Combining fuzzy grouping results
Dear all,
I'm using the fuzzy grouping task to de-duplicate customers. It seems that we may need to run 2 fuzzy groupings and then combine the results:
1. Fuzzy grouping on customer title, last name and firstname
2. Fuzzy grouping on address1, address2, address3, city, zip and country
I've configured this and the results are placed into 2 separate tables. Now I'd like to combine the 2 sets (i.e. fuzzy grouping on name or address) to show 1 consolidated list of duplicate records and the record which should be used instead.
Does anyone have any suggestion or experience on how to do this?
Thanks,
Matt
May 10th, 2011 6:05pm
There should be no reason why you can't acheive this with just one Fuzzy Grouping Task.
Does you source table have a Primary Key i.e. Something like CustomerID? if so, are you storing the customer ID with the data being stored in each of the tables?Jeff Wharton MSysDev (C.Sturt), MDbDsgnMgt (C.Sturt)
Free Windows Admin Tool Kit Click here and download it now
May 10th, 2011 8:12pm
Jeff,
Thanks for your reply.
The source table has CustomerID and StoreID. It is the combination of these which make the customer record unique. Ii also has a surrogate key for the purposes of Analysis Services.
Having studied the table and looked at where the duplicate records are it seems to me that we need to use the fuzzy group to group on where it finds a high similarity on Title + FirstName + LastName OR a high similarity on Address1 + Address2 + Address3
+ Zip etc.
The data has come from a point of sale system, so you can imagine what the quality of it is like!
Matt
May 11th, 2011 6:02am