Lookup transform
One suggestion from a MSDN blog is:
"[SCD]: for large number of rows that will not exist in the dimension table, consider using a lookup before the SCD. "
http://blogs.msdn.com/b/ashvinis/archive/2005/09/27/474563.aspx
But I don't konw the reason. Why we need to add another Lookup? What's the internal action in SCD? For example, SCDII, it will check if the record exists in destination, and then update or insert. But how doesn't SCD
check the record?
Please let me konw your idea.
November 13th, 2010 6:15am
The default SCD transform that comes with the SSIS stack is a row by row transform. In order to find which rows are new and which already exist Lookup transform is much faster . With SSIS 2008 you can use Cache Transform . But I will not suggest any of these..
I use http://kimballscd.codeplex.com/.
If you cannot install it on your machine due to some policy then all you can use T-SQL to find the Type-1 , Type -2 and other types if you have and dump them in separate tables . Use a SET based operation to UPDATE or INSERT in destination table.
If you are using SQL 2008 then you might find that the MERGE statement in SQL is the best way to get this done in one single statement.
There is another thing which I wanted to mention here...Lets say you have 5 scd-2 attributes in the dimension table. Now if you try to do a JOIN checking each of these attributes the statement will be very slow.
What you can do is create a column in the source table (staging) and destination (warehouse dimension table) and take all these 5 attributes and compute a hash value using some Hashing algorithm , lets say SHA1 . Now you have hash equivalent for these 5
attributes in one column so all you need to do it to just check for this column .... saves your JOIN ..
have a look here http://anastasiosyal.com/archive/2008/04/23/quick-tip-use-hashbytes-to-create-a-hash-in-tsql.aspx
HTHAshwani Roy
Blog
Twitter
Please click the Mark as Answer button if a post solves your problem!
Free Windows Admin Tool Kit Click here and download it now
November 13th, 2010 9:03am
The default SCD transform that comes with the SSIS stack is a row by row transform. In order to find which rows are new and which already exist Lookup transform is much faster . With SSIS 2008 you can use Cache Transform . But I will not suggest any of these..
I use http://kimballscd.codeplex.com/.
If you cannot install it on your machine due to some policy then all you can use T-SQL to find the Type-1 , Type -2 and other types if you have and dump them in separate tables . Use a SET based operation to UPDATE or INSERT in destination table.
If you are using SQL 2008 then you might find that the MERGE statement in SQL is the best way to get this done in one single statement.
There is another thing which I wanted to mention here...Lets say you have 5 scd-2 attributes in the dimension table. Now if you try to do a JOIN checking each of these attributes the statement will be very slow.
What you can do is create a column in the source table (staging) and destination (warehouse dimension table) and take all these 5 attributes and compute a hash value using some Hashing algorithm , lets say SHA1 . Now you have hash equivalent for these 5
attributes in one column so all you need to do it to just check for this column .... saves your JOIN ..
have a look here http://anastasiosyal.com/archive/2008/04/23/quick-tip-use-hashbytes-to-create-a-hash-in-tsql.aspx
HTHAshwani Roy
Blog
Twitter
Please click the Mark as Answer button if a post solves your problem!
November 13th, 2010 9:03am
Thank you Ashwani. It sounds greate, I will try the kimballscd. The imformation is very useful!
Before closing this thread, I still have two questions.
1) Regarding that suggestion in MSDN blog, since I use a Lookup, then I don't need to use SCD any more, right? I can simply use Lookup and conditional split to split the data for updating and inserting. I mean I don't need to use SCD, so why he suggested
us do a lookup before SCD?
2) The link:
http://kimballscd.codeplex.com/wikipage?title=SCD%20Wizard%20Disadvantages&referringTitle=Home mentioned: "the Wizard doing an uncached RBAR round-trip lookup for every row of the source. " What's that? I created a trace but I still cannot find
the internal action of SCD wizard. (Perhaps only the member from SSIS team can better answer this quesiton. So, if this is a silly question, please simply ignore it). Thanks.
Free Windows Admin Tool Kit Click here and download it now
November 14th, 2010 7:15am
Thank you Ashwani. It sounds greate, I will try the kimballscd. The imformation is very useful!
Before closing this thread, I still have two questions.
1) Regarding that suggestion in MSDN blog, since I use a Lookup, then I don't need to use SCD any more, right? I can simply use Lookup and conditional split to split the data for updating and inserting. I mean I don't need to use SCD, so why he suggested
us do a lookup before SCD?
2) The link:
http://kimballscd.codeplex.com/wikipage?title=SCD%20Wizard%20Disadvantages&referringTitle=Home mentioned: "the Wizard doing an uncached RBAR round-trip lookup for every row of the source. " What's that? I created a trace but I still cannot find
the internal action of SCD wizard. (Perhaps only the member from SSIS team can better answer this quesiton. So, if this is a silly question, please simply ignore it). Thanks.
November 14th, 2010 7:15am
1. Yes you dont really need a SCD you can do the DML using SQL there . I am not sure but it was written in 2005 and it is more like a suggestion I guess I would not pay too much attention to it.
2. The issue with SDC transform is it is row by row and does not work really well when the number of rows / data in destination table is big ... but I am not sure that does the "Uncaching " means here...
In short never use the default SCD transform that comes with the SSIS package... By the way I would also say transforms like OLEDB command are also row by row and really perform bad...
If you want to get some tried and tested pattern and best practices look at SQL CAT blog before any other http://sqlcat.com/ .. Ashwani Roy
Blog
Twitter
Please click the Mark as Answer button if a post solves your problem!
Free Windows Admin Tool Kit Click here and download it now
November 14th, 2010 4:11pm