SSIS 2005 lookup with a largish data set
Hi All -
And I'm doing things horribly wrong and need a spin in the right direction!
I am trying to load a smallish amount (10k-600k) rows of data from a raw file into SQL Server (destination as oledb destination). My destination tables range from 100k - 100 million in size. By random occurrence, some of the records in that file may already
exist in the table - as to which point I don't really care - ignoring the error is fine.. I do need to retain my identities on the table so I'm doing a table/fast load insert.
I have a master package calling some child ones and and I need the child ones to return success so the master doesn't crash
I have "ForcedExecutionResult" = Success, "ForcedExecutionValue" = True on the child package and it's components ... and it still returns as failure to the master package instead of success as I would assume that it should?
Ok so let's say my table has records a, b, c in it and the file has records c, d, e in ti. Well if I just had a simple RawFile - > OLEDB Destination data flow - that would crash since C was both in the source and destination . If I say "ignore error".
the I lose records d e.
I can't delete record C because of FK reasons.
So If I say "OK! Add a lookup transformation to make sure we only insert rows that are new". That adds a lot of overhead - especially when my lookup has millions of rows in it.
I can't do a row by row insert and catch the bad values that way because I need to retain the identity values (unless I'm missing a trick to do that)
Although Ideally I'd like data flow not to error and to act like a lookup component functionality where I just insert the new records regardless if there dupes in the raw file ... If I can get the component to fail and report success so the rest of the operations
can go on, that'd be fine with me.
I haven't worked much with the SQL Server destination. Would that be the key to what I'm looking for and just set max error count to 100000? Would that allow my package to hit the error handler I have associated with it though or just completely ignore that
and report success? Would that load faster than a fast load from the oledb destination?
This is SSIS 2005 so no cache tricks available
Thoughts?
Thanks!
November 6th, 2010 2:06pm
Yes, your assumption is correct, please see this post for some information:
http://www.tbig.com.au/forums/bi-monkey/3420-preventing-sql-agent-jobs-calling-ssis-reporting-failure.html
Why not to
1) load the rows from the file into a staging table;
2) Insert the data from this staging table into the target using a special SQL e.g.
SELECT * INTO MyTargetTable FROM tbStaging WHERE
NOT EXISTS (SELECT tbStaging.MyColumn1 + tbStaging.MyColumn2 FROM tbStaging)
or you can use a join to achieve the same;
(use SQL Task)
An extra source of info:
http://stackoverflow.com/questions/726448/using-sql-server-ce-possible-to-insert-only-if-not-existsArthur My Blog
Free Windows Admin Tool Kit Click here and download it now
November 6th, 2010 9:25pm
What I would do is just a few small alterations in your inserting package(s). DON'T use the SQL Server Destination - that won't get you anything.
Reduce the Destination's "batch size" from zero (all rows) to something "reasonable" (you pick - experiment a little).
Redirect errors from the Destination to another Destination, configured identically - except for having a batch size of 1 (one).
What will happen:
You'll get some buffers input into your first Destination, until you reach the "batch size" you've set on it. At that point, the Destination will actually try the insert operation. If there's no duplicate in the batch, the insert will succeed.
If there is a duplicate, all the rows from that batch will be sent to the error output (no inserts will have succeeded). The second Destination will then try a (slower) row-by-row insert. Only the specific rows that were duplicates will cause errors
and get dumped out the error output for you to collect/report on/whatever.
Talk to me now on
November 7th, 2010 11:09am