SSIS Update only subset of rows
I have an UPDATE tasks in the package. We will have an issue if these run as is against our production database. The issue is that due to the size of
these two tables being updated, these UPDATE statements will fill the transaction log. Since a given UPDATE statement is an atomic transaction, each row that is modified by the UPDATE statement will be written to the transaction until all the rows affected
by the UPDATE statement have been modified.
As such, you'll need to execute the UPDATE in a loop, updating only a subset of rows (say 500,000) with each update statement to avoid filling up the transaction log.
The Update Statement :
UPDATE Table1 SET Table1.Field1 = StagingTable.Field1 FROM Table1 INNER JOIN StagingTable ON Table1.Field2 = StagingTable.Field2
Any ideas ? Thanks
May 5th, 2011 11:11am
In loop or not, how the Transaction log will not be filled?
And what kind of a database space constraint you have so you cannot update data, it is already an un-healthy environment, beware.Arthur My Blog
Free Windows Admin Tool Kit Click here and download it now
May 5th, 2011 11:33am
You might write script which loop and update only 1000-2000 rows each iteration, commit data before pull another set that way you can avoid fill up this transactional log.
We have a similary kind of problem to update 5million rows every day - for that we implimented process which pull 5000 rows update and commit before pull next 5000 row set.
Hope this helps you.
If this is a data warehouse (not OLTP) then you can change recovery model
simple before doing this updates and revert it back full
once you finish this process.
http://uk.linkedin.com/in/ramjaddu
May 5th, 2011 11:33am
Here is an example on how to do an update in batches:
http://www.codeproject.com/KB/aspnet/BatchUpdate.aspxArthur My Blog
Free Windows Admin Tool Kit Click here and download it now
May 5th, 2011 11:36am
Thanks RamJaddu and Arthur !!
But i think writing it all in T-SQl would be nice to avoid those round trips, lets say we are updating 5000 row for an iteration with over 680 million rows between those two tables, then it will take 136,000 round trips.
Could you suggest me how to write it all in a T-SQL ?
here is my update statement
UPDATE Table1 SET Table1.Field1 = StagingTable.Field1 FROM Table1 INNER JOIN StagingTable ON Table1.Field2 = StagingTable.Field2
Thanks again !!
May 5th, 2011 12:28pm
I do not think there are round trips involved, all the SQL gets executed on the SQL Server anyways.Arthur My Blog
Free Windows Admin Tool Kit Click here and download it now
May 5th, 2011 2:10pm
You did not specify what is vendor of your database is that you are trying to update but if you want to learn something new in T-SQL you can use the MERGE command to do that update on SQL Server 2005 and up as described here:
http://charliedigital.com/2009/06/05/t-sql-merge-my-new-friend/Arthur My Blog
May 5th, 2011 2:13pm
I know that you have a Logging issue
Just heads-up you may need to some tunning check
http://blogs.msdn.com/b/mattm/archive/2010/06/29/performance-tuning-ssis-video.aspx
check it all or just jump to 33:30sec and see what its saying about updates in SSIS
story short 4:30Sec using a UPDATE in ssis becomes 1s using a different technique, anyways nice to check that
Good luck
Nik
Sincerely SH -- MVP, MCITP 2008, MCTS 2008 & 2005 -- Please kindly mark the post(s) that answered your question and/or vote for the post(s).
Free Windows Admin Tool Kit Click here and download it now
May 5th, 2011 4:15pm
Hi Can anyone suggest how to do it
I mean how to modify the update query in sql script task for batch update ?
Lets say for this statement :
The Update Statement :
UPDATE Table1 SET Table1.Field1 = StagingTable.Field1 FROM Table1 INNER JOIN StagingTable ON Table1.Field2 = StagingTable.Field2
Sorry if that sounds silly ...Im just learning
Thank You :)
May 5th, 2011 6:45pm
Assuming that Field2 is a sequential integer and that there's one row per value, here's some code to do the next 5,000 rows.
--Find out how the point that has already been updated
DECLARE @Field2AlreadyUpdated INTEGER
SELECT @Field2AlreadyUpdated=MAX(Table1.Field2)
FROM Table1
WHERE Table1.Field2 is not null -- Assuming your field is null until updated
-- Update only the next 5,000 rows
UPDATE Table1
SET Table1.Field1 = StagingTable.Field1
FROM Table1
INNER JOIN StagingTable ON Table1.Field2 = StagingTable.Field2
WHERE StagingTable.Field2 Between @Field2AlreadyUpdated + 1 and @Field2AlreadyUpdated + 5000
Free Windows Admin Tool Kit Click here and download it now
May 5th, 2011 10:16pm
I am assuming table TABLE1 column value null until updated
Use SET ROWCOUNT
---- some thing like this
While (select count(1) from Table1 where Field1 is null) > 0
begin
begin tran
SET ROWCOUNT 1000; ---- this restrict below update to 1000 rows
UPDATE Table1
SET Table1.Field1 = StagingTable.Field1
FROM Table1 INNER JOIN StagingTable
ON Table1.Field2 = StagingTable.Field2
Where Table1.Field1 is null
If @@ERROR = 0
Commit tran
Else
Rollback tran
end
http://uk.linkedin.com/in/ramjaddu
May 6th, 2011 7:50am
Thanks a lot Ramjaddu
Can you please modify the following update statement for better understanding !! i was in military for a while and just changed job to software
update cds
set [mailing_history] = cdst.[mailing_history],
[mailing_history_initial] = cdst.[mailing_history_initial],
[mailing_history_followup] = cdst.[mailing_history_followup]
from dbo.campaign_detail_static cds
inner join dbo.campaign_detail_static_temp cdst
on cdst.campaign_code_skey = cds.[campaign_code_skey]
and cdst.sequence_nr = cds.sequence_nr
Thank You :)
Learning bee
Free Windows Admin Tool Kit Click here and download it now
May 6th, 2011 11:57am