Group by query on a huge table
BTW - I think you should post this question in the T-SQL forum. A SQL to SQL table copy like this is best done all inside the DB engine, so your first preference should be to solve this tempDB size issue.
Assuming you are or have sought advice on a pure T-SQL solution, we can investigate SSIS ones... can your server handle SELECTing with an ORDER BY on those columns?
If yes, then you can issue that ORDERed SELECT as a source in SSIS, then use an asynchronous script to collate running totals.
Talk to me now on
May 29th, 2012 6:05pm
Hi everybody,
I want to aggregate the data of one of my tables (5 billions of records) and put it in a target table.
I tried to do it within a SQL query but I reach my tempdb limit such as:
insert into target_table
select log_date, site_id,server_id,sum(queries),sum(volumes)
from source_table
group by log_date, site_id,server_id
I would like to know if there is a way to do it progressively with SSIS for instance.
Thanks Pete
Free Windows Admin Tool Kit Click here and download it now
June 3rd, 2012 4:54pm
You can probably aggregate in a loop based on date ranges e.g a week. Then accumulate the running totals in temp variables, and then further group from there.
Perhaps shredding the recordset may work: http://www.sqlis.com/post/Shredding-a-Recordset.aspxArthur My Blog
June 3rd, 2012 5:07pm
BTW - I think you should post this question in the T-SQL forum. A SQL to SQL table copy like this is best done all inside the DB engine, so your first preference should be to solve this tempDB size issue.
Assuming you are or have sought advice on a pure T-SQL solution, we can investigate SSIS ones... can your server handle SELECTing with an ORDER BY on those columns?
If yes, then you can issue that ORDERed SELECT as a source in SSIS, then use an asynchronous script to collate running totals.
Talk to me now on
Free Windows Admin Tool Kit Click here and download it now
June 3rd, 2012 6:27pm
Instead of doing grouping on billions of records in one go, you can also do it by site_id + server_id, assuming you don't have millions of sites and servers.
insert into target_table
select log_date, sum(queries),sum(volumes)
from source_table where site_id = ? and server_id = ?
group by log_date
Randy Aldrich Paulo
MCTS(BizTalk 2010/2006,WCF NET4.0), MCPD |
My Blog
BizTalk Message Archiving - SQL and File
Automating/Silent Installation of BizTalk Deployment Framework using Powershell >
Sending IDOCs using SSIS
June 4th, 2012 3:40am


