Group by query on a huge table
BTW - I think you should post this question in the T-SQL forum. A SQL to SQL table copy like this is best done all inside the DB engine, so your first preference should be to solve this tempDB size issue. Assuming you are or have sought advice on a pure T-SQL solution, we can investigate SSIS ones... can your server handle SELECTing with an ORDER BY on those columns? If yes, then you can issue that ORDERed SELECT as a source in SSIS, then use an asynchronous script to collate running totals. Talk to me now on
May 29th, 2012 6:05pm

Hi everybody, I want to aggregate the data of one of my tables (5 billions of records) and put it in a target table. I tried to do it within a SQL query but I reach my tempdb limit such as: insert into target_table select log_date, site_id,server_id,sum(queries),sum(volumes) from source_table group by log_date, site_id,server_id I would like to know if there is a way to do it progressively with SSIS for instance. Thanks Pete
Free Windows Admin Tool Kit Click here and download it now
June 3rd, 2012 4:54pm

You can probably aggregate in a loop based on date ranges e.g a week. Then accumulate the running totals in temp variables, and then further group from there. Perhaps shredding the recordset may work: http://www.sqlis.com/post/Shredding-a-Recordset.aspxArthur My Blog
June 3rd, 2012 5:07pm

BTW - I think you should post this question in the T-SQL forum. A SQL to SQL table copy like this is best done all inside the DB engine, so your first preference should be to solve this tempDB size issue. Assuming you are or have sought advice on a pure T-SQL solution, we can investigate SSIS ones... can your server handle SELECTing with an ORDER BY on those columns? If yes, then you can issue that ORDERed SELECT as a source in SSIS, then use an asynchronous script to collate running totals. Talk to me now on
Free Windows Admin Tool Kit Click here and download it now
June 3rd, 2012 6:27pm

Instead of doing grouping on billions of records in one go, you can also do it by site_id + server_id, assuming you don't have millions of sites and servers. insert into target_table select log_date, sum(queries),sum(volumes) from source_table where site_id = ? and server_id = ? group by log_date Randy Aldrich Paulo MCTS(BizTalk 2010/2006,WCF NET4.0), MCPD | My Blog BizTalk Message Archiving - SQL and File Automating/Silent Installation of BizTalk Deployment Framework using Powershell > Sending IDOCs using SSIS
June 4th, 2012 3:40am

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics