Parallel Vs Sequential Processing
Hi.
I have a SSIS package which reads 120 M records from a DB2 warehouse. The ware house has various tables for diff Geographies. around 7 tables.
The current design has 7 source adapters for DB2 , and union all followed by the required transofrmation. I am thinking of having one source adapter in for each loop and reading all the 7 tables sequentially... I could not test the second approach because
, i wanted to test production sized data and they do updates on week ends.
Which one will give me the best results in loading those many records in terms of execution time. ? ( is the 1st approach a parallel processing)..Does SSIS enforces parallelism when i put more then one source comp in a data flow.
Please help me understand this..
November 7th, 2010 8:02pm
Ouch! How long does it take to run now?
I'm often skeptical about the value of parellelism, since even one busy thread might max out your IO channels. OTOH, in my company's environment when we get on the WAN we get throttled connections each maxed out at 1mbs, so using seven connections
in parallel would probably win.
IOW, likely the answer is, "it depends!"
Hope you will post the results as you try them out.
Thanks.
Josh
Free Windows Admin Tool Kit Click here and download it now
November 7th, 2010 8:46pm
Parallelism is one of the most common design patterns that is used to increase the performance of an SSIS workload, like many other Software solutions.
Having more than 1 data source in a data flow or having more than 1 data source where each loads 1 table (there shoulb be NO connection between the data flows) will force SSIS to run in parallel.
Meanwhile, you should be aware of the followings.
* Is there any dependency among the 7 tables. I mean is there any reference amon gthe tables that you need to take care during loading?
* Take care of EngineThreads property of the data flow task, you may need to increase the number.
* Do you have enough resource (memory, cpu) on the SSIS Server? Especially if you run out of memory, you may see that parallelism performs worse than sequential execution.
* Do you have enough network bandwith? (you may try to look at network setting tricks.
http://sqlcat.com/top10lists/archive/2008/10/01/top-10-sql-server-integration-services-best-practices.aspx)
* DO you have enough resource on the Source Server (DB2). I mean are you allowed to create a big workload by executing in parallel on DB2?
These are the issues I have in my mind. There may be some more additions to this list.
Hope that helps,
Sedat
November 8th, 2010 5:46am