Suggestion required in SSIS Package design approach
I need to your suggestion in implementing on SSIS Package.
Problem: We have millions of data coming in single flat file for multiple categories like below:
ACC | 1 | new account |FirstName|LastName |address|
VEN|VenName|2,000
VEN|VenName|2,000
BUS|xcx|343|gfgfdg|ggdsg|
BUS|xcx|343|gfgfdg|ggdsg|
BUS|xcx|343|gfgfdg|ggdsg|
as shown,(1) we can identify the record by using record identifier(ACC,VEN,BUS) (2) data is pipe delimited but the number of columns will vary as per the category
Note: File locaiton need to be passed as variable
I need to write a SSIS package which will read the data from file and load to respective tables, viz. Account(ACC),Vendor(VEN),Business(BUS), etc.
My appraoch for the solution in SSIS is: (1) Create the master packge to check for the existence of the file in specified location (2)if file exist,use Script task to split the file into multiple files dynamically, i.e. all ACC records will goes to Accout.txt
and so forth. (3) Develop the child packages for each category type(Accout,Busiess,Vendor) will read the data from the files created in step 2. (4) Call all the chile packages from parent package
We can do this operation in single package using Split transoforation, however, by considering the complexity of transformation involves further to load each category frecord and to avoid loading millions of data to memory pipleline I have chosen the above
solution.
I need to your guidance/help in the ealuating above solution approach and help me to improve the desgin.
Thanks in advance
May 23rd, 2011 6:23am
your approach won't have fair performance, because you loops through records and load them one by one
I strongly recommend you to use Script Component as Source.
with few lines of c# or vb.NET scripts you can fetch data as you want from any format of lines in flat file, and generate a unique format output or multiple outputs based on your destination logic.
this is a sample of script component as source:
http://www.rad.pasfu.com/index.php?/archives/38-Script-Component-as-Source-SSIS.htmlhttp://www.rad.pasfu.com
Free Windows Admin Tool Kit Click here and download it now
May 23rd, 2011 6:30am
Hi Reza Radd - Thanks for prompt reply. As I mentioned in my query, I am also planning to use Script component to first write the rows to seperate files(Acc.txt,Bus.txt,Ven.txt, etc). Then am planning to develop child packages which will connect
to these files and load the data to respective destination tables. am copying the steps again here
My appraoch for the solution in SSIS is: (1) Create the master packge to check for the existence of the file in specified location (2)if file exist,use Script task to split the file into multiple files dynamically, i.e. all ACC records will goes to Accout.txt
and so forth. (3) Develop the child packages for each category type(Accout,Busiess,Vendor) will read the data from the files created in step 2. (4) Call all the chile packages from parent package
May 23rd, 2011 6:51am
you have an extra step in your approach. you load them in memory using script component, and then export them in flat files. and after all these you are going to load each file again and do data flow.
this may help readability of your package, but reduce performance, because you can do all you want in first load in memory, instead of writing results of script component to different flat files, return them as different outputs from script component (
you can have multiple outputs there ). and use these outputs as source of other transformations in data flow task.http://www.rad.pasfu.com
Free Windows Admin Tool Kit Click here and download it now
May 23rd, 2011 7:05am
Just give me your thoughts/feeback on my approach and evaluate my design. Regards
May 23rd, 2011 7:05am
Great Thanks Reza Raad! The only reason i am considering to write the data to multiple files is to seperate the lengthy transformations for each type in seperate package which increase package readeability and also to assign each package to seperate
SSIS developers.
thanks
Free Windows Admin Tool Kit Click here and download it now
May 23rd, 2011 8:02am
I am agree with you that this will cause more readability , but this will cost you much time and resource if your source file contains huge amount of data. this is the why I recommend you to use all in memory way as I suggested.
but if you insist to use this way, you can write them in separate files, but not that in your child packages do just BULK INSERT TASK to do batch load and raise performance. ( this method can be applied if you don't need any transforming data during your
etl scenario , but if you need transforming data then you need to load those file in memory again with data flow sources and this will cause a re-fetch data in memory that is not good solution for large etl scenario).http://www.rad.pasfu.com
May 23rd, 2011 8:25am
Hi All,
I have suggestion, i dont know wheather this is helpfull or not, let me know
i had worked on similar problem with a fixed width file, where i had header, details and footer records and i have multiple flat files, record count is around 100,000 , my solution was master package and a load child package, master package was used to check
if file exists if yes call load package once load is complete move file to archive folder and send sucess email, else send an Informational email file not exists, In load package configure flat file source, using conditinal split , separate date into 3 ways
, header , detail and footer, using substring functiona and Derived column configured to load in to 3 differnt tables.
BB
Free Windows Admin Tool Kit Click here and download it now
May 23rd, 2011 9:29am