Building ODS(Operational Data Source)
I am Integrating 3 different database sources into an ODS. The ODS is going to have the exact tables and columns as the source system. I just tried to load one sample table (3 columns) with 2 million records, just to check the systems. It took around 30 minutes, I don’t want to go into asking you guys how to optimize it. My plan is to create a schema, and the table structure for all the sources in the ODS in the first week and once the structure is ready I will start loading data from one database source at a time Q1) Is there a golden rule or method to follow while creating the table structure or should I just start doing the create table scripts for all the tables in the ODS? Q2) Since the tables are relational in the source system, and my guess is that ODS need not have any relation among tables. So I don’t have to create any constraints (like PK, FK) while writing the create scripts. Correct me if I am wrong. Q3) I am going to use SSIS to load data, so how do I create the packages, should I have each package for each source or should I break up the tables, say 5 tables in one package. Does this affect the performance? It would be really great if you could answer these questions and also give me some tips/advice. I owe you guys. Thanks
November 18th, 2010 3:24pm

Hi Daniel, About the Q1) I just do the scripts. About the Q2) You're right. But if there exist PK on tables you should be able to improve performance using fast load on oledb destination component with Order hint. About the Q3) The SSIS have the capability of execute parallel task. I can recomend you create at least a dataflow for each table, doing this you can execute in a single package that gone to run the 5 dataflows at same time if you don´t configure precedence between componentes. To take the decision about isolate each table to a package, you should consider if in the future you want or need do the process only to a table. About ODS. Often, an ODS source mean that you build the data source view to cube from the relational/transactional system. What you mean with ODS? And of course... maybe somone can indicate you better.Vctor M. Snchez Garca (ES) (BI) Hope this help. Please vote if you find this posting was helpful. if this is an answer to your question, please mark it. http://bifase.blogspot.com | http://twitter.com/atharky
Free Windows Admin Tool Kit Click here and download it now
November 19th, 2010 8:01am

There are at least 30 tables in each source system. How can I create data flow task for each one of them. There must be something simpler I can do. ODS is something that can have all the source databases, it can be used for reporting and quering purpose rather than use the source database itself. And also ODS can be as a staging database, to build the data warehouse.
November 19th, 2010 8:26am

Yes, there is another way... but is more trickly. you can design a basic package as pattern, and from code (c#, vb.net) take this model package to add source, mapping the source, add destination and mapping it. Building dynamically a package for each table of source, reading the properties of each table and configure componentes from code. Is a big effort at beging but is a reusable solution Check http://www.sqllion.com/2009/05/ssis-programming-basic/ , http://msdn.microsoft.com/en-us/library/ms136025.aspx and http://ssisbi.com/building-ssis-packages-programmatically-part-1/ Víctor M. Sánchez García (ES) (BI) Hope this help. Please vote if you find this posting was helpful. if this is an answer to your question, please mark it. http://bifase.blogspot.com | http://twitter.com/atharky
Free Windows Admin Tool Kit Click here and download it now
November 19th, 2010 8:51am

Hmm. I don't think I would want to go that way.
November 19th, 2010 4:34pm

When you say you are integrating 3 different sources, are you doing anything to check the integrity? Or is it just the UNION ALL of all 3 sources without any regard to what gets stored. Depends on what you doing. If all sources have different data, then your database tables should have unique contraints on it. And say if you importing 'Table1' from all sources and inserting it into 'Table1', how would you make out which row came from which source. It'd be a good idea to have column like 'source' indicating its source. There also seems a performance issue there. You will have to look at what is the bottleneck. If network is the issue, you might not wanna do packages in parallel. You should also use checkpoints, so that the progress gets preserved. Also rethink on the purpose of this database. Like if its for reporting, do you really need all data or just summarized data would do. And if you need row level details, do you really need all the tables. Perhaps you should finalize reporting requirements first. If data is for cubing, you can dimensionalize the data. I'm only saying this, because looking at the big picture, I see no purpose of storing redundant data. (unless its for performance gains). Cheers,Press CTRL-ALT-DEL now for an IQ test.
Free Windows Admin Tool Kit Click here and download it now
November 20th, 2010 9:57pm

I importing all the data into the ODS. Same structure as the source. I appreciate your input. Thanks brother
December 10th, 2010 3:42pm

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics