Merge Join in SSIS 2005 Vs 2008
Hi,
I have a process which reads from 2 tables (using two connection managers) and "merge join" them to a third table by using left outer join.
This process is implemented on SSIS 2005 and SSIS 2008 identically, however, for some reason I get
less records at the third table at the 2008 process.
When implementing this merge join on Sql Server 2008 and Sql Server 2005 I get the same result as I got at the SSIS 2005 process - so it seems like the problem is at the SSIS 2008 process.
I've checked the merge join properties - they look similar on 2005 and 2008.
Does anyone knows if the 2008 merge join object should be configured differently than the 2005 merge join?
I have nulls on some the fields that connect the two tables - do you think it's got to do with it?
What else can I check?
Thanks!
David
November 23rd, 2010 1:04am
are the LEFT part of join and RIGHT part of join in both 2005 and 2008 same? they should be exactly same if you use LEFT OUTER JOIN typehttp://www.rad.pasfu.com
Free Windows Admin Tool Kit Click here and download it now
November 23rd, 2010 1:26am
I have nulls on some the fields that connect the two tables - do you think it's got to do with it?
What else can I check?
Thanks!
David
As Reza said the merge joins are the same in 2005 & 2008
You need to replace nulls with blanks or some other value. Nulls does not provide correct data for joins.
Tweet me..
November 23rd, 2010 1:39am
The Merge Joins for the two versions operate identically.
Typical issues with unexpected behaviour for Merge Joins are:
Mistakes made in the Sort components preceding the Merge Join. You may be sorting on something different than you think.
Misconfiguration of the Merge Join - the "join type." Make sure it's what you intend.
Misconfiguration of the Merge Join - the "join columns." If you've edited the sort order, it's possible that the Merge Join may not be merging on the entire sort key. Make sure that the columns you intend to be used as the join key have a checkmark
in the "Join Key" column in the Merge Join The Sort component wasn't used, the Source component requested sorted data - but the Source wasn't configured properly. If you're using an ORDER BY in your Source, make sure that the Advanced Editor shows the correct information in the SortKeyPosition
properties of the order by columns. The Sort component wasn't used, the Source component requested sorted data - but the two Source collations aren't compatible. Make sure that if you're doing this, the sort order of the two sources is compatible - this means the data should be sorted
the same way by both mechanisms.
Talk to me now on
Free Windows Admin Tool Kit Click here and download it now
November 23rd, 2010 5:08am
Hi Todd,
Thanks for your reply.
I've checked all the Typical issues you mentioned but couldn't see any problem.
It's very strange that the same "merge join" process at SSIS 2008 returns less rows than it's equivalent at SSIS 2005 - while they both use the exact same data sources (sql server 2005 table + Oracle table)
I discovered two interesting facts
1. The rows that don't pass through the SSIS 2008 "merge join" but DO pass through the SSIS 2005 "merge join"- contain in their join key both numbers and letter ( for example: "I-0008504"), while those that do pass contain only numbers
("0000001007"). This join key in configured as NVARCHAR(10) on both datasets.
Do you think it's got to do with a double-byte character set (DBCS)?
2. If I add "Multicast" object connected to an "OLE DB destination" object - between one of the datasets and the "merge join" - the problem is solved.
i.e the 2008 "merge join" joins the exact rows number as the 2005 "merge join" does!
I know this sounds strange but it works!
Although the Multicat has nothing to do with the inputs the "merge join" object receives.
By the way - This only happens when I use "Table or view" option on the Data Access Mode of the "OLE DB destination" object.
If I choose "Table or view - fast load" - the problem remains
Did anyone encounter a similarr problem or can offer a direction?
Thanks Again,
David
November 24th, 2010 3:44pm
Whoa. That really sounds weird - especially the mangling of the column values, and the success you saw with the Multicast.
Have you tried deleting the Merge Join and dropping a new one on the surface?
Even if that works, if you can possibly extract a "safe" sample data set from your sources, store it in RAW files, and repro the issue... then upload the package and those files to a Connect issue, I'm sure the SSIS team will try to figure out WTF is going
on. That sounds awful weird.
Talk to me now on
Free Windows Admin Tool Kit Click here and download it now
November 24th, 2010 9:23pm
I am experiencing the same issue with SSIS 2008. The package worked fine (Merge Join returned correct / complete results) in SSIS 2005, but since I upgraded the package to 2008, the Merge Join is not returning all of the expected rows. The behavior
exhibited is that the Merge Join turns green part way through the rows being passed to it from one of the sources. I have approximately 12 million rows being passed to the Merge Join, and after approximately 480,000 rows, the Merge Join step turns green,
as do all downstream steps, but the OLE DB source is still yellow and passing rows to the Merge Join transform. If I alter the Data Flow Task's DefaultBufferMaxRows from the default (10,000) to 50,000, then the Merge Join will pass approximately 3 million
rows before it exhibits the behavior as above (turning green although more rows still need to be processed).
I haven't tried adding a Multicast, because I need to use Fast Load on the destination (it seems that only worked when specifying Table or View for the destination).
Our environment is SQL Server 2008 Enterprise Edition SP1. Also, I have tested this on 2008 R2, and it works fine (but we can't currently migrate to R2 in our production environment).
Has Microsoft identified this as an issue or does anyone know of a workaround?
Thanks for any insight.
November 19th, 2011 5:38pm