Flat File and uneven number of columns
Please leave feedback for Microsoftregarding this problem at https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=126493
Ok I'm sure it just me, SSIS has been great so far....but how can you import a straight CSV file with and uneven column count.
For example: (assume CR LF row delimiter)
The,Quick,Brown,Fox,JumpsHello,WorldThis,is,a,test
"Normally" I'd expect this
| Col1 |
| Col2 |
| Col3 |
| Col4 |
| Col5 |
The
Quick
Brown
Fox
Jumps
Hello
World
NULL
NULL
NULL
This
is
a
test
NULL
Ok but what we get is the row delimiter is ignored in preference for the column delimiter and the row delimiter gets sucked into the column and the next row starts toget layed down.
So we get
| Col1 |
| Col2 |
| Col3 |
| Col4 |
| Col5 |
The
Quick
Brown
Fox
Jumps
Hello
World{CR}{LF}This
is
a
test
I'm I not seeing a tick box somewhere that says "over here if you want to terminate a row on the row delimitereven if all columns aren't full and we'llpad NULLs inrest of the row columnswhich you can fix in the flow transformations"
I'm sure it's there.....help!
(By the way SSIS team, great job on the package love using it)
January 20th, 2006 11:14am
Try using ragged right instead of delimited.
Thanks,
Matt
Free Windows Admin Tool Kit Click here and download it now
January 23rd, 2006 7:36pm
Ah see if only it was that simple Matt. "Ragged right files are files in which every column has a fixed width, except for the last column, which is delimited by the row delimiter."
So the only thing which can be Ragged in fact is the last column. In my example I've purposely putvarying column widths so the old "Ragged Right" doesnt apply (this is more the norm for ragged files).
So any ideas how this is going to work in SSIS...I must be missing something simple here this is a very common problem that the old DTS, Excel, Access, and other programs deal with quite well.
help!
January 24th, 2006 1:57am
Unfortunately we only support ragged right for fixed width so for this scenario you would have to read it as a single column and then split it using a script component or you could write a custom component that split it.
Matt
Free Windows Admin Tool Kit Click here and download it now
January 24th, 2006 11:52pm
but this is a bug or changed behavior from the old sql server 2000 dts which when you had a comma or tab deliminited file some of the rows had extra columns after the last valid one.. it ignored those extra columns.
the new functionality is to instead append the commas/delimiters into the data pulled in from the last column..
This is causing us major grief becuase now none of our dts's work in the new sql server 2005 so it is not backwards compatible..
March 10th, 2006 7:01pm
This is similar to an issure raised the other day. I produced a sample package on how to handle this.
http://sqlblogcasts.com/files/4/integration_services/entry412.aspx
Free Windows Admin Tool Kit Click here and download it now
March 10th, 2006 7:08pm
I'm with you Igkahn. It's a bug. No way is this a feature! This is killing us too. If I have to script this as set out below then something fundimentally is wrong with SSIS. I cut one of our large SQL 2000 servers over to SQL 2005 and we are stopping there until fixes are supplied for errors like this. I'm now having to use the SQL 2000 DTS instead of the nice looking however functionality poor SSIS.
Please get this fixed.
Garry SwanInformation Systems ManagerCSIRO Australia
March 13th, 2006 11:24am
If you feel strongly about it then you should raise it through the Feedback centre and through your MS representative if you have one.
Free Windows Admin Tool Kit Click here and download it now
March 13th, 2006 11:47am
Calling it a bug implies that is supposed to do something else and I'm not sure that is the case. I'm not denying that it MIGHT be a bug - hopefully Matt will reply again and tell us (By the way, just because some behaviour was present in DTS, you shouldn't assume the same would be true of SSIS. This is a replacement, not an upgrade).
Regardless, I can understand why this is causing headache. Have you logged it at the feedback center (http://lab.msdn.microsoft.com/productfeedback/default.aspx)? If not, then you shouldn't expect that the feature will make it into the next version.
I don't understand why you call SSIS functionally poor. As Simon explained this can still be achieved quite easily. Surely the fact that there is "more than one way to skin a cat" so to speak means the product is functionally rich as opposed to poor? Is the fact that something cannot be achieved using your "preferred" method really prohibiting you from moving to a superior product?
-Jamie
March 13th, 2006 11:52am
From my research there has been a lot of posts about this issue in the last couple of months. But im still looking for a proper solution, is there a fix available? Is there a code sample available for a SSIS source component or whatnot, thatthreats uneven columns properly?
I've looked through the sample for the source component and could not get it to run (something aboutno compatible component in the dll)
Since people do need to import data fromdelimited files very often, SSIS should be expending the functionality of what there was in the DTS world... but in this case it was decided to limit it. Whoever came up with the delimiter for each column must have been so proud of himself that he generlized and removed the need for a row delimiter....
Well you haven't, and we want it back. :) It's causing many people, a lot of greef.
Or at least release the source for a flat file handler which people can customize to their liking.
There is so many flat file formats out there.... Why be so restrictive when dealing with them???
Free Windows Admin Tool Kit Click here and download it now
March 22nd, 2006 2:12am
Whats wrong with the sample I have provided.
SSIS focus was on building a framework that was performant, scalable and extensible. For this reason the components out of the box don't satisfy everyones requirements. But I have shown how it is very easy to extend SSIS with a script component to achieve your goal.
If you want a more polished solution you can develop your own custom component but that is another level of complexity that really isn't needed due to the power of the script component.
March 22nd, 2006 12:48pm
Someone mentioned that the code provided by Sabin didn't work for him/her. I have not tried that code.
Here is a version (another way to skin the cat) that can be tried. Please note that your first row shouldcontain all the column names for this to work. This script assumes that the input is configured in a way that all data on one row constitute a column. So basically, there is just one column in each row. The output of this script is also one column in each row. However, the output created by this script can be "understood" by SSIS file connection manager. Finally, assumption is that "tab" is column delimiter. You can change that below, if that is not the case.
Private dataRow As Boolean = False
Public Function AppendMissingTabs(ByVal Row As InputBuffer) As String
Dim columns As String() = Nothing Dim outputRow As String = Nothing Dim outputString As String = Nothing Dim buffer As StringBuilder = New StringBuilder()
buffer.Append(Row.BigSingleColumn)
columns = Row.BigSingleColumn.Split(New [Char]() {Chr(9)})
If Not dataRow Then columnCount = columns.Length End If
Dim N As Integer If columns.Length < columnCount Then
For N = 1 To columnCount - columns.Length Step 1 buffer.Append(Chr(9)) Next N
End If
dataRow = True
Return buffer.ToString()
End Function
Free Windows Admin Tool Kit Click here and download it now
March 22nd, 2006 6:10pm
It comes down to if I tell a system that my row finishes with a crlf I don't want it to override my decision and pull the crlf into the data.Why would I want the row delimiter in the data?!?? Is this a feature people needed and the SSIS team decided the former was solast week that this new functionality was, in fact, a better solution path...I just doubt it. The required functionality is the wayExcel, DTS, Access and many other apps handle this data import. Why isSSIS pulling crlf into my data when I've said this is the end of the row??!?. Just stop and move onto the next row. Put that functionalityback in and you've got a great delimited file importer.
Garry Swan
MCDBA
April 6th, 2006 11:05am
What you have actually said is that your column finishes with your delimiter. The flat file source then looks for that delimiter to end the column. The CRLF is not defined as the record delimiter but the delimiter of the last column.
I know that doesn't solve the problem but should explain the reason for the situation.
I still stand by the fact that if you want this functionality using the script component is a valid solution.
Free Windows Admin Tool Kit Click here and download it now
April 6th, 2006 12:23pm
Last night I uploaded a custom source adapter to SourceForge that would let you parse this file with regular expressions.
http://sourceforge.net/projects/textregexsource
Just connect a file connection manager to it and set a regular expression, and it could produce what you want.
I think a regex like (?'Col1'\w+),*(?'Col2'\w*),*(?'Col3'\w*),*(?'Col4'\w*),*(?'Col5'\w*)\n
might do the trick, though it might need tweaking.
Geof
April 6th, 2006 6:45pm
I posted about this a couple of weeks ago, and got no answer, so I am glad it is not just me.
In my view, this is a bug, as the row delimiter is ignored. It's not how it worked in SQL 2000, and it's not how it works in any version of Access. A row delimeter is a 'row delimeter', not a 'row delimeter if it happens to appear' in the last column.
It's all very well having work arounds, and other way of doing it, but it is putting another layer of processing in that you never needed to use before with all the issues that brings.
Still, it's not the worst bug we've found in moving to 2005!
SteveG
Free Windows Admin Tool Kit Click here and download it now
April 6th, 2006 7:12pm
Just checked the SP1 version of SQL2005 and it still has the same problem. Come on guys get this fixed.
July 18th, 2006 11:18am
CSIROSwan wrote:Just checked the SP1 version of SQL2005 and it still has the same problem. Come on guys get this fixed.
Its not a bug in my opinion. If you tell SSIS to expect a file in a certain format and it doesn't arrive in that format then what it should it do? Do you honestly think it should just guess like DTS would have done? What if it guesses wrong (as DTS was wont to do)?
There are very easy ways to deal with this problem. I posted something just last week explaining how one might do it and there's an executable demo there as well: http://blogs.conchango.com/jamiethomson/archive/2006/07/14/4226.aspx
Hope that helps!
-Jamie
Free Windows Admin Tool Kit Click here and download it now
July 18th, 2006 11:25am
Look trust me it's a bug. The new intergration system is really bad at importing data without lots of help. We are holding back one of our SQL2000 servers just so we can continue to use DTS (and that's saying something about SSIS). Really how hard can it be. Get the "normal" expected functionality back into these processes. Listen to your customer we are telling you this is not what we want!
November 22nd, 2006 9:49am
our solution was for me to write a c program that we pass flat files through first that fixes the number of columns
you tell it the number of columns and the deliminator and the string terminator if any and it will either remove or add columns to each row.
this is something in my opinion that should be built into sql server ssis
ie
D:\workarea\root>fixinput.exeFixInput (c) 2006 larry kahn - icentrix.com (kahn@lgk.com)
syntax: inputfile outputfile columns terminator delimiter debug
Note to pass a quote or command put it in quotes ie: fixinput infilename.csv out filename.csv 10 "," """ 1 to pass a tab character as the terminator pass T ie: fixinput infilename.txt outfilename.txt "T" "" 1
D:\workarea\root>
http://stage1.icentrix.com/fixinput.exe
hope this helps
Free Windows Admin Tool Kit Click here and download it now
November 22nd, 2006 5:12pm
See the thing is we can all write code that will work around this problem; however, we need to ask why we are writing code for such a simple import. There are 100's of solutions here however the only solution we should require for this very basic kind of data import is the SSIS package to recognise the line terminator and the column separators as every other package from Excel to DTS has done in the past.
This is a bug and someone in the SSIS team needs to take this seriously and get it fixed.
November 23rd, 2006 2:52am
> What you have actually said is that your column finishes with your delimiter.
> The flat file source then looks for that delimiter to end the column.
> The CRLF is not defined as the record delimiter but the delimiter of the last column.
Programmatically, I can see why this might have been easier to code...but, simply put, this is very non-intuitive behavior. People expect a record delimiter to take precedence over a column delimiter. Yes, it's self-consistent, but that doesn't mean it's "right".
And saying that there's a scripting workaround is a bitof a cop out. Don't get me wrong...I appreciate the workaround! But it seems like you're saying that because there's a workaround,the inconvenientand unexpected behavior is fine. Seems odd.
Free Windows Admin Tool Kit Click here and download it now
November 29th, 2006 8:00pm
Yup, definitely a BUG. We are not going to purchase SQL 2005 for our production servers until the issue is fixed. We have over 40 flat file importsand we will not spend the money trying to program a solution to this bug for each import. Just glad we purchased SQL 2005 Dev and tried recreating our DTS packs before we actually purchased SQL 2005.
December 7th, 2006 2:05pm
Zmope wrote:Yup, definitely a BUG. We are not going to purchase SQL 2005 for our production servers until the issue is fixed. We have over 40 flat file imports and we will not spend the money trying to program a solution to this bug for each import. Just glad we purchased SQL 2005 Dev and tried recreating our DTS packs before we actually purchased SQL 2005.
To me, the definition of a bug is something that does not work the way it is supposed to. That is NOT the case here. It is working exactly as designed - hence, not a bug. Comparisons with Access and Excel are also irrelevant. Why should SSIS comply with other products whose functionality has been bedded in for ten years or more? Its hardly comparing apples with apples is it?
Regardless, that does not mean to say that the many concerns raised on here are not valid so please don't flame me for what I've just said. If an enhancement to the functionality is required (i.e. if the design is wrong)then connect.microsoft.com is the place to ask for it. In the meantime Simon has provided example workarounds.
Just FYI, the reason that the functionality has changed from DTS is because of the SSIS team's desire to produce as performant a data pipeline as possible. My understanding is that the functionality herein stems from that. I'm not saying that as an excuse by the way - its not my job to defend Microsoft and if they have dropped a clanger (which it seems they have here) then they should listen to the community.
Incidentally, the person from MSFT that was originally replying on this thread (Matt David) no longer works on the SSIS team. Given the length of this thread its unlikely that anyone from the SSIS team is going to read it (and Microsoft have never committed to answering anything on these forums) hence if you want to bring it to the attention of the SSIS team, connect.microsoft.com is the correct channel in which to do it. There's alot been written on this thread but I don't think anyone has submitted it at the proper place. have they?
-Jamie
Free Windows Admin Tool Kit Click here and download it now
December 7th, 2006 11:15pm
Jamie,
Thanks for the reply. I agree on your definition of a bug. And comparisons with Access and Excel might not be relevant, but a comparison with definitely DTS is. DTS handled these unstructured flat files perfectly (as does Access and Excel). I suppose the issue is about product expectations. There should be an option to parse the file out if it is unstructured, again, especially if SSIS predecessor (DTS) functions this way implicitly. It's nice now to have advanced data flow options, but doesn't make sense to remove functionality (functionality people use) in a new version of a product.
So, maybe not a bug, but definitely poor design (on this matter).
Not including the option in SSIS essentially makes it unfeasible for us to purchase 2005 at this time. (As said in previous post, 40+ unstructured flat files imports with tons of columns from different vendors.)
BTW, for those interested in this thread, we just installed SQL 2005 Service Pack 2 Community Technology Preview (CTP) version (SQLServer2005SP2-KB921896-x86-ENU.exe) and this issue still is not resolved.
December 8th, 2006 3:34am
Jamie,
I have submitted this bug to connect when it was first suggested sometime back in this link (https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=126493). As for workarounds we can all write these, man there is a 100 different ways to solve this. However; this is behaviour is outside of what is expected. Have a look at the problem then tell me it's not a bug. Why is the line terminator pulled into the data? If that's not a bug then I don't know what is. I'm not sure how one defines getting a line terminator in their data as a "wow wish Id thought of that, isn't that handy" kind of a moment. The comparison to 3 other products (Excel, Access, SQL 2000 DTS) are relevant since these have what we call normal behaviour for this process.
Swan
MCDBA 2000
Free Windows Admin Tool Kit Click here and download it now
December 8th, 2006 3:39am
CSIROSwan wrote:
Jamie,
I have submitted this bug to connect when it was first suggested sometime back in this link (https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=126493). As for workarounds we can all write these, man there is a 100 different ways to solve this.However; this is behaviour is outside of what is expected. Have a look at the problem then tell me it's not a bug. Why is the line terminator pulled into the data? If that's not a bug then I don't know what is. I'm not sure how one defines getting a line terminator in their data as a "wow wish Id thought of that, isn't that handy" kind of a moment. The comparison to 3 other products (Excel, Access, SQL 2000 DTS) are relevant since these have what we call normal behaviour for this process.
Swan
MCDBA 2000
Thanks Garry. I've added a comment to the connect submission and I hope everyone else that has been on this thread does the same. The more people that raise this - the more likely that it will get changed.
-Jamie
December 8th, 2006 5:17am
I've added my comments to this as well. At least the latest response from MS didn't have another "next version" answer. Really hoping this might make it into SP3 for SSIS. The fact that there are so many different workarounds should really speak to the fact that this is not expected behaviour even if it's not technically a bug. I know I've wondered why my last column was too large quite a few times before remembering that it was an export from Excel and the empty columns were not exported with delimiters on all rows. I do agree that the comparisons with Excel/Access/DTS are valid - especially since Excel causes some of these problems when it exports to a delimited file.
Free Windows Admin Tool Kit Click here and download it now
December 9th, 2006 4:49am
There's alot of people commentingon this thread but not many of them have added anything to the Connect posting. Believe me - its more likely to get added in if more people ask for it.
-Jamie
December 9th, 2006 8:20pm
I agree, this is a bug. I just started using SSIS. And the first thing that I stumbled upon was this bug. How do I vote in the connect page? Please provide me the link.
Free Windows Admin Tool Kit Click here and download it now
January 16th, 2007 7:41pm
Omkumar wrote:I agree, this is a bug. I just started using SSIS. And the first thing that I stumbled upon was this bug. How do I vote in the connect page? Please provide me the link.
https://connect.microsoft.com/feedback/viewfeedback.aspx?FeedbackID=126493&wa=wsignin1.0&siteid=68
Click the 'Vote Now' button
Incidentally, only 5 people have voted - and one of those is me and I don't even care if it gets fixed or not. Its more likely to get changed if you add your comments in the appropriate place. The appropriate place is NOT on this forum.
Note the following comments from Bob Bojanic, a developer on the SSIS team:
"Unfortunately, there is no a setting, like you expected, to direct the parsing of flat files to first break rows and then columns.
The flat file parsing for SSSIS in SQL Server 2005 works on column by column basis, for performance reasons.
That is something we already had users ask for after the release, and it should definitely be available in the next version of the product. In the meantime people usually use the script component to do this type of parsing there are a few good posts about it on the SSIS MSDN forum."
-Jamie
January 16th, 2007 7:58pm
I agree, that feature bring us lots of trouble. It works good in 2000 but terriblely wrong in 2005 which scared me away from 2005
Free Windows Admin Tool Kit Click here and download it now
January 17th, 2007 10:18pm
coral bao wrote:I agree, that feature bring us lots of trouble. It works good in 2000 but terriblely wrong in 2005 which scared me away from 2005
Have you added your comments to the connect posting? If not, the link is elsewhere in this thread.
-Jamie
January 17th, 2007 10:28pm
Well, there is no "Vote" button, and I'm logged in under Windows Live ID.
It is one day after the anniversary of the birth of this post and still no solution (at least not the solution many of us are looking for). I understand that maybe DTS didn't work as it should in the past (guess it was considered "too forgiving"). I get that, I handle data entry for my site and sometimes I have to change the data entry screens to allow only valid scenarios and yes people complain that it doesn't work like before, well, before the changes were madewe were open to receiving bogus data entries that served neither my client nor the user. So I get it, SSIS is better than DTS because it is supposedly faster by ignoring "real world" data scenarios like "uneven delimited columns". I say "real world" because that is the data we have in this format that is no longer accepted by SSIS. What's even scarier is that SSIS will not "catch" this problem unless the dataloaded in justhappens to "break" a rule set by the field that will hold the {CR}{LF} and subsequent row that follows. Oh, this is very dangerous and "unfriendly" behavior - but as the MS MVP's point out it's not technically a bug. If it worked in the past (DTS) and now you get garbage in the present (SSIS) after loading in the same exact data file then that garbage you get that is loaded in is (in my opinion) a critical error. If it does load successfully and the DBA is unaware of it you will not only lose data, but inadvertently allow the user in datarow #16 to see another user's data in row #17.
My suggestion to the SSIS team (if they haven't implemented it already) - add an option to "Check for Row Delimiters before Column Delimiters" with a note that it "may cause slower performance" - seriously for many small-timers like me the difference in nano-seconds is tolerable. And please make this available as soon as possible, I'd rather not wait another year for the next Service Pack.
In the meantime another important question we should all be asking ourselves is, "How did we get the data with uneven delimiters in the first place?" For far too long we have all let this slide with DTS's intuitive and forgiving behavior. We all need to take better caution when loading in data (even if it was exported from DTS/SSIS). For many of us we don't have the luxury of dictating the delimiting quality of exports we receive from other parties. For cases like this, try using the workaround found in other posts (which I refuse to do) or write a simple C# program to read in each row, append missing delimiters as necessary, then write each row to a new file that you will later import (which I would rather do - I'm crazy-fun like that). So far I haven't found myself needing to fix large files, therefore I personally don't need (at the time of this writing) to create my own C# app to do this for me. Trust that if I do, I will come back to this post and give you a link to download it.
Now there is one other way we ended up here. It is how I ended up here and it is actually the most important thing in my post. It's "Excel"!!! Excel is an awesome spreadsheet, but a terrible "CSV Editor". The MS MVP's will probably argue that it's goal was never to be a CSV Editor or something like that - but come on, my CSV files all have Excel icons and MS sets Excel as the default application to open them. So here are my gripes about CSV's in Excel: you lose your text qualifiers (if they're double quotes - I don't know about single quotes), you lose your matching number of delimiters for each line, and Excel auto-formats your string data to numeric data.
Say I have "01", "02", "KG" for a column, when I export them out of SQL Sever. I get "01", "02", "KG" when I open up the CSV fileusing Notepad. If I open up the file using the default application - Excel, I get: 1, 2, KG. When I save it in Excel and open it up inNotepad I see this now: 1, 2, KG. Notice my leading zero's are gone!!! This is something I'm sure many of us here are already familiar with, but please bear with me. Notice the double-quotes are also gone!!! I thought, well, maybe how Excel handles csv isn't to my taste, but I'll try saving it as an Excel spreadsheet and load that in to see if I get any better results. Try this too, create a new text file and type this in:
"01""02""KG"
Save the text file as .csv, then open it up in Excel. Now save it as a regular Excel spreadsheet (.xls) and try importing it using SSIS's import from Excel feature (I actually had it going to a table where the datatype was varchar(2) and it allowed nulls). You might not be as shocked as I was to see that the "KG" was entirely missing when "previewing" theimported data- NO WARNINGS, NO ERRORS about LOSS OF DATA- nada! Just shows 3 rows loaded like we would think it would if all went well. Scary stuff!!! Open up the table and there is no 01, just 1 - as for KG, just null.
Now I'm not trying to load in a go-zillion records where the text-qualifier goes missing or therewere missing delimiting-commas on the end of my records. I'm here in this forum because I was editing tables -very manageable tables ranging from 16 rows to a 1,000. Editing these tables in SQL Server is a major pain, especially if all you want to do is copy a bunch of rows you already have and then edit the fields you want to create new rows. "Real World" example here: My tests. I have tests scheduled for 3 times a year. They don't vary much during the year, so I would like to open up a file, copy the rows (or tests) from earlier in this year, paste those rows, change a few columns that hold the testing dates and answer keys, then save it and load it back into my Database.
Okay, maybe that's too much to ask for - maybe it goes against all principals and practices of being a DBA - but if I can get the same results in a shorter amount of time, which do you think I'm gonna do: add row by row in SQL Server and re-type everything in by hand for each of the 100's of new tests??? Or do a little copy and pasting - copy-paste old tests, then update them by copy-pasting the answer key - hey, less human typing equals less human error! I'm not completely lazy, I do plan in the future to make an interface to easily copy and update tests without editing tables by hand - but the future isn't here yet, so hands and copy-paste are my only options at the moment. There is nothing I can do about the behavior of Excel - and I was about to lose hope. I couldn't write my own spreadsheet application to do what I want... but what if someone, or some people already did. So I searched for other SpreadSheet Apps, and I found "Open Office". It offers a spreadsheet program called "Calc". It's popular and it's totally free so what do I have to lose but a few minutes to download and install? So I installed it on my machine. Just to avoid any problems I opened up "Calc", went to "Tools"->"Auto Correct" and disabled all the checkboxes for the "Options" tab. NOTE: Opening up your data in any spreadsheet application is risky because of Auto-Correct!!! Remember that. Okay so I went ahead andopened up a CSV file with the "Calc" application and to my delight there was a prompt that showed it understood that there were rows and columns and how I would like to handle them - Do not leave everything as standard (if there is a column that you know has text data but the values can be "promoted" to integer, then select "Text" for those columns - that is what I did in this case). Looked at the file in "Calc" - saw nothing suspicious. Then saved it, opened it up in notepad and there were my beautiful Text-Qualifiers (double quotes) and leading zeros! Hooray! Plus, If I wanted to add something like "01", I just add an apostrophe beforehand, or I can set the column datatype to text and it knows not to promote to numeric.
Okay, some of you are thinking, "Oh but Excel does this too, just click on "Data"->"Import External Data" and you will see something similar to what Open Office already does when Open Officeopens a CSV file. Well, it's not the same. Yes, the advantage you do get with "Import External Data" in Excel is you can set those columns to text. The columns I am referring to again are the ones withstring data that can be promoted to numeric data and corrupt its value (i.e. the "01" to 1 problem). So yes, this is one way to open it up in Excel and keep those text values fully intact. However, it does not fix the problem of losing the Text-Qualifiers OR the problem of un-even column delimiters. For those of you who still don't understand what it is I am talking about, then try it out yourself. This is an export Straight from DTS/SSIS (the column heading have been changed to protect my IP):
"BranchID","FarmD","BranchFarmID","BranchType","MinValue","MaxValue","SellerID"1,29,30,"BO",8,10,2,29,31,"BO",0,7,3,30,41,"BO",3,8,4,30,31,"BO",0,2,5,31,41,"BO",5,6,6,31,32,"BO",0,4,7,32,33,"BO",4,5,8,32,36,"BO",0,3,9,33,34,"BO",4,5,10,33,36,"BO",0,3,11,34,35,"BO",4,5,12,34,36,"BO",0,3,13,35,36,"BO",0,5,14,36,37,"BO",4,5,15,36,41,"BO",0,3,16,37,38,"BO",4,5,17,37,41,"BO",0,3,18,38,39,"BO",4,5,19,38,41,"BO",0,3,20,39,40,"BO",4,5,21,39,41,"BO",0,3,22,40,41,"BO",0,5,23,42,45,"RF",,,7424,42,48,"RF",,,7425,122,123,"BO",10,12,26,122,125,"BO",0,9,27,123,124,"BO",3,4,28,123,125,"BO",0,2,29,125,126,"BO",5,6,30,125,129,"BO",0,4,31,126,127,"BO",8,10,32,126,129,"BO",0,7,33,127,128,"BO",3,4,34,127,129,"BO",0,2,35,130,131,"RF",,,26036,130,132,"RF",,,26037,130,133,"RF",,,26038,69,71,"BO",5,8,39,69,70,"BO",0,4,40,86,88,"BO",19,20,41,86,87,"BO",0,18,42,72,75,"RF",,,17143,72,75,"RF",,,17244,72,78,"RF",,,17145,72,78,"RF",,,17246,151,154,"RF",,,31147,151,152,"RF",,,31148,2,3,"BO",0,3,49,2,11,"BO",4,10,50,3,4,"BO",0,5,51,3,11,"BO",6,8,52,4,5,"BO",4,5,53,4,9,"BO",0,3,54,5,6,"BO",4,5,55,5,9,"BO",0,3,56,6,7,"BO",4,5,57,6,9,"BO",0,3,58,7,8,"BO",4,5,59,7,9,"BO",0,3,60,8,9,"BO",0,5,61,9,10,"BO",20,26,62,9,11,"BO",0,19,
Now save this data to a text file and mark it as your original. Make two copies and save it as csv files (one with the name "Savedby Open Office" and the other "Saved by Excel". Now open up one in Open Office and click save, then close - don't edit anything. Open up the other in Excel, save and close, and again, don't edit anything - you can also try it using the "Data"->"Import External Data" option (but you will still see the same results). Open up the csv saved by Open Office and everything is as it should be. Open up the csv using Excel and see that the Text Qualifiers are gone again and notice that by line 48 your extra "column delimiter" (or comma) drops off the face of the earth for the rest of the records. Now there isn't anything magical about the number 48 - it starts on 16 on another file I have - it appears random - I don't know how to predict when it will occur, but once I see it in a file, when I reopen the file and save it, it always starts messing up on the same row for that file.
So there you have it, straight from SQL Server, opened in Excel and the data is IMHO (In My Honest Opinion) corrupted - thought that's a whole other debate, I'm sure an MS MVP will say "it's not corruption", and that "it is by design" - then IMHO I don't like the design. I hope this long post will help each of you to understand that the problems with the "end result" we are trying to get are multi-faceted, from a lost "friendly handling" of data between DTS and SSIS, to Excel casually "dropping off" Delimiters and Text Qualifiers, to whatever else you happen to stumble upon in your own data adventures. I only offer two workarounds, for humongo files, write a C# app to fix them - not that hard, but will take 4 hours to a day to write and test. The other, for smaller files (especially those you want to edit to your content), use Open Office - I would avoid Excel for editing and viewing CSV files. Oh, and to compare the differences in the original files (humongo and small) after making edits orjust checking for changes after opening and saving using different softwares;I suggest a cool little app from Scooter Software called Beyond Compare 2. It's awesome and free to try - plus you will love it - especially when coding and something breaks, you can use this tool to compare the working version of your code to the broken version to see what files were changed and what rows were editedto help you debug - I've been using it for 6 years now and it's saved my butt countless times. Especially when I have a 25 MB data file and I want to compare it to a 25 MB data file from the day before.
Also, sorry for the post, I'm long-winded, but Ilike to be as detailed as possible, especially fornewbies.
Free Windows Admin Tool Kit Click here and download it now
January 21st, 2007 8:33pm
Miguel_TX wrote:
Well, there is no "Vote" button, and I'm logged in under Windows Live ID.There is for me, inside the "Ratings" box. However, the bug submission is marked as closed.
January 21st, 2007 10:04pm
It is closed because MS say that is the way it has been designed.
You need to vote to get MS to do something about it. If you don't vote they won't.
If you look at the feedback on scripting and what has been put in SP2 you will see that giving your feedback will make things happen
Free Windows Admin Tool Kit Click here and download it now
January 23rd, 2007 2:48am
I'd still add your vote and whatever comments you feel are necessary. I guess a new bug submission could be entered, but it's likely to be closed as it's a "feature" to MS. Pointing back to this thread saying that workarounds exist is just denying that there's a problem. Sure we can write script and code all day long to get around MS' shortcomings, but that defeats the purpose of using a product that's designed to make our lives easier.As I noted in an earlier post, part of this is caused by the fact the Excel causes this problem when you save it out to a CSV file. (And you can't trust Excel directly because the JET provider for it interprets datatypes of the various columns so poorly.)Anyone who is still reading this thread and cares about this needs to add their votes and comments to the MS Connect site:https://connect.microsoft.com/feedback/viewfeedback.aspx?FeedbackID=126493It's the only way we might get a fix prior to their next major release of SQL Server.-Pete
February 3rd, 2007 12:02am
paschott wrote:It's the only way we might get a fix prior to their next major release of SQL Server.-Pete
I wouldn't guarantee you'll get one in that either, forget about prior to it
Free Windows Admin Tool Kit Click here and download it now
February 3rd, 2007 12:14am
As yet the problem remains. I have voted even though the problem is allegedly long closed. Why do we pay so much for the software if we cannot get issues fixed. It still surprises me that much used facilities are dropped all that happens is somebody says fix it youself.
June 7th, 2007 4:34pm
Madhattuer wrote:
As yet the problem remains. I have voted even though the problem is allegedly long closed. Why do we pay so much for the software if we cannot get issues fixed. It still surprises me that much used facilities are dropped all that happens is somebody says fix it youself.How do you know we aren't getting this fixed? Have you looked at the June CTP of SQL Server 2008 yet to see if this is still the case? Give Microsoft some credit, please.
Free Windows Admin Tool Kit Click here and download it now
June 7th, 2007 4:59pm
Microsoft have produceda much better IDE for SSIS which is much better than the one for DTS with modifications to the package working through neatly. All in all seeing the progress colour coded makes for an easier development process.
In the past I have had experience of areas which used to work being changed and then not being corrected on notification. Unfortunately there are always to many jobs to do in the available time.
Unfortunately, looking at the rest of this thread, I am not the only one who has negative feelings about changes.
I have had a quick look at the overview for 2008(basic text search for SSIS) and could only find generic statements with no specifics mentioned.
Looking at the feedback reference the topic had been closed a long time ago with no comment as to a fix being worked upon. The manner of the closing seemed to suggest otherwise, a feeling also shared by comments found on the thread.
Obviously by the response I have stirred this topic is emotive in both camps.
Unfortunately we have, in the past, been left in the position that a bug was not going to be fixed (current up to Office 2003), have yet to check 2007. A bug which did not exist in version 97. This left us in an awkward position with our customers when they encounter the problem as there was no work around available.
June 7th, 2007 5:45pm
Madhattuer,Did you read Bob's response that Jamie included in his post above? Bob thinks it should be in the next release. That would be SQL Server 2008. Don't expect the documents to detail everything they are working on in the next version.
Free Windows Admin Tool Kit Click here and download it now
June 7th, 2007 5:50pm
I must admit I can't find that one. Oh well it's been a long day. Thanks for the info, but for now I better find a work around.
June 7th, 2007 6:01pm
Madhattuer wrote:
I must admit I can't find that one. Oh well it's been a long day. Thanks for the info, but for now I better find a work around.It's the third post from the top on page 3... Let me include it here for you:
Jamie Thomson wrote:
Note the following comments from Bob Bojanic, a developer onthe SSIS team:
"Unfortunately, there is no a setting, like you expected, to direct the parsing of flat files to first break rows and then columns.
The flat file parsing for SSSIS in SQL Server 2005 works on column by column basis, for performance reasons.
That is something we already had users ask for after the release, and it should definitely be available in the next version of the product. In the meantime people usually use the script component to do this type of parsing there are a few good posts about it on the SSIS MSDN forum."
-Jamie
Free Windows Admin Tool Kit Click here and download it now
June 7th, 2007 6:05pm
To anyone still reading this thread....
I have written a summation of this issue here:
Flat File Connection Manager issues
(http://blogs.conchango.com/jamiethomson/archive/2007/05/15/SSIS_3A00_--Flat-File-Connection-Manager-issues.aspx)
It may help anyone that is vexed by this. Also, take a note of the comments at the end, they make for interesting reading.
-Jamie
June 7th, 2007 6:16pm
Must be one of my tired days. Thanks, must have got knocked bandy by the big email from Miguel.
Thanks again
Free Windows Admin Tool Kit Click here and download it now
June 7th, 2007 6:23pm
I too still can't believe this hasn't been fixed. It falls under a bug since the line terminator is pull into the data, simple. It's just a fundamental error. I now have instructed my team to call DTS packages from SSIS to pull in uneven column widths. You can still install DTS on a SQL2005 machine as a stand alone app. I am just bewildered as to how you get bugs fixed with Microsoft. I am a huge fan of Microsofthowever this has left me thinking that getting them to listen is a lot harder than they make out at the annual TechEd events.
Im going to re-open this problem and see if they close it again.
Wish me luck!
June 8th, 2007 2:02am
Gerry,
Good luck! I think you'll need it
We're not still having this same argument surely
They WILL close it because in their eyes (and I know you don't agree) it is not a bug. Bob Bojanic has already stated that they are looking at this for katmai. If they don't providesomethingin katmai that appeases everyone then you can complain long and hard andI shall stand shoulder-to-shoulder with you and be very verycritical.
I know you will answer that they could have fixed it in the service packs but look at it from their point of view. Should they fixbugs (and remember, they don't think this is a bug)or should they provide different ways of doing things that are currently very veryachievable (and, mayI add, pretty easy)? Hopefully everyone would say they should be fixing bugs.
Read the comments on my blog post. The majority of people commenting (admittedly there isn't many of them) don't think this is abug.
Please don't flame me. I know you don't agree, I'm just trying to provide the alternative (i.e. Microsoft's)point-of-view.
Regards
Jamie
Free Windows Admin Tool Kit Click here and download it now
June 8th, 2007 2:27am
Well you know arguments they can just run and run. Do you know if Simon is still around. Have attempted twice to get his example from http://sqlblogcasts.com/files/4/integration_services/entry412.aspxbut it won't let me join to download the example. Trying to remedy by pulling in as one column and then spliting using scripting however can't seem to access column data which insists that it is a blob? Still searching for info on this but with little joy so far.
June 11th, 2007 4:58pm
Madhattuer wrote:
Well you know arguments they can just run and run. Do you know if Simon is still around. Have attempted twice to get his example from http://sqlblogcasts.com/files/4/integration_services/entry412.aspx but it won't let me join to download the example. Trying to remedy by pulling in as one column and then spliting using scripting however can't seem to access column data which insists that it is a blob? Still searching for info on this but with little joy so far.This is a result of either mismatched data types coming into your script, or your syntax when trying to write your script. See your other thread for details. We're not going to discuss your blob issue in this thread.
Free Windows Admin Tool Kit Click here and download it now
June 11th, 2007 5:01pm
Talk about a watched kettle never boils. I left my computer to get a drink and my login details arrived.
Having a quick look at Simons example his output column from the flat file was wstr and mine was set to Text. Setting to DSTR seems to cover the problem in this case.
June 11th, 2007 5:32pm
But Mat,
In this casehow do we deal with the extra column values(In rows of varying size) that are getting added into a new row. I have fewColumns areDT_TEXT types having"new line" delimiters in its content, whichsimply adds more rows. Sohow should Ideal this using script component.
Thanks
Subhash Subramanyam
Free Windows Admin Tool Kit Click here and download it now
June 18th, 2007 10:40am
ok since you are still complaining about this here is the fix I wrote.. a c program which fixes the input file so that it has the correct fixed number of columns.. you tell it how many columns it should haved and what the string delimiter and column deliminter are and it pads or strips columns off as necessary..
ie
C:\utils>cat 1.csv1,2,3,4,"now is the","time"
C:\utils>fixinput 1.csv 2.csv 8 "," """ 0
C:\utils>cat 2.csv1,2,3,4,"now is the","time",,
C:\utils>
syntax:
C:\utils>fixinputFixInput (c) 2006 larry kahn - icentrix.com (kahn@lgk.com)
syntax: inputfile outputfile columns terminator delimiter debug
Note to pass a quote or command put it in quotes ie: fixinput infilename.csv outfilename.csv 10 "," """ 1 to pass a tab character as the terminator pass T ie: fixinput infilename.txt outfilename.txt "T" "" 1
download here:
http://stage1.icentrix.com/fixinput.exe
June 22nd, 2007 4:14am
Madhattuer wrote:
As yet the problem remains. I have voted even though the problem is allegedly long closed. Why do we pay so much for the software if we cannot get issues fixed. It still surprises me that much used facilities are dropped all that happens is somebody says fix it youself.
How do you know we aren't getting this fixed? Have you looked at the June CTP of SQL Server 2008 yet to see if this is still the case? Give Microsoft some credit, please.
What the heck??? 2008? So your solution for all these people is to purchase yet another version of the software to get back functionality that worked in SQL 2000 DTS? That sir, is B.S. ... and typical Microsoft.
Free Windows Admin Tool Kit Click here and download it now
August 24th, 2009 7:06pm
My problem seems very close to this. I have an Excel spreadsheet, with multiple columns. Some lines have more columns than others. I want to export the data in each column, into a text file, such that when the export function comes across a new column, it generates a newline character. Tes1 test2 test3 test4 test5 testa testb testc testd teste testf becomes Test1 test2 test3 test4 test5 testa testb testc testd teste and so on. Splitting the columnar layout of the current export file is tedious in the extreme. My editor (LPEX) seems to say it can do this with a series of wrap commands, but I can't get that to work at the moment, and it too will be time-consuming. Any suggestions ? I am not a trained programmer, so please, any suggestions need to be explained in simple terms. This is my very first attempt at asking a question of Microsoft.
September 15th, 2009 11:42am
Well it don't work in 2008 either.
I've hit the same problem.
Free Windows Admin Tool Kit Click here and download it now
April 19th, 2010 1:14pm
Hi Jamie,
I've implemented this example and it works like a charm, but now I'm running into an issue with qualified text. Text that comes in as "Adams, Sam" is still getting split into two cells even though I have specified the qualified text (")
in the import component. I'm guessing the script is overriding this..
I'm wondering if there's a way to modify your script to handle these kinds of exceptions? So if text comes in within quotes, everything in the quotes goes into one cell...including any commas.
Thanks so much,
Josh
May 18th, 2010 8:36pm
Hi Jamie,
I've implemented this example and it works like a charm, but now I'm running into an issue with qualified text. Text that comes in as "Adams, Sam" is still getting split into two cells even though I have specified the qualified text (") in the import
component. I'm guessing the script is overriding this..
I'm wondering if there's a way to modify your script to handle these kinds of exceptions? So if text comes in within quotes, everything in the quotes goes into one cell...including any commas.
Thanks so much,
Josh
Hi Josh,
I'm not sure which of my scripts you're referring to. I've reviewed this thread and don't see any links to any scripts that I have posted anywhere (if I missed it I apologise, its a long, old thread)
Regardless, sounds like you have a different issue to the one talked about here so you may want to raise it as a new thread.
-Jamiehttp://sqlblog.com/blogs/jamie_thomson/ |
@jamiet |
About me
Free Windows Admin Tool Kit Click here and download it now
May 19th, 2010 4:55pm
you provided this link in this thread on july 18 2006
http://consultingblogs.emc.com/jamiethomson/archive/2006/07/14/4226.aspx
it's this script that's giving me issues with qualified text.
hope this helps!
Josh
May 19th, 2010 5:06pm
you provided this link in this thread on july 18 2006
http://consultingblogs.emc.com/jamiethomson/archive/2006/07/14/4226.aspx
it's this script that's giving me issues with qualified text.
hope this helps!
Josh
Ak OK. I'm sure there is a way to adapt it to cope with that sort of situation but .Net development isn't really my forte. The script is provided "as is" in the hope that it provides a starting point for people needing to accomplish something similar - I'm
sure you can find help on the many .Net forums that exist if you need to modify it.
I suspect that the
String Class and its
members would be a good starting point too.http://sqlblog.com/blogs/jamie_thomson/ |
@jamiet |
About me
Free Windows Admin Tool Kit Click here and download it now
May 19th, 2010 9:57pm
I know this is an old post, and I'm just facing the problem now, not that I'm going to receive mal formed files, but just to have a quality check, I want to make sure each of my lines has 200 characters, and then a newline.
This simple check shouldn't have taken more than 5 minutes, here, it is taking me a lot of time, as it already took a lot of time for many of you.
I have just one word : Thank you Microsoft for thinking differently !BizTalk Consultant in France
March 28th, 2011 7:18pm
I also am experiencing this as well. We want to convert some FoxPro imports to SSIS. SQL Server 2008 RC2 doesn't do this yet.... Makes it hard to convince people that moving to SSIS from Foxpro is better....
Free Windows Admin Tool Kit Click here and download it now
April 26th, 2011 1:31pm
What about for fixed column files that have 2 columns at the right that may or may not be there...
This does not address this issue at all.
There is a LF at the end of each row, which should be read as the end of the data for that record. Why is this so difficult to have in SSIS?
April 26th, 2011 1:46pm
Why is this so difficult to have in SSIS?
it's hard to import this:
"test1";"abc";"xyz";"123"
"test2";"cba";"zyx";"321"
"test3";"abc"
"test4";"efg";"zyx"
"test5";"cba";"zyx";"321"
because it's not a
correct csv file.
But with a little scripting it's possible:
http://microsoft-ssis.blogspot.com/2011/02/script-component-as-source.htmlPlease mark the post as answered if it answers your question | My SSIS Blog:
http://microsoft-ssis.blogspot.com
Free Windows Admin Tool Kit Click here and download it now
April 26th, 2011 3:28pm
But what about Fixed Length columns without scripting. I'm a DBA not a programer and need to import a number of these kinds of files.
No delimeter in the file except for <LR> at end of row.
File has 6-8 columns per row. It's columns 7 and 8 that are optional and the <LR> is after the 6th column if the last two columns are not present. This should be EASY. End the record with the <LR>.
April 26th, 2011 3:51pm