data Cleansing-standarizing column,s data
I have several DBases with many attributes . I'd like to use SSIS to read the source DBs, clean the data, and insert into the relational DB.
first part of cleansing requires
Gender column has 0/1 in lahore DB
Gender column has 1/0 in karachi DB
Gender column has m/f in islamabad DB
Gender column has null in peshawar DB.
I want SSIS to read all Dbases one by one then convert Gender column values to "Male And Female" and send erronous data to "error/exception table"
Second thing is i have Dtae Of Birth column in which erronous dates exists like 31-feb-90,22-apl-87 ,21/03/76 etc.Similarly I want SSIS to read Date of birth column from each database seperately , check erronous/dirty data if exists,send dirty data to a
new "exception/error table". and send data after modification to original tables.
PLZ i need help ASAP plzzzzzzzzzzzz
November 2nd, 2010 6:44am
If I am not wrong you have already posted this query in an erlier thread. No point in creating duplicate threads. If you are not satisfied by any reply you can ask the questions/confusions/doubts in the same thread.Nitesh Rai- Please mark the post as answered if it answers your question
Free Windows Admin Tool Kit Click here and download it now
November 2nd, 2010 6:56am
thanks nitesh...i have seen your answer there,here i have explained my question in detail.thats why i posted this and i had not seen your reply when i posted this one.
1) Use case statement to convert 0 and 1 to M and F in source itself. You can also use derived column to do this: ColName==0?"Male":(ColName==1?"Female":"NA")
2) You need to convert the date value in proper format (yyyy-mm-dd) and then use a derived column with expression as : DT_DBDATE(DateCol). Configure the error op of the derived column to redirect the erroneous date to a error table/file
i have to do all this using SSIS and i don`t know how to use SSIS to perform my data cleansing before ETL. i have used some queries in my SQL to check dirty data,
for gender
SELECT
[Gender]
FROM [Isb_practice].[dbo].[BS_Students]
Where
(Gender <>'0') AND ( Gender <>'1')
i found 8 values with (00,11,01,10,001,M) .
similarly, for date of birth
SELECT *
FROM Student
WHERE ([Date of Birth] NOT LIKE '%_-Jan-__')
AND ([Date of Birth] NOT LIKE '%_-Feb-__')
AND ([Date of Birth] NOT LIKE '%_-Mar-__')
AND ([Date of Birth] NOT LIKE '%_-Apr-__')
AND ([Date of Birth] NOT LIKE '%_-May-__')
AND ([Date of Birth] NOT LIKE '%_-Jun-__')
AND ([Date of Birth] NOT LIKE '%_-Jul-__')
AND ([Date of Birth] NOT LIKE '%_-Aug-__')
AND ([Date of Birth] NOT LIKE '%_-Sep-__')
AND ([Date of Birth] NOT LIKE '%_-Oct-__')
AND ([Date of Birth] NOT LIKE '%_-Nov-__')
AND ([Date of Birth] NOT LIKE '%_-Dec-__')
AND ([Date of Birth] <> '') AND ([Date of Birth] IS NOT NULL)
The output of the above query shows 3 invalid dates. 22-Jal-75,1/27/75,27-Apl-77.
SELECT * FROM Student
WHERE ([Date of Birth] LIKE '29-Feb-%') OR
([Date of Birth] LIKE '3_-%')
The output of the above query shows 9 invalid dates. 29-Feb-75,31-jun-89 etc.
WHAT I WANT????
1)first SSIS read data.
2)if don`t find 0 or 1 in gender send erronous data to "error table".
3)in "error table", dirty data
(00,11,01,10,001,M) should be converted in 0,1
4)after standarization, error table data should go to actaul table at its place.
5)Now i will change 0/1 to male and female.
6)now i will send complete table of student(stid,name,father,gender,adress,last degree,reg data,dob) to a new table STD_INFO.
7) i will repeat same 1-6 steps for all databases(lahore,karachi,islamabad,peshawar) and then i will integrate STD_INFO tables of all databases into one.
i hope now you can understand what i have to do. Similar case with DOB first dirty data to erronous table after conversion there to original table and
then i will check all records at the same time.
November 2nd, 2010 7:53am
Gender==0?"Male":(Gender==1?"Female":"NA") giving error
TITLE: Microsoft Visual Studio
------------------------------
Error at Data Flow Task [Derived Column [46]]: The data types "DT_WSTR" and "DT_I4" are incompatible for binary operator "==". The operand types could not be implicitly cast into compatible types for the operation. To perform this operation, one or both
operands need to be explicitly cast with a cast operator.
Error at Data Flow Task [Derived Column [46]]: Attempt to set the result type of binary operation "Gender == 0" failed with error code 0xC0047080.
Error at Data Flow Task [Derived Column [46]]: Computing the expression "[Gender] ==0?"Male":[Gender]==1?"Female":"NA"" failed with error code 0xC0047084. The expression may have errors, such as divide by zero, that cannot be detected at parse time, or there
may be an out-of-memory error.
Error at Data Flow Task [Derived Column [46]]: The expression "[Gender] ==0?"Male":[Gender]==1?"Female":"NA"" on "input column "Gender" (114)" is not valid.
Error at Data Flow Task [Derived Column [46]]: Failed to set property "Expression" on "input column "Gender" (114)".
------------------------------
ADDITIONAL INFORMATION:
Exception from HRESULT: 0xC0204006 (Microsoft.SqlServer.DTSPipelineWrap)
------------------------------
BUTTONS:
OK
------------------------------mhassanshahbaz
Free Windows Admin Tool Kit Click here and download it now
November 2nd, 2010 9:30am
Hi mhassanshahbaz,
The error "The data types "DT_WSTR" and "DT_I4" are incompatible for binary operator "==". The operand types could not be implicitly cast into compatible types for the operation. To perform this operation, one or both operands need to be explicitly cast
with a cast operator." is caused by that, the Gender is defined as Char in the database, and it is considered as DT_WSTR in the SQL Server Integration Services(SSIS), 0 is DT_I4, we can compare DT_WSTR with DT_I4.
To fix the issue, please change the expression to be:
Gender=="0"?"Male":(Gender=="1"?"Female":"NA")
Thanks,
Jin ChenJin Chen - MSFT
November 4th, 2010 5:16am