Showing posts with label lookup ssis. Show all posts
Showing posts with label lookup ssis. Show all posts

Thursday, 18 October 2018

Load of the datawarehouse without the lookup transform-SSIS





               Load of the datawarehouse without the use of lookup tables

This particular module has been developed for those customers where the lookup tables are very large and the join transformations shall help process the information faster than lookup. In this particular case the ETL team has replaced the lookup with join transforms.
Note: This method cannot be applied for type 2 entities
This method currently handled insert, updates and deletes.
The solution shall be developed over 3 data flows as given under:





In the supply load dataflow the ETL team has developed the methodology for inserts, updates and deletes as follows:





Method for handling inserts and updates
·         Join the source and the target table using the merge transform based on the composite keys. The end client could pick up the surrogate keys based on the same.






In case of the insert, the identity key column should be checked for the identity row insert.




For updates the table data is stored in a copy of the target table. Thereafter the update sql is run for the same using the sql transform.



Handling deletes:
In the case of deletes, the ETL team looks for the composite key details in the target table and the given details should not be in the source table. In case the given condition is applicable then the row in the target table is marked as ‘D’(delete flag)


Contact details:
For resource requirements:resources@mnnbi.com
For training and consulting services: apoorv@mnnbi.com
Blogspot details: https://mndatasolutionsindia.blogspot.com/

Written by Gauri(gauri@mnnbi.com)

https://mndatasolutionsindia.blogspot.com/p/gauri.html



Monday, 8 October 2018

Fuzzy Lookup for replacing special characters in SSIS

Using fuzzy lookup for cleansing data using SSIS
Problem Statement : The ETL team has developed a sql code for cleansing the data from ERP such as Tally. The code entails multiple conditions for lookup of special characters. The ETL team intends to automate the lookup process having special characters. Hence it has developed an SSIS package for the same.

The original sql code has code for replacement of special characters as follows:

 



1st special character lookup where ‘amp;’ was replaced by   ‘&’ and ‘#13;#10;’ was replaced by space  



2nd special character lookup: in this the special characters within SQL such as char(9),char(11),char(12),char(13) have been replaced by blank.

The issue with the above code was that during each ETL extract across multiple companies, the ETL team had to look for various special characters because of the nature of the input data source. The ETL team desired to automate the entire process using SSIS  ( fuzzy lookup function).


 The replacement of special characters using fuzzy look up transformation which performs data cleansing task such as data correction and providing missing values. Fuzzy look up transforms return data with one or more closest matching values. It does not differentiate between close values and provides exact matching. Fuzzy look up transformation is used to correct and cleanse the data. The fuzzy look up transform takes input data from the data flow and matches with a table for the mismatch values.



The fuzzy look up transform takes the input data from data flow and provide the connection with sql server   for a  specific database to  retrieve the input table from the same database and match the column of the table  as shown in below:

In the first step, the ETL team extracted the required columns from the source table.




After retrieving the  required input columns fuzzy lookup transform uses the reference table to match the exact value between input column and look up column as shown below:



For exact matching set the value of threshold in fuzzy lookup transforms as shown in below:
The ETL team used an approximation of 65%(based on trial and error). It also set up the maximum number of matches per lookup output to be 1(no duplicate values). The special characters to be fed into the lookup transform have been attached in the fuzzy lookup ‘additional delimiters’ section.




For automating the following portion of the sql code entailing filter parameters:

The team used the conditional split as follows:

The conditional split is used in SSIS tool for condition such as IF and case statement as used in programming language and set the variables, parameters  ,predefined functions and operators .



Union all is used to combine multiple sources and produce one output without sorted the data where the order of data is not necessary. If wants to sorted the data merge join is used instead of union all transformation.

Contact details:
For resource requirements:resources@mnnbi.com
For training and consulting services: apoorv@mnnbi.com
Blogspot details: https://mndatasolutionsindia.blogspot.com/

Written by Gauri(gauri@mnnbi.com)

https://mndatasolutionsindia.blogspot.com/p/gauri.html


Initial and Incremental data Load Template by M&N Business Intelligence-SAP Data Services

  INCREMENTAL LOAD/CREATION OF DIMENSION TABLE LOGIC AT SAP DATA SERVICES END CLIENT In this particular document we shall be looking at the ...