Thursday, 18 October 2018

Load of the datawarehouse without the lookup transform-SSIS





               Load of the datawarehouse without the use of lookup tables

This particular module has been developed for those customers where the lookup tables are very large and the join transformations shall help process the information faster than lookup. In this particular case the ETL team has replaced the lookup with join transforms.
Note: This method cannot be applied for type 2 entities
This method currently handled insert, updates and deletes.
The solution shall be developed over 3 data flows as given under:





In the supply load dataflow the ETL team has developed the methodology for inserts, updates and deletes as follows:





Method for handling inserts and updates
·         Join the source and the target table using the merge transform based on the composite keys. The end client could pick up the surrogate keys based on the same.






In case of the insert, the identity key column should be checked for the identity row insert.




For updates the table data is stored in a copy of the target table. Thereafter the update sql is run for the same using the sql transform.



Handling deletes:
In the case of deletes, the ETL team looks for the composite key details in the target table and the given details should not be in the source table. In case the given condition is applicable then the row in the target table is marked as ‘D’(delete flag)


Contact details:
For resource requirements:resources@mnnbi.com
For training and consulting services: apoorv@mnnbi.com
Blogspot details: https://mndatasolutionsindia.blogspot.com/

Written by Gauri(gauri@mnnbi.com)

https://mndatasolutionsindia.blogspot.com/p/gauri.html



Wednesday, 17 October 2018

Deployment of SSIS package


Deployment of SSIS package

This note intends to address the following issues with regards to deployment of the SSIS packages.

·         Migrate the packages from one environment to another-development to integration to production

·         Code the job dependency-once a job is completed the second job is to begin.

·         Handle multiple database connections across different environments.

·         Handle errors while execution of the jobs

Attached is the flow of steps for the deployment of the package.

1.       Build the package

2.       Convert to deployment model.
3.       click on ‘Properties’
Change the deployment utility to ‘true’.



4.       Creation of the Manifest file

5.       Run the package installation wizard.
6.       go to next and check on ‘file system deployment’
7. go to ‘next’ and select that path where you want to deploy your package
8.       then go to ‘next’ and ‘finish’ and your deployable package is ready.
Attached are the other blog links:


Website: www.mnnbi.com

.      

















Monday, 15 October 2018

Scheduling in SSIS


                                             SCHEDULED PACKAGE IN SSIS TOOL

The SSIS packages can be automated and scheduled with SQL Server agents.


The method that we have used for automation is through the use of SQL Server agent.
Schedules the job by running sql server agent job and the same is shown below.


After creating the job define the name and go to step , specify the field inside new keyword.


In job configuration, set the properties optionally click ok to save when executing the package manually.

In the next step we can opt action on the time of success or failure of the job inside the advanced page. When the job executes successfully by default it go to next page otherwise quit the job and report at the time of success or quit the job and report at the time of  failure. Likewise on the failure action by default quit the job and report the failure when the errors occurs otherwise go to next page or qiuit the job and report the success.

Retry attempts is used when job jails due to some reasons such as network connection problem.When connection works properly retry all the attempts again.



In the next step schedules the execution time of package we can opt one of the schedule type like recurring which is used to execute periodic events or we can choose one of these three options also according to task requirement. Set the frequency of the job occurs like weekly,daily,monthly and hourly.


Sql server agent fires an alerts on an event and response to that event .sql server agent can monitor performance and windows management instrumentation events .
Alert  willll be raised on the basis of multiple conditions like severity level,message number,WMI events.



Contact details:
For resource requirements:support@mndatasolutions.com
For training and consulting services: support@turbodatatool.com
Blogspot details: https://mndatasolutionsindia.blogspot.com/
Phone:+91-8802466356







Lookup Transform

In this particular case, the ETL team devised a job for converting the sql code for loading the order master table in its product with SSIS code. For the surrogate key upload, the ETL team used the lookup transform.
Reason for converting the code to SSIS:-

  • ·         Increase the speed and ease for debugging the error and audit errors. The ETL team intends to add features such as auto correct load(re starting the job once the error takes place from the place the error has taken place)
  • ·         Enable easy connectivity with large number of  ERPs
  • ·         Help automate the process over a large number of deployments with SSIS.
  Attached blog indicates how one can convert multiple join used for ssis lookup transform loading        the Data warehouse into SSIS code.

The input sql code was converted to cte (or common table expressions). These common table expressions formed the procedure that needed to be converted into SSIS jobs.

Understand the joins between the tables and the required columns for developing the required dataflow.

·         Develop the code in SSIS for various join and using SSIS transforms as below:
In this section we indicate the join between the STG_DAYBOOK and Dim_DATE  based on order_date.


·         Pick up the relevant columns from ‘merge join’ after each join transformation.


 SSIS INPUT:-Using merge join transform:-


In this portion join with dim warehouse table get the relevant input:-


Develop the code in SSIS for various joins as below:

SSIS INPUT:-Using merge join transform:-

In this portion join with Dim_Supplier  table get the relevant input:-


SSIS input:-in this portion using the sort transform  on Dim_supplier table and above merge join  
Transform .

SSIS  input :- In this portion  use the merge join transform  based upon the partyledger name  and supplier name.

In this portion  of code is used for the get the unique  voucher type name  from stg_voucher_ type table.

For following condition : voucher  type  would be sales,purchase,journal(the specific voucher types have been extracted using conditional split)

Ssis input:-in this portion  use   the stg_voucher_type  table  and  aaply on  sort transform on this table get the unique voucher type name conditional split transform and apply the above  condition. The final sort was used to remove any duplicates that could arise.


SSIS:-Finally applying the SSIS  Lookup transform and combinebelow column one is available input columns
And  available lookup table based on the composite key and surrogate keys.
Final  table:-


Ssis input:-




The ETL team used the Full cache option to minimize the number of times the database shall be accessed. All the lookup values should be in the lookup cache .

The ETL team used the OLEDB connection manager since the lookup was been done from a local machine.
In case of no mtach the ETL team assumed a case of insert and hence the option of ‘Redirect rows to no match output’ was chosen.
The ETL team has assummed that there are no exceptions to be handled.
In case of ‘no match’ the output was put into another table(that is a copy of item supplier order as below:


The ‘execute sql task’ enables insertion of the non-matched rows into the target table of item supplier order.
Usage of sequucne container for completing the Lookup transformations:

The other tasks to be executed are ‘sending the mail’ etc.

Thursday, 11 October 2018

Converting complex SQL Queries into SSIS code




Attached blog indicates how one can convert highly complex sql queries used for loading the
 Data warehouse into SSIS code.

End client requirement: the end client desired that the sql queries used to load the datawarehouse be converted into SSIS code for better auditing purposes and better error logging purposes. The graphical and intuitive visuals from SSIS shall better help manage the complex sql code.
The input sql code was converted to cte(or common table expressions). These common table expressions formed the procedure that needed to be converted into SSIS jobs.

The data flow that was developed in SSIS was a procedure with multiple common table expressions.
Step 1:
·         Understand the joins between the tables and the required columns for developing the required dataflow.

·         Develop the code in SSIS for various joins as below:


·         Pick up the relevant columns from ‘merge join’ after each join transformation.
           


SSIS input:

·         Identify any filter parameters: I have used the conditional split for the filter parameters.


SSIS package:


·         Identify any calculated columns.


SSIS package implementation: derived column.


·         Aggregation with required group by parameters:
SQL implementation:




Finally insertion into the target table.

In this manner the ETL team converted complex sql code into SSIS code for easy maintainability.

Website: www.mnnbi.com




Initial and Incremental data Load Template by M&N Business Intelligence-SAP Data Services





INCREMENTAL LOAD/CREATION OF DIMENSION TABLE LOGIC AT SAP DATA SERVICES END CLIENT

In this particular document we shall be looking at the incremental load logic at SAP Data Services end client. This document is being prepared because of the following purposes.

·         Data Integrator does not have a dimension creation wizard. Thus using the transforms within the Data Integrator we need to make the required type-1 and type-2 dimensions.

·         This document will serve as a template for all new developers to populate the data warehouse and the data mart as part of the PAMDSS project at SAP Data Services end client.

·         The document is attached with the required templates which shall further help in understanding of the processes. With this document we seek to systematize the development of history, base and the dimension tables within the project.

·         Prevent some basic mistakes that developers can make while populating the history and the dimension tables.

Note: Along with this document we shall be attaching/referring to the required atl data flows. As part of the understanding we shall be attaching the screen shots of the required atl and sending the required atl.


Methodology for populating the Data Warehouse

Scenario for loading the data warehouse: depending upon the project requirements a client within the Data warehouse can have the following tables:

·         Base tables

·         History tables.

In this document we shall be looking at populating the History tables. The methodology for populating the base tables is simpler and does not require special treatment.

We have history making events and non-History making events. We shall have to track inserts, updates and deletes for both the events. The scenario can be expressed by the use of the following matrix.


Event
Insert
Update
Delete




History making
1
2
3
attributes







Non-history
4
5
6
making



attributes









Also many times we have last update time stamps in the source. We shall be looking at a specific case where we have the time stamp in the source and have to populate the target in such a scenario.


Methodology for handling Inserts: That is in this case we shall be looking at scenarios 1 and 4.

There shall be two kinds of inserts within this scenario.

·         Those records whose business key in not present in the data warehouse.

·         Those history making records that have been updated. We shall generate an insert row for each one of them.

Methodology for handling those records whose business key is not present in the Data ware house

For this purpose we propose that you follow activities within Flow 1. In the following diagram Flow1 includes Query_ID, Validation, Query_Insert, Merge and Surrogate key generation.






Steps in the process are as following:

1.      Lookup for the business key in the data warehouse table.

2.      Validate whether the Business key is Null or not. Within the Validation transform put the “Business key is NULL” as the custom condition.

3.      In case Business Key is NULL then it is straight a case of an insert.

4.      Set the Insert Time Stamp, Update Time Stamp, Effective From (History Start)

Time Stamp as today’s date [Query_Insert].

5.      Set the Effective through(History End time stamp) as the end of time.


Note: This shall handle Insert operation for all non-history making events also.


Handling Inserts from History Making Records: for the records that are to be updated we shall be handling separately within the updates section. However for the new row to be inserted after the history preserving transform, we perform the following operations.

·         Map the Insert row as normal (discard all the other row types) through the use of Map Transform

·         Merge the output with the Inserts from Flow1.

·         Generate a surrogate key after union operation.


Handling Updates for History Making attributes (scenario 3): In this particular case we have the following scenario:

·         The Lookup value of the business key(from Query_ID) is not NULL.

·         We have History Start and History End Time stamps in the target table.

·         We know the attributes on which we propose to preserve history. Then the combination of activities used shall be as following:

·         Table Comparison: this shall help determine all the required updates within the target table. All the output rows from this transform shall be marked as ‘U’. The following care should be taken when we use this transform:

o The Compare column should not have the Target table Load Time Stamps and any of the primary key columns.



o   The compare column should have only the history tracking attributes from the target table.

·         History Preserving transform: The History preserve transform shall have input rows marked as ‘Update’. Within the History preserve transform we need to take the following care:

o   The history preserve columns are the same as in Table Comparison.

o   If there is a row status in the target table then its Reset Status should be changed.

·         Map the update rows: this is done using the MAP transform and only marking the Update rows output as Normal. The other row types should be discarded.


Handling Deletes: within Data Integrator we have an option for marking Deletes as Update rows within the History Preserve transform. Our experience with the same has not been very convincing. The alternate methodology is as following:

·         In the Table Comparison, use the option Preserve deleted rows.

·         Map the deleted rows using MAP transform(discard all input types except the deletes).

·         Set the last update time stamp and the History end date as today’s date.

·         Set the row status as ‘delete’.

·         Merge it with the update rows.

Explanation for the transform Map_Operation_Update: It is important to consider why we need to add one last Map transform before we load the data into the target table. For case of Normal(we converted the update rows into Normal rows) rows mark the respective row as Update(as given in the following diagram). The reason for doing the given step is that in case the rows are marked as normal then they shall be inserted as a separate row in the target table. In case we have primary key constraints in the target it shall then result in an error.





































Handling Updates for the Non-history making attributes (Scenario 5): Unfortunately we cannot handle this case within the same data flow. For this we designed another data flow whose implementation shall follow the execution of the previous data flow. In this case we simply track the rows flagged as Updates from the Table Comparison Transform in the following manner:








































In the Query_2 step we shall set the Last Update Time Stamp to today’s date. (Note we should take the following care while implementing the above steps)

·         In the Compare column category of Table Comparison include only the non-History making attributes from the target table. Do not include the target table load Time Stamps and the primary key columns in the Compare column category.

·         In the last Map operation Map_Operation_Update set the Input Normal rows as output type update.


Input Tables have a Last Update Time stamp: In case the input tables have a last update time stamp then we can limit the number of rows to be scanned. This is done with the help of the following steps:



·         Generate a script before the dataflow in which you capture the Maximum value of the Last Update Time stamp from the target table(see the marked rows in red in the attachment below). This can be done by writing a simple script as following:

$FULLUPLOAD=’NO’;

IF ($FULLUPLOAD='YES')

begin

print( 'The process of full upload is set to begin');

sql( 'SQL_SERVER_SANDBOX', 'TRUNCATE TABLE DSSTB_DW_CAL_CODE_TS');

sql( 'SQL_SERVER_SANDBOX','INSERT INTO DSSTB_DW_CAL_CODE_TS(CCD_DW_ID,CCD_DW_ROW_STATUS,CCD_DW_INSERT_TS, CCD_DW_LAST_UPD_TS,CCD_ID,CCD_CODE,CCD_DESC,CCD_SITE_CODE,CCD_STAF F_ID,CCD_IDX_PRVDR_CODE,CCD_DFLT_REG_SECT,CCD_ACTIVE_DATE,CCD_DISAB LED_DATE,CCD_INSERT_TS,CCD_LAST_UPD_TS,CCD_CCAT_CODE,CCD_EXTENSION_

WKS)                                   VALUES(0,\'\',\'1900-01-01                                 00:00:00.000\',\'1900-01-01

00:00:00.000\',0,\'N\A\',\'\',\'\',0,\'\',\'\',\'1900-01-01                               00:00:00.000\',\'1900-01-01

00:00:00.000\',\'1900-01-01 00:00:00.000\',\'1900-01-01 00:00:00.000\',\'\',0)'); $Last_Timestamp_var='1900-01-01 00:00:00.000';

end

else

begin

$Last_Timestamp_var=SQL('SQL_SERVER_SANDBOX','SELECT MAX(CCD_LAST_UPD_TS) FROM DSSTB_DW_CAL_CODE_TS');
print('the process of incremental load has started');

end


Thereafter we do the following steps:

·         Declare the variable as a parameter.

·         Pass the parameter into the dataflow.

·         In the ‘where’ clause (as given in the attachment below) select the time stamps from the source table with values greater than the parameter value.



















































Dimension Tables: In this case we shall look at both Type-1 and Type-2 dimension tables.

How to work with Type-1 Dimension tables: In this case we wish to carry out the following steps:

·         In case of an update keep the latest record. Do not increment the dimension key.

·         However in case of an insert we wish to increment the dimension key.

We carry the above using the following steps:
































































The logic: the logic for the above data flow is as following.



From the source tables we shall get records marked as Insert(new records) and updates(old records changed). In case the record is an update then load it directly onto the target table. This is done by passing it through a map transform(the output type for the Update being ‘Update’ with other rows types being ‘discarded’) .However in case it is an insert, then we generate the surrogate key.


Methodology for populating type-2 Dimension Tables: The methodology for populating the type-2 dimension tables is the same as populating the history tables with history making attributes. The sequence of operation and the logic are as following:


Blog link: https://mndatasolutionsindia.blogspot.com/2018/10/initial-and-incremental-data-load.html

Website: www.mnnbi.com







Initial and Incremental data Load Template by M&N Business Intelligence-SAP Data Services

  INCREMENTAL LOAD/CREATION OF DIMENSION TABLE LOGIC AT SAP DATA SERVICES END CLIENT In this particular document we shall be looking at the ...