Monday, 17 December 2018

Dimensions and fact data load optimization using SSIS-speed the data load using control flow and dataflow settings

In this particular blog, we are looking at the initial and incremental data loads for the following:
·         Item journal credit
·         Item journal debit
·         Dim category
·         Dim warehouse

The given set of product was created to enable the following:
·         Optimize the speed of the data load
·         Optimize the error logging, audit logging facilities within SSIS.

The sql code for the item journal debit and item journal credit fact load was converted into SSIS as shown below.
Indexes were dropped before the data load and created after the data load as shown below.





For the fact load, Lookup transform was used as shown below:




The lookup transform was used to pick up the minimum surrogate key in case multiple outputs were derived from the solution.

Attached are the control flow, container and dataflow setting and the reasons thereof.

Control flow settings:

Checkpoint was enabled as shown below:
  • ·         Checkpoint file exists
  • ·         The job to restart if the error happens
  • ·         The parent to fail in case there is an error at the dataflow level.


      Delay validation was not enabled because we are not using temporary tables.


.

Control flow settings:
In order for the rollback to function, we have set the following at the transaction level.
·         Transaction option: Required
  • ·         Isolationlevel: Read Uncommitted. This is because the tables from which the data is been loaded do not have any other update or delete transactions happening with the same.
  • ·         Logging mode has been enabled at the control flow level.
  • ·       The delay validation is false since we are not using any temporary tables.

  
     The data flow settings are as follows:

  • ·         Isolation level: read uncommitted
  • ·         Transaction option: supported.
  •        This has been done because the isolation level at the control flow was uncommitted.
  •        The transaction level supported indicates that the transaction at the container level would continue to run at the data flow level.


The ‘failparentonfailure’ has been marked to be true to enable checkpoint usage and better auditing and error logging.






 For the logging mode the dataflow is using the container settings.




With the given settings the given set of tasks was able to achieve the following results:
  • ·         Optimized the running of the procedures and tasks within SSIS
  • ·         Optimum error handling and logging activities
  • ·         Restarting the jobs from the error points using breakpoints and checkpoints
  • ·         Transaction rollback was enabled where required


Prepared by:
Ishwar Singh
Lead ETL Developer
Website: www.mnnbi.com
Email address: apoorv@mnnbi.com
Phone: +91-8802466356







Thursday, 6 December 2018

Crashing the Load Times using Turbodata


Turbodata-SQL Problem Statement
·        Ever increasing data of the end clients
·        Serial execution of the queries. If one query fails then the entire process fails. Process has errors. Manual resources required to restart the process.
·        Error logging: usage of temporary tables. No intermediate error data available.
·        Nightly process-SQL. Client interface in C#/.NET
·        The product had to be made scalable. Large number of multiple end users along with audit logs, error logs and data loading. The product to support large number of different kinds of reports-GST, inventory and ledger for the end clients.
·        Scalable module with required security is to be developed.

Turbo Data Solution
·        Crash the execution time-migrate from SQL to SSIS for load times
·        Handle errors separately.
·        Bypass the bottleneck processes: store historical data in data warehouse during the fiscal day
·        Extensive error mapping, logging systems.

Change in the end client systems:





Business Benefits
Ø Meet the SLA for the management
Ø Reduce execution times: parallel execution
Ø Remove error prone data
Ø Keep error data in staging tables
Ø Restart the process from the place where error occurred: checkpoint and breakpoint


Change in Turbodata-SQL to SSIS

The ETL team adopted the following methodology for converting the sql code to SSIS code:
·         Conversion of cte-container
·         All the joins-SSIS transforms
·         Redirect the error rows for mappings and table inserts
·         Drop the indexes before data load and recreate the indexes after data load.



TurboData Conversion Experience



For the linkage of one execute package Task to another, we are using three concepts (Success, Failure, Completion). Hence precedent constraints are marked as ‘completion’ and not ‘successes. The reason for conversion of the same is that the process should run smoothly even if there is an error.


Improved Meta Data Handling


Error history: in the Turbodata initial product, we were not able to track the details of errors at the transform level due to usage of temporary tables. The SSIS team
Validate external metadata is set to false for temporary tables. It should be set to true. Hence the ETL team uses permanent staging tables instead of temporary tables.

Better Error Handling



The ETL team added the breakpoints at container level.
Note *A number of actions possible on breakpoint error such as emailing, inserting details into transaction logs etc.


Error Handling: Dataflow


By some transform error can be generated and we can store error or ignore and fail it. Rows were redirected to ensure smooth dataflow run during the execution process: quicker execution time

Error Handling: Control Flow Level
*Note:  Container (At control level): Transaction Option should be Required, So the child component if fail then the whole transaction will be rollback.
The child component of container: The transctionOption should be supported. So that if it fails the all transaction of same container wills rollback.
Error Handling: Control Flow Level


FailParentOnFailure: True (It is applied in case of check point enable) Indicates whether the parent of the executable fail when the executable of child fail.

 Rerunning the job from error point:

*The ETL team used the checkpoints to re run the jobs from the places where error had occurred

*Reduce the process re running time

Optimize the Job Run


For job optimization: - On Development side we applied Repeatable and production side applied ReadUncommited.
RunOptimizedMode: True (Applied on Data Flow Task)
Unused columns are not put in the ETL tool buffer (Lower load on the system. Quicker query execution)


Log Settings (Logging Mode enabled at container level and ‘Use Parent Settings’ at dataflow level. Capture execution details at lowest level possible)

Know the number of rows loaded in SQL Load: Better logging features

Benefits for Turbodata
Key Result Areas for End Client

Steps for implementation-End Client



Prepared by:

Aprajita Kumari
Contact Details:
Website: www.mnnbi.com
Phone: +91-8802466356

Sunday, 25 November 2018

Conversion of Nil rated GST report from SQL to SSIS




The necessity of converting the sql code to SSIS code arose from the following requirements.
1.      Design scalable module across various ERPs. For the same the ETL team, segregated  various modules using parent and child packages as shown below.




                A  hierarchy of packages was built in within the SSIS dataflow.




·        
The need to design an error proof reporting system across large number of extraction systems. Say an end client has customers on Tally, SAP, Navision etc. then the end client can use the Turbodata GST module to run the GST reports from all the customers simultaneously. 

  •           The need to rerun the jobs when error occurs.
    Say, we are running the particular set of stored procedures. 


The above set of procedures shall fail if any of the intermediary procedures fail.

In case of large scale deployment of GST consolidated solutions, the above is a bottleneck. to get around the same, the ETL team used the SSIS code to achieve the following.
  •       The entire set of the sql code  was embedded within containers. Each container execution was sequenced to be dependent upon the prior container(completion and not success). a sample as shown under:



Implications for the business: the entire process shall run smoothly over large data loads. The process of error handling shall be easier. The resources can check for the errors after the completion of the load during the night(in morning).

By passing the failure problem during sql execution.
For the same, the ETL team used the failure tracking processes at the dataflow level for the both the data insertion and data processing.
An example is attached herewith:
·         Error handling during data processing.
On a particular condition, attached herewith, the code was stopping multiple times.



The ETL team at the transformation level had 3 options:


  •         Ignore failure
  •        Redirect the row
  •         Fail Component.

The ETL team decided to redirect the rows on each failure into the audit table. This audit table was available to the end client managers for analysis at a later date. However this feature ensures that in case an error happened during the night then the code would not stop across multiple end users.

Similarly during the data insertion and data updates, errors could occur. such rows were redirected at the ETL level.





·         Saving on the execution times: for the same the ETL team adopted the 5 pronged strategy.

o   Identifying the processes that could be run in parallel. for example in sql code, the fact load for item sales order, item journal fact and others was carried out sequentially. The ETL team decided to execute the same in parallel.



o   Minimizing the number of columns to be processed: the extraction module had minimum number of columns to be processed and minimum rows to be processed.

o   Capturing the error prone data before processing the same as Nil rated data.
o   dropping of indexes before data load in a dimension table and recreating the indexes after the data load.



For more information on how to do one of the following:
  •          Design optimum GST reports
  •          Catch error prone data before processing
  •          Design scalable datawarehouses for SAP, Navision, tally on premises and on cloud

Contact:
Apoorv Chaturvedi
Phone: +91-8802466356
website: www.mnnbi.com

SSIS module and blog prepared by Aprajita Kumari:


Saturday, 10 November 2018

Turbodata : capture GST filing errors with quickness and ease(ETL tools: SSIS, Qliktech Expressor, Alteryx, SAP Data Services)


Turbodata ETL solution migrates the GST SQL code to ETL code for better data audit and data profiling.
Capture the errors better by using one of the 3(three)options:
  •          Fail the task
  •         Redirect the error rows: for example in case there is a registered customer. However at the voucher level, the GST number has not been indicated for the given transaction. In such a case these rows can be redirected.
  •      Ignore the error: In case the end client has typed in a wrong reference number for the credit note then the given transaction could be ignored and the transaction passed. In such a scenario the error check point to indicate that the reference number in the credit note should reference the sales voucher number in the datawarehouse

Capture the details at each of the transformation points within the dataflow and in the target table.
Send emails and alerts in case of any errors to the required stakeholders(event handlers)

  •         Breakpoints: stop the job on any of the errors at the procedure level(container level) . For example in case the unit of measurement logic after extraction is not clear for all the transactions then  there could be a breakpoint for the given job.       
  •       Checkpoints: restart the job from the place where the error has happened. This is specially critical for those companies where data is been consolidated, cleansed and thereafter GST reports have been developed. For example if there was a problem with the rate or Unit of measurement extraction in the container at row number 450. The check point shall ensure that the jobs shall be restarted from row number 451.

How is the Turbodata ETL solution better than sql solutions/custom solutions for GST filing:
  •          Built in data consolidation facility: handle missing extraction inputs. Automate data extraction from multiple data sources.
  •          Data cleansing before GST reporting: use the error handling facilities to load the data. The unclean data is thereafter corrected by the end user at a date and time of its convenience
  •          GST reports using standard GST logic across all ERPs
  •          Re run the tasks from the place where error has happened: extremely critical for those customers that have multiple data sources. This is important for those customers that have large scale implementations.
  •          Develop the GST reports based on your criteria: critical for those customers who desire to develop their own reports(based on their own business logic)
  •           GST reconciliation with ease and convenience: data cleansing, data consolidation, data auditing and data profiling has been done. GST reports have been filed using standard business logic.






Contact:
Sharad Maheshwari, Apoorv Chaturvedi
Email: sharad@govisory.in, apoorv@mnnbi.com, tureasesolutions@gmail.com

Thursday, 18 October 2018

Load of the datawarehouse without the lookup transform-SSIS





               Load of the datawarehouse without the use of lookup tables

This particular module has been developed for those customers where the lookup tables are very large and the join transformations shall help process the information faster than lookup. In this particular case the ETL team has replaced the lookup with join transforms.
Note: This method cannot be applied for type 2 entities
This method currently handled insert, updates and deletes.
The solution shall be developed over 3 data flows as given under:





In the supply load dataflow the ETL team has developed the methodology for inserts, updates and deletes as follows:





Method for handling inserts and updates
·         Join the source and the target table using the merge transform based on the composite keys. The end client could pick up the surrogate keys based on the same.






In case of the insert, the identity key column should be checked for the identity row insert.




For updates the table data is stored in a copy of the target table. Thereafter the update sql is run for the same using the sql transform.



Handling deletes:
In the case of deletes, the ETL team looks for the composite key details in the target table and the given details should not be in the source table. In case the given condition is applicable then the row in the target table is marked as ‘D’(delete flag)


Contact details:
For resource requirements:resources@mnnbi.com
For training and consulting services: apoorv@mnnbi.com
Blogspot details: https://mndatasolutionsindia.blogspot.com/

Written by Gauri(gauri@mnnbi.com)

https://mndatasolutionsindia.blogspot.com/p/gauri.html



Data insertion-Tally(approaches)

 Problem statement: Many softwares look to insert data into Tally from their application.This blog looks at issues and approaches for the sa...