The necessity of converting the sql code to SSIS code arose from the following requirements.
1. Design scalable module across various ERPs. For the same the ETL team, segregated various modules using parent and child packages as shown below.
A hierarchy of packages was built in within the SSIS dataflow.
· The need to design an error proof reporting system across large number of extraction systems. Say an end client has customers on Tally, SAP, Navision etc. then the end client can use the Turbodata GST module to run the GST reports from all the customers simultaneously.
- The need to rerun the jobs when error occurs.Say, we are running the particular set of stored procedures.
The above set of procedures shall fail if any of the intermediary procedures fail.
In case of large scale deployment of GST consolidated solutions, the above is a bottleneck. to get around the same, the ETL team used the SSIS code to achieve the following.
- The entire set of the sql code was embedded within containers. Each container execution was sequenced to be dependent upon the prior container(completion and not success). a sample as shown under:
Implications for the business: the entire process shall run smoothly over large data loads. The process of error handling shall be easier. The resources can check for the errors after the completion of the load during the night(in morning).
By passing the failure problem during sql execution.
For the same, the ETL team used the failure tracking processes at the dataflow level for the both the data insertion and data processing.
An example is attached herewith:
· Error handling during data processing.
On a particular condition, attached herewith, the code was stopping multiple times.
The ETL team at the transformation level had 3 options:
- Ignore failure
- Redirect the row
- Fail Component.
The ETL team decided to redirect the rows on each failure into the audit table. This audit table was available to the end client managers for analysis at a later date. However this feature ensures that in case an error happened during the night then the code would not stop across multiple end users.
Similarly during the data insertion and data updates, errors could occur. such rows were redirected at the ETL level.
· Saving on the execution times: for the same the ETL team adopted the 5 pronged strategy.
o Identifying the processes that could be run in parallel. for example in sql code, the fact load for item sales order, item journal fact and others was carried out sequentially. The ETL team decided to execute the same in parallel.
o Minimizing the number of columns to be processed: the extraction module had minimum number of columns to be processed and minimum rows to be processed.
o Capturing the error prone data before processing the same as Nil rated data.
o dropping of indexes before data load in a dimension table and recreating the indexes after the data load.
For more information on how to do one of the following:
- Design optimum GST reports
- Catch error prone data before processing
- Design scalable datawarehouses for SAP, Navision, tally on premises and on cloud
Contact:
Apoorv Chaturvedi
Phone: +91-8802466356
SSIS module and blog prepared by Aprajita Kumari:
Linkedin profile: https://www.linkedin.com/in/aprajita-kumari-81635548/
No comments:
Post a Comment