The necessity of converting the sql code to SSIS code arose
from the following requirements.
1. Design scalable module across various
ERPs. For the same the ETL team, segregated
various modules using parent and child packages as shown below.
A hierarchy of packages was built in within the SSIS dataflow.
· The need to design an error proof reporting system across large number of extraction systems. Say an end client has customers on Tally, SAP, Navision etc. then the end client can use the Turbodata GST module to run the GST reports from all the customers simultaneously.
- The need to rerun the jobs when error occurs.Say, we are running the particular set of stored procedures.
The above set of procedures shall fail if any of the
intermediary procedures fail.
In case of large scale deployment of GST consolidated solutions, the above is a bottleneck. to get around the same, the ETL team used the SSIS code to achieve the following.
- The entire set of the sql code was embedded within containers. Each container execution was sequenced to be dependent upon the prior container(completion and not success). a sample as shown under:
Implications for the business: the entire process shall run smoothly over large data loads. The
process of error handling shall be easier. The resources can check for the
errors after the completion of the load during the night(in morning).
By passing the failure
problem during sql execution.
For the same, the ETL team used the failure tracking
processes at the dataflow level for the both the data insertion and data
processing.
An example is attached herewith:
·
Error
handling during data processing.
On a particular condition, attached herewith, the code was
stopping multiple times.
The ETL team at the transformation level had 3 options:
- Ignore failure
- Redirect the row
- Fail Component.
The ETL team decided to redirect the rows on each failure
into the audit table. This audit table was available to the end client managers
for analysis at a later date. However this feature ensures that in case an
error happened during the night then the code would not stop across multiple
end users.
Similarly during the data insertion and data updates, errors
could occur. such rows were redirected at the ETL level.
·
Saving
on the execution times: for the same the ETL team adopted the 5 pronged
strategy.
o
Identifying
the processes that could be run in parallel. for example in sql code, the fact
load for item sales order, item journal fact and others was carried out
sequentially. The ETL team decided to execute the same in parallel.
o
Minimizing
the number of columns to be processed: the extraction module had minimum number
of columns to be processed and minimum rows to be processed.
o
Capturing
the error prone data before processing the same as Nil rated data.
o
dropping
of indexes before data load in a dimension table and recreating the indexes
after the data load.
For
more information on how to do one of the following:
- Design optimum GST reports
- Catch error prone data before processing
- Design scalable datawarehouses for SAP, Navision, tally on premises and on cloud
Contact:
Apoorv Chaturvedi
Phone: +91-8802466356
SSIS module and blog prepared by Aprajita Kumari:
Linkedin profile: https://www.linkedin.com/in/aprajita-kumari-81635548/