Turbodata-SQL Problem Statement
·
Ever increasing data of the end clients
·
Serial execution of the queries. If one query fails then the entire
process fails. Process has errors. Manual resources required to restart the
process.
·
Error logging: usage of temporary tables. No intermediate error data
available.
·
Nightly process-SQL. Client interface in C#/.NET
·
The product had to be made scalable. Large number of multiple end users
along with audit logs, error logs and data loading. The product to support
large number of different kinds of reports-GST, inventory and ledger for the
end clients.
·
Scalable module with required security is to be developed.
Turbo Data Solution
·
Crash the execution time-migrate from SQL to SSIS for load times
·
Handle errors separately.
·
Bypass the bottleneck processes: store historical data in data
warehouse during the fiscal day
·
Extensive error mapping, logging systems.
Change in the end client systems:
Business Benefits
Ø Meet the SLA for the
management
Ø Reduce execution
times: parallel execution
Ø Remove error prone
data
Ø Keep error data in
staging tables
Ø Restart the process
from the place where error occurred: checkpoint and breakpoint
Change in Turbodata-SQL to
SSIS
The
ETL team adopted the following methodology for converting the sql code to SSIS
code:
·
Conversion of cte-container
·
All the joins-SSIS transforms
·
Redirect the error rows for mappings and table inserts
·
Drop the indexes before data load and recreate the
indexes after data load.
TurboData
Conversion Experience
For the
linkage of one execute package Task to another, we are using three concepts
(Success, Failure, Completion). Hence precedent constraints are marked as
‘completion’ and not ‘successes. The reason for conversion of the same is that
the process should run smoothly even if there is an error.
Improved Meta Data Handling
Error
history: in the Turbodata initial product, we were not able to track the
details of errors at the transform level due to usage of temporary tables. The
SSIS team
Validate external metadata is set to
false for temporary tables. It should be set to true. Hence the ETL
team uses permanent staging tables instead of temporary tables.
Better Error Handling
The ETL team added the breakpoints at
container level.
Note *A number of actions possible on breakpoint
error such as emailing, inserting details into transaction logs etc.
Error
Handling: Dataflow
By some
transform error can be generated and we can store error or ignore and fail it.
Rows were redirected to ensure smooth dataflow run during the execution
process: quicker execution time
Error Handling: Control Flow Level
*Note: Container (At control level): Transaction
Option should be Required, So the child component if fail then the whole
transaction will be rollback.
The child component
of container: The transctionOption should be supported. So that if it fails
the all transaction of same container wills rollback.
Error Handling: Control Flow Level
FailParentOnFailure:
True (It is applied in case of check point enable) Indicates whether the parent
of the executable fail when the executable of child fail.
*The ETL team used the
checkpoints to re run the jobs from the places where error had occurred
*Reduce
the process re running time
Optimize
the Job Run
For job optimization: - On Development side we applied Repeatable and
production side applied ReadUncommited.
RunOptimizedMode: True (Applied on Data Flow Task)
Unused columns are not put in the ETL tool
buffer (Lower load on the
system. Quicker query execution)
Log
Settings (Logging Mode enabled at container level
and ‘Use Parent Settings’ at dataflow level. Capture execution details at
lowest level possible)
Know the number of rows loaded in SQL Load: Better logging features
Benefits for Turbodata
Key
Result Areas for End Client
Steps for
implementation-End Client
Prepared by:
Aprajita Kumari
Linkedin details: https://www.linkedin.com/in/aprajita-kumari-81635548/
Website: www.mnnbi.com
Phone: +91-8802466356
No comments:
Post a Comment