Tuesday, 11 June 2024

Deployment of SSIS package

 

Deployment of SSIS package

This note intends to address the following issues with regards to deployment of the SSIS packages.

·         Migrate the packages from one environment to another-development to integration to production

·         Code the job dependency-once a job is completed the second job is to begin.

·         Handle multiple database connections across different environments.

·         Handle errors while execution of the jobs

Attached is the flow of steps for the deployment of the package.

1.       Build the package

2.       Convert to deployment model.
3.       click on ‘Properties’
Change the deployment utility to ‘true’.



4.       Creation of the Manifest file

5.       Run the package installation wizard.
6.       go to next and check on ‘file system deployment’
7. go to ‘next’ and select that path where you want to deploy your package
8.       then go to ‘next’ and ‘finish’ and your deployable package is ready.
Attached are the other blog links:


Website: www.mnnbi.com

.      
















Conversion of Nil rated GST report from SQL to SSIS

 



The necessity of converting the sql code to SSIS code arose from the following requirements.
1.      Design scalable module across various ERPs. For the same the ETL team, segregated  various modules using parent and child packages as shown below.




                A  hierarchy of packages was built in within the SSIS dataflow.




·        
The need to design an error proof reporting system across large number of extraction systems. Say an end client has customers on Tally, SAP, Navision etc. then the end client can use the Turbodata GST module to run the GST reports from all the customers simultaneously. 

  •           The need to rerun the jobs when error occurs.
    Say, we are running the particular set of stored procedures. 


The above set of procedures shall fail if any of the intermediary procedures fail.

In case of large scale deployment of GST consolidated solutions, the above is a bottleneck. to get around the same, the ETL team used the SSIS code to achieve the following.
  •       The entire set of the sql code  was embedded within containers. Each container execution was sequenced to be dependent upon the prior container(completion and not success). a sample as shown under:



Implications for the business: the entire process shall run smoothly over large data loads. The process of error handling shall be easier. The resources can check for the errors after the completion of the load during the night(in morning).

By passing the failure problem during sql execution.
For the same, the ETL team used the failure tracking processes at the dataflow level for the both the data insertion and data processing.
An example is attached herewith:
·         Error handling during data processing.
On a particular condition, attached herewith, the code was stopping multiple times.



The ETL team at the transformation level had 3 options:


  •         Ignore failure
  •        Redirect the row
  •         Fail Component.

The ETL team decided to redirect the rows on each failure into the audit table. This audit table was available to the end client managers for analysis at a later date. However this feature ensures that in case an error happened during the night then the code would not stop across multiple end users.

Similarly during the data insertion and data updates, errors could occur. such rows were redirected at the ETL level.





·         Saving on the execution times: for the same the ETL team adopted the 5 pronged strategy.

o   Identifying the processes that could be run in parallel. for example in sql code, the fact load for item sales order, item journal fact and others was carried out sequentially. The ETL team decided to execute the same in parallel.



o   Minimizing the number of columns to be processed: the extraction module had minimum number of columns to be processed and minimum rows to be processed.

o   Capturing the error prone data before processing the same as Nil rated data.
o   dropping of indexes before data load in a dimension table and recreating the indexes after the data load.



For more information on how to do one of the following:
  •          Design optimum GST reports
  •          Catch error prone data before processing
  •          Design scalable datawarehouses for SAP, Navision, tally on premises and on cloud

Contact:
Apoorv Chaturvedi
Phone: +91-8802466356
website: www.mnnbi.com

SSIS module and blog prepared by Aprajita Kumari:

Crashing the Load Times using Turbodata

 

Turbodata-SQL Problem Statement
·        Ever increasing data of the end clients
·        Serial execution of the queries. If one query fails then the entire process fails. Process has errors. Manual resources required to restart the process.
·        Error logging: usage of temporary tables. No intermediate error data available.
·        Nightly process-SQL. Client interface in C#/.NET
·        The product had to be made scalable. Large number of multiple end users along with audit logs, error logs and data loading. The product to support large number of different kinds of reports-GST, inventory and ledger for the end clients.
·        Scalable module with required security is to be developed.

Turbo Data Solution
·        Crash the execution time-migrate from SQL to SSIS for load times
·        Handle errors separately.
·        Bypass the bottleneck processes: store historical data in data warehouse during the fiscal day
·        Extensive error mapping, logging systems.

Change in the end client systems:





Business Benefits
Ø Meet the SLA for the management
Ø Reduce execution times: parallel execution
Ø Remove error prone data
Ø Keep error data in staging tables
Ø Restart the process from the place where error occurred: checkpoint and breakpoint


Change in Turbodata-SQL to SSIS

The ETL team adopted the following methodology for converting the sql code to SSIS code:
·         Conversion of cte-container
·         All the joins-SSIS transforms
·         Redirect the error rows for mappings and table inserts
·         Drop the indexes before data load and recreate the indexes after data load.



TurboData Conversion Experience



For the linkage of one execute package Task to another, we are using three concepts (Success, Failure, Completion). Hence precedent constraints are marked as ‘completion’ and not ‘successes. The reason for conversion of the same is that the process should run smoothly even if there is an error.


Improved Meta Data Handling


Error history: in the Turbodata initial product, we were not able to track the details of errors at the transform level due to usage of temporary tables. The SSIS team
Validate external metadata is set to false for temporary tables. It should be set to true. Hence the ETL team uses permanent staging tables instead of temporary tables.

Better Error Handling



The ETL team added the breakpoints at container level.
Note *A number of actions possible on breakpoint error such as emailing, inserting details into transaction logs etc.


Error Handling: Dataflow


By some transform error can be generated and we can store error or ignore and fail it. Rows were redirected to ensure smooth dataflow run during the execution process: quicker execution time

Error Handling: Control Flow Level
*Note:  Container (At control level): Transaction Option should be Required, So the child component if fail then the whole transaction will be rollback.
The child component of container: The transctionOption should be supported. So that if it fails the all transaction of same container wills rollback.
Error Handling: Control Flow Level


FailParentOnFailure: True (It is applied in case of check point enable) Indicates whether the parent of the executable fail when the executable of child fail.

 Rerunning the job from error point:

*The ETL team used the checkpoints to re run the jobs from the places where error had occurred

*Reduce the process re running time

Optimize the Job Run


For job optimization: - On Development side we applied Repeatable and production side applied ReadUncommited.
RunOptimizedMode: True (Applied on Data Flow Task)
Unused columns are not put in the ETL tool buffer (Lower load on the system. Quicker query execution)


Log Settings (Logging Mode enabled at container level and ‘Use Parent Settings’ at dataflow level. Capture execution details at lowest level possible)

Know the number of rows loaded in SQL Load: Better logging features

Benefits for Turbodata
Key Result Areas for End Client

Steps for implementation-End Client



Prepared by:

Aprajita Kumari
Contact Details:
Website: www.mnnbi.com
Phone: +91-8802466356

Testing methodology for Turbodata using SSIS

 

Testing experience-summary

The attached blog indicates the comprehensive testing methodology adopted by Turbodata team for GST and audit  reporting purposes.   






       Business Requirements and Understanding:
The resource understood the business logic requirements for GST reports. Each measure and the value was specified with the end client(GSP partner). 
2.       Test planning and estimation: For the same the ETL team did the following steps:
a.       Arranged for sample Tally data from the end client
b.      Looked for the final GST reports to be developed at the ETL end. The numbers from the source system GST reports were matched with the data warehouse GST reports.
3.       Designing test cases and preparing test plan: for the numbers auditing purposes, the following was the methodology adopted by the ETL team.
a.       The testing was done for the entire fiscal year and fiscal month(the granularity of GST reporting is for the fiscal year and fiscal month)
b.      The testing was done across multiple companies
c.       The scenarios were developed for GST reports based on ledger entries, voucher type entries. For example how should credit notes be handled, how should sales invoice be handled?
d.      The specifications were signed off by the end client(GSP partner)
4.       Test execution with Bug closure and reporting:
·         The ETL process was run across various fiscal years and fiscal months. For each fiscal year and fiscal month, the variances with regards to the source ERP reporting module was reported to the developers
·         Finding the bug: The required business logic because of which error was happening was deducted and the details given to the developer.
1.       Summary report and result analysis: the final details were given to the ETL team based on the data audit numbers.


The types of testing that was done was as follows:
·         Incremental testing: when the jobs were migrated from development to production, the output of the incremental data was checked. For the same the ETL team used the following:
o   Used merge joins and lookup transforms for the same. The non-matching entries from lookup were stored in separate tables

·         Data quality testing: the dirty data was checked during data load. For example between stg_daybook_temp and stg_daybook the dirty data was cleansed. The log of the same has been kept for the audit purposes.
For matching the same, the ETL team used the following:
o   Fuzzy lookup
·         Data transformation testing: The business rules were validated with the end client. Thereafter the business logic was tested using SQL and SSIS coding. The features in SSIS used for the same were as follows:
o   Redirect rows on failure
o   Used lookup for finding the unmatched rows
·         Data accuracy testing: the output numbers in the reports were matched with the Tally GST reports to check the accuracy of the output data.
·         Data completeness testing: for the same, the ETL team did the following:
o   For the same, the ETL team did the following:
§  Used the @execute variable to check the row counts
§  Used the following at the dataflow level:
·         Redirect rows for transformations and target table loading: the unmatched rows were stored in staging tables for audit purposes.
·         At the sequence container level, the etl team used the following connection parameters for the following parameters:




  •                          Control flow:
§  Transaction level: required
§  Isolation level: serializable (where possible). To enable rollback of transactions. In case of nightly load after the day’s processing is completed, the etl team can used ‘transaction uncommitted’
§  Enable logging
§  Enable the usage of @executevariable
§  Usage of checkpoints and breakpoints for error handling


     o Dataflow:
§  Transaction level: supported
§  Isolation level: transaction committed (where possible).
§  Enable logging(use parent settings)
§  Enable the usage of @executevariable
§  Error logging: use parent settings
§                                                                                                                            Fail parent on failure : true  
     Metadata Testing: for the same, the ETL team used the following:
o   Usage of log tables
o   Used the data viewer during testing process

For metadata testing, Data Profiling transform can be used.

·         Application Upgrades: this entailed migration purposes across multiple end clients. For the same, the ETL team tested with the following:
o   Multiple package configurations
o   Multiple project deployment configurations: using manifest files
·         GUI/Navigation Testing: the final reports were tested for report reload and report refresh times
·         Source to Target Testing (Validation Testing): for the same, the filter parameters were checked while extraction to staging area. The following were the methods adopted for the same:
o   The extractor C# code was checked for the required data columns


Methodology adopted for testing: The source to target mappings given by end clients was taken as a base. Thereafter the testing process was started.

Test case scenarios:

Serial number
Mapping doc validation
M&N BI test case
1
Verify whether the business logic was mentioned
Every measure and dimension was tested for source mapping
2
Validation
        1.) Null: Derived column(findstring)
        2.)  Source and target data type to be same: Data conversion
      3.)     Validate the name of columns in the table against mapping doc: manually checked
.         



3
Constraints
All the target tables had primary key constraints
In the intermediary tables, the composite key constraints were checked using Row_Number().
4
Data consistency issues 
.        1.)      Stg_daybook_temp to stg_daybook load. Used fuzzy lookup, replace            and findstring for the same
5
Completeness issues
        1.        Confirm all the data is loaded: @valuevariable, log tables
        2.        Check for any rejected records: dataflow level(audit tables)
       3.        Check for data not to be truncated: warnings in SSI, output                varchar(255)
        4.        Boundary value analysis: WIP.
       5.        Compare unique value of key fields between source and target: audit table(sql not in statement used)
6
Correctness issues                     
        1.)      Data that is misspelled: manual checking, looku
        2.)      Null, unique, non null: referential key constraints were checked
7
Data quality
        1.)      Null check: handled above(conditional split)
        2.)      Date check: usage  of parameters, table variables
        Use Data profiler task
8
Duplicate check
        1.)      Check for unique key, primary key constraints in all tables
        2.)      Sort transform: remove duplicates
9
Date validation
      1.)      Know the row creation date: insert_ts, update_ts and load_ts to be added.
Used table variable and parameters for date entry

10
Data validation
1.)    To validate the complete data set in source and target table minus a query in a best solution: staging tables, audit tables
                 2.)      Difference between source and target: redirect the rows
11.
Data cleanliness
1.)    Remove unnecessary columns: optimize the dataflow




Types ETL bugs taken into account:

Serial number
Types of ETL bugs
details
1.)       
User interface bug
       1.)      Extractor of Tally and SAP: interface bug testing for and from dates
2.)
Boundary value analysis      
       1.)      The maximum and minimum dates in stg_daybook_temp and stg_daybook should be between the input dates in the ETL extractor. Use data profiler task
              3.)
Equivalence class partitioning(the data type should be specific)
HSN CODE should be 16 characters. If less or more than 16 characters then an error to be generated. Use data profiler task
              4.)
Input/output bugs
Related to above
5.)
Calculation bugs
Check physically
             6.)
Load condition bugs:
Not allow multiple users
Configuration file to be checked
              7.)
Race condition bugs
Test the system with full load. Check for incremental data load




Prepared by:

Ishwar Singh
Lead ETL Developer
Linkedin profile: https://www.linkedin.com/in/ishwar-singh-034324139/

Website: www.mnnbi.com
Email address: apoorv@mnnbi.com
Phone: +91-8802466356

Initial and Incremental data Load Template by M&N Business Intelligence-SAP Data Services

  INCREMENTAL LOAD/CREATION OF DIMENSION TABLE LOGIC AT SAP DATA SERVICES END CLIENT In this particular document we shall be looking at the ...