Showing posts with label data audit. Show all posts
Showing posts with label data audit. Show all posts

Wednesday, 26 December 2018

Testing methodology for Turbodata using SSIS


Testing experience-summary

The attached blog indicates the comprehensive testing methodology adopted by Turbodata team for GST and audit  reporting purposes.   






       Business Requirements and Understanding:
The resource understood the business logic requirements for GST reports. Each measure and the value was specified with the end client(GSP partner). 
2.       Test planning and estimation: For the same the ETL team did the following steps:
a.       Arranged for sample Tally data from the end client
b.      Looked for the final GST reports to be developed at the ETL end. The numbers from the source system GST reports were matched with the data warehouse GST reports.
3.       Designing test cases and preparing test plan: for the numbers auditing purposes, the following was the methodology adopted by the ETL team.
a.       The testing was done for the entire fiscal year and fiscal month(the granularity of GST reporting is for the fiscal year and fiscal month)
b.      The testing was done across multiple companies
c.       The scenarios were developed for GST reports based on ledger entries, voucher type entries. For example how should credit notes be handled, how should sales invoice be handled?
d.      The specifications were signed off by the end client(GSP partner)
4.       Test execution with Bug closure and reporting:
·         The ETL process was run across various fiscal years and fiscal months. For each fiscal year and fiscal month, the variances with regards to the source ERP reporting module was reported to the developers
·         Finding the bug: The required business logic because of which error was happening was deducted and the details given to the developer.
1.       Summary report and result analysis: the final details were given to the ETL team based on the data audit numbers.


The types of testing that was done was as follows:
·         Incremental testing: when the jobs were migrated from development to production, the output of the incremental data was checked. For the same the ETL team used the following:
o   Used merge joins and lookup transforms for the same. The non-matching entries from lookup were stored in separate tables

·         Data quality testing: the dirty data was checked during data load. For example between stg_daybook_temp and stg_daybook the dirty data was cleansed. The log of the same has been kept for the audit purposes.
For matching the same, the ETL team used the following:
o   Fuzzy lookup
·         Data transformation testing: The business rules were validated with the end client. Thereafter the business logic was tested using SQL and SSIS coding. The features in SSIS used for the same were as follows:
o   Redirect rows on failure
o   Used lookup for finding the unmatched rows
·         Data accuracy testing: the output numbers in the reports were matched with the Tally GST reports to check the accuracy of the output data.
·         Data completeness testing: for the same, the ETL team did the following:
o   For the same, the ETL team did the following:
§  Used the @execute variable to check the row counts
§  Used the following at the dataflow level:
·         Redirect rows for transformations and target table loading: the unmatched rows were stored in staging tables for audit purposes.
·         At the sequence container level, the etl team used the following connection parameters for the following parameters:




  •                          Control flow:
§  Transaction level: required
§  Isolation level: serializable (where possible). To enable rollback of transactions. In case of nightly load after the day’s processing is completed, the etl team can used ‘transaction uncommitted’
§  Enable logging
§  Enable the usage of @executevariable
§  Usage of checkpoints and breakpoints for error handling


     o Dataflow:
§  Transaction level: supported
§  Isolation level: transaction committed (where possible).
§  Enable logging(use parent settings)
§  Enable the usage of @executevariable
§  Error logging: use parent settings
§                                                                                                                            Fail parent on failure : true  
     Metadata Testing: for the same, the ETL team used the following:
o   Usage of log tables
o   Used the data viewer during testing process

For metadata testing, Data Profiling transform can be used.

·         Application Upgrades: this entailed migration purposes across multiple end clients. For the same, the ETL team tested with the following:
o   Multiple package configurations
o   Multiple project deployment configurations: using manifest files
·         GUI/Navigation Testing: the final reports were tested for report reload and report refresh times
·         Source to Target Testing (Validation Testing): for the same, the filter parameters were checked while extraction to staging area. The following were the methods adopted for the same:
o   The extractor C# code was checked for the required data columns


Methodology adopted for testing: The source to target mappings given by end clients was taken as a base. Thereafter the testing process was started.

Test case scenarios:

Serial number
Mapping doc validation
M&N BI test case
1
Verify whether the business logic was mentioned
Every measure and dimension was tested for source mapping
2
Validation
        1.) Null: Derived column(findstring)
        2.)  Source and target data type to be same: Data conversion
      3.)     Validate the name of columns in the table against mapping doc: manually checked
.         



3
Constraints
All the target tables had primary key constraints
In the intermediary tables, the composite key constraints were checked using Row_Number().
4
Data consistency issues 
.        1.)      Stg_daybook_temp to stg_daybook load. Used fuzzy lookup, replace            and findstring for the same
5
Completeness issues
        1.        Confirm all the data is loaded: @valuevariable, log tables
        2.        Check for any rejected records: dataflow level(audit tables)
       3.        Check for data not to be truncated: warnings in SSI, output                varchar(255)
        4.        Boundary value analysis: WIP.
       5.        Compare unique value of key fields between source and target: audit table(sql not in statement used)
6
Correctness issues                     
        1.)      Data that is misspelled: manual checking, looku
        2.)      Null, unique, non null: referential key constraints were checked
7
Data quality
        1.)      Null check: handled above(conditional split)
        2.)      Date check: usage  of parameters, table variables
        Use Data profiler task
8
Duplicate check
        1.)      Check for unique key, primary key constraints in all tables
        2.)      Sort transform: remove duplicates
9
Date validation
      1.)      Know the row creation date: insert_ts, update_ts and load_ts to be added.
Used table variable and parameters for date entry

10
Data validation
1.)    To validate the complete data set in source and target table minus a query in a best solution: staging tables, audit tables
                 2.)      Difference between source and target: redirect the rows
11.
Data cleanliness
1.)    Remove unnecessary columns: optimize the dataflow




Types ETL bugs taken into account:

Serial number
Types of ETL bugs
details
1.)       
User interface bug
       1.)      Extractor of Tally and SAP: interface bug testing for and from dates
2.)
Boundary value analysis      
       1.)      The maximum and minimum dates in stg_daybook_temp and stg_daybook should be between the input dates in the ETL extractor. Use data profiler task
              3.)
Equivalence class partitioning(the data type should be specific)
HSN CODE should be 16 characters. If less or more than 16 characters then an error to be generated. Use data profiler task
              4.)
Input/output bugs
Related to above
5.)
Calculation bugs
Check physically
             6.)
Load condition bugs:
Not allow multiple users
Configuration file to be checked
              7.)
Race condition bugs
Test the system with full load. Check for incremental data load




Prepared by:

Ishwar Singh
Lead ETL Developer
Linkedin profile: https://www.linkedin.com/in/ishwar-singh-034324139/

Website: www.mnnbi.com
Email address: apoorv@mnnbi.com
Phone: +91-8802466356

Saturday, 3 March 2018

Capturing Data Entry errors for audit purposes



A number of times the end client types in wrong data into the source ERP system thereby resulting in wrong outputs and results. Junk inputs imply junk outputs.  The ETL team would recommend an auditable output from Turbodata to be used as part of the reporting purposes.  Wrong data inputs can impact the end client in one or more of the following ways:
  •        Wrong tax filing specifically in online scenario.
  •         Wrong business picture
  •         Wrong predictive analytics.
As per the Toyota ProductionSystem, bad inputs should not be processed further as it adds to the final costs.
The ETL team(my firm) has found the following errors with regards to the data entry inputs specifically with Tally ERP 9.0.  

·         Stock input has been in one godown but stock outward movement has been from other godowns:





·         Missing purchase or sales order entries resulting in negative stocks at given points in time. One cannot have negative stock balances at any point in time.



Other data input errors that we have commonly seen are as follows:

  •       Duplicate payment entries
  •      Duplicate sales entries
  •        Receipt note entries but no purchase invoice entries
  •         Payments not having the required bill reference numbers.
How to resolve the errors:
·         In an object oriented program it is difficult to catch the errors on a real time basis. The ETL team recommends using the relational databases for catching the errors. The real time extraction module for Turbodata should be used for the same.
·         Transferring the data onto the third normal database is recommended. This helps catch data duplicity based on the composite keys.
For example if an end client has made the same amount payment for a given voucher on a given fiscal date, then the same should come as part of the discrepancy report. It is possible that the end client could be correct. There is also a possibility that the payment entries have been made by 2 different resources. Further handling of the given situation is as follows:
·         If the end client desires to catch the following error then the username by which the data entries have been done shall not be added to the composite key. In such a scenario there is a discrepancy between the Turbodata ledger balance output and the Tally report. The end client to approve the discrepant entry before the data is input into the system for auditing purposes.
Using perpetual valuations for ledger and inventory instead of periodic valuations. For example if an end client relies on periodic valuations for ledger balances then a duplicate payment entry then the periodic balances at the end of the fiscal month are difficult to catch. For example if an end client has a duplicate entry of Rs. 100k(One hundred thousand  only) over a balance of say Rs. 15000k(One fifty million only).
However using the perpetual system it is easy to catch the data entry errors.

Matching the consolidated trial balances and closing stock balances at the database level with the on fly calculations at the software level.

A small story for the end user: as Yuval Harari is Sapians says that mankind is primarily driven by myths. Hence many a managers are driven by myths regarding software or the consulting companies having the right audit numbers(with the managers inputting junk numbers).
A small story from one of my favourite books(Raag Darbari by Srilal Shukla) could best illustrate the point.
The protagonist Ranganath had gone from the city to visit his relative, an aunt’s husband , in the village. During the course of the village fair, it was suggested that the group goes and sees the village temple for the local goddess. At the temple Ranganath found that the statue instead of been of a goddess was of a soldier( for a goddess he was looking for two lumps  in front and two lumps in the back). The priest asked for donations for the goddess. To this request Ranganath refused saying that the statue was not of a goddess but of a man. There was an ensuing scuffle between the villagers and Ranganath. Ranganath was eventually rescued by his cousin. On going out and meeting other people, the cousin mentioned the following:
"My cousin has come from the city and is very well read. That is why he talks like a fool."
The author has always associated himself with Ranganath.


Apoorv Chaturvedi
Email: support@mndatasolutions.com;support@turbodatatool.com
Phone:+91-8802466356














Tuesday, 16 January 2018

Excel Consolidation-spreadsheet consolidation

Excel consolidation Problem
Many a times we have seen the following:
·         MS Excel reports been developed and used since developing complex reports is very cumbersome at database level.
o   Large amount of data to be processed to develop these complex reports. Hence the end client does not desire to load the transaction system with these reports.
o   The reports are very complex.
o   The reports require data consolidation from multiple data sources to be developed.
o   The data sanctity of the reports is under question since the data can be tampered within MS Excel
o   Developing the MS Excel reports takes a lot of time.
If the end client is facing these issues then Turbodata should be able to help with the following:
·         Data consolidation services: consolidate data from multiple data sources onto a single location for developing consolidated reports.
·         Data cleansing services: cleanse the data for any bad entries.
·         Data transformation services: provide any business logic that is required for the product.
·         Business Intelligence services at lowest cost: development of complex reports using best in class Business Intelligence  tool at lowest cost(free if possible).
·         User security: each user to see its own data only.
·         Design of online forms is possible.
·         Data auditing and data profiling services are available.
Contact:
Apoorv chaturvedi
Website: www.mnnbi.com
Phone: +91-8802466356


Initial and Incremental data Load Template by M&N Business Intelligence-SAP Data Services

  INCREMENTAL LOAD/CREATION OF DIMENSION TABLE LOGIC AT SAP DATA SERVICES END CLIENT In this particular document we shall be looking at the ...