Monday, 8 October 2018

Fuzzy Lookup for replacing special characters in SSIS

Using fuzzy lookup for cleansing data using SSIS
Problem Statement : The ETL team has developed a sql code for cleansing the data from ERP such as Tally. The code entails multiple conditions for lookup of special characters. The ETL team intends to automate the lookup process having special characters. Hence it has developed an SSIS package for the same.

The original sql code has code for replacement of special characters as follows:

 



1st special character lookup where ‘amp;’ was replaced by   ‘&’ and ‘#13;#10;’ was replaced by space  



2nd special character lookup: in this the special characters within SQL such as char(9),char(11),char(12),char(13) have been replaced by blank.

The issue with the above code was that during each ETL extract across multiple companies, the ETL team had to look for various special characters because of the nature of the input data source. The ETL team desired to automate the entire process using SSIS  ( fuzzy lookup function).


 The replacement of special characters using fuzzy look up transformation which performs data cleansing task such as data correction and providing missing values. Fuzzy look up transforms return data with one or more closest matching values. It does not differentiate between close values and provides exact matching. Fuzzy look up transformation is used to correct and cleanse the data. The fuzzy look up transform takes input data from the data flow and matches with a table for the mismatch values.



The fuzzy look up transform takes the input data from data flow and provide the connection with sql server   for a  specific database to  retrieve the input table from the same database and match the column of the table  as shown in below:

In the first step, the ETL team extracted the required columns from the source table.




After retrieving the  required input columns fuzzy lookup transform uses the reference table to match the exact value between input column and look up column as shown below:



For exact matching set the value of threshold in fuzzy lookup transforms as shown in below:
The ETL team used an approximation of 65%(based on trial and error). It also set up the maximum number of matches per lookup output to be 1(no duplicate values). The special characters to be fed into the lookup transform have been attached in the fuzzy lookup ‘additional delimiters’ section.




For automating the following portion of the sql code entailing filter parameters:

The team used the conditional split as follows:

The conditional split is used in SSIS tool for condition such as IF and case statement as used in programming language and set the variables, parameters  ,predefined functions and operators .



Union all is used to combine multiple sources and produce one output without sorted the data where the order of data is not necessary. If wants to sorted the data merge join is used instead of union all transformation.

Contact details:
For resource requirements:resources@mnnbi.com
For training and consulting services: apoorv@mnnbi.com
Blogspot details: https://mndatasolutionsindia.blogspot.com/

Written by Gauri(gauri@mnnbi.com)

https://mndatasolutionsindia.blogspot.com/p/gauri.html


Thursday, 4 October 2018

Weighted average valuations-Turbodata


The given method of stock valuation matches with the Tally ‘Default’ method of stock valuation with ‘stock in hand’ logic.
The weighted average calculations by Turbodata have been calculated at the monthly level. There is an option to go at the daily level for the weighted average calculations. However in order to calculate the daily weighted average , the following are the pre requisites:
·         The quality of the data should be fine(there should be no negative stocks on any fiscal date).
·         In case the calculations have to be done at a lower level of granularity(godown, cost center etc) then the quality of data has to be good at that level of calculation.
In a similar fashion, the weighted average rate can be calculated at a quarterly and yearly level.

Logic of the weighted average calculations :
·         The ETL team calculates the cumulative sum of debit entries(purchase invoice and journal debit invoice, credit notes and debit notes. The Purchase order and sales order entries are excluded.) as of the given fiscal date. These calculations are done on a monthly basis. Within the cumulative sum of the debit entries , the sum of quantities and the sum of values(without taxes) for the items are calculated. There are some pre requisites to the calculations of the cumulative sums:
o   The voucher entries should have item purchase or item journal debit numbers. Without the given entries the ETL team does not consider the given entries
o   The cumulative process shall stop at a point for the items where the value becomes negative. This is because stocks cannot have negative values.
o   The ETL team has considered only purchase and item journal debit entries. Other debit entries such as cash payment or cash receipts have not been considered even if the item name is in  requisite vouuchers. The reason for not considering the payment and the receipt entries is that there are no quantities associated with the same.
·         The ETL team calculates the cumulative sum of credit entries(sales and journal credit )[ purchase invoice and journal debit invoice, credit notes and debit notes. The Purchase order and sales order entries are excluded.]  as of the given fiscal date. These calculations are done on a monthly basis. Within the cumulative sum of the credit  entries , the sum of quantities and the sum of values(without taxes) for the items are calculated. There are some pre requisites to the calculations of the cumulative sums:
o   The voucher entries should have item sales or item journal credit  numbers. Without the given entries the ETL team does not consider the given entries.
o   The cumulative process shall stop at a point for the items where the value becomes negative. This is because stocks cannot have negative values.
o   The ETL team has considered only sales and item journal credit entries. Other credit entries such as payment or receipts have not been considered even if the item name is in the stock item details. The reason for not considering the payment and the receipt entries is that there are no quantities associated with the same.
Difference between the Tally logic and ‘Turbodata’ logic
·         Tally takes in all the debit and credit entries without considering the voucher type logic. Turbodata stocks with only purchase, sales, journal credit and journal debit entries.
·         Turbodata does not allow the negative valuations to go through in the inventory valuations. In such a scenario the end user shall need to correct the input entries.


Are you a customer having the following issues:


Having issues with large value of  slow moving inventory
Have issues with cash flow cycles
Do not have clarity regarding product profitability




Our product Turbodata can help your firm with resolving the above issues. The product is inspired by philosophy of The Goal by Eliyahu Goldratt and Profit Beyond Measure by Thomas Johnson and Ander Brohms(please see the appendix 1 for a summary of the philosophies)

Both the philosophies imply that the end client should use the order line profitability instead of using the periodic calculations. Only then would the end client get complete visibility into its operations and profitability by customer, region etc.

What is required for determining the orderline profitability?
For determining the same the end client needs to have valuations of inventory using perpetual method instead of the periodic method.
As a case to the point, consider the following:




In the attached scenario of an item, the valuation using weighted average/FIFO has been done on periodic basis. Hence the end client looses the orderline profitability details by using the same.

However in the snapshot below using Turbodata, the weighted average calculations are done on a daily basis(as in the attached snapshot)

 

This enables the end client to calculate orderline profitability.

Issues with calculating the orderline profitability:
v  In some of the software,  negative stock is allowed.  Because of the same orderline profitability calculations might be impacted. The sample below gives the first instance of negative stock for an item.
Sample attached below:






v  The physical stock entries valued at 0(zero) value can create discrepancies in the stock valuations.
v  Data consolidation from multiple systems could be required for calculating the same.
v  Data transformation in terms of business logic of the end client needs to be done so that the required calculations come into force.

By using Turbodata, the end clients shall be able to achieve the following:
v  Go towards orderline profitability by getting an estimate of cost of goods sold based on perpetual FIFO and weighted average calculations.
v  Achieve the following activities
o   Data cleansing: clean the master data before reporting is done
o   Data profiling: find the first instance when the closing stock of an item turned negative at godown or consolidated level.
       Data analytics: have consolidated dashboards along with predictive analytics facilities at economical costs.
v  Better management of inventories: by finding the profitability of the sale of items at the orderline level for a given set of customers.
v  Prepare the data for predictive analytics and forecasting through data compression and sql reduction. The predictive analytics and forecasting is required to capture the variations from the standard values for sales. A significant variation is to be captured early so that the end client could take the corrective actions quickly.

Interested in moving towards orderline profitability:
v  Deployment of Turbodata solution(for testing sample data): USD 3000/-(USD Three thousand only)+taxes as applicable. Contact us for a sample demo
v  Buy our standard book based on Turbodata project experiences: USD 5/-(five) dollars
Please contact the following for the above for a demo
Name: Apoorv Chaturvedi
Website: www.mnnbi.com


Appendix 1

What do the above management philosophies say?
The Goal:
The Goal is inspired by the theory of constraints. This implies that there are 3 parameters that are critical for any firm:
v  Throughput: the rate at which the system generates the sales(our definition of cash sales)
v  Inventory: the input material required to convert the inputs material to final product for generating throughput.
v  Labor: The manpower required for converting inventory to throughput.
The protagonist Jonah in ‘Goal’ also insisted on standard deviations and variations to be part of the process. The variations to be detected on a close to real time basis so that any errors are caught beforehand.

Profit Beyond Measure:
Profit Beyond Measure  is inspired by the Toyota Production system. It emphasizes that the manufacturing company should function like a human body. The functional managers should account for self sustainability(standard cycle times), diversity and interdependence( the manufacturing managers need to look at the whole system like a human body and not just a single component).
The book emphasizes that there should be a reduction in inventory by reducing a changeover times at each of the working station. That is the manufacturing process should start once the customer order has come into the system. The book further looks at ‘Design to order’ by designing multiple configurable modules to offer the end clients multiple types of products.
The system emphasizes catching the errors in production cycle quickly so that there is reduced material wastage.

Sample example of inventory optimization:Inventory optimization of large trading company

For Inventory Optimisation, our top end functional consultants and partners



For inventory optimization services, consider the following:



Automated GST Filing using Turbodata and GSP Partners

Are you facing the following issues with regards to GST filing?
  • ·         Delay in filing
  • ·         Concern regarding the changing regulations from the government
  • ·         Concern regarding reconciliation: specially for customers using MS Excel upload.
  • ·         Have a manual process for GSTR filing. The manual process is prone to error
  • ·         Have high manpower costs related with GST filing.

Turbodata shall help your firm with faster, easier and more convenient GST filing.
How is Turbodata different?
  • ·         All the reports for the end client shall be developed on the cloud installation. Only a minimal extract for all the vouchers and masters shall be done from the end client location. The ETL team shall commit to usage of maximum amount of RAM for the same(say 1 GB for incremental data extract)
  • ·         The end client can do the prior change of the data. The system shall automatically take care of the same. This is enabled through incremental data load process using data normalization.
  • ·         No reports shall be developed at the client location. All the reporting work shall be done at the server location.
  • ·         Initial and incremental transaction data extract shall be done from the end client location.
  • ·         The end client need not worry about re filing the GST reports since it shall be done by the GSP partner automatically.
  • ·         The package is very easy to deploy, deliver and maintain. No high end software are required. The system can extract data from SAP, Tally and other source systems with ease.
  • ·         Dependence on MS Excel for tax filing purposes is taken away since it could result in data errors and discrepancies.
  Current system:




Why is the Turbodata system better?

Turbodata system:


·         Turbodata system is inspired by ‘The Deming Way’, ‘The Goal’ and the Toyota production system and the Inmon methodology.  In a nutshell the following are the features copied from the above systems by Turbodata:
o   No error prone data should be passed for the reporting purposes. The data needs to be cleansed, audited and consolidated before report development.
o   The processing of the transaction should be done as soon as the transaction has been fed in the source system. That is the processing should take place on a real time basis and not specifically at the end of the month. Turbodata enables this feature in the following manner:
§  Each transaction fed into the end client source system is assumed to be an order from the end client.
§  The system offers the facility for real time extract and upload(current system is manual but the data can be loaded on a daily basis by the end client go the server)
o   Once the data has been loaded onto the server, it is transferred to a normalized database(insert, update and deletes). At the data warehouse level the data cleansingdata transformationdata consolidation activities are done
o   Once the data has been cleansed at the datawarehouse level then the reports for GST are developed. In one single lot, GSTR1, GSTR2 and GSTR3 reports can be developed.
o   Turbodata is integrated with at least one GSP partner. The end client could look at other GSP partner solutions if it desires the same.
o   The deployment of the solution is very easy and convenient. For any end client the deployment should take not more than 20(twenty) minutes. Minimum installation pre requisites are required.
o   The data for the end client is stored in a datawarehouse. The end client does not need to worry about changes in the statutory requirements. Other high end services like inventory optimization and predictive analytics are possible on the cloud.

To check why should the end client consider Turbodata GST, please check the following linkage:


GST Reporting Issues:
Contact
Apoorv Chaturvedi
Phone: 8802466356
Website: www.mnnbi.com

Indicate the following:
·         ERP system/ERP systems
·         Turnover: frequency of load
·         Number of locations

 ·         Automation of GST filings(a sample with a GSP).
·         Attached is a blog for the same:


·         Blog link: http://mnnbi.net/gst/gst.aspx


·         Preliminary video: https://www.youtube.com/watch?v=-Vp7JzHCYp0&feature=youtu.be

·         Detailed demo link: https://youtu.be/d-KD3mp1jic



Detailed demo link: https://youtu.be/d-KD3mp1jic

The demo was attended by a number of CFOs from industry.

Sample problems that a datawarehouse can solve:
·         Data audit and data profiling before GST is filed-samples given in the demo.
·         Single source of truth for all online GST reports. For example if GSTR1B2B is updated then HSN summary and document summary should also be updated.

Automation of spreadsheets: client reference(a major exporter based in Gurgaon). We are now able to automate the spreadsheets due to our technology of data compression and sql reduction at lower costs. The end client could send the sample profit and loss and balance sheets to be automated along with the source system details.
Attached is the blog link for the same:


Capturing Data Entry Errors for Audit Purposes
A number of times the end client types in wrong data into the source ERP system thereby resulting in wrong outputs and results. Junk inputs imply junk outputs.  The ETL team would recommend an auditable output from Turbodata to be used as part of the reporting purposes.  Wrong data inputs can impact the end client in one or more of the following ways:
  •        Wrong tax filing specifically in online scenario.
  •         Wrong business picture
  •         Wrong predictive analytics.
As per the Toyota ProductionSystem, bad inputs should not be processed further as it adds to the final costs.
The ETL team(my firm) has found the following errors with regards to the data entry inputs specifically with Tally ERP 9.0.  

·         Stock input has been in one godown but stock outward movement has been from other godowns:





·         Missing purchase or sales order entries resulting in negative stocks at given points in time. One cannot have negative stock balances at any point in time.



Other data input errors that we have commonly seen are as follows:

  •       Duplicate payment entries
  •      Duplicate sales entries
  •        Receipt note entries but no purchase invoice entries
  •         Payments not having the required bill reference numbers.
How to resolve the errors:
·         In an object oriented program it is difficult to catch the errors on a real time basis. The ETL team recommends using the relational databases for catching the errors. The real time extraction module for Turbodata should be used for the same.
·         Transferring the data onto the third normal database is recommended. This helps catch data duplicity based on the composite keys.
For example if an end client has made the same amount payment for a given voucher on a given fiscal date, then the same should come as part of the discrepancy report. It is possible that the end client could be correct. There is also a possibility that the payment entries have been made by 2 different resources. Further handling of the given situation is as follows:
·         If the end client desires to catch the following error then the username by which the data entries have been done shall not be added to the composite key. In such a scenario there is a discrepancy between the Turbodata ledger balance output and the Tally report. The end client to approve the discrepant entry before the data is input into the system for auditing purposes.
Using perpetual valuations for ledger and inventory instead of periodic valuations. For example if an end client relies on periodic valuations for ledger balances then a duplicate payment entry then the periodic balances at the end of the fiscal month are difficult to catch. For example if an end client has a duplicate entry of Rs. 100k(One hundred thousand  only) over a balance of say Rs. 15000k(One fifty million only).
However using the perpetual system it is easy to catch the data entry errors.

Matching the consolidated trial balances and closing stock balances at the database level with the on fly calculations at the software level.

A small story for the end user: as Yuval Harari is Sapians says that mankind is primarily driven by myths. Hence many a managers are driven by myths regarding software or the consulting companies having the right audit numbers(with the managers inputting junk numbers).
A small story from one of my favourite books(Raag Darbari by Srilal Shukla) could best illustrate the point.
The protagonist Ranganath had gone from the city to visit his relative, an aunt’s husband , in the village. During the course of the village fair, it was suggested that the group goes and sees the village temple for the local goddess. At the temple Ranganath found that the statue instead of been of a goddess was of a soldier( for a goddess he was looking for two lumps  in front and two lumps in the back). The priest asked for donations for the goddess. To this request Ranganath refused saying that the statue was not of a goddess but of a man. There was an ensuing scuffle between the villagers and Ranganath. Ranganath was eventually rescued by his cousin. On going out and meeting other people, the cousin mentioned the following:
"My cousin has come from the city and is very well read. That is why he talks like a fool."
The author has always associated himself with Ranganath.


Apoorv Chaturvedi
Website: www.mnnbi.com

For getting your  data checked and data audited before filing the GST reports, consider the following:
Or fill up the contact form on the website  http://mnnbi.com/.



Initial and Incremental data Load Template by M&N Business Intelligence-SAP Data Services

  INCREMENTAL LOAD/CREATION OF DIMENSION TABLE LOGIC AT SAP DATA SERVICES END CLIENT In this particular document we shall be looking at the ...