Business requirement: M&N BI has a product
called Turbo data wherein data consolidation, data cleansing, data auditing and
data profiling for reporting purposes was been done through SQL Server
database. The limitations of the above solution were as follows:
·
Installation time of SQL server: SQL
server would take a lot of time and memory to install on the end client
machine.
·
Requirement of a database expert for the
ETL process: the data from Tally and other ERPs can be very unclean. For
cleansing the same, the ETL team required a database expert for the same.
·
Reporting flexibility: The ETL team
looked to use MS Excel for increased reporting flexibility.
·
A quick solution for standalone machine:
the product should work on standalone machines.
That is the ETL team desired to use Excel for
reporting, data auditing and data profiling purposes to give the end clients
the following benefits:
·
Reduced time of deployment
·
Reduced cost of the solution
·
Increased flexibility in
reporting(specifically for GST purposes)
·
Reduce the cost of the manpower for
Excel deployment.
|
Deployment
|
Delivery
|
Usage
|
Disposal
|
Faster
|
ü
|
ü
|
|
|
Easier
|
ü
|
ü
|
ü
|
|
Convenient
|
|
|
ü
|
|
More Fun
|
|
|
ü
|
|
Environment
Friendly
|
|
|
|
|
To develop the scalable solution, the following are
the steps to be followed:
·
Extraction of the raw data from the ERP
(Tally): The raw data needs to be cleansed, audited and profiled. Thereafter
the reports of trial balance, balance sheet, profit and loss statements and
duplicate analysis to be developed.
·
The historical version of the balance
sheet extract to be stored at the client location (through the consolidation
feature in Excel).
·
From the multiple spreadsheets that have
been extracted over periods in time, the ETL team shall develop consolidated
reports.
Extraction of the raw Tally data: for the same,
Tally extractor shall be used (get the data onto flat files using pipe
delimiter). The ETL team shall extract daybook, ledger master, item, master,
godown master data from Tally ERP 9.0.
For cleaning of the data from tally database.
The ETL team
imported the Daybook data. The given input spread sheets has various special
characters across various columns. The ETL team was required to clean those
special characters.
For example in
the case of voucher type, Voucher number one has special characters like JV\9\MAYapos;16
For removing the
same the ETL team used the following excel function.
The cleansing
function of the voucher shall incorporate cleansing of the following character
sets:
·
apos;
·
amp;
·
quot;
·
|
·
#13;#10;
·
#10;
·
#13;
·
&
=IF(ISNUMBER(SEARCH("apos;16",B2))=TRUE,REPLACE(B2,FIND("apos;16",B2),7,""),
IF(ISNUMBER(SEARCH("amp;",B2))=TRUE,REPLACE(B2,FIND("amp;",B2),7,""),IF(ISNUMBER(SEARCH("quot;",B2))=TRUE,REPLACE(B2,FIND("quot;",B2),7,""),IF(ISNUMBER(SEARCH("amount;",B2))=TRUE,REPLACE(B2,FIND("amount;",B2),7,""),B2))))
The methodology of cleansing is as follows:
·
Check whether a particular character is present in the string using
Find function in excel.
·
Code the replace function in excel to replace the specialized
character with blank.
·
Use nested if functions to check for the availability of the
characters using ISNUMBER or ISTEXT or Isnontext.
·
The multiple conditions shall be incorporated in multiple IF
statements.
This logic was applied for
voucher number, voucher typename, companyname, ledgername, godownmname
For example in
the case of Voucher typename, Voucher typename one has special characters like Journal-amp;1
For removing the
same the ETL team used the following excel function.
The cleansing
function of the voucher shall incorporate cleansing of the following character
sets:
·
apos;
·
amp;
·
quot;
·
#13;#10;
·
#10;
·
#13;
·
&
·
-amp;1
The cleansing
function of the voucher type name
=IF(ISNUMBER(SEARCH("-amp;1",G2))=TRUE,REPLACE(G2,FIND("-amp;1",G2),8,""),IF(ISNUMBER(SEARCH("quot;",G2))=TRUE,REPLACE(A2,FIND("quot;",G2),8,""),IF(ISNUMBER(SEARCH("amount;",G2))=TRUE,REPLACE(G2,FIND("amount;",G2),8,""),G2)))
The methodology of cleansing is as follows:
·
Check whether a particular character is present in the string using
Find function in excel.
·
Code the replace function in excel to replace the specialized
character with blank.
·
Use nested if functions to check for the availability of the characters
using ISNUMBER or ISTEXT or Isnontext.
·
The multiple conditions shall be incorporated in multiple IF
statements.
In the case of
Company Name, Company Name one has special characters like ZISON TECHNOLOGIES GHANA LTD Jan-2013
For removing the
same the ETL team used the following excel function.
The cleansing
function of the voucher shall incorporate cleansing of the following character
sets:
·
-
2013
·
apos;
·
amp;
·
quot;
·
Jan-2013
·
-amp;1
The cleansing
function of the Company name
=IF(ISNUMBER(SEARCH("Jan-2013",A2))=TRUE,REPLACE(A2,FIND("Jan-2013",A2),29,""),IF(ISNUMBER(SEARCH("quot;",A2))=TRUE,REPLACE(A2,FIND("quot;",A2),29,""),IF(ISNUMBER(SEARCH("amount;",A2))=TRUE,REPLACE(A2,FIND("amount;",A2),29,""),A2)))
The methodology of cleansing is as follows:
·
Check whether a particular character is present in the string using
Find function in excel.
·
Code the replace function in excel to replace the specialized
character with blank.
·
Use nested if functions to check for the availability of the
characters using ISNUMBER or ISTEXT or Isnontext.
The multiple conditions shall be incorporated in
multiple IF statements.
The cleansing
function of the Party Ledger Name shall incorporate cleansing of the following
character sets:
·
apos;
·
amp;
·
quot;
·
Jan-2013
·
&
·
-amp;1
The cleansing
function of the Party Ledger Name
=IF(ISNUMBER(SEARCH("apos;16",J5))=TRUE,REPLACE(J5,FIND("apos;16",J5),7,""),
IF(ISNUMBER(SEARCH("amp;",J5))=TRUE,REPLACE(J5,FIND("amp;",J5),7,""),IF(ISNUMBER(SEARCH("quot;",J5))=TRUE,REPLACE(J5,FIND("quot;",J5),7,""),IF(ISNUMBER(SEARCH("amount;",J5))=TRUE,REPLACE(J5,FIND("amount;",J5),7,""),J5))))
·
|
·
$@
GH? 3.88/ =
·
-3202.18
$ @ GH? 3.88/ $
·
=
-GH?
·
2613.00 $ @ GH? 4/ $ = GH? 10452.00|
The cleansing
function of the amount
=IF(ISNUMBER(SEARCH("=",
Q3))=TRUE,
REPLACE(IF(ISNUMBER(SEARCH("|",Q3))=TRUE,RIGHT(Q3,LEN(Q3)-(SEARCH("=",Q3)+5))),FIND("|",IF(ISNUMBER(SEARCH("|",Q3))=TRUE,RIGHT(Q3,LEN(Q3)-(SEARCH("=",Q3)+5)))),7,""),IF(ISNUMBER(SEARCH("|",
Q3))=TRUE,REPLACE(Q3,FIND("|", Q3),1,""),Q3))
The methodology of cleansing is as follows:
·
Check whether a particular character is present in the string using
Find function in excel.
·
Code the replace function in excel to replace the specialized
character with blank.
·
Used search function in excel for extracting the substring.
·
Use nested if functions to check for the availability of the
characters using ISNUMBER or ISTEXT or Isnontext.
·
The multiple conditions shall be incorporated in multiple IF
statements.
Next steps:
Please send the Excel spreadsheets for data cleansing
Please send the Tally data for data consolidation, data cleansing, data auditing and data profiling.
Blog prepared by :
Agrima Lohia: https://www.linkedin.com/in/agrima-lohia-77799692/
Please contact the following for additional information:
Name: Apoorv Chaturvedi
Email: apoorv@mnnbi.com
Phone: +91-8802466356