Tuesday, 21 May 2019

Documentation for Dashboards and Data Modeling for Bottling Company with sales in Africa



The end client desires to upgrade the sales reporting for its resources from the current MS Excel spreadsheets to an online system.

The requirements of the end client were as follows:
1 .)    Consolidate the data from  multiple spreadsheets for analysis over period in time.
2.)    Provide visibility for sales and expenses by route and  truck .
Step 1: the ETL  team mapped  the spreadsheets onto a database(initially MS Access) for easy data entry at the client main location for after trip and before trip details. Attached are the details for the same (appendix a)
Design of the audit report:
As a first step the ETL team replicated the audit report for the end client in Powerbi. The purpose of replicating the audit report was as follows:
1.)    Check whether the numbers are correct as per the data entry done for the end client
2.)    Consolidate all the audit numbers for the end client in one interactive framework.
The details of the replication of the audit report are in appendix b.

Development of analytics and dashboards: the ETL team  identified four(4) key kpis for analytic purposes.
·         Sales: date, truck and route
·         Returned sales-returned crates, returned full bottles, returned empty bottles: date, truck and route
·         Samples: date, truck and route
·         Expenses: date, truck, route and expense type.
The behavior of expense is as follows:
·         Aggregate by date dimension and expense type
·         Semi aggregate by route and truck master
·         Non aggregate by item dimension

The behavior of sales is as follows:
·         Aggregate by date dimension
·         Aggregate by truck and route master
·         Non aggregate by expense type
·         Aggregate by item entities.

In case of expenses analysis the analytics to be developed were to be grouped by expense type(that does not fall in direct hierarchy with route and truck hierarchies). Hence the calculated measures need to be fixed at each level of analysis by expense type( for route truck and overall analysis).
Accordingly the etl team developed the following kpis for the expense analytics:
Expense(fixed overall):
All expense: CALCULATE(SUM(after_trip_allocation[expense]),ALL(after_trip_allocation),after_trip_allocation[type_of_expense_id]>0)
NOTE: The type of expense id was stipulated to be more than 0 since without the expense id expense aggregate would not make sense.

Expense across route id and expense nature:the given measures fixes the total expense
EXPENSE_ACROSS_ROUTE_EXPENSETYPE = CALCULATE(SUM(after_trip_allocation[expense]),ALLEXCEPT(after_trip_allocation,after_trip_allocation[route_id],'type of expense'[expense_nature]), after_trip_allocation[type_of_expense_id]>0)

Percentage calculation for route:
PERCENT_ROUTE = (after_trip_allocation[EXPENSE_ACROSS_ROUTE_EXPENSETYPE]/after_trip_allocation[expense_excl_nature])*100

Expense across truck id and expense nature:
EXPENSE_ACROSS_TRUCK_EXPENSE_NATURE = CALCULATE(SUM(after_trip_allocation[expense]),ALLEXCEPT(after_trip_allocation,Truck_master[Truck_ID],'type of expense'[expense_nature]),after_trip_allocation[type_of_expense_id]>0)


Percent expense along the truck route is defined as follows:
PERCENT_TRUCK = (after_trip_allocation[EXPENSE_ACROSS_TRUCK_EXPENSE_NATURE]/after_trip_allocation[expense_excl_nature])*100

Attached is the dashboard for the expense analysis:



Calculation  of sold bottles:
Sold bottles is the difference of quantity loaded,return bottles and sample bottles.
Attached is the snapshot for the sold bottles
sold_bottles =
SUM('Before_trip_allocation'[quantity_loaded]) - SUM('after_trip_allocation_item'[Returned_bottles])-CALCULATE(sum(after_trip_allocation_item[returned_full_bottles]),after_trip_allocation_item[sample_flag]="'sample'")

Sold bottles are calculated on the basis of:
·         Vehicle no.
·         Route description
·         Date dimension
·         Item dimension(brand,volume and holding type)
Attached is the dashboard of the sold bottles:



Vehicle no and the route description  is the slicer .we can analyzed the sold bottles for the particular  vehicle and route.


Calculation  of sample bottles
Sample bottles is extracted from the sample flag.
Attached is the snapshot for the sold bottles
Sample = CALCULATE(sum(after_trip_allocation_item[returned_full_bottles]),after_trip_allocation_item[sample_flag]="'sample'")

Sample bottles are calculated on the basis of:
·         Vehicle no.
·         Route description
·         Date dimension
·         Item dimension(brand,volume and  holding type)
Attached is the dashboard of the sample bottles:



Calculation  of return bottles:
Return bottles is extracted from the sample flag.
Attached is the snapshot for the sold bottles
Returned_bottles = CALCULATE(sum(after_trip_allocation_item[returned_full_bottles]),after_trip_allocation_item[sample_flag]="'returned'")

Return bottles are calculated on the basis of:
·         Vehicle no.
·         Route description
·         Date dimension
·         Item dimension(brand,volume and holding type)
Attached is the dashboard of the return bottles:



Appendix A
LIQUOR COMPANY  MOBILITY APP

The given implementation has been done to automate the field sales collection data for Liquor Company.

As a first step, the ETL team has deployed the MS Access database on the client end for data capture of before and after truck load. That is the ETL team has devised a system wherein it captures the data when the truck is loaded by various SKUs. Also the end client inputs the data after the truck has completed its journey.
The benefit to the end client is that the ms excel spreadsheets shall be replaced by database and consolidated reporting shall be available to the end client.
Access  Implementation
The end client desired  to have a set of reports using PowerBI  and database is created in MS Access  using the following features:
·         Calculate the crates loaded on the basis of driver name,areaname,sales person.
·         Calculate the Daily routine  sales and commission  report on the basis of brand ,volume and holding types.
·         Calculate the total bottles sold after the trip.
Database model for Access interface is as follows:( the data model was developed by reverse engineering the client spreadsheets)




Attached is the explanation of the database model:

Truck  Master: The truck  master entails all the master data for the truck.It includes the vehicle no.,driver no.,helper,sales person of the truck.

Route  Master: The route master entails all the master data for the route.The route master  has one to many relation with the truck  master.That is a truck can use the various route to travel or will have the same route on multiple dates. .It includes all the information of route i.e. when the  journey started ,what was the meter  reading and how many crates are loaded.

Before trip allocation: Before trip allocation entails all the information of the item before the journey  was started i.e. how much  quantity was loaded of which brand,volume and holding type.which type of load was it,either it is for “sales” ,”promotion”or ”sample”.

After trip allocation:it entails all the information of after trip of the truck on the given route.i.e how any full bottles,emptybottles,emptycrates,breakage bottles are returned,how much fuel was consumed and the expense details .

After trip allocation item: it entails all the information of after trip.i.e how any full bottles,emptybottles,emptycrates,breakage bottles are returned,which types of material  was sold “sample”,”promotion” or “sold”.

The database is created in MS Access.There are two buttons before trip and after trip.
Attached is the snapshot :

Before trip allocation contains all the details of the truck ,when the journey was started and how much quantity was loaded of which brand ,what is the volume and the holding types.





After trip allocation:the after trip form contain the  after trip allocation detail and expense detail on the basis of vehicle no. and area name.
Attached is the snapshot:

After trip allocation form contains the duration of journey and the description of the sample flag i.e  either the item returned was “return” or “sample” on the basis of brand, volume and holding types.




Expense detail: This form contains the expense  details of the truck.



Appendix B
Attached is the explanation for the dashboard developed for replicating the audit report.





The above portion has been mapped in PowerBI as follows

Area name and Vehicle no: These 2 have been assumed to be filter parameter as follows:







Vehicle no. and area name have been used as slicers for the given dashboards.



DAX function:The sample and returned bottles are calculate by  using the dax functionality.
Attached is the snapshot:





This is the aggregate function.we use the DAX functionality for the calculation of the returned bottles


For the same the etl team generated an aggregate for ‘after_trip_allocation’
Attached is the snapshot




Cross filter sample:



In case interested in converting spreadsheets to data model, please contact the following:

Name: Apoorv Chaturvedi
Phone: +91-8802466356
email: apoorv@mnnbi.com

Blog prepared by:

Thursday, 16 May 2019

Automation of GST filing for a large cinema chain-more than 150 locations



Problem statement:
The end client has more than 156 cinema locations for which the GST data needs to be consolidated for monthly GST filing. The problems that the end client desires to solve are as follows:
·         Automation of the GST loads process.
·         Error handling and data auditing

Tasks achieved
Full load of all 156 locations:

The ETL team tried single load and 2 parallel load approach for implementation of the same.
The performance parameters of the ETL process was gauged by the following parameters:
·         Time of load
·         Error logging and status logging
·         Types of errors
·         Validation issues

Process flow: The end client has a set of stored procedures running at remote locations. The procedures are of 2(two) types: 2 parameter and 3 parameters. The ETL load accommodated both the types of stored procedures into the final solution.
The ETL team has done an initial load for 2 parameter(only the date from and date to are given as input parameters) and 3 parameter locations(only the cinema_operator,date from and date to are given as input parameters).

Attached is the entire process flow at the process task level:



In the initial load package, the initial load of 2 parameters and 3 parameters happens in parallel.
Hereafter there could be some locations that are not loaded (of 2 parameters and 3 parameters). These locations are loaded in the incremental set up.
Condition for the incremental set up is derived from the variable value:



@[user::count]>0 condition is given for add missing value in data for  the process is running  when count Value is greater then 0.

The given variable @usercount is derived from the execution of the sql transform as follows:



Mapping of the result output to the variable:



Package explanation of initial load :



3 parameters and 2 parameters are loaded in parallel in initial load package.




Explanation of two parameters load




Fetch of the maximum row_id from audit_table. The table name audit_table has all the locations with 2(two) parameter load.



Set the parameters for loop increments.
The variable @inc is set to 1.
@max variable has been set to the maximum number of rows in the audit table.





For each location a dynamic connection string is used for connecting to the remote location and extracting data from the remote location.
Fetching of connection string for dynamic location: the audit table has the connection parameters or all locations. These connection parameters are extracted one by one using the execute sql command.






Sqlcommand: =
" select Cinema_strCode code, Accessed_user id, Channel_Address ip, convert(varchar(30), DECRYPTBYPASSPHRASE('8', password)) pass, SUBSTRING(tableSynonym, 0, CHARINDEX('.', tableSynonym, 1))dbname from tblCinema_Master where Cinema_strCode in (select code from audit_table where  row_id="+(dt_wstr,4) @[User::inc]+" );"

The data is set to result set



Dynamic Location is set to the static location by setting the expression connection string








Con1:(setting up the connection string)

"Data Source="+ @[User::ip1]  +";User ID="+ @[User::id1] +";Initial Catalog="+ @[User::dbname1] +";Provider=SQLNCLI11.1"+";Password="+ @[User::pass1] +";"



Now the Sql Execute Task connection string assigned dynamic connection.


Fetching the data from remote location using dynamic connection string:


Updated Intermediate_Test






Validation rules: the end client asked the ETL team to apply validation rules to check the quality of the data. the End client had input percentages of 18,12,11,14,0 and 5. Any transactions not in the given percentage ranges were classified as having wrong data.

Rule Apply first Check (18,12,11,14,0,5)


Checked Rule Second : this rule entailed that any entry having 0% GST should not have total value less than -2 or more than 2.




Update the Intermediate location with covered date and location.




Executed the exjoinexceptiontable for apply both rules.







Truncated the intermediate tables





truncate table tblGstIntermediate;
truncate table [dbo].[AuditTable1];
truncate table [dbo].[auditTable2];



 Incremental load of two parameters; this module indicates the approach for 2 parameter incremental load.
The incremental load for 2 parameters was required since there could be some locations that have been missed out of the initial load due to the connection errors. In such a scenario the incremental load for 2 parameter locations needs to be done. 

ALTER procedure [dbo].[sp_missing_2_parameter]
as
begin
truncate table  missing_2_parameters;
--drop index idx_cinema_str on tblGst_Test11;

with cte1 as(select distinct cinema_strcode from [dbo].[tblGst_Test11])
select * into #cte1 from cte1;
create index #idx_cte1 on #cte1(cinema_strcode);


with
cte2
as( select * from [dbo].[audit_table]

where code not in (select distinct cinema_strcode from #cte1))

insert  into missing_2_parameters
select *  from cte2;

--create index idx_cinema_str on tblGst_Test11(cinema_strcode);
end;


We will find the missing location from missing_2_parameters, and we

Start loading the data from missing_2_parameters table.




Executed the sp_missing_2_parameter for load the missing location
exec sp_missing_2_parameter;
The find out the maximum missing_id  from missing_2_parameters.

select max( missing_id)  from missing_2_parameters



Setting the value of for loop parameters


For loop will iterate till the max_inc meet.
Note :
Then we have taken sequence container

Inside taken Execute SQL task (Taking Dynamic connection string one by one)


The process flow hereafter is the same as that was done for the initial 2 parameter.

The ETL team replicated the same process for 3 parameter load.

Note: this blog covers only the serial execution. The process of parallel execution(running multiple dataflows simultaneously for 2 parameter data load was work in progress).


APPENDIX
Activity
1.)    Logging : the logging mode was enabled for package data load.

1.       Set the error log for the package.

Steps:-



Set the Provider type and configure the type of data base where to store.

Then set the event to store the error points






In case of any GST automation requirements, please contact the following:
Name: Apoorv Chaturvedi
email: support@mndatasolutions.com;support@turbodatatool.com
Phone; +91-8802466356
Website: https://mn-business-intelligence-india.business.site/









Data insertion-Tally(approaches)

 Problem statement: Many softwares look to insert data into Tally from their application.This blog looks at issues and approaches for the sa...