Showing posts with label Table Calculations. Show all posts
Showing posts with label Table Calculations. Show all posts

Tuesday, 26 February 2019

Tableau implementation : case study


The end client desired to have a set of reports for a large number of end users with the following features:
  •        Web access
  •         Drill down/drill through
  •          Cascading prompts: in the location hierarchy, say the city level details
  •          Date parameters as prompts. This shall dictate the following for reporting purposes:
  •          On fly calculation of various measures such as IPH(Item per head),SPH(sales per head) and ASP(average selling price) across all levels of hierarchies.
  •           Calculation of percentages based on various level s of aggregations.
Definition of Item per head: this is defined as the division of the following base measures.
Total number of items: this is the total number of items(item skus) sold for the given fiscal date for a given cinema_hopk.
Total number of admits: This is fixed at the cinema level. This is the total number of admits for a given cinema_hopk for a given fiscal date and hour block.

·         Requirement of fixing of the admits usage at cinema_HOPK, data_appended and hour block level. This fixed value of admits of shall thereafter be used for analysis across item hierarchies .

For meeting the above requirements, the following was the approach adopted by the end client.
·         Usage of Tableau server: For development of dashboards, web access of the dashboards, securitization (to be handled in more detail hereafter).
·         Fixing of the usage of admits: This problem is called the fan trap problem wherein there are 2 entities A and B with one to many relationships and the list of measures include those from entity ‘A’. In such a scenario there shall be a higher number of counts/sums from Table A.
For handling the same, the ETL team did the following:
  •         Usage of ‘Fixed’ table calculation function along with ‘Exclude’ Table calculation function. This enables fixing the values of admits for all levels of item hierarchies.
  •          In terms of sql, the etl team did a join between the admits and the  functional table(having fnb values and number of items values). A single table after coding all the required business logic was developed. Thereafter the sum of admits was chosen from the minimum row number of the consolidated table based on cinema_hopk, hour_bracket and date. During the usage of the date parameters, the number of admits was summed up  date based on the minimum row entries.

Logic for the the duplicate values removal is as follows:

[selectMIN(bcg_aggregate_id)as id_pick,
cinema_hopk,
date_appended,
hour_block
from infoworks_bcg_aggregate

groupby
cinema_hopk,
date_appended,
hour_block)


select*into #ctef from cte1;
createindex #idx_ctef on #ctef(id_pick);

update infoworks_bcg_aggregate
set row_pick_date='s'
from #ctef a
join infoworks_bcg_aggregate b
on a.id_pick=b.bcg_aggregate_id;]

Admit calculation in Tableau:  for the same, 3 step process was followed:
  •          Pick up admit values with minimum row numbers.
  •          Sum the admits based on parameters
  •          At the report level, exclude the group by item level entities for admit.

Attached are the snapshots for admit calculation from Tableau:

Admit_final=(if [Row Pick Date]='s' THEN ([Total Admits]) else 0 END)


Adm=SUM(if [date_from_]<=[Date Appended] and [date_to_]>=[Date Appended] then  ([admit_final]) else 0 end)
Adm1=({ EXCLUDE [Item Category],[Item Sub Category 1],
[Item Sub Category 2],[Item Desc Revised]:
[adm]})


Items calculation in Tableau : For the same, the following processes are followed:
·         Choose the items based on the parameter.
·         If the date appended   is between the  range of the input  parameters  then take the total items else mark the total number of items as 0.


Attached are the snapshots for items calculation from Tableau:
Total_items=(if [date_from_]<=[Date Appended] and [date_to_]>=[Date Appended] then  ([Total Items]) else 0 end)

Gross  Value  calculation in Tableau:For the same,following process are followed
  •          Choose the  Gross value    based on the date parameter.
  •        If the date appended   is between the  range of the input  parameters  then take the gross  values else mark the total number of gross  values  as 0.

Attached are the snapshots for fnb  calculation from Tableau:
Fnb_gross=(if [date_from_]<=[Date Appended] and [date_to_]>=[Date Appended] then  ([Fnb Value]) else 0 end)
IPH 
IPH stands for item per head and its logic is: total_items/total_admit
This is a calculated measure based on FNB gross and total number of admits .

Attached are the snapshots for IPH calculation from Tableau:
·         if sum([adm1])=0 then 0 else SUM([total_items_current])/SUM([adm1]) end

ASP 
It stand average selling price and its logic is: fnb value/total Items
This is a calculated measure based on FNB gross and total number of items

Attached are the snapshots for asp calculation from Tableau:
SUM([asp_1])/SUM([asp_2])

·         Asp_1(fnb): (if [date_from_]<=[Date Appended] and [date_to_]>=[Date Appended] then  ([Fnb Value]) else 0 end)

·         Total_Item(asp_2)=(if [date_from_]<=[Date Appended] and [date_to_]>=[Date Appended] then  ([Total Items]) else 0 end)

 SPH
SPH stands for sales per head and  its logic is: fnb/Total  Admits
This is a calculated measure based on FNB gross and total number of admits. The formulae is given below.

Attached are the snapshots for SPH calculation from Tableau
      if sum([adm1])=0 then 0 else sum([fnb_value_current])/SUM([adm1]) end


IPH Contribution (%) : IPH Contribution is the percentage calculation of the IPH value.
The logic is:
 ([IPH_PER_1] / [iph_per2])
The numerator is defined as follows:
·         IPH value at level of granularity for the date range as given in the input parameters. The granularity level can be cinema_hopk, city, state, region. The given IPH can be calculated at category, sub category 1, sub category2 and item description level.

The denominator is defined as follows:
·         Sum of IPH for the given date parameters at a level of aggregation that is defined by the numerator.  
For the same the following business logic has been followed:
Numerator(item description level): denominator(item sub category 2 level)
Numerator(item sub category 2): denominator(item sub category1 level)
Numerator(item sub category 1): denominator(item  category level)
This percentage calculation in an on fly calculation(snapshot) that has been done by the ETL team.

 Attached are the snapshots for IPH calculation from Tableau:
          IPH Contribution (%) : ([IPH_PER_1] / [iph_per2]
·         IPH_PER_1= SUM ([total_items_current]/[adm1])
·         iph_per2     =WINDOW_SUM ([IPH]). In order to calculate the sum across all the IPH at the given level of granularity the ETL team used the concept of Pane down within Tableau.
·         Go to the iph_per2  select the pane(down) option.  
SPH Contribution(%):SPH Contribution is the percentage calculation of the SPH value.
The logic is:
  ([SPH_PER_1]/([SPH_PER_2]))
Attached are the snapshots for SPH calculation from Tableau:
SPH Contribution(%):  ([SPH_PER_1]/([SPH_PER_2]))
·         SPH_PER_1:= SUM([fnb_value_current]/[adm1])
·         SPH_PER_2:=WINDOW_SUM([SPH])

Business logic details : The ETL team has added the SQL logic and Tableau logic  along with the same.
The Business logic is applied inside the project is as follows

SQL end:


1.  Date is applied on the basis of the   6:00 AM to 6:01 AM

The logic is used for the concept as :


WHENdatepart(hour,(TransI_dtmDateTime ))<=6 THENDATEADD(DAY,-1,cast(TransI_dtmDateTime asDATE))
ELSEcast(TransI_dtmDateTime asDATE)END
as date_appended,

1.  Time_Slot is applied  on basis of dateTime logic as follows
(
  9AM – 12 PM
  12PM – 3PM
        3PM – 6PM
  6PM – 9PM
  Post 9PM

)

whendatepart(hour,TransI_dtmDateTime)>=9 anddatepart(hour,TransI_dtmDateTime)<=12 then'9am-12pm'
whendatepart(hour,TransI_dtmDateTime)>=12 anddatepart(hour,TransI_dtmDateTime)<=15 then'12pm-3pm'
whendatepart(hour,TransI_dtmDateTime)>=15 anddatepart(hour,TransI_dtmDateTime)<=18 then'3pm-6pm'
whendatepart(hour,TransI_dtmDateTime)>=18 anddatepart(hour,TransI_dtmDateTime)<=21 then'6pm-9pm'
else'Post 9pm'endas hour_block

Business Logic coded at Tableau end.

3.WeekDay_WeekEnd: at the Tableau end

IF DATENAME('weekday',[Date Appended])="Sunday" OR
   DATENAME('weekday',[Date Appended])="Saturday" OR
   DATENAME('weekday',[Date Appended])="Friday" 
THEN "Weekend"
ELSE "Weekday"
END





4.By cinema type(Applied on the basis of filter basis)
- Mainstream - Gold class

5. Hierarchies (Roll-down,Roll-up basis of hierarchies)

Measure business logic:

Some condition applied for the measure calculation

·         i)In Some cases on some cinemaHopk the admit count in specific HourBlock , there is missing admit In that case default value will be 0
CinemaHOPK-Admit(No value)->HourBlock Then default value will be 0.

·         ii) In Some cases on some cinemaHopk the admit count in specific HourBlock , there is missing Items in that case default value will be 0
CinemaHOPK-Item(No value)->HourBlock Then default value will be 0.


·         iii) In measure calculation condition if the admit count will be zero then  means denominator value will be zero then condition will be N/A.




Level of aggregate of objects:
·         Gross sale: aggregate and cumulative measure. That is the measure can be rolled up by all levels of item hierarchy, all level of cinema hierarchy and all levels of date dimension.
·         Item sale: aggregate and cumulative measure. That is the measure can be rolled up by all levels of item hierarchy, all level of cinema hierarchy and all levels of date dimension.
·         Number of admits: semi aggregate and snapshot. This measure can be rolled up by date dimension and location dimension. However for the item dimension, the value is fixed at cinema hopk level.


Samples of Tableau functions

Use of fixed function

We are taking sum of total admits as admit_final  and we fixed the admit_final  on the basis of hopk.
Admit_final=sum(total admits)
Admits={fixed(hopk): Admit_final}


ETL load for SAMPLE Cinemas

 

          Full load is time consuming because it loads the full data again (all records, including those that haven't changed since the last load) into the database.Incremental load means only loading the records into the warehouse that have changed (inserts, updates etc.). The advantages of incremental load are as follows:·         Lower load times for nightly load. This implies that the end client shall have more time to do report reloads.·         Lower load on the system.·         Flexibility to change the time of incremental load based on the end client requirements.·         Flexibility to add additional sources as required.

The advantage is that it reduces the amount of data being transferred from system to system, as a full load may take hours / days to complete depending on volume of data.

The approach of incremental data load for SAMPLE Cinemas:
·         For all the tables the ETL team shall load the LOAD_TS.
·         For the incremental data load, the ETL team shall compare the date_appended from the source systems with the Load_ts.
·         The ETL team shall pick up only the data from getdate()-10 for the incremental data load. This shall help capture changes for all entries that have getdate()-10 as date inputs.
·         For the inserts, updates and deletes, the ETL team shall use the merge statement. The composite keys for the same shall be cinema_hopk, date_appended, hour_block and item_desc_revised.

Other parameters to be seen during the incremental data load:
·         All hash joins shall be converted to merge joins/nested loop joins to minimize the page scans. This has been done using the following method:
o   All the intermediate data has been stored in the temporary tables in the order of the composite keys: cinema_hopk, date_appended,hour_block,item_desc_revised.
o   Indexes have been created on the temporary tables based on the order of the composite keys in which data has been stored.
·         Creation of indexes: all indexes in the target tables shall be dropped before the incremental load begins. After the incremental load ends, the indexes shall be re created.
In case the end client decides to do reporting using ‘live’ extraction then the following are the actions steps by the ETL team.
o   Creation of columnar indexes on all group by and filter parameters.
·         Error logging: For each table upload, the required error logs can be generated. The end client can have the option of stopping the process of incremental data load in case the required number of errors have been crossed during the incremental data load process.
·         Isolation levels: For the same, the ETL team shall require the end client inputs.
For minimal load on the transaction system, the ETL team would recommend transaction isolation level on ‘uncommmitted’. In case updates can happen then ‘read repeatable’, if deletes can happen then ‘serializable’.
·         Precedence constraint: For the sql execution, the precedence constraint is ‘success’. That is a procedure execution has to be successful before the next procedure is executed. No code has been written for failed execution for the sql version of the code.
·         Transaction level: The entire code shall run in one session. So the transaction level for the sql version is ‘required’.
·         Audit logs: Currently the ETL team has not coded anything for the audit logs. In case the end client desires audit logs then the temporary tables to be replaced by permanent tables. This could require re coding in SQL.
·         Checkpoints and breakpoints: No checkpoints and breakpoints have been coded in the sql code currently. For checkpoints and breakpoints, the ETL team needs an ETL tool.
·         Parallelism: It is not enabled in the current version of the sql code.

    GO
/****** Object:  StoredProcedure [dbo].[sp_master_x_load]    Script Date: 02/26/2019 10:29:14 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER procedure [dbo].[sp_master_x_load]
as
begin
--execute dbo.SP_BCGWORKS_AGGREGATE;
execute dbo.SP_consultWORKS_AGGREGATE_new;
execute dbo.sp_infoworks;
   execute dbo.sp_infoworks_bcg_aggregate;
--execute sp_infoworks_bcg_aggregate_sub_category_above;

end;


Incremental  load  in sql

We use a load_ts  known   as load timestamp to load the update the data.


WITH CTE_MAX_LOAD_TS
AS(SELECT MAX(LOAD_TS) AS MAX_LOAD_TS FROM dbo.BCG_AGGREGATE_APPENDED)

SELECT * INTO #MAX_LOAD_TS FROM CTE_MAX_LOAD_TS;







Execution plan




with cte1
as(select sum(fnb_value) as fnb_value,
sum(total_items) as total_items,
hour_block,
cinema_hopk,
item_sub_category_1,
item_sub_category_2,
item_category,
item_desc_revised,
DATE_APPENDED,
cinema_name,
country,
cinema_location,
state_strname,
region

from dbo.BCG_AGGREGATE_APPENDED a
left outer join Cinema_master_Edited  b
on a.cinema_hopk=b.hopk
WHERE DATE_APPENDED >(SELECT  MAX_LOAD_TS FROM #MAX_LOAD_TS)

group by
hour_block,
cinema_hopk,
item_sub_category_1,
item_sub_category_2,
item_category,
item_desc_revised,
DATE_APPENDED,
cinema_name,
country,
cinema_location,
state_strname,
region)

select * into MAX_LOAD_TS from cte1

order by cinema_hopk,DATE_APPENDED,hour_block;

Here we check whether the date_appended is greater than load timestamp or not?

with cte1a
as(select SUM(total_admits) as total_admits,
a.hour_block,
a.cinema_HOPK,
a.infoworks_date

from infoworks_aggregate a
WHERE infoworks_date>(SELECT  MAX_LOAD_TS FROM #MAX_LOAD_TSss)

group by
a.hour_block,
a.cinema_HOPK,
a.infoworks_date)
select * into MAX_LOAD_TS from cte1a



Business case study:

The etl  team has solved  the following problems for the end client:
1.       Identify the stock on the basis of cinema location and item description
2.       Estimate the overstock and under stock values of the items based on the IPH consumption.
3.       Estimate the trend line of consumption based on IPH(Item per head) consumption.

So the following approach was used:
1.       Fixed the iph on the basis of the cinema location and item description
2.       Categorize the item  as ‘’A”,”B”,”C” based on top 70%, 70%-90% and above 90% values respectively.
3.       Estimate the iph  of category “A”on the basis of weekday and weekend
4.       Estimate the iph of  category “A” on the basis of fiscal date across the fiscal month. 

Steps used for solving the :
1.       Drill down the iph on the basis of cinema location and item description
1.       Calculate the iph%
 For  iph %  following tableau calculation are done:
·         Iph%=  (RUNNING_SUM([iph])/[windows_sum])*100
                      Iph= (if sum([admit])=0 then 0 else SUM([total_item_current])/SUM([admit]) end)
                                         Window_sum=window_sum([iph])
                            

       3.  For the categorization of iph% following steps are used:
a.If the iph% is less than or equal to 70 then it will fall on category A.
b.If the iph% is between 70 and 90 then it will fall on category B.
c.If the iph% is greater than 90 then it will fall on category C.
Following is the snapshot from Tableau:


3.1. This is the category “A” item which have the maximum sale on the fiscal month April






1.1. When we  analyze the fig:1.1 and fig:1.2 for the iph of regular pepsi on the basis of admits and weekend/weekdays, the following  inferences can be drawn:
1.       The number of admits and the IPH fall over the course of the fiscal weeks for the month analyzed.
2.       In fig:1.1 on date 8th April(weekend) number of admits are highest stock(above monthly average) but the IPH was average (indicating possible stocking issues on the given fiscal date)
3.       In fig:1.2 on date 8th April  the iph was lower than 7th april The same needs to be explained by the end client.
So the following conclusions can be drawn:




1.       The IPH could  have  decreased because of understocking at the end client on the given fiscal dates.




Fig. 1.1.






Fig:1.2


3.3.When we analyze on the day basis the iph of the regular pepsi on 25th April the IPH is 3% instead of the average of 8% across the fiscal month(indicating possible  lack of availability of the Pepsi stock)





Fig:1.2
Trend line
Trend line is use to predict the trend of the usage.
3.4..Trend line of the iph lying on the “A” category



3.5. When we  analyze the small pepsi which belongs to the category “A” it is understock on  5th april and 13th april and on 25th april the stock was maximum.






Please contact:

Apoorv Chaturvedi
Email: support@mndatasolutions.com;support@turbodatatool.com
Phone: +91-8802466356
website : https://mn-business-intelligence-india.business.site/

















Data insertion-Tally(approaches)

 Problem statement: Many softwares look to insert data into Tally from their application.This blog looks at issues and approaches for the sa...