The end client desired to have a set
of reports for a large number of end users with the following features:
- Web access
- Drill down/drill through
- Cascading prompts: in the location hierarchy, say the city level details
- Date parameters as prompts. This shall dictate the following for reporting purposes:
- On fly calculation of various measures such as IPH(Item per head),SPH(sales per head) and ASP(average selling price) across all levels of hierarchies.
- Calculation of percentages based on various level s of aggregations.
Definition of Item per head: this is defined as the division of the following base measures.
Total number of items: this is the total number of items(item skus) sold for the given fiscal date for a given cinema_hopk.
Total number of admits: This is fixed at the cinema level. This is the total number of admits for a given cinema_hopk for a given fiscal date and hour block.
·
Requirement of fixing of the
admits usage at cinema_HOPK, data_appended and hour block level. This fixed
value of admits of shall thereafter be used for analysis across item
hierarchies .
For meeting the above requirements,
the following was the approach adopted by the end client.
·
Usage of Tableau server: For
development of dashboards, web access of the dashboards, securitization (to be
handled in more detail hereafter).
·
Fixing of the usage of admits:
This problem is called the fan trap problem wherein there are 2 entities A and
B with one to many relationships and the list of measures include those from entity
‘A’. In such a scenario there shall be a higher number of counts/sums from
Table A.
For handling the same, the ETL team
did the following:
- Usage of ‘Fixed’ table calculation function along with ‘Exclude’ Table calculation function. This enables fixing the values of admits for all levels of item hierarchies.
- In terms of sql, the etl team did a join between the admits and the functional table(having fnb values and number of items values). A single table after coding all the required business logic was developed. Thereafter the sum of admits was chosen from the minimum row number of the consolidated table based on cinema_hopk, hour_bracket and date. During the usage of the date parameters, the number of admits was summed up date based on the minimum row entries.
Logic for the the duplicate
values removal is as follows:
[selectMIN(bcg_aggregate_id)as id_pick,
cinema_hopk,
date_appended,
hour_block
from
infoworks_bcg_aggregate
groupby
cinema_hopk,
date_appended,
hour_block)
select*into #ctef from cte1;
createindex
#idx_ctef on #ctef(id_pick);
update
infoworks_bcg_aggregate
set
row_pick_date='s'
from
#ctef a
join
infoworks_bcg_aggregate b
on
a.id_pick=b.bcg_aggregate_id;]
Admit
calculation in Tableau: for the same, 3
step process was followed:
- Pick up admit values with minimum row numbers.
- Sum the admits based on parameters
- At the report level, exclude the group by item level entities for admit.
Attached are
the snapshots for admit calculation from Tableau:
Admit_final=(if
[Row Pick Date]='s' THEN ([Total Admits]) else 0 END)
Adm=SUM(if
[date_from_]<=[Date Appended] and [date_to_]>=[Date Appended] then ([admit_final]) else 0 end)
Adm1=({ EXCLUDE [Item
Category],[Item Sub Category 1],
[Item Sub
Category 2],[Item Desc Revised]:
[adm]})
Items calculation in Tableau : For
the same, the following processes are followed:
·
Choose the items based on the
parameter.
·
If the date appended is
between the range of the input parameters
then take the total items else mark the total number of items as 0.
Attached are
the snapshots for items calculation from Tableau:
Total_items=(if [date_from_]<=[Date
Appended] and [date_to_]>=[Date Appended] then ([Total Items]) else 0 end)
Gross Value calculation in Tableau:For the same,following process are followed
- Choose the Gross value based on the date parameter.
- If the date appended is between the range of the input parameters then take the gross values else mark the total number of gross values as 0.
Attached are the snapshots for fnb calculation from Tableau:
Fnb_gross=(if [date_from_]<=[Date
Appended] and [date_to_]>=[Date Appended] then ([Fnb Value]) else 0 end)
IPH
IPH stands for item per head and its logic
is: total_items/total_admit
This is a calculated measure based on FNB
gross and total number of admits .
Attached are the snapshots for IPH
calculation from Tableau:
·
if sum([adm1])=0 then 0 else
SUM([total_items_current])/SUM([adm1]) end
ASP
It stand average selling price and its
logic is: fnb value/total Items
This is a calculated measure based on FNB gross
and total number of items
Attached are the snapshots for asp
calculation from Tableau:
SUM([asp_1])/SUM([asp_2])
·
Asp_1(fnb): (if
[date_from_]<=[Date Appended] and [date_to_]>=[Date Appended] then ([Fnb Value]) else 0 end)
·
Total_Item(asp_2)=(if
[date_from_]<=[Date Appended] and [date_to_]>=[Date Appended] then ([Total Items]) else 0 end)
SPH
SPH stands for sales per head and its logic is: fnb/Total Admits
This is a calculated measure based on FNB gross and total number of
admits. The formulae is given below.
Attached are the snapshots for SPH
calculation from Tableau
if sum([adm1])=0 then 0 else
sum([fnb_value_current])/SUM([adm1]) end
IPH Contribution (%) : IPH Contribution is the percentage calculation of the IPH value.
The logic is:
([IPH_PER_1] / [iph_per2])
The
numerator is defined as follows:
·
IPH value at level of
granularity for the date range as given in the input parameters. The granularity
level can be cinema_hopk, city, state, region. The given IPH can be calculated
at category, sub category 1, sub category2 and item description level.
The
denominator is defined as follows:
·
Sum of IPH for the given date
parameters at a level of aggregation that is defined by the numerator.
For the same the following business
logic has been followed:
Numerator(item description level):
denominator(item sub category 2 level)
Numerator(item sub category 2):
denominator(item sub category1 level)
Numerator(item sub category 1):
denominator(item category level)
This percentage calculation in an on fly
calculation(snapshot) that has been done by the ETL team.
Attached are the snapshots for IPH calculation from Tableau:
IPH Contribution (%) : ([IPH_PER_1] / [iph_per2]
·
IPH_PER_1= SUM
([total_items_current]/[adm1])
·
iph_per2 =WINDOW_SUM ([IPH]). In order to calculate
the sum across all the IPH at the given level of granularity the ETL team used
the concept of Pane down within Tableau.
·
Go to the iph_per2 select the pane(down) option.
SPH
Contribution(%):SPH Contribution is the percentage
calculation of the SPH value.
The logic is:
([SPH_PER_1]/([SPH_PER_2]))
Attached are the snapshots for SPH calculation from Tableau:
SPH Contribution(%):
([SPH_PER_1]/([SPH_PER_2]))
·
SPH_PER_1:=
SUM([fnb_value_current]/[adm1])
·
SPH_PER_2:=WINDOW_SUM([SPH])
Business logic details : The ETL team has added the SQL logic and Tableau logic along with the same.
The Business logic is
applied inside the project is as follows
SQL end:
1. Date
is applied on the basis of the 6:00 AM
to 6:01 AM
The logic is used for the concept as :
WHENdatepart(hour,(TransI_dtmDateTime
))<=6 THENDATEADD(DAY,-1,cast(TransI_dtmDateTime asDATE))
ELSEcast(TransI_dtmDateTime
asDATE)END
as
date_appended,
1.
Time_Slot is applied on basis of dateTime logic as follows
(
9AM – 12
PM
12PM – 3PM
3PM – 6PM
6PM – 9PM
Post 9PM
)
whendatepart(hour,TransI_dtmDateTime)>=9 anddatepart(hour,TransI_dtmDateTime)<=12 then'9am-12pm'
whendatepart(hour,TransI_dtmDateTime)>=12 anddatepart(hour,TransI_dtmDateTime)<=15 then'12pm-3pm'
whendatepart(hour,TransI_dtmDateTime)>=15 anddatepart(hour,TransI_dtmDateTime)<=18 then'3pm-6pm'
whendatepart(hour,TransI_dtmDateTime)>=18 anddatepart(hour,TransI_dtmDateTime)<=21 then'6pm-9pm'
else'Post 9pm'endas hour_block
Business Logic coded at Tableau end.
3.WeekDay_WeekEnd: at the Tableau end
IF
DATENAME('weekday',[Date Appended])="Sunday" OR
DATENAME('weekday',[Date
Appended])="Saturday" OR
DATENAME('weekday',[Date
Appended])="Friday"
THEN
"Weekend"
ELSE
"Weekday"
END
4.By cinema
type(Applied on the basis of filter basis)
- Mainstream -
Gold class
5. Hierarchies (Roll-down,Roll-up basis of hierarchies)
Measure business logic:
Some condition applied for the measure
calculation
·
i)In
Some cases on some cinemaHopk the admit count in specific HourBlock , there is
missing admit In that case default value will be 0
CinemaHOPK-Admit(No
value)->HourBlock Then default value will be 0.
·
ii) In
Some cases on some cinemaHopk the admit count in specific HourBlock , there is
missing Items in that case default value will be 0
CinemaHOPK-Item(No
value)->HourBlock Then default value will be 0.
·
iii)
In measure calculation condition if the admit count will be zero then means denominator value will be zero then
condition will be N/A.
Level of aggregate of objects:
·
Gross sale: aggregate and cumulative measure. That is the measure can be rolled
up by all levels of item hierarchy, all level of cinema hierarchy and all
levels of date dimension.
·
Item sale: aggregate and cumulative measure. That is the measure can be rolled
up by all levels of item hierarchy, all level of cinema hierarchy and all
levels of date dimension.
·
Number of admits: semi aggregate and snapshot. This measure can be rolled up by date
dimension and location dimension. However for the item dimension, the value is
fixed at cinema hopk level.
Samples of Tableau functions
Use of fixed function
We are taking sum of total admits as
admit_final and we fixed the
admit_final on the basis of hopk.
Admit_final=sum(total
admits)
Admits={fixed(hopk):
Admit_final}
ETL load for SAMPLE Cinemas
ETL load for SAMPLE Cinemas
Full load is time consuming because it loads the full data again (all records, including those that haven't changed since the last load) into the database.Incremental load means only loading the records into the warehouse that have changed (inserts, updates etc.). The advantages of incremental load are as follows:· Lower load times for nightly load. This implies that the end client shall have more time to do report reloads.· Lower load on the system.· Flexibility to change the time of incremental load based on the end client requirements.· Flexibility to add additional sources as required.
The advantage is that it reduces the amount of data being
transferred from system to system, as a full load may take hours / days to
complete depending on volume of data.
The approach of incremental data load for SAMPLE Cinemas:
·
For all the tables the ETL team shall load the LOAD_TS.
·
For the incremental data load, the ETL team shall compare the
date_appended from the source systems with the Load_ts.
·
The ETL team shall pick up only the data from getdate()-10 for
the incremental data load. This shall help capture changes for all entries that
have getdate()-10 as date inputs.
·
For the inserts, updates and deletes, the ETL team shall use the
merge statement. The composite keys for the same shall be cinema_hopk,
date_appended, hour_block and item_desc_revised.
Other parameters to be seen during the incremental data load:
·
All hash joins shall be converted to merge joins/nested loop
joins to minimize the page scans. This has been done using the following
method:
o
All the intermediate data has been stored in the temporary
tables in the order of the composite keys: cinema_hopk,
date_appended,hour_block,item_desc_revised.
o
Indexes have been created on the temporary tables based on the
order of the composite keys in which data has been stored.
·
Creation of indexes: all indexes in the target tables shall be
dropped before the incremental load begins. After the incremental load ends,
the indexes shall be re created.
In case the end client decides to do reporting using ‘live’
extraction then the following are the actions steps by the ETL team.
o
Creation of columnar indexes on all group by and filter
parameters.
·
Error logging: For each table upload, the required error logs
can be generated. The end client can have the option of stopping the process of
incremental data load in case the required number of errors have been crossed
during the incremental data load process.
·
Isolation levels: For the same, the ETL team shall require the
end client inputs.
For minimal load on the transaction system, the ETL team would
recommend transaction isolation level on ‘uncommmitted’. In case updates can
happen then ‘read repeatable’, if deletes can happen then ‘serializable’.
·
Precedence constraint: For the sql execution, the precedence
constraint is ‘success’. That is a procedure execution has to be successful
before the next procedure is executed. No code has been written for failed
execution for the sql version of the code.
·
Transaction level: The entire code shall run in one session. So
the transaction level for the sql version is ‘required’.
·
Audit logs: Currently the ETL team has not coded anything for
the audit logs. In case the end client desires audit logs then the temporary
tables to be replaced by permanent tables. This could require re coding in SQL.
·
Checkpoints and breakpoints: No checkpoints and breakpoints have
been coded in the sql code currently. For checkpoints and breakpoints, the ETL
team needs an ETL tool.
·
Parallelism: It is not enabled in the current version of the sql
code.
GO
/****** Object:
StoredProcedure [dbo].[sp_master_x_load] Script Date: 02/26/2019 10:29:14 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER
ON
GO
ALTER procedure
[dbo].[sp_master_x_load]
as
begin
--execute dbo.SP_BCGWORKS_AGGREGATE;
execute dbo.SP_consultWORKS_AGGREGATE_new;
execute dbo.sp_infoworks;
execute dbo.sp_infoworks_bcg_aggregate;
--execute sp_infoworks_bcg_aggregate_sub_category_above;
end;
Incremental
load in sql
We use a load_ts known
as load timestamp to load the update the data.
WITH
CTE_MAX_LOAD_TS
AS(SELECT MAX(LOAD_TS) AS MAX_LOAD_TS FROM dbo.BCG_AGGREGATE_APPENDED)
SELECT * INTO #MAX_LOAD_TS FROM CTE_MAX_LOAD_TS;
Execution plan
with cte1
as(select sum(fnb_value) as fnb_value,
sum(total_items) as total_items,
hour_block,
cinema_hopk,
item_sub_category_1,
item_sub_category_2,
item_category,
item_desc_revised,
DATE_APPENDED,
cinema_name,
country,
cinema_location,
state_strname,
region
from dbo.BCG_AGGREGATE_APPENDED a
left outer join
Cinema_master_Edited b
on a.cinema_hopk=b.hopk
WHERE
DATE_APPENDED >(SELECT MAX_LOAD_TS FROM
#MAX_LOAD_TS)
group by
hour_block,
cinema_hopk,
item_sub_category_1,
item_sub_category_2,
item_category,
item_desc_revised,
DATE_APPENDED,
cinema_name,
country,
cinema_location,
state_strname,
region)
select * into MAX_LOAD_TS from cte1
order by cinema_hopk,DATE_APPENDED,hour_block;
Here we check whether
the date_appended is greater than load timestamp or not?
with cte1a
as(select SUM(total_admits) as total_admits,
a.hour_block,
a.cinema_HOPK,
a.infoworks_date
from
infoworks_aggregate a
WHERE
infoworks_date>(SELECT MAX_LOAD_TS FROM
#MAX_LOAD_TSss)
group by
a.hour_block,
a.cinema_HOPK,
a.infoworks_date)
select * into MAX_LOAD_TS from cte1a
The etl team has solved the following problems for the end client:
1.
Identify the stock on the basis
of cinema location and item description
2.
Estimate the overstock and
under stock values of the items based on the IPH consumption.
3.
Estimate the trend line of consumption based on IPH(Item per head) consumption.
So the following approach was used:
1.
Fixed the iph on the basis of
the cinema location and item description
2.
Categorize the item as ‘’A”,”B”,”C” based on top 70%, 70%-90% and above 90% values respectively.
3.
Estimate the iph of category “A”on the basis of weekday and
weekend
4.
Estimate the iph of category “A” on the basis of fiscal date across the fiscal month.
Steps used for solving the :
1.
Drill down the iph on the basis
of cinema location and item description
1.
Calculate the iph%
For iph
% following tableau calculation are
done:
·
Iph%= (RUNNING_SUM([iph])/[windows_sum])*100
Iph= (if sum([admit])=0 then 0 else
SUM([total_item_current])/SUM([admit]) end)
Window_sum=window_sum([iph])
3. For the categorization of iph%
following steps are used:
a.If the iph% is
less than or equal to 70 then it will fall on category A.
b.If the iph% is
between 70 and 90 then it will fall on category B.
c.If the iph% is
greater than 90 then it will fall on category C.
Following is the snapshot from
Tableau:
3.1. This is the category “A” item which have the maximum sale on the
fiscal month April
1.1. When we analyze the fig:1.1
and fig:1.2 for the iph of regular pepsi on the basis of admits and weekend/weekdays,
the following inferences can be drawn:
1.
The number of admits and the
IPH fall over the course of the fiscal weeks for the month analyzed.
2.
In fig:1.1 on date 8th
April(weekend) number of admits are highest stock(above monthly average) but
the IPH was average (indicating possible stocking issues on the given fiscal
date)
3.
In fig:1.2 on date 8th
April the iph was lower than 7th
april The same needs to be explained by the end client.
So the following conclusions
can be drawn:
1. The IPH could have decreased because of understocking at the end
client on the given fiscal dates.
Fig. 1.1.
Fig:1.2
3.3.When we analyze on the day basis the
iph of the regular pepsi on 25th April the IPH is 3% instead of the average
of 8% across the fiscal month(indicating possible lack of availability of the Pepsi stock)
Fig:1.2
Trend line
Trend line is use to predict the trend of
the usage.
3.4..Trend line of the iph lying on the “A”
category
3.5. When we analyze the small pepsi which belongs to the
category “A” it is understock on 5th
april and 13th april and on 25th april the stock was maximum.
Please contact:
Apoorv Chaturvedi
Email: support@mndatasolutions.com;support@turbodatatool.com
Phone: +91-8802466356
website : https://mn-business-intelligence-india.business.site/
website : https://mn-business-intelligence-india.business.site/
good work..
ReplyDelete