The approach of the
etl team is to optimize the sql given by the end clients without doing any of
the following:
- Separating the transaction and the reporting layers
- Developing another transaction layer.
In this module we
shall analyze the joins that where used as part of the execution process.
Thereafter we shall optimize the joins by converting hash joins(no index usage)
to nested loop joins(index usage from one table) and thereafter to merge
joins(optimize index usage from both the tables).
For the process of
join optimization, the etl resource chose the stored procedure
sp_item_purchase_order from the datawarehouse load of turbodata as given under.
DECLARE @SUPPLIER INT;
SET
@SUPPLIER=(SELECT
DIM_SUPPLIER_ID FROM DIM_SUPPLIER WHERE SUPPLIER_NAME LIKE
'N/A');
UPDATE [STATUS] SET [DISPLAYTEXT]='ITEM_PURCHASE_ORDER LOAD DONE',[COMPLETED]=66;
WITH
CTE1 AS(
SELECT A.COMPANY_NAME,A.ORDER_NO,A.PARTY_LEDGER_NAME,A.ENTERED_BY,ORDER_DATE,
A.AMOUNT,A.AMOUNT_SUBSTRING,A.AMOUNT_SUBSTRING2,A.AMOUNT_SUBSTRING_AFTER$,A.CATEGORY,A.COST_CENTER,
A.VOUCHER_TYPE_NAME,A.XYZ,
ISNULL(D.DIM_DATE_ID,1) AS DIM_DATE_ID,
LEDGERNAME
FROM STG_DAYBOOK A
LEFT OUTER JOIN DIM_DATE D
ON A.ORDER_DATE=D.DATE_DESC
WHERE (A.VOUCHER_TYPE_NAME IN
(SELECT DISTINCT
VOUCHER_TYPE_NAME
FROM STG_VOUCHER_TYPE
WHERE VOUCHER_TYPE IN ('PURCHASE')))
--AND A.IS_DEEMED_POSITIVE LIKE '%YES%'
)
,
CTE2 AS (
SELECT SUM (CAST(A.AMOUNT_SUBSTRING2 AS
FLOAT)) AS AMOUNT_SUBSTRING2
,SUM(CAST(A.AMOUNT_SUBSTRING_AFTER$
AS FLOAT)) AS
AMOUNT_SUBSTRING_AFTER$
,A.ORDER_DATE, A.ENTERED_BY,
A.ORDER_NO, A.VOUCHER_TYPE_NAME, ISNULL(A.PARTY_LEDGER_NAME,'N/A') AS PARTY_LEDGER_NAME ,
ISNULL(A.LEDGERNAME,'N/A') AS LEDGERNAME,
K.ITEM_SUPPLIER_ORDER_ID,
A.DIM_DATE_ID,A.COMPANY_NAME,A.CATEGORY,
A.COST_CENTER,A.XYZ
FROM CTE1 A
LEFT OUTER JOIN ITEM_SUPPLIER_ORDER K
ON A.ORDER_NO = K.ORDER_NO
AND
A.VOUCHER_TYPE_NAME=K.VOUCHER_TYPE_NAME
AND A.DIM_DATE_ID=K.DIM_DATE_ID
AND A.COMPANY_NAME=K.COMPANY_NAME
AND A.PARTY_LEDGER_NAME=K.PARTY_LEDGER_NAME
AND
A.LEDGERNAME=K.LEDGERNAME
AND
A.COST_CENTER=K.COST_CENTER
GROUP BY
--A.DIM_ITEM_ID,
A.ORDER_DATE, A.ENTERED_BY,
A.ORDER_NO, A.VOUCHER_TYPE_NAME, K.ITEM_SUPPLIER_ORDER_ID,
ISNULL(A.PARTY_LEDGER_NAME,'N/A')
,A.COMPANY_NAME,A.DIM_DATE_ID,A.CATEGORY,
A.COST_CENTER,A.XYZ,
ISNULL(A.LEDGERNAME,'N/A')
)
,
CTE3 AS (
SELECT COUNT (DISTINCT(ORDER_NO))AS COUNT_ORDER_ID,
ITEM_SUPPLIER_ORDER_ID
FROM CTE2
GROUP BY
ITEM_SUPPLIER_ORDER_ID
),
CTE4 AS (
SELECT A.AMOUNT_SUBSTRING2,
A.AMOUNT_SUBSTRING_AFTER$
,A.ORDER_DATE, A.ENTERED_BY,
A.ORDER_NO, A.VOUCHER_TYPE_NAME, A.PARTY_LEDGER_NAME ,
A.ITEM_SUPPLIER_ORDER_ID,
A.DIM_DATE_ID,A.COMPANY_NAME,A.CATEGORY,
A.COST_CENTER,A.XYZ,A.LEDGERNAME
FROM CTE2 A
JOIN CTE3 B ON
A.ITEM_SUPPLIER_ORDER_ID=B.ITEM_SUPPLIER_ORDER_ID
WHERE B.COUNT_ORDER_ID=1
),
CTE5 AS (
SELECT a.AMOUNT_SUBSTRING2,
a.AMOUNT_SUBSTRING_AFTER$,A.ORDER_DATE, A.ENTERED_BY,
A.ORDER_NO, A.VOUCHER_TYPE_NAME, A.PARTY_LEDGER_NAME ,
A.ITEM_SUPPLIER_ORDER_ID,
--A.DIM_ITEM_ID,
A.DIM_DATE_ID,A.COMPANY_NAME,A.CATEGORY,
A.COST_CENTER,A.XYZ,A.LEDGERNAME
FROM CTE2 A
JOIN CTE3 B ON
A.ITEM_SUPPLIER_ORDER_ID=B.ITEM_SUPPLIER_ORDER_ID
WHERE B.COUNT_ORDER_ID>1
AND A.ENTERED_BY LIKE '%ADMIN%'
),
CTE6 AS (
SELECT a.AMOUNT_SUBSTRING2,
a.AMOUNT_SUBSTRING_AFTER$,A.ORDER_DATE, A.ENTERED_BY,
A.ORDER_NO, A.VOUCHER_TYPE_NAME, A.PARTY_LEDGER_NAME ,
A.ITEM_SUPPLIER_ORDER_ID,
A.DIM_DATE_ID,A.COMPANY_NAME,A.CATEGORY,
A.COST_CENTER,A.XYZ,A.LEDGERNAME
FROM CTE4 A
UNION ALL
SELECT a.AMOUNT_SUBSTRING2,
a.AMOUNT_SUBSTRING_AFTER$,A.ORDER_DATE, A.ENTERED_BY,
A.ORDER_NO, A.VOUCHER_TYPE_NAME, A.PARTY_LEDGER_NAME ,
A.ITEM_SUPPLIER_ORDER_ID,
A.DIM_DATE_ID,A.COMPANY_NAME,A.CATEGORY,
A.COST_CENTER,A.XYZ,A.LEDGERNAME
FROM CTE5 A
)
MERGE ITEM_PURCHASE_ORDER AS TARGET
USING CTE6 AS SOURCE
ON(TARGET.ITEM_SUPPLIER_ORDER_ID
= SOURCE.ITEM_SUPPLIER_ORDER_ID
AND TARGET.DIM_DATE_ID= SOURCE.DIM_DATE_ID
AND TARGET.COMPANY_NAME=SOURCE.COMPANY_NAME
--AND
TARGET.PARTY_LEDGER_NAME=SOURCE.PARTY_LEDGER_NAME
--AND TARGET.LEDGERNAME=SOURCE.LEDGERNAME
)
WHEN
NOT MATCHED BY TARGET
THEN INSERT(AMOUNT_SUBSTRING2,AMOUNT_SUBSTRING_AFTER$,
--ORDER_DATE,
ENTERED_BY,
--ORDER_NO,VOUCHER_TYPE_NAME,
--PARTY_LEDGER_NAME,
ITEM_SUPPLIER_ORDER_ID,
DIM_DATE_ID,
COMPANY_NAME
--,CATEGORY,COST_CENTER,XYZ,LEDGERNAME
)
VALUES(SOURCE.AMOUNT_SUBSTRING2,SOURCE.AMOUNT_SUBSTRING_AFTER$,
--SOURCE.ORDER_DATE ,
SOURCE.ENTERED_BY ,
--SOURCE.ORDER_NO ,
--SOURCE. VOUCHER_TYPE_NAME ,
--SOURCE. PARTY_LEDGER_NAME ,
SOURCE. ITEM_SUPPLIER_ORDER_ID ,
SOURCE. DIM_DATE_ID ,
SOURCE.COMPANY_NAME
--SOURCE.CATEGORY,
--SOURCE.COST_CENTER,
--SOURCE.XYZ,
--SOURCE.LEDGERNAME
)
WHEN
MATCHED
THEN UPDATE SET TARGET.AMOUNT_SUBSTRING2= SOURCE.AMOUNT_SUBSTRING2
,TARGET.AMOUNT_SUBSTRING_AFTER$=SOURCE.AMOUNT_SUBSTRING_AFTER$;
Thereafter the ETL
resource ran the uery execution panel to check for the joins and the index
usage. The resource identified the usage of hash joins wherever possible.
The ETL resource looked for the following:
·
Hash match cost
·
Table scan cost.
Within the hash match cost, the etl
resource looked for the following:
·
The join creating the hash
match.
·
The table size
·
The total costs including the
subtree cost.
The same operation shall be repeated for
the table scans:
Hereafter the ETL resource identified the
join to be worked upon from the given sql.
Steps to convert the hash join to nested
loop join:
- · Take the cte output and store in a temporary table in the same order as the joins.
- · Create index in the same order as joins.
Developing the temporary table for the
second table alo9ng with the indexes in the same order in which the joins have
been done.
Hereafter the ETL developer has 2 options:
- Use the join conditions
- Change the join conditions to where clause
Understanding the impact of where clause.
Using the join conditions instead of the
where clause: in our case instead of using the join conditions instead of where
clause the optimization process remains the same.
Summary of the findings:
·
In converting the hash joins to
nested loops, the maximum memory is used by the following:
o
Sort operation
o
Index seek: indexed nested loop
scan
o
Table scan
For the next steps regarding the join optimization, please contact the following:
Name: Apoorv Chaturvedi
Email: apoorv@mnnbi.com
Phone: +91-8802466356
Website: www.mnnbi.com
Prepared by Ankur Gupta(Intern): https://www.linkedin.com/in/ankur-gupta-3b9a71179