Monday, 6 May 2024

Optimizing the query execution times by converting the hash joins to merge joins

 The approach of the etl team is to optimize the sql given by the end clients without doing any of the following:

  •          Separating the transaction and the reporting layers
  •         Developing another transaction layer.


In this module we shall analyze the joins that where used as part of the execution process. Thereafter we shall optimize the joins by converting hash joins(no index usage) to nested loop joins(index usage from one table) and thereafter to merge joins(optimize index usage from both the tables).
For the process of join optimization, the etl resource chose the stored procedure sp_item_purchase_order from the datawarehouse load of turbodata as given under.
DECLARE @SUPPLIER INT;
SET @SUPPLIER=(SELECT DIM_SUPPLIER_ID FROM DIM_SUPPLIER WHERE SUPPLIER_NAME LIKE 'N/A');

UPDATE  [STATUS] SET [DISPLAYTEXT]='ITEM_PURCHASE_ORDER LOAD DONE',[COMPLETED]=66;

WITH CTE1 AS(
                 
                  SELECT A.COMPANY_NAME,A.ORDER_NO,A.PARTY_LEDGER_NAME,A.ENTERED_BY,ORDER_DATE,
                  A.AMOUNT,A.AMOUNT_SUBSTRING,A.AMOUNT_SUBSTRING2,A.AMOUNT_SUBSTRING_AFTER$,A.CATEGORY,A.COST_CENTER,
                  A.VOUCHER_TYPE_NAME,A.XYZ,
                  ISNULL(D.DIM_DATE_ID,1) AS DIM_DATE_ID,
                  LEDGERNAME
                  FROM STG_DAYBOOK A
                  LEFT OUTER JOIN DIM_DATE D
                  ON A.ORDER_DATE=D.DATE_DESC        
                        WHERE (A.VOUCHER_TYPE_NAME IN   
                                    (SELECT DISTINCT VOUCHER_TYPE_NAME
                                                FROM STG_VOUCHER_TYPE
                                                WHERE VOUCHER_TYPE IN ('PURCHASE')))
                        --AND A.IS_DEEMED_POSITIVE LIKE '%YES%'
                  )
                  ,
CTE2 AS (
                  SELECT SUM (CAST(A.AMOUNT_SUBSTRING2 AS FLOAT)) AS AMOUNT_SUBSTRING2
                  ,SUM(CAST(A.AMOUNT_SUBSTRING_AFTER$ AS FLOAT)) AS AMOUNT_SUBSTRING_AFTER$
                  ,A.ORDER_DATE, A.ENTERED_BY,
                  A.ORDER_NO, A.VOUCHER_TYPE_NAME, ISNULL(A.PARTY_LEDGER_NAME,'N/A') AS PARTY_LEDGER_NAME ,
                  ISNULL(A.LEDGERNAME,'N/A') AS LEDGERNAME,
                  K.ITEM_SUPPLIER_ORDER_ID,
                  A.DIM_DATE_ID,A.COMPANY_NAME,A.CATEGORY,
A.COST_CENTER,A.XYZ
                  FROM CTE1 A
                  LEFT OUTER JOIN ITEM_SUPPLIER_ORDER K
                  ON  A.ORDER_NO = K.ORDER_NO
                  AND A.VOUCHER_TYPE_NAME=K.VOUCHER_TYPE_NAME
                  AND A.DIM_DATE_ID=K.DIM_DATE_ID
                  AND A.COMPANY_NAME=K.COMPANY_NAME
                  AND A.PARTY_LEDGER_NAME=K.PARTY_LEDGER_NAME
AND A.LEDGERNAME=K.LEDGERNAME
AND A.COST_CENTER=K.COST_CENTER
                  GROUP BY
                  --A.DIM_ITEM_ID,
                   A.ORDER_DATE, A.ENTERED_BY,
                  A.ORDER_NO, A.VOUCHER_TYPE_NAME, K.ITEM_SUPPLIER_ORDER_ID,
                  ISNULL(A.PARTY_LEDGER_NAME,'N/A')
                  ,A.COMPANY_NAME,A.DIM_DATE_ID,A.CATEGORY,
A.COST_CENTER,A.XYZ,
ISNULL(A.LEDGERNAME,'N/A')
            )
           
           
           
           
           
            ,
CTE3 AS (
                  SELECT COUNT (DISTINCT(ORDER_NO))AS COUNT_ORDER_ID,
                  ITEM_SUPPLIER_ORDER_ID
                  FROM CTE2
                  GROUP BY
                  ITEM_SUPPLIER_ORDER_ID
            ),
CTE4 AS (
                        SELECT A.AMOUNT_SUBSTRING2,
A.AMOUNT_SUBSTRING_AFTER$
,A.ORDER_DATE, A.ENTERED_BY,
                  A.ORDER_NO, A.VOUCHER_TYPE_NAME, A.PARTY_LEDGER_NAME ,
                  A.ITEM_SUPPLIER_ORDER_ID,
                  A.DIM_DATE_ID,A.COMPANY_NAME,A.CATEGORY,
A.COST_CENTER,A.XYZ,A.LEDGERNAME
                  FROM CTE2 A
                  JOIN CTE3 B ON
                   A.ITEM_SUPPLIER_ORDER_ID=B.ITEM_SUPPLIER_ORDER_ID
                  WHERE B.COUNT_ORDER_ID=1
            ),
CTE5 AS (
                        SELECT a.AMOUNT_SUBSTRING2,
a.AMOUNT_SUBSTRING_AFTER$,A.ORDER_DATE, A.ENTERED_BY,
                  A.ORDER_NO, A.VOUCHER_TYPE_NAME, A.PARTY_LEDGER_NAME ,
                  A.ITEM_SUPPLIER_ORDER_ID,
                  --A.DIM_ITEM_ID,
                  A.DIM_DATE_ID,A.COMPANY_NAME,A.CATEGORY,
A.COST_CENTER,A.XYZ,A.LEDGERNAME
                  FROM CTE2 A
                  JOIN CTE3 B ON
                   A.ITEM_SUPPLIER_ORDER_ID=B.ITEM_SUPPLIER_ORDER_ID
                  WHERE B.COUNT_ORDER_ID>1
                  AND A.ENTERED_BY LIKE '%ADMIN%'
            ),
CTE6 AS (
                 
                  SELECT a.AMOUNT_SUBSTRING2,
a.AMOUNT_SUBSTRING_AFTER$,A.ORDER_DATE, A.ENTERED_BY,
                  A.ORDER_NO, A.VOUCHER_TYPE_NAME, A.PARTY_LEDGER_NAME ,
                  A.ITEM_SUPPLIER_ORDER_ID,
                  A.DIM_DATE_ID,A.COMPANY_NAME,A.CATEGORY,
A.COST_CENTER,A.XYZ,A.LEDGERNAME
                   FROM CTE4 A
            UNION ALL
                  SELECT a.AMOUNT_SUBSTRING2,
a.AMOUNT_SUBSTRING_AFTER$,A.ORDER_DATE, A.ENTERED_BY,
                  A.ORDER_NO, A.VOUCHER_TYPE_NAME, A.PARTY_LEDGER_NAME ,
                  A.ITEM_SUPPLIER_ORDER_ID,
                  A.DIM_DATE_ID,A.COMPANY_NAME,A.CATEGORY,
A.COST_CENTER,A.XYZ,A.LEDGERNAME
                   FROM CTE5 A
            )
     
MERGE ITEM_PURCHASE_ORDER AS TARGET
USING CTE6 AS SOURCE
                  ON(TARGET.ITEM_SUPPLIER_ORDER_ID = SOURCE.ITEM_SUPPLIER_ORDER_ID
                  AND TARGET.DIM_DATE_ID= SOURCE.DIM_DATE_ID
                  AND TARGET.COMPANY_NAME=SOURCE.COMPANY_NAME
                  --AND TARGET.PARTY_LEDGER_NAME=SOURCE.PARTY_LEDGER_NAME
                  --AND TARGET.LEDGERNAME=SOURCE.LEDGERNAME
                  )
                 
WHEN NOT MATCHED BY TARGET
                  THEN INSERT(AMOUNT_SUBSTRING2,AMOUNT_SUBSTRING_AFTER$,
                  --ORDER_DATE,
                  ENTERED_BY,
                  --ORDER_NO,VOUCHER_TYPE_NAME,
                  --PARTY_LEDGER_NAME,
                  ITEM_SUPPLIER_ORDER_ID,
                  DIM_DATE_ID,
                  COMPANY_NAME
                  --,CATEGORY,COST_CENTER,XYZ,LEDGERNAME
                  )
                  VALUES(SOURCE.AMOUNT_SUBSTRING2,SOURCE.AMOUNT_SUBSTRING_AFTER$,
                  --SOURCE.ORDER_DATE     ,
                  SOURCE.ENTERED_BY ,
                  --SOURCE.ORDER_NO ,
                  --SOURCE.   VOUCHER_TYPE_NAME ,
                  --SOURCE.   PARTY_LEDGER_NAME ,
                  SOURCE.     ITEM_SUPPLIER_ORDER_ID  ,
                  SOURCE.     DIM_DATE_ID ,
                  SOURCE.COMPANY_NAME
                  --SOURCE.CATEGORY,
                  --SOURCE.COST_CENTER,
                  --SOURCE.XYZ,
                  --SOURCE.LEDGERNAME
                  )
WHEN MATCHED
                  THEN UPDATE SET TARGET.AMOUNT_SUBSTRING2= SOURCE.AMOUNT_SUBSTRING2
                  ,TARGET.AMOUNT_SUBSTRING_AFTER$=SOURCE.AMOUNT_SUBSTRING_AFTER$;

Thereafter the ETL resource ran the uery execution panel to check for the joins and the index usage. The resource identified the usage of hash joins wherever possible.


The ETL resource looked for the  following:

·         Hash match cost
·         Table scan cost.
Within the hash match cost, the etl resource looked for the following:
·         The join creating the hash match.
·         The table size
·         The total costs including the subtree cost. 



The same operation shall be repeated for the table scans:

Hereafter the ETL resource identified the join to be worked upon from the given sql.


Steps to convert the hash join to nested loop join:
  • ·         Take the cte output and store in a temporary table in the same order as the joins.
  • ·         Create index in the same order as joins.




Developing the temporary table for the second table alo9ng with the indexes in the same order in which the joins have been done.





Hereafter the ETL developer has 2 options:
  •          Use the join conditions
  •          Change the join conditions to where clause




Understanding the impact of where clause.




Using the join conditions instead of the where clause: in our case instead of using the join conditions instead of where clause the optimization process remains the same.





Summary of the findings:
·         In converting the hash joins to nested loops, the maximum memory is used by the following:
o   Sort operation
o   Index seek: indexed nested loop scan
o   Table scan
The cost reduction comes in nested loop join been used instead of the hash join



For the next steps regarding the join optimization, please contact the following:
Name: Apoorv Chaturvedi
Email: apoorv@mnnbi.com
Phone: +91-8802466356
Website: www.mnnbi.com
Prepared by Ankur Gupta(Intern): https://www.linkedin.com/in/ankur-gupta-3b9a71179

GST2A Reconciliation

 Documentation for GSTR2A reconciliation for India’s leading Retail Chain

Reason for the development of the module:

 

The module has been developed for chartered accountants and end clients to help resolve the GSTR2A reconciliation issues.

Why is this module required?

·         The data been handled for GSTR2A reconciliation can be very large. With excel handling of the data could be very cumbersome

·         There are a large number of conditions by which GSTR2 might not make a match between the client purchase invoices and the client buyer sales invoices. The reasons for the variations could be with regards to document number, document date, Recepient GSTIN, buyer GSTIN and even the transaction amounts. In order to handle the mismatches between these types of invoices there are a large number of join conditions involved.

·         The developed solution can be a standalone module for Chartered accountants not involving the complete GST extraction and services from various data providers

·         Benefits with regards to ERPs:

o   The solution is based on Azure cloud. It uses the high processing power of Azure cloud for faster data processing

o   The solution offers web and mobile access for GSTR2A reconciliation

o   The client can also share with the vendors using whatsapp and email any discrepancy details

o   The development team is bringing with itself automated extractors from Tally and other ERPs.

o   The development team is bringing with itself specialized Chartered  accountants with experience is GST 2A reconciliation(https://www.linkedin.com/in/sharad-maheshwari-688b733/)

o   The data should be secure(on Microsoft Azure cloud) and there should be facility for each company to see its own data(row level security)

o   The business logic should be auditable.

 

The following are the steps followed for the GST2A Audit process:


 For the GSTR2 A reconciliation the sample client is using the following composite keys

·         Document number

·         Document date

·         Supplier GSTIN

·         Company GSTIN

·         Taxable Amount

·         Total GST amount

 

There can be some variations in the Taxable amount and the total GST amount. That logic has to be incorporated into the final audit report.

Supplier GSTIN and Company GSTIN have to match. However there can be discrepancies in Document Number and document date because of different ERP systems.

The client’s logic entailed that match shall include all 6 criteria. Alternatively, if 4(supplier GSTIN, Company GSTIN, taxable amount and Total GST amount) criteria match but there is discrepancy in the remaining 2 criteria (document number an document date) then it should still be considered matched.

Methodology of the development of the solution:

·         The solution was developed in Powerbi for the following reasons:

o   Consolidate the data from different data sources

o   Incorporate the client logic

o   Easy and secure viewing through the Azure portal

o   Standardization of the formulae.

o   Reduce the handling times for GSTR2A reconciliation with large and varied data.

o   No row limitations as far as the processing of the data is concerned



Attached is the methodology for the same:

1.       Find the exact match for the entries:



1.       Find those entries where the client has filed the returns but the returns are not in GST audit-sample code


3.       Find those entries where the client’s customer has filed the data but client has not filed the data


3.       Find those entries where the client’s customer has filed the data but client has not filed the data


 3.) Find those entries where the document number is not matching:


4.) Document date mismatch data


Once  the above analysis has been done then specific matching criteria can also be done by GST number, document number and document date



Other reports: Based on the analysis after steps 1 ,2 and 3 it is possible to divide the data to find the exact mismatch

Custom variations based on Total tax amount and total amount are also possible. A sample case study is attached herewith:

4.       Match  the ERP and the GST data . The ERP data shall come from any ERP(including Tally). The GST data shall come from the GST portal as part of the GSTR2 reconciliation).


1.       Exclude the entries not in the exact match


5.       Extract those entries that have matching Taxable amount, Total amount, recipient GSTIN, Supplier GSTIN but not matching document number and document date

5.       Extract those entries that have matching Taxable amount, Total amount, recipient GSTIN, Supplier GSTIN but not matching document number and document date





Please contact the following for automating your GST 2A audit process

 https://mn-business-intelligence-india.business.site/

Apoorv Chaturvedi(Phone: +91-8802466356, email: support@turbodatatool.com, support@mndatasolutions.com)


Website: https://mn-business-intelligence-india.business.site/

Data insertion-Tally(approaches)

 Problem statement: Many softwares look to insert data into Tally from their application.This blog looks at issues and approaches for the sa...