Fig 1. SAP To Snowflake Migration, image by author
Numerous clients across the globe from various verticals have moved to or are currently relocating to Snowflake from SAP frameworks. Snowflake can wipe out hindrances in heritage SAP scientific framework like intricacy, costly, high upkeep, required premium labor force, difficult to scale, coordination of non-SAP information (unstructured and semi organized) and equipment/reestablishment cost by giving a stage to SAP clients that is basic, secure, practical and approach boundless figure for responsibility division and freely versatile register and capacity.
These clients are teaming up SAP and non SAP information from different frameworks, for example, ERP, client assembled applications, IOT frameworks, Social media and Off the Shelf Software which are significant for the business. Clients can break the information storehouses to drive examination by building secure, represented and brought together information stage to consent to protection and administrative norms on Snowflake.
I’m certain you will all concur that the most basic piece of any relocation cycle is moving the items, and this is valid for SAP HANA/BW to Snowflake movement too. In SAP HANA, one curious article is the ‘Graphical Calculation View’ that is utilized to consume scientific, trait, and other estimation perspectives to perform complex computations. These items are assembled utilizing association, join, subquery, projection, and conglomeration hubs. Moreover, they have a few layers of settling that really expands the intricacy of movements into Snowflake.
Complexity of graphical calculation views depend on these parameters:
- Number of settling levels in settled graphical estimation view
- Numerous association, join and gathering by proclamations
- Complex and exceptionally settled case articulations
- Utilization of undesirable capabilities (like string/date) on inquiry segments
- Number of lines of code (might be more than 100,000 lines after movement to .sql)
- Extremely mind boggling questions in graphical computation sees
- Underlying complex rationale for run time information change and information recovery in a similar graphical estimation view
- Cross practical conditions
- Business prerequisite outcomes in information blast across various layers
- Cross joins used to perform complex rationale
- Dramatically high aggregation time
- Execution plan can’t be created as it surpasses as far as possible, but the question gets executed
For moving such a complex settled graphical perspectives into Snowflake, robotization instrument will assume a significant part to diminish the relocation time, manual endeavors and cost. This robotization instrument will have two stages in particular, analyzer and converter.
Analyzer will perform investigation of all given graphical computation perspectives and produce a report on intricacy, reliance grid, manual intercession (whenever required), anticipated term/anticipated cost and different subtleties.
Converter will normally change over the sent out *.xml complex graphical computation sees into *.sql records. The stage/device ought to have critical robotization with the end goal that manual intercession is kept at negligible, in this way decreasing in general time and cost of relocation. Overall, are two methodologies that will help movement of complicated settled graphical perspectives: first, tuning relocated sees (during manual or computerization stage) and second re-designing.
First : For tuning migrated views(during manual or automation phase), below are the few actions which may need to be performed:
- Straightening complex inquiries
- Improving on complex inquiries by eliminating practically/in fact undesirable channels – for instance String/Date channels
- Lessening the lines of code by utilizing CTE (named SubQuery characterized) to further develop accumulation time and code practicality.
- Adopting base up strategy for execution tuning settled sees
- Supplanting determined segments in a GROUP BY statement by their nom de plume names.
- Eliminating pointless genuine checks, (for example, 1=1) in WHERE proviso conditions.
- Eliminating pointless CAST BY capabilities that control the datatype of the sections.
- Making super informational collection utilizing CTEs, to such an extent that they can be alluded downstream to produce different little CASE proclamations according to utilitarian necessities. Besides, we can make determined sections for various SELECT articulations.
- Improving request of channel conditions according to Snowflake principles, for example, putting date sections first and following it up with segments of lower cardinality.
- Expanding Snowflake accumulation memory size(1MB-4MB) assuming the size of code is gigantic.
Second: For re-engineering migration (materialized [table] or using tools DBT****/ Coalesce****) below are the few actions which may need to be performed:
- For extremely complex settled graphical estimation view relocation to Snowflake, we really want to take emerged table or utilizing DBT/Coalesce apparatuses approach.
- Settled sees with high assemblage and execution time can be appeared to further develop execution and diminish costs.
- Subsequent to breaking complex graphical computation sees into emerged tables for handling modules, we really want to fabricate a handling structure.
- To fabricate handling system model we ought to consider beneath boundaries:
— Special case taking care of: This incorporates characterizing, catching, and dealing with all situations without any information admonitions as well as rollback necessities.
— For current and history information handling, we really want to signal the datasets with current and history banner, form a control table and parametrize the ETL pipeline with Current and History information handling
— Information quality actually looks at should be set up to catch invalid information handling, ludicrous information blast, old information, and information compromise.
— Send email alarms to explicit gatherings on disappointments during model handling.
— Characterize the question labeling designs by thinking about vertical names/model names, ETL work names, and cost focuses.
- To diminish the passed time, utilization cost and for better information quality, we really want to choose right model handling techniques
— Equal VS Sequential model handling
— History VS Incremental model handling
- While considering authentic versus gradual model handling, we ought to constantly attempt to choose the steady technique to diminish utilization cost and further develop execution. In the as of late closed Snowflake Summit (June 2022), we reported another element called Materialized table (as of now in confidential review). This Materialized Table can be utilized for programmed steady handling · and is most certainly worth investigating inside and out.
- While considering equal versus successive model handling, we ought to constantly attempt to choose right strategy according to prerequisites of execution and information quality.