Change Data Capture in DWBI (ETL/ELT implementations)

Pooja Kelgaonkar
Dev Genius
Published in
3 min readSep 17, 2021

--

Change Data Capture(CDC) is one of the most important feature of DWBI implementations. CDC is used to identify changed data and apply changes to Data Warehouse. Change data capture (CDC) is the process of recognizing when data has been changed in a source system so a downstream process or system can action that change. A common use case is to reflect the change in a different target system so that the data in the systems stay in sync.

Change data capture (CDC) records insert, update, and delete activity that applies to a table. This activity makes the details of the changes available in an easily consumed relational format. Column information and the metadata that is required to apply the changes to a target environment is captured for the modified rows and stored in change tables that mirror the column structure of the tracked source tables.

A good example of a data consumer that this technology targets is an extraction, transformation, and loading (ETL) application. An ETL application incrementally loads change data from source systems to a data warehouse or data mart.

There are different types of CDCs which can be implemented - Type I ,Type II and Type III. Type I doesn't store history. This updates record directly, there is no record of historical values, only current state. Type II - Row versioning , Track changes as version records with current flag & active dates and other metadata. Type III - History versioning, preserve versions of data. Track change to a specific attribute, add a column to show the previous value, which is updated as further changes occur. SCD Type II is widely used implementation in DWBI.

Let's see how can we implement SCD Type II in ETL as well as ELT implementations. with SCD Type II implementation, target table need to add few extra columns to track versioning - e.g. Active_IND, Start_Date, End_Date etc. Active Indicator column represents if the record is active or inactive(old/historical record). For new/updated record the indicator column is set to "Y" or"Yes" and for old records, its set to "N" or "No". Start date is date when the record is start date/load date. End date is date when the record gets updated for historical record. For all new/active record inserted, end date is set to NULL.

Upserts - Update and Inserts are applied to the target table/system based on the changes. This can be used for ETL and ELT implementations. SCDs can be implemented using SQL, ETL tools, Database Objects - Stored Procedure , Functions etc.

Below are the steps followed to identify and apply changes to the target systems as part of SCD Type II.

  1. Delta Capture - Source data received from source system on scheduled basis.
  2. Identify Change - Source data compared against target table to identify records existed/updated and new records to be considered as inserted to the target table
  3. Apply Change - Once records identified to be inserted or updated. Update and Inserts can be applied to the target system. In case of updates, old records are updated for flag - Active_IND and End_date with date of updation. In case of inserts, active_ind set to Y and End_date as NULL .All records to be updated and inserted all together as inserts to the target system.
  4. In some of the implementations, Batch_ID is also used along with date and indicator columns. with Batch id , loads can be identified with batch of data load. Batch id can also be used as part of Audit Batch Control.
  5. Follow my next blog to read more about ABC implementation for DWBI-ETL/ELT implementations.
  • About Me

I am DWBI and Cloud Architect! I have been working with various Legacy data warehouses, Bigdata Implementations, Cloud platforms/Migrations. I am Google Certified Professional Cloud Architect .You can reach out to me @ LinkedIn if you need any further help on certification, GCP Implementations!

--

--

My words keep me going, Keep me motivating to reach out to more and more!