Migrating Workloads to GCP : Part II :Phases of Migration
Thanks for reading my first blog on GCP migration series. If you have missed to read it then you can read the first blog @ https://poojakelgaonkar.medium.com/migrating-workloads-to-gcp-part-i-planning-migrations-ae8e301d207b
This blog is second chapter of migration series, we are going to discuss more about migration path and phases of migration. In earlier blog , we have learnt that the assessment of existing system , defining goal, defining type of migration applicable to current project/application.
Let’s see the migration path which lead to assessment/discovery, planning/designing, development/testing/deployment and validate/optimize the applications migrated.

These are the four phases of migration –
1. Assess — This is very critical phase of migration. This phase is also referred as Discovery phase where we focus on analyzing existing applications and capture metrics required to be gathered for migration. based on the type of migration, we need to gather some of below metrics as part of assess phase.
a. Legacy — Data Warehouse system migration to GCP — we need to capture metrics to represent complexity of system which helps to design migration plan.
i. Let’s consider a scenario of Teradata Data Warehouse migration to GCP.

ii. We will need to capture some of below metrics required for analysis
1. Source integrations — what type of sources integrated with existing TD system. How source data in brought into TD for processing?
2. Target integrations — what are different types of integrations — BI/Reporting/AI/ML or any downstream applications accessing data from TD
3. Type of loads — what are types of loads — real time , batch loads
4. Type of data processed — Structured, semi-structured etc.
5. Type of implementation — ETL or ELT
6. Type of transformations — what are different transformations run on top of TD — categorize them as Simple/Medium/Complex
7. What are TD utilities used to process data — Bteq/TPT etc.
8. What is size of Data stored/processed in TD
9. What is scheduling/orchestrator used to schedule loads/transformations
iii. These metrics are required to analyze existing system. Categorize the data, jobs/pipelines, complexity of applications & integrations
2. Plan — In this phase, we design infrastructure , choices of services, networking, project structure, folders, Organization setup. Based on first phase, analysis metrics captured — put up a design in place to define strategy of migration.
a. Based on assess scenario of Teradata DW migration use case , Plan phase consist of below strategy
i. Organization setup — projects , folder setup
ii. Networking — Firewall, subnet, VPC setup for source & target integrations
iii. Define Data strategy — Hot, Warm and cold data
iv. Define data loads — Historical loads & incremental loads
v. Define data migration strategy — choice of service , devices, transfer service to copy data to GCP environment
vi. Define storage layer — Define GCS bucket structure
vii. Define pipeline strategies, design pipelines to process and transform data
viii. Define orchestration and scheduling using GCP native services
ix. Define phases of migration — what part of application to be migrated based on interdependency of processes within TD system
x. Define operations & maintenance activities and setup
xi. Define cost strategy — cost estimation and setup. refer to https://poojakelgaonkar.medium.com/bigquery-pricing-model-and-cost-optimization-recommendations-d57ae1ebea36 for understanding more about BQ pricing model and cost optimizations
xii. Define CI/CD on GCP
3. Develop/Deploy — In this phase we design and build deployment strategy and process to move loads to GCP. Based on the initial two phases for a given use case , we will perform below steps –
a. Continuing same scenario — lets assume GCP BQ is used to replace Teradata DW system
b. Assumptions- existing application loads were all ELT using Bteq , TD data sourced in using TD native utilities , BI reports are run on top of TD.
c. Data loads — one time historical data migration
d. BQ transfer service used to bring in historical data to BQ
e. All ELT Bteq pipelines converted to equivalent BQ pipelines
f. All converted pipelines are triggered using GCP Airflow as orchestrator & scheduler
g. All BI reports continue to run on existing BI platform , Reporting SQL and connections to be changed to point the data read through BQ
h. For Real time loads — use GCP services pub/sub or dataflow to bring in data to BQ or use BQ native utilities to read data from GCS bucket and push to BQ on near real time basis — This is depending on application requirements.
i. Choices of services are completely dependent on existing application setup, business need, application requirements and SLAs to be considered for data availability
j. Deploy sample code built using GCP CI/CD setup
k. Setup logging & monitoring — alerts, thresholds and automated mails/reports
4. Optimize & Validations — In this phase, we validate the application / data migrated to GCP. There are various ways to validate data migrated . We can use any existing validation tools or open source tools or can develop an automated framework based on the application needs. We will perform below steps as part of this phase –
a. Validate data migrated — run validation scripts to ensure historical data migrated is matching with TD system
b. Validate data pipelines — run migrated pipelines on same datasets and run existing legacy pipelines for same datasets and compare. Compare pipeline output, execution time, error handling & logs captured etc.
c. Validate converted SQL — run same BI reports on TD and BQ and compare
d. Validate application SLAs
e. Validation application performance, DR , scalability and availability
f. Validate Operational and maintenance setup on TD vs BQ
g. Validate cost expenditure and actual costs of application on GCP
Here, we have considered a Data warehouse migration use case — Teradata to GCP migration. We can refer to these phases and sample metrics, steps captured here. These are based on my expertise with GCP migrations. We can find more details on each of these tasks at Google documentation — https://cloud.google.com/architecture/migration-to-gcp-getting-started
In the next blog, we will learn more about some of sample use cases and reference architecture of GCP migrations.
About Me :
I am DWBI and Cloud Architect! I have been working with various Legacy data warehouses, Bigdata Implementations, Cloud platforms/Migrations. I am Google Certified Professional Cloud Architect .You can reach out to me @ LinkedIn if you need any further help on certification, GCP Implementations!