ELT with Google Bigquery: Part II
Thanks for reading out my first blog on ELT with Google Bigquery. If you haven’t read it yet , you can refer to — https://poojakelgaonkar.medium.com/elt-with-google-bigquery-d4b84d77adaf to read out the first part of this blog.
In previous blog, we have seen 2 different approaches to implement ELT on GCP with Bigquery with other GCP services. In this blog, we are going to see some of the important aspects of designing/architecting ELT on GCP specifically when we have chosen Bigquery as DW/Data Lake to store data. Before explaining the specifics, lets look at the Bigquery and try to find out answer to many questions like access to data, retrieve data, data security, data loss prevention, run data analysis , data analytics, generate stats , building dashboards, exploring data through reporting and the most importantly how can we get ML models built/run on Bigquery? These are some of the most important questions to determine choice of DW/Data Lake service on GCP. There are many Data/Storage/DB services available with GCP. Follow my next blogs to get more insights about these services and choosing criteria of these services.
Talking more about Bigquery- lets look at some of these questions –
Bigquery is one of the services of GCP which offers data storage, data security, DLP, Data to be accessed over various services within GCP and outside GCP. BQ(Bigquery) is seamless configurable, usable with any other GCP service. We can process data anywhere , any services within GCP or outside GCP we can store data in BQ. BQ offers connectivity over internet through its utilities, Rest APIs, GCP as well as through different connectors made available to connect to BQ. We can also see that BQ plugins are also available for most of the ETL as well as Reporting tools. Bigquery allow us to use this same as any other DW using ANSI SQL queries, functions, stored procedures etc. BQ extends support to ML as well. We can use GCP build in services to access data saved in BQ to create ML models and deploy using AI platform. BQ also supports ML with its own capabilities to create ML models, deploy in BQ and access them like SQL statements. BQ ML have some of the restrictions like algorithms supported, models can be created, accessed etc. with BQ ML, its been easy to create ML models, deploy and use them for any of the business users who are aware of SQL.
BQ supports most of the DW functions and operations except row level processing or individual inserts/updates run on a table. There is limitation on no of transactions run on a table per day if individual transactions carried out like OLTP. Inserts/updates can very well handled through merge statements. BQ performance is been key to its success. BQ supports schema segregation same as any other DW and its referred as Datasets in BQ. We can create different datasets for different layers of DW same as different databases/schemas can be created in any other DW. BQ supports accessing its objects created within datasets and across datasets. Follow my next blog to get more insights of data security and accessibility implementation in BQ.
While implementing ELT on GCP, BQ is one of the service widely used to store and process data. Source can be setup based on the requirements and target can be BQ. Now, we are going to look at the different layers and services that fall under ELT while designing and developing ELT on GCP. We know some of the services like Dataflow, Data Fusion, Data Proc, Cloud Functions using which ETL can be implemented on GCP. To implement ELT on GCP, we can use variety of stack, services, open sources. Below table represent some of the use cases to implement ELT on GCP -
In most of these scenarios and use cases, landing data to GCS is first step to ensure source data validation, source data versioned and stored for some time to avoid any type of data loss. Once data is copied to GCS, native or open-source utilities used to get it loaded to appropriate DB services. Below data model, layers are recommended as well typically followed in most of the DW implementations –
These are some of the datasets/layers created in BQ while implementing ELT or ETL with BQ. The same layers can also be created while implementing ETL on GCP. The E and T steps, transformations may vary based on the implementation.
Hope this helps to understand BQ setup and usage in ELT as well as ETL implementations. Follow my next blogs to read more about BQ and other GCP services.
I am DWBI and Cloud Architect! I have been working with various Legacy data warehouses, Bigdata Implementations, Cloud platforms/Migrations. I am Google Certified Professional Cloud Architect .You can reach out to me @ LinkedIn if you need any further help on certification, GCP Implementations!