BigQuery Pricing Model and Cost Optimization Recommendations

Pooja Kelgaonkar
Google Cloud - Community
6 min readMar 24, 2022

--

Thanks for reading out my previous blog on Performance Optimization with BigQuery. In this blog, we are going to explore more on BQ pricing — How BQ price is calculated ? How we can predict costs if we go with GCP BQ implementation. We are also going to learn BQ Cost Optimization techniques as well as some recommendations.

What is BQ Cost? How to calculate BQ Pricing?

BQ is managed offering of Google cloud and have 2 components contributing to BQ pricing.

1. BQ Storage cost — Storage cost associated with table/data stored in BQ. Active storage and long term storage contributes to storage costs.

2. BQ Query Cost — Query processing cost based on the bytes scanned while running queries. There are 2 modes of cost available — On demand and Fixed rate.

3. BQ I/O cost — BQ I/O cost includes — data insertion costs in case of streaming inserts/real time data loads

BQ cost can be calculated using Google pricing calculator — https://cloud.google.com/products/calculator

Now, let’s see what all inputs required to calculate BQ cost or estimate BQ cost. We need to know below inputs before we can calculate BQ pricing -

1. What is data volume or storage stored in BQ — Long term storage and Short term storage. Tables which are created and changed in processing cater to short term storage. Any data which is not changed more than 90 days time contribute to long term storage. Storage pricing is different for short term and long term storages.

2. What is query volume and data processed in queries — Data scanned is one of the critical component of BQ pricing . BQ pricing is based on the volume of data scanned through queries. We need to know the queries , types of queries, data operations through SQL , data volume expected during data processing/data analytics/data science use cases etc.

3. What is type of data loads? — We need to know the different types of data loads happening in application which will continue to load data into BQ. Bulk loads are free in BQ with no cost associated in data loads while streaming inserts/real time data loads are chargeable.

4. How many users are there? How often data will be accessed? What are the types of queries/processing/BI/analytics would be run by different teams? — We need to know the consumer side of application where data will be exported or accessed or retrieved from BQ. This is required to calculate the query cost for application. We can use BQ query sampling to calculate approx. data scanned through queries –

Sample pricing — Let’s take a sample use case and see how we can use Google pricing calculator to calculate cost –

1TB of query scan is free every month. We can quickly check that using pricing calculator –

There are 2 tabs on pricing calculator — On demand and flat rate . we can compare the pricing of same storage and query for on demand and flat rate calculator.

Flat rate pricing works on no of slots, once a customer analyses the storage, queries, volume processed/scanned during loads/applications. Customer can choose to go with flat rate if the scanned volume is more causing more on demand pricing. First 1TB is free however the next 1TB costs $5 for each scan. E.g. If we take a use case where a report query scans 5TB of data so in this case, 1TB will go in free bucket and for rest of 4 TB we will get flat 4 * 5 = $20 bill.

Sample pricing of flat rate for 20GB storage and 2000 slots –

Lets take a sample pricing for 100GB of storage and no streaming inserts . it will look something like –

We can add storage and query cost for on demand and get total cost per month. Refer to below snapshot for sample use case –

Now, we know how we can calculate GCP cost for BQ usage. Let’s see some of the recommendations on BQ cost optimizations.

1. Keep only required data only as long as its needed

a. Setup table expirations — we can setup table as well as dataset expirations to avoid storage cost beyond the need of data. E.g. this can be setup for temporary tables and storages which will be deleted/dropped automatically

2. Be cautious of how often data is edited

a. Tables can be partitioned and saved in BQ. If table data is not updated often then the storage is converted from active storage to long term storage. Consider a use case where SCD is been implemented but the updated data is expected to be received only for certain duration. We can have table partitioned accordingly and all old data stored gets converted to long term as no updates happening on them beyond 90 days.

3. Avoid duplicate copies of data

a. BQ supports external tables — we can read data from data stored on other GCP services such as GCS , bigtable, SQL and drive etc. recommend to store data on these services if its not required to be frequently accessed on BQ

4. Double check if streaming inserts are needed or not

a. BQ costs for streaming inserts. Check if they are really needed, if customer or applications doesn’t need data to be available on real time then recommend to go with near real time data loads. Convert real time data loads to near real time, having them added to be part of micro batches. Have data loaded every 3–5 mins time and data will be available near real time.

5. BQ snapshots — DR and backup process

a. BQ maintains snapshot of table data for 7 days by default. We can go to that particular snapshot any time within 7 days time. Recommend to avoid creating multiple copies as part of backup process until its really required.

6. Be mindful of queries run on BQ

a. BQ costs every single column getting selected in query run. Avoid running select *, recommend to run only selected columns in query

b. Partition data stored , loaded in BQ.

c. Access data on partitions and restrict partitions while running queries against BQ data

7. Understanding on demand vs flat rate pricing

a. Pricing model and selecting one is one of the important aspect in BQ costing/pricing.

This is last but one story in Bigquery blog series. Read my next blog on Bigquery Challenges. If you are looking out for any more topics please feel free to drop me a message/note.

About Me :

I am DWBI and Cloud Architect! I have been working with various Legacy data warehouses, Bigdata Implementations, Cloud platforms/Migrations. I am Google Certified Professional Cloud Architect .You can reach out to me @ LinkedIn if you need any further help on certification, GCP Implementations!

--

--

Pooja Kelgaonkar
Google Cloud - Community

My words keep me going, Keep me motivating to reach out to more and more!