Data Security and Governance with GCP BQ
Data security is securing data wherever its located, whether its in transit or stored or at rest, whether its managed/maintained by internal or external teams. This is critical to implement and ensure data security is in place whether data is in cloud or at on-premises systems.
Data Governance is process of managing the availability, usability, integrity and security of the data in enterprise systems, based on internal data standards and policies that also control data usage. Effective data governance ensures that data is consistent and trustworthy and doesn’t get misused. It’s critical to implement irrespective of cloud or on-premises implementations.
Let’s quickly touch base upon some of the key concepts when it comes to implementing Data Security and Governance with GCP and BigQuery.
Architecture, hierarchy of resources, and data governance
i. Google Cloud architecture framework
Google Cloud’s Architecture Framework describes best practices, makes implementation recommendations, and goes into detail about products and services. The framework aims to help you design your Google Cloud deployment so that it best matches your business needs. Refer to https://cloud.google.com/architecture/framework
ii. Organizing BigQuery resources
Like other Google Cloud services, BigQuery resources are organized in a hierarchy. Your BigQuery resource hierarchy is fundamental for data governance in your BigQuery deployment. We can organize BigQuery objects in Organizations , Folders, Projects and Dataset hierarchy. Refer to https://cloud.google.com/bigquery/docs/resource-hierarchy
iii. Data Governance
Understand the concept of data governance, and what controls you might need to secure your BigQuery resources. You can read more details on GCP documentation.
Refer to https://cloud.google.com/bigquery/docs/data-governance
Securing resources with Identity and Access Management (IAM)
i. IAM — Identity and Access Management
a. IAM allow us to grant granular access to specific Google Cloud resources and helps prevent access to other resources. This allows to adopt the security principle of least privilege, which states that nobody should have more permissions than they actually need. This is one the most important security principle.
b. We can set IAM policies at different levels of the resource hierarchy. Resources inherit the policies of the parent resource. The resulting policy for a resource is the union of the policy set at that resource and the policy inherited from its parent. Recommend to understand the hierarchy of resources and provide only required roles to users.
ii. Access control roles and permissions
a. Predefined IAM roles and permissions in BigQuery
We need to grant permissions by granting roles to a user, a group, or a service account. Refer to https://cloud.google.com/iam/docs/overview
This describes the BigQuery IAM roles that you can grant to identities to access BigQuery resources.
b. Basic roles and permissions in BigQuery
BigQuery’s dataset-level basic roles existed prior to the introduction of IAM. We recommend that you minimize the use of basic roles, and use IAM roles instead. Refer to — https://cloud.google.com/bigquery/docs/access-control for more details
iii. Access control by resource level
a. Controlling access to datasets
Dataset-level permissions determine the users, groups, and service accounts allowed to access the tables, views, and table data in a specific dataset.
b. Introduction to controlling access to tables and views
BigQuery Table ACL lets you set table-level permissions on resources like tables and views. Table-level permissions determine the users, groups, and service accounts that can access a table or view.
iv. Access control by authorization
a. Creating authorized views
Giving a view access to a dataset is also known as creating an authorized view in BigQuery. An authorized view allows you to share query results with particular users and groups without giving them access to the underlying source data. Refer to — https://cloud.google.com/bigquery/docs/authorized-views
b. Creating authorized UDFs
An authorized UDF is a UDF that is authorized to access a particular dataset. The UDF can query tables in the dataset, even if the user who calls the UDF does not have access to those tables. Refer to — https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions#authorized_udfs
Securing data with classification
Data with classification refer to stored data on Bigquery and applying column level or row level security on top of the data. We know how can we apply row-level security with typical DWBI implementation where we can restrict or filter data based on few conditions and create views to be shared with users to protect data or apply masking or encrypting PII information present in data. Let’s see how can we implement row-level and column level security with BigQuery –
i. Column-level security
BigQuery provides fine-grained access to sensitive columns using policy tags, or type-based classification, of data. Using BigQuery column-level security, you can create policies that check, at query time, whether a user has proper access. Refer to — https://cloud.google.com/bigquery/docs/column-level-security-intro

ii. Row-level security
Row-level security extends the principle of least privilege by enabling fine-grained access control to a subset of data in a BigQuery table, by means of row-level access policies. Refer to — https://cloud.google.com/bigquery/docs/row-level-security-intro
Data Discovery with BigQuery
GCP offers few data discovery services which can be used for data loss prevention , data catalog used to derive data lineage and maintain it. These services can also be used to scan PII data and add tags to those BQ tables.
i. Data Loss Prevention
This is fully managed service that lets Google Cloud customers identify and protect sensitive data at scale. Refer to — https://cloud.google.com/bigquery/docs/scan-with-dlp
ii. Data Catalog
Data Catalog interacts with Cloud Data Loss Prevention (DLP) to automatically identify sensitive data by using Cloud DLP’s powerful auto-tagging mechanism. We can use policy tags to define access to your data, for example, when you use BigQuery column-level security. Refer to — https://cloud.google.com/bigquery/docs/best-practices-policy-tags

Follow my upcoming blog with steps, process to use DLP and Data Catalog with BigQuery.
Data Encryption
As we all know, Google’s BigQuery is fully managed service of GCP. This offers encryption at both places i.e. Data at Rest and Data in Transit. Data stored in Bigquery as well as at Google storage is fully encrypted by GCP. There are 2 types of encryptions available –
a. Google Managed encryption
BigQuery automatically encrypts all data before it is written to disk. The data is automatically decrypted when read by an authorized user. By default, Google manages the key encryption keys used to protect your data.
b. Customer managed encryption
We can control encryption ourself, by using customer-managed encryption keys (CMEK) for BigQuery. Instead of Google managing the key encryption keys that protect data, we can control and manage key encryption keys in Cloud KMS.
Another important aspect of data security is Logging and Monitoring, though we setup security using some of built in services of GCP we would also want to monitor and ensure its implementation. We will see more on data monitoring and debugging with upcoming blog but you can refer to — https://cloud.google.com/bigquery/docs/monitoring to get more insights of BQ monitoring. Setting up alerts, creating metrics and building dashboards always help to monitor and capture issues proactively or address effectively.
About Me :
I am DWBI and Cloud Architect! I have been working with various Legacy data warehouses, Bigdata Implementations, Cloud platforms/Migrations. I am Google Certified Professional Cloud Architect .You can reach out to me @ LinkedIn if you need any further help on certification, GCP Implementations!