Snowflake — Data Cloning
Thanks for reading my blog, this is next chapter in Snowflake series. In earlier blogs we have learnt about Snowflake, Snowflake — features, Key features, Distinguishing feature — Data Sharing. In this blog, we are going to learn about Data Cloning — one of distinguishing feature of Snowflake.
With Data Warehouse and Data Lake implementation, we create copies of data to be shared across teams/users. Data cloning — as name says, it creates copies/clones of data. In typical data cloning, data/databases creates a copy of data making it duplication of data. Additional pipelines to be created to maintain copy of data — Data Consistency. Data copies created costs additional storage cost.
Data Cloning is one of the features using which we can create copy of data. Data Cloning creates data copy using metadata. This feature doesn’t costs any additional storage cost until data gets modified in it.
What is Data Clone?
Data clone is snapshot of data — when we create data clone it creates with current state of source object. If any changes happens to source data these are not reflected to clones automatically. Cloned objects are independent objects to source objects. any changes to cloned object occurs additional storage cost and considered separate object than source ones.
How to create Data Clones ?
We can create data clone using create clone DDL. We can create clone of schema, database and table.
1. To create clone of database use create ddl as below, it creates clone of database and all objects within database in current state.
create database db_clone clone sourcedb;
2. To create clone of a schema and all objects within the schema at its current state as -
create schema schema_clone clone source_schema;
3. To create clone of a table at its current state, use create statement as -
create table orders_clone clone orders;
How Data Clone works?
Creates copy of data without utilizing extra storage. No additional cost associated with data clones until data gets modified. This is done on the basis of metadata tables. We can very well relate it as pointer to primary memory. When a clone gets created, it creates a clone on top of primary memory. Single memory is associated with multiple objects. Same memory and micro partitions are referred by various clones created.
Let’s see how it looks –
Consider a use case where primary table is -> TABLE_A
Clone table is TABLE_A_CLONE
We run query to create table clone
Both the tables points to same memory location/micro partitions
Refer to below image where it refers to same storage layer , same storage, same micro partitions. It doesn’t create additional copy of data at storage layer.
TABLE_A has 3 partitions — P1, P2, P3 . Same gets created for TABLE_A_CLONE as S1, S2, S3. These partitions point to same micro partitions.
Lets assume few records are modified in cloned object — TABLE_A_CLONE. The moment data gets modified in CLONED table, it creates new partition in storage layer. The new/modified data costs additional storage. Whenever we run query on TABLE_A_CLONE, the old or unmodified gets accessed through old micro-partitions / common micro partitions of primary object. The new data is accessed from new micro-partitions created.
What are the benefits of Data Cloning ?
The following are some of the benefits/advantages of Zero Copy Clone Snowflake:
- Quick Environment Setup : We don’t have to spend any additional time to setup additional environment. We generally need to spend sometime — wait hours, days, or even weeks to create a test or development environment from a copy of production data warehouse. This also costs more for a test or development environment that can handle all the replicated data.
- Quick Clone: Zero Copy Clone Snowflake is a quick technique that allows us to create many copies of data without incurring the additional storage expenses associated with data replication, saving a lot of time.
- Storage Cost Saving: Zero copy clone Snowflake creates a clone of the item without having to reproduce the underlying storage. For multiple environments, same storage layer can be accessed. No additional storage costs across multiple env.
- No Administration: No additional cost required for admin activities as no need to maintain multiple env, objects across regions. Clone helps to create one as snapshot of data across regions.
I believe, by now you know Data Cloning — distinguishing feature of Snowflake. We will learn more about data protection — time-travel in detail with upcoming blogs.
About Me :
I am DWBI and Cloud Architect! I am currently working as Senior Data Architect — GCP, Snowflake. I have been working with various Legacy data warehouses, Bigdata Implementations, Cloud platforms/Migrations. I am SnowPro Core certified Data Architect as well as Google certified Google Professional Cloud Architect. You can reach out to me LinkedIn you need any further help on certification, Data Solutions and Implementations!