Snowflake is a cloud native database and can be hosted on any of the clouds namely AWS, GCP or Azure. While using snowflake we pay for only the storage we use and the compute which was actually used during the cycle.
It has a 3 layer architecture
Global Services Layer
Compute Layer
Storage Layer
Snowflake stores the data on the storage layer in the underlying blob storage of the cloud. It then creates a compute warehouse on top of this which helps to query the data and deliver the results to the user. The services layer manage all others tasks like authorization, metadata management, infrastructure management and caching of the results.
Snowflake Data Lifecycle
Organizing Data -> Database -> Schemas -> Tables
Storing Data -> INSERT/COPY INTO commands
Querying Data -> SELECT statements
Working Data -> DML statements, DDL statements and CLONING
Removing Data -> DELETE, DROP and TRUNCATE
Snowflake Tables
Permanent - They are permanent tables which allow timetravel of 90 days and have failsafe period of 7 days.
Temproray - These tables only exist during the session and have timetravel of 1 day and no fail safe period.
Transient - These tables exist until they are dropped but timetravel is of 1 day and no fail safe period.
External - These tables are outside snowflake and are read only. They do not have timetravel or failsafe.
There are many other objects in snowflake like Organization, Account, Database, Schema, View, Functions, Stored Procedures, Sequences, Tasks and Streams .
Snowflake can be integration with variety of tools from the data world:
Business Intelligence (Tableau, Power BI, Qlikview, Thoughtspot
Data Integration (DBT Labs, Informatica, Pentaho, Fivetran
Security and Governance (Collibra, DATADOG, HashiCorpVault)
Machine Learning and Data Science (DataRobot, Dataiku, Amazon Sagemaker, Zepl)
SQL Devlopment and Management (SQL DBM, Seekwell, Hackolade, Agiledata engine)
SnowPark
It is an API used to access data outside snowflake interface. It supports languages like Python, JAVA, Scala
It provides various functions like .select(), .join() etc to work with the dataframes.
It executes the code lazily ie only when requested.
It has a push down computation where data is not moved outside snowflake, instead code is pushed down to snowflake.
There are many other aspects to Snowflake which is a cloud native database. This blog just gives the overview.
Thanks for Reading!!
Nice blog ishita