Snowflake interview questions

for

freshers & experienced 2022


Most asked snowflake questions

1. What is a Snowflake cloud data warehouse?

Snowflake is an analytic data warehouse implemented as a SaaS service. It is built on a new SQL database engine with a unique architecture built for the cloud. This cloud-based data warehouse solution was first available on AWS as software to load and analyze massive volumes of data. The most remarkable feature of Snowflake is its ability to spin up any number of virtual warehouses, which means the user can operate an unlimited number of independent workloads against the same data without any risk of contention.

2. Is Snowflake an ETL tool?

Yes, Snowflake is an ETL tool. It’s a three-step process, which includes:

• Extracts data from the source and creates data files. Data files support multiple data formats like JSON, CSV, XML, and more.

• Loads data to an internal or external stage. Data can be staged in an internal, Microsoft Azure blob, Amazon S3 bucket, or Snowflake managed location.

• Data is copied into a Snowflake database table using the COPY INTO command

3. Explain Snowflake ETL?

The full form of ETL is Extract, Transform, and Load. ETL is the process that we use for extracting the data from multiple sources and loading it to a particular database or data warehouse. The sources are third party apps, databases, flat files, etc.

Snowflake ETL is an approach to applying the ETL process for loading the data into the Snowflake data warehouse or database. Snowflake ETL also includes extracting the data from the data sources, doing the necessary transformations, and loading the data into Snowflake.

4. How is data stored in Snowflake?

Snowflakes store the data in multiple micro partitions which are internally optimized and compressed. The data is stored in a columnar format in the cloud storage of Snowflake. The data objects stored by Snowflake cannot be accessed or visible to the users. By running SQL query operations on Snowflake, you can access them.

5. How is Snowflake distinct from AWS?

Snowflake offers storage and computation independently, and storage cost is similar to data storage. AWS handles this aspect by inserting Redshift Spectrum, which enables data querying instantly on S3, yet not as continuous as Snowflake.

6. What type of database is Snowflake?

Snowflake is built entirely on a SQL database. It’s a columnar-stored relational database that works well with Excel, Tableau, and many other tools. Snowflake contains its query tool, supports multi-statement transactions, role-based security, etc., which are expected in a SQL database.

7. Can AWS glue connect to Snowflake?

Definitely. AWS glue presents a comprehensive managed environment that easily connects with Snowflake as a data warehouse service. These two solutions collectively enable you to handle data ingestion and transformation with more ease and flexibility.


8. Explain Snowflake editions.

Snowflake offers multiple editions depending on your usage requirements.

• Standard edition – Its introductory level offering provides unlimited access to Snowflake’s standard features.

• Enterprise edition – Along with Standard edition features and services, offers additional features required for large-scale enterprises.

• Business-critical edition – Also, called Enterprise for Sensitive Data (ESD). It offers high-level data protection for sensitive data to organization needs.

• Virtual Private Snowflake (VPS) – Provides high-level security for organizations dealing with financial activities.

9. Define the Snowflake Cluster

In Snowflake, data partitioning is called clustering, which specifies cluster keys on the table. The method by which you manage clustered data in a table is called re-clustering.

10. Explain Snowflake architecture

Snowflake is built on an AWS cloud data warehouse and is truly Saas offering. There is no software, hardware, ongoing maintenance, tuning, etc. needed to work with Snowflake.

Three main layers make the Snowflake architecture – database storage, query processing, and cloud services.

• Data storage – In Snowflake, the stored data is reorganized into its internal optimized, columnar, and optimized format. 

• Query processing – Virtual warehouses process the queries in Snowflake.

• Cloud services – This layer coordinates and handles all activities across the Snowflake. It provides the best results for Authentication, Metadata management, Infrastructure management, Access control, and Query parsing.

11. What are the features of Snowflake? 

Unique features of the Snowflake data warehouse are listed below:

• Database and Object Closing

• Support for XML

• External tables

• Hive meta store integration

• Supports geospatial data

• Security and data protection

• Data sharing

• Search optimization service

• Table streams on external tables and shared tables

• Result Caching

12. Why is Snowflake highly successful?

Snowflake is highly successful because of the following reasons:

• It assists a wide variety of technology areas like data integration, business intelligence, advanced analytics, security, and governance.

• It offers cloud infrastructure and supports advanced design architectures ideal for dynamic and quick usage developments.

• Snowflake supports predetermined features like data cloning, data sharing, division of computing and storage, and directly scalable computing.

• Snowflake eases data processing.

• Snowflake provides extendable computing power.

• Snowflake suits various applications like ODS with the staged data, data lakes with data warehouse, raw marts, and data marts with acceptable and modelled data.

13. Tell me something about Snowflake AWS?

For managing today’s data analytics, companies rely on a data platform that offers rapid deployment, compelling performance, and on-demand scalability. Snowflake on the AWS platform serves as a SQL data warehouse, which makes modern data warehousing effective, manageable, and accessible to all data users. It enables the data-driven enterprise with secure data sharing, elasticity, and per-second pricing.

14. Describe Snowflake computing.

Snowflake cloud data warehouse platform provides instant, secure, and governed access to the entire data network and a core architecture to enable various types of data workloads, including a single platform for developing modern data applications.  



15. What is the schema in Snowflake?



Schemas and databases used for organizing data stored in the Snowflake. A schema is a logical grouping of database objects such as tables, views, etc. The benefits of using Snowflake schemas are it provides structured data and uses small disk space.


16. What are the benefits of the Snowflake Schema?

• In a denormalized model, we use less disk space.

• It provides the best data quality.


17. Differentiate Star Schema and Snowflake Schema?


Both Snowflake and Star Schemas are identical, yet the difference exists in dimensions. In Snowflake, we normalise only a few dimensions, and in a star schema, we denormalise the logical dimensions into tables.


18. What kind of SQL does Snowflake use?

Snowflake supports the most common standardized version of SQL, i.e., ANSI for powerful relational database querying.

19. What are the cloud platforms currently supported by Snowflake?

• Amazon Web Services (AWS)

• Google Cloud Platform (GCP)

• Microsoft Azure (Azure)

20. What ETL tools do you use with Snowflake?

Following are the best ETL tools for Snowflake

• Matillion

• Blendo

• Hevo Data

• StreamSets

• Etleap

• Apache Airflow 

21. Explain zero-copy cloning in Snowflake?

In Snowflake, Zero-copy cloning is an implementation that enables us to generate a copy of our tables, databases, schemas without replicating the actual data. To carry out zero-copy in Snowflake, we have to use the keyword known as CLONE. Through this action, we can get the live data from the production and carry out multiple actions.

22. Explain “Stage” in the Snowflake?

In Snowflake, the Stage acts as the middle area that we use for uploading the files. Snowpipe detects the files once they arrive at the staging area and systematically loads them into the Snowflake.

Following are the stages supported by the snowflake:

• Table Stage

• User Stage

• Internal Named Stage

23. Explain data compression in Snowflake?

All the data we enter into the Snowflake gets compacted systematically. Snowflake utilizes modern data compression algorithms for compressing and storing the data. Customers have to pay for the packed data, not the exact data.

24. How do we secure the data in the Snowflake?

Data security plays a prominent role in all enterprises. Snowflake adapts the best-in-class security standards for encrypting and securing the customer accounts and data that we store in the Snowflake. It provides the industry-leading key management features at no extra cost:

25. Explain Snowflake Time Travel?

Snowflake Time Travel tool allows us to access the past data at any moment in the specified period. Through this, we can see the data that we can change or delete. Through this tool, we can carry out the following tasks:

Restore the data-associated objects that may have lost unintentionally.

For examining the data utilization and changes done to the data in a specific time period.

Duplicating and backing up the data from the essential points in history.

26. What is the database storage layer?

Whenever we load the data into the Snowflake, it organizes the data into the compressed, columnar, and optimized format. Snowflake deals with storing the data that comprises data compression, organization, statistics, file size, and other properties associated with the data storage. All the data objects we store in the Snowflake are inaccessible and invisible. We can access the data objects by executing the SQL query operation through Snowflake.


27. Explain Fail-safe in Snowflake?

Fail-safe is a modern feature that exists in Snowflake to assure data security. Fail-safe plays a vital role in the data protection lifecycle of the Snowflake. Fail-safe provides seven days of additional storage even after the time travel period is completed.

28. Explain Virtual warehouse?

In Snowflake, a Virtual warehouse is one or more clusters endorsing users to carry out operations like queries, data loading, and other DML operations. Virtual warehouses approve users with the necessary resources like temporary storage, CPU for performing various snowflake operations.

 29. Explain Data Shares

Snowflake Data sharing allows organizations to securely and immediately share their data. Secure data sharing enables sharing of the data between the accounts through Snowflake secure views, database tables.

30. What are the various ways to access the Snowflake Cloud data warehouse?

We can access the Snowflake data warehouse through:

• ODBC Drivers

• JDBC Drivers

• Web User Interface

• Python Libraries

• SnowSQL Command-line Client

31. Explain Micro Partitions?



Snowflake comes along with a robust and unique kind of data partitioning known as micro partitioning. Data that exists in the Snowflake tables are systematically converted into micro partitions. Generally, we perform Micro partitioning on the Snowflake tables.



32. Explain Columnar database?



The columnar database is opposite to the conventional databases. It saves the data in columns in place of rows, eases the method for analytical query processing and offers more incredible performance for databases. Columnar database eases analytics processes, and it is the future of business intelligence.



33. How to create a Snowflake task?



To create a Snowflake task, we have to use the “CREATE TASK” command. Procedure to create a snowflake task:

CREATE TASK in the schema.

USAGE in the warehouse on task definition.

Run SQL statement or stored procedure in the task definition.



34. How do we create temporary tables?



To create temporary tables, we have to use the following syntax:

Create temporary table mytable (id number, creation_date date);



35. Where do we store data in Snowflake?



Snowflake systematically creates metadata for the files in the external or internal stages. We store metadata in the virtual columns, and we can query through the standard “SELECT” statement.



36. Does Snowflake use Indexes?



No, Snowflake does not use indexes. This is one of the aspects that set the Snowflake scale so good for the queries.



37. How is Snowflake distinct from AWS?



Snowflake offers storage and computation independently, and storage cost is similar to data storage. AWS handles this aspect by inserting Redshift Spectrum, which enables data querying instantly on S3, yet not as continuous as Snowflake.



38. How do we execute the Snowflake procedure?



Stored procedures allow us to create modular code comprising complicated business logic by adding various SQL statements with procedural logic. For executing Snowflake procedure, carry out the below steps:

• Run a SQL statement

• Extract the query results

• Extract the result set metadata



39. Does Snowflake maintain stored procedures?



Yes, Snowflake maintains stored procedures. The stored procedure is the same as a function; it is created once and used several times. Through the CREATE PROCEDURE command, we can create it and through the “CALL” command, we can execute it. In Snowflake, stored procedures are developed in Javascript API. These APIs enable stored procedures for executing the database operations like SELECT, UPDATE, and CREATE.



40. Is Snowflake OLTP or OLAP?



Snowflake is developed for the Online Analytical Processing(OLAP) database system. Subject to the usage, we can utilize it for OLTP(Online Transaction processing) also.



41. How is Snowflake distinct from Redshift?



Both Redshift and Snowflake provide on-demand pricing but vary in package features. Snowflake splits compute storage from usage in its pricing pattern, whereas Redshift integrates both.



42. What is the use of the Cloud Services layer in Snowflake?



The services layer acts as the brain of the Snowflake. In Snowflake, the Services layer authenticates user sessions, applies security functions, offers management, performs optimization, and organizes all the transactions.



43. What is the use of the Compute layer in Snowflake?



In Snowflake, Virtual warehouses perform all the data handling tasks. Which are multiple clusters of the compute resources. While performing a query, virtual warehouses extract the least data needed from the storage layer to satisfy the query requests.



44. What is Unique about Snowflake Cloud Data Warehouse?



Snowflake is cloud native (built for the cloud).So, It takes advantage of all the good things about the cloud and brings exciting new features like,

• Auto scaling

• Zero copy cloning

• Dedicated virtual warehouses

• Time travel

• Military grade encryption and security

• Robust data protection features

Snowflake is a poetry. It's beautifully crafted with smart defaults –

• All the data is compressed by default

• All the data is encrypted

• Its Columnar, thereby making the column level analytical operations a lot faster

Not to mention the number of innovations in the product – eg. Intelligent Services layer, data shares, tasks & streams. Snowflake also has a simple and transparent pricing, which makes it very easier even for smaller businesses to afford a cloud data warehouse



45. What is Snowflake Architecture ?



Snowflake is built on a patented, multi-cluster, shared data architecture created for the cloud. Snowflake architecture is comprised of storage, compute, and services layers that are logically integrated but scale infinitely and independent from one another



46. What does the Storage Layer do in Snowflake ?



The storage layer stores all the diverse data, tables and query results in Snowflake. The Storage Layer is built on scalable cloud blob storage (uses the storage system of AWS, GCP or Azure). Maximum scalability, elasticity, and performance capacity for data warehousing and analytics are assured since the storage layer is engineered to scale completely independent of compute resources



47. What does the Compute Layer do in Snowflake ?



All data processing tasks within Snowflake are performed by virtual warehouses, which are one or more clusters of compute resources. When performing a query, virtual warehouses retrieve the minimum data required from the storage layer to full fil the query requests



48. What does the Cloud Services Layer do in Snowflake ?



The services layer is the brain of Snowflake. The services layer for Snowflake authenticates user sessions, provides management, enforces security functions, performs query compilation and optimization, and coordinates all transactions



49. What is a Columnar database and what are its benefits ?



Columnar databases organize data at Column level instead of the conventional row level. All Column level operations will be much faster and consume less resources when compared to a row level relational database



50. What is Snowflake Caching ?



Snowflake caches the results of every query you ran and when a new query is submitted, it checks previously executed queries and if a matching query exists and the results are still cached, it uses the cached result set instead of executing the query. Snowflake Cache results are global and can be used across users



51. What are the different types of caching in Snowflake ?



  1. Query Results Caching
  2. Virtual Warehouse Local Disk Caching
  3. Metadata Cache


52 Name the types of catches in Snowflake? 


  • Query Results Caching
  • Metadata Cache
  • Virtual Warehouse Local Disk Caching


53 What is Snowflake Time Travel?


Snowflake Time Travel tool enables you to access historical data at any given point within a defined time period. Using this you can see the data that has been deleted or changed. Using this tool you can perform the below tasks:

  • Restore data-related objects (Schemas, tables, and databases) that might have lost accidentally.
  • To examine data usage and changes made to data with a time period
  • Backing up and duplicating data from key points in the past.


54 What is Fail-safe in Snowflake?


Fail-safe is an advanced feature available in Snowflake to ensure data protection. This plays an important role in Snowflake’s data protection lifecycle. Fail-safe offers 7 days extra storage even after the time travel period is over.



55 Why fail-safe instead of Backup?


To minimize the risk factor, DBA’s traditionally execute full and incremental data backups at regular intervals. This process occupies more storage space, sometimes it may be double or triple. Moreover, the data recovery process is costly, takes time, requires business downtime, and more.

Snowflake comes with a multi-datacenter, redundant architecture that has the capability to minimize the need for traditional data backup. Fail-safe features in Snowflake is an efficient and cost-effective way that substitutes the traditional data backup and eliminates the risks and scales along with your data.



56 What is the Data retention period in Snowflake?


Data retention is one of the key components of Snowflake and the default data retention period for all snowflake accounts is 1 day (24 hours). This is a default feature and applicable for all Snowflake accounts.



57 Explain data shares in Snowflake?




The data shares option in snowflake allows the users to share the data objects in a database in your account with other snowflake accounts in a secured way. All the database objects shared between snowflake accounts are only readable and one can not make any changes to them.

Following are the sharable database objects in Snowflake:

  • Tables
  • Secure views
  • External tables
  • Secure UDFs
  • Secure materialized views


58 What are the data sharing types in Snowflake?


Following are the 3 types of data sharing types:

  • Sharing Data between functional units. 
  • Sharing data between management units.
  • Sharing data between geographically dispersed location


59 What do you know about zero-copy cloning in Snowflake?