Vol #18 | What is Databricks SQL?
Looking for a simple tool for analyzing data directly within the lakehouse - Try Databricks SQL!
A few weeks back, I wrote a post explaining “What is Databricks”. In today’s post, I’ll introduce you to another tool within the Databricks eco-system - “Databricks SQL”.
Let’s get started!
To implement a lakehouse, you would need a compute engine that can understand SQL queries for data analysis. Databricks SQL provides this compute capability to execute your SQL queries & execute BI reports to analyse data residing within the lakehouse.
Databricks SQL is the compute engine that you can be used to query data present in the cloud data storage like AWS S3, Azure ADLS or Google Cloud Storage.
Details
Let’s have a look at some of the important concepts related to Databricks SQL.
SQL Warehouse (Compute Engine)
Databricks SQL uses “SQL Warehouses” as a compute engine required to execute any queries. If you have worked on Snowflake, then you can compare Databricks SQL warehouse to Snowflake virtual warehouses.
Databricks SQL warehouse also uses T-shirt-based sizing & also has a serverless option.
Lakehouse (Storage)
Data stored within the lakehouse (cloud storage + open table formats) can be analysed using Databricks SQL. You can either use internal DBFS as storage, or you can mount your cloud storage.
Unlike other cloud warehouses, you don’t need to load the data inside Databricks SQL first. This reduces the efforts & cost required to move & maintain data in 2 different stores like lake & warehouse.
BI & Visualisations
Databricks SQL also has an inbuilt visualization tool that can be used for generating quick reports & dashboards. This makes it very easy to create quick reports with just a few clicks.
It also supports integration with other BI tools like Power BI & Tableau. Reports can be created in these BI tools & queries can be executed within Databricks SQL.
Photon Engine
It is the new engine that helps to accelerate the queries on Delta Lake. You can enable this within Databricks SQL for better performance.
Data Explorer
You also get a quick data explorer that can be used to browse through the metadata of various tables & views. You can easily view every object's metadata details, owner and permissions for other users.
Access Control
Databricks SQL query editor can be used to provide fine-grained access to tables & views. The standard “Grant/Revoke” permissions can provide access to various objects for specific roles. You can also create views to mask the data “dynamically” based on the user accessing it.
Summary
Databrciks SQL provides an easy option for analysing data present in the lakehouse. It is better suited for data analysts or non-data engineers who want to analyse data without writing any complex code in python or scala.
If you want to explore Databricks SQL further, you can look at the resources below.
Note - Databricks SQL is not available in the community edition :(
Don’t miss the Data Governance Community Data organised by Data QG happening on 14th Sep
Thanks for the post again. Makes more sense when I compare to snowflake
Athena, till last year, supported only Presto engine. I think in the latest Re:invent, it was announced that it now supports Spark engine.
Listen to this podcast -
https://gauravthalpati.substack.com/p/episode-2#details