Vol #14 | What is Medallion Architecture?
and why it is important while building data lakes & lake houses.
If you are working in a data ecosystem that is built using Databricks, you must have come across this term called “Medallion Architecture”.
In this post, we will see what this means & how it can benefit.
Medallion Architecture is a design pattern for implementing data lake or lake houses to organise your data in logical layers. These layers are separated based on the data granularity, quality & access levels to different data personas.
What is Medallion Architecture?
Medallion Architecture defines your data storage in three layers
Bronze
Silver
Gold
If you have previously worked on any Hadoop project or implemented any data lake, then you would be able to relate it to various data lake layers like Raw, Cleansed, and Curated.
Bronze (Raw) Layer :
The very first layer, where you store all your data “as is” in its most raw format. This data can be used later for any auditing purpose or backtracking the data received from source systems.
Silver (Cleansed) Layer :
Data from Bronze layer is moved to the Silver layer after validating & cleaning the data. The silver layer will have the good-quality data consumed by data scientists, analysts & other users who want to explore data.
Gold (Curated) Layer :
The gold layer is the final layer where data is transformed, aggregated & curated for business users to easily query data. The data is arranged in business dimensions for easy & quick consumption for business users.
What are the benefits?
There are several benefits of having layer-based architecture while building your data lakes & lake houses.
Easy data management
Granular level access control
Improved data quality in higher layers
Easy data discovery for business users
Enables Self-Service Analytics
For more details, you can refer to this blog by Databricks.
I hope you have learnt a new term today. Stay tuned for more such stuff in the cloud data world every week.