Hello to all my subscribers. I hope you all are well & had a great week.
Today I'll discuss a new terminology in the data world. It is known as the "Modern data stack."
What is Modern Data Stack?
A modern data stack is a suite of the latest products used in today's world for building a data ecosystem. Compared to the legacy tools, these are more promising in terms of efficiency, ease of use, integration, performance & cost.
The modern data stack comprises tools for data ingestion & load, processing & transformations, storage, analytics, visualization, governance, orchestration & security.
Some of the key features/properties exhibited by tools within the Modern data stack are as below
Generally, cloud-based / SaaS tools which does not need much admin & management efforts
Easy to scale as data or users increases
Practical & easy to use for all personas - data engineers, data analysts, machine learning engineers, etc
Primarily are based on ELT approach instead of ETL; help in building automated data pipeline quickly
Have inbuilt connectors to integrate with other products, tools & platforms easily
They simplify your data architecture and make it much easier to maintain & manage
These support more collaborative work & support open source technologies to a large extent
These can be easily migrated & moved across different cloud providers for a multi-cloud implementations
Support "pay as you go" pricing instead of "always-on" infra/license based pricing.
Modern Data Stack | Platforms and Tools
Here are some of the most popular tools from the modern data stack that many enterprises have started using. I’ve not explored all of these but have attended various webinars, seen youtube videos or read about these in various blogs.
You can try to explore these further to understand them in detail. I’ve provided links to each of these tools for your further reading.
Data transformation - dbt (data build tool), Databricks
Data storage - Apache Iceberg, Delta lake, along with the cloud object storage like AWS S3, Azure ADLS
Data warehouse- Snowflake, Big Query, Redshift, Azure Synpase Analytics, Databricks SQL
Orchestration- Airflow, Prefect, Astronomer
Data observability - Monte Carlo, Acceldata
By no means, this is a complete list. There are many other players in market, that are providing tools that can be easily be part of the modern data stack family.
Recently, I came across a new platform - Datacoves. They have some of these modern data stack tools bundled all in one single platform!
This might be the future of how new data ecosystems would be built using a single platform with multiple tools packaged inside it.
Where can you read more about it?
If you are interested in learning more about the modern data stack, I’ve created a youtube playlist to explain it. You can watch it on the below link
Modern Data Stack - Youtube Playlist
There is also an upcoming webinar on the 24th of May where some of the top tech companies will share their vision about the future of the modern data stack. You can register for the same here.
Thanks for taking time out to read this newsletter. If you have any suggestions/feedback, please leave a comment.
Don’t forget to share with your friends & other aspiring data engineers!
Databricks SQL is a compute engine that is based on Spark SQL.
You can use Databricks SQL for querying data residing on the cloud object storage like S3, ADLS, GCS.