Vol #5 | What is Data Governance
Data Governance is an extensive topic to discuss, learn and understand.
This week's newsletter will give you a high-level overview of Data Governance and why it is essential in modern data architectures.
Data Governance is a set of best practices and processes for managing, accessing, sharing and securing data & meta-data in your ecosystem. It is one of the core pillars of the data ecosystem and should always be part of your data strategy.
Data Governance includes (but is not limited to) the governance of below crucial data functions.
Data Quality & Integrity
What it is - It is the process to validate whether the data is
correct
complete
unique
and up-to-date
before it gets consumed by the user. These include data validations, cleansing, formatting & standardization of data.
Why is it essential - It ensures that users always get the correct & complete data. This helps in maintaining trust within the data.
Data Security & Access Policies
What it is - Implementing suitable security and access policies while storing and accessing the data.
You should always encrypt the data before storing it and always implement the correct levels of access controls so that only eligible users can access it.
Why is it essential - To ensure that there is no unauthorized access to your data and it is always safely stored & accessed by relevant users only. You don't want your Sales team to have access to your employee’s salaries!
Metadata Management
What it is - Storing & managing your metadata as a unified metadata repository (similar to handling your data at a central store)
Why is it essential - I keep on saying this in all my articles & talks.
Metadata is as crucial as your data.
A sound metadata management strategy can help you to discover & classify your data. Leverage metadata to help business users to understand the data and perform analysis
Master Data Management
What it is - These are processes to manage master data within your ecosystem. Master data are entities like customers or products. These are your business's core entities that need to be managed carefully.
Why is it important - You need to ensure that you don't have duplicates in your master data and should merge similar records for the same customer or product.
You should have a single consistent view of a customer who has purchased a "Home Loan" and also has a "Fixed/Term Deposit." MDM helps you to get a 360-degree view of your core entities.
Data Audit & Linage
What it is - Lineage is the ability to track your data from source to target. You should be able to investigate the complete data flow across various systems.
You should also be able to track any errors, rejections or data leakages that happened during the flow of data within the data pipeline.
Why is it essential - It helps in investigating any data issues and the exact components causing the issue.
Data Sharing
What it is - The ability to share your data with various stakeholders based on their role & relevance. There might be multiple data consumers with whom you will have to share data. In case you are monetizing your data then you would need a robust, scalable & secure data sharing mechanism
Why is it important - For avoiding data duplication and making data available quickly & securely for your data consumers.
While the above are various pillars of Data governance, there are various approaches for implementing Data Governance based on your organization's size and data strategy. Larger enterprises have a specialized program for implementing Data Governance and have a dedicated Data Governance Council to ideate, implement, manage and review the governance policies.
Smaller businesses can start by implementing some of the essential data quality, security and data access policies. As you grow, you can always improve your governance strategy and continue improving these.
That's it for today. I hope you have found this article helpful. Please do share and comment with your thoughts and let me know what I should write in my upcoming newsletters.
Have a great weekend!