This is the second edition of a two-part series on data governance. Read the first part on leveraging the potential of data with good governance, here.
Sound data governance will deliver organization-wide outcomes. With cataloguing and quality management, data governance ensures trust in data. It will reduce time to market by ensuring discoverability, transparency, accessibility and ease of use. It will also facilitate integrations to expand the partner ecosystem and participation in open data initiatives.
It goes without saying that data governance gives organizations the power to leverage their data efficiently, effectively and securely. But, implementing an organization-wide data governance program is no easy task. Based on our experience working with a leading Danish investment bank, we outline the top things to keep in mind while implementing a data governance program.
Step 1 - Begin with an 'as-is' state analysis and stakeholder engagement
Before even getting to the drawing board, understand the vision and the key drivers of your data governance program. Assess the current state before diving into any solutioning. Based on this understanding, conduct persona interviews with different data stakeholders and document their pain points and goals. Build a feature matrix that corresponds to expectations from the data governance solution along with priority and criticality of the features.
We segment our lessons and project expectations in the following way:
World-class customer experience
Alignment to data mesh
Total cost of ownership
Package and sell asset management products
Alignment of the current implementation and target data architecture
|Reduced operating costs|
Introduce subscription models to clients
Industrialize the wholesale offering
Step 2 - Select your tools
It is essential to choose tools that are contextually appropriate for the organization. While doing so, remember:
No one tool fits all requirements
Commercial tools have a long list of features, but more is not always merrier
For open-source tools, the community behind the tool, roadmap visibility, architecture, extensibility etc. are important
With that in mind, select your tools carefully:
- Analyse the current data ecosystem (source/analytical stores, data processing and pipelines, consuming applications etc.). Based on this analysis, list your non-negotiables and priorities. For instance, you might need a tool that can align with the data mesh principle or enable data office
- Identify candidate tools from open-source and commercial offerings. Perform the first-pass elimination using a feature matrix, reflecting the gap in the existing data ecosystem, stakeholders' priorities and feature expectations from the tool
- Finalize the toolkit based on secondary research, proofs-of-concept, workshops and demos and interactions with product vendors. Use predefined questionnaires to guide conversations
Step 3 - Implement the tools
We’d recommend the following steps:
Install and configure tools in the existing client ecosystem
Deploy the tool in the on-cloud or on-prem environment
Set up DevOps pipelines
Create a forking/synching strategy
Implement observability and alert mechanisms
While implementing the data catalogue, bring together the technical metadata, ownership, lineage and business glossary, pushing the metadata repository through defined API/interfaces and making it available through the discovery interface.
For data quality implementation:
Define DSL to facilitate data producers to create quality rules that are pushed to the respective domain repository
Build pipelines to provision the automated jobs, perform data quality checks and emit the results
Collect quality results/metrics, push to metrics stores and make them available to producers/consumers
Set up a culture of data domain ownership. Form a data governance committee with well-defined roles and responsibilities. Create domain ownership, where domain teams are responsible for owning and managing the data.
Step 4 - Empower your stakeholders
Upon successful implementation, your stakeholders — the consumers and data product owners — can perform specific functions.
Data product owner
Such a data workbench will enable domain teams to publish transparent and trustworthy data for consumers and pave the way to onboarding new partners with increased confidence.
Data governance is essential for any organization working with big data because its implications are broader than just technology. It is cross-cutting, involving data collection and storage, data security of explicit and derived personally identifiable (PII) data, management of consent, algorithmic design, product design, organisational incentives, etc.
This presents a unique opportunity to use technology and working models to eliminate risks and bias in data-driven solutions. Good solutions require diverse multi-disciplinary teams, tools that can be use autonomously, scalable governance models and more. Data mesh is a perfect fit for this use case.
Future of data governance: Data Mesh
Zhamak Dehghani, Director of Emerging Technologies and Founder of Data Mesh, defines it as a novel concept that embraces the ubiquitous data in an organization by the convergence of distributed domain driven architecture, self-serve platform design and product thinking with data. Below are the features that set it apart:
Data as product (process & data)
Data responsibility decentralized by domains
Data producers are data owners and empowered by self-service platform/capabilities)
Federated governance enables centralized data governance, data quality and data life cycle
To learn more about Data Mesh, click here.
This blog is a shorter version of a white paper from the 21st International Conference on Electronic Business (ICEB 2021).
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.