Part two: the 4 step framework for federated data governance

Part two: the four step framework for federated data governance

Divya Joshi ,

Podila Madhusudhana Rao and

Sheetal Pratik

Published: October 25, 2021

This is the second edition of a two-part series on data governance. Read the first part on leveraging the potential of data with good governance, here.

Sound data governance will deliver organization-wide outcomes. With cataloguing and quality management, data governance ensures trust in data. It will reduce time to market by ensuring discoverability, transparency, accessibility and ease of use. It will also facilitate integrations to expand the partner ecosystem and participation in open data initiatives.

It goes without saying that data governance gives organizations the power to leverage their data efficiently, effectively and securely. But, implementing an organization-wide data governance program is no easy task. Based on our experience working with a leading Danish investment bank, we outline the top things to keep in mind while implementing a data governance program.

Step 1 - Begin with an 'as-is' state analysis and stakeholder engagement

Before even getting to the drawing board, understand the vision and the key drivers of your data governance program. Assess the current state before diving into any solutioning. Based on this understanding, conduct persona interviews with different data stakeholders and document their pain points and goals. Build a feature matrix that corresponds to expectations from the data governance solution along with priority and criticality of the features.

We segment our lessons and project expectations in the following way:

Business

Process

Technical

Financial

World-class customer experience

Federated governance

Alignment to data mesh

Total cost of ownership

Package and sell asset management products

Self-serve platform

Alignment of the current implementation and target data architecture

Reduced operating costs

Introduce subscription models to clients

Industrialize the wholesale offering

Step 2 - Select your tools

It is essential to choose tools that are contextually appropriate for the organization. While doing so, remember:

No one tool fits all requirements
Commercial tools have a long list of features, but more is not always merrier
For open-source tools, the community behind the tool, roadmap visibility, architecture, extensibility etc. are important

With that in mind, select your tools carefully:

Analyse the current data ecosystem (source/analytical stores, data processing and pipelines, consuming applications etc.). Based on this analysis, list your non-negotiables and priorities. For instance, you might need a tool that can align with the data mesh principle or enable data office

Identify candidate tools from open-source and commercial offerings. Perform the first-pass elimination using a feature matrix, reflecting the gap in the existing data ecosystem, stakeholders' priorities and feature expectations from the tool

Finalize the toolkit based on secondary research, proofs-of-concept, workshops and demos and interactions with product vendors. Use predefined questionnaires to guide conversations

Step 3 - Implement the tools

We’d recommend the following steps:

Install and configure tools in the existing client ecosystem
Deploy the tool in the on-cloud or on-prem environment
Set up DevOps pipelines
Create a forking/synching strategy
Implement observability and alert mechanisms

While implementing the data catalogue, bring together the technical metadata, ownership, lineage and business glossary, pushing the metadata repository through defined API/interfaces and making it available through the discovery interface.

For data quality implementation:

Define DSL to facilitate data producers to create quality rules that are pushed to the respective domain repository
Build pipelines to provision the automated jobs, perform data quality checks and emit the results
Collect quality results/metrics, push to metrics stores and make them available to producers/consumers

Set up a culture of data domain ownership. Form a data governance committee with well-defined roles and responsibilities. Create domain ownership, where domain teams are responsible for owning and managing the data.

Step 4 - Empower your stakeholders

Upon successful implementation, your stakeholders — the consumers and data product owners — can perform specific functions.

Consumer

Data product owner

Discover data products
View metadata
View data lineage to understand the data flow better
View quality snapshot along with metadata
Access quality notifications/alerts for the datasets of interest

Link data products to the business terms, encouraging consumption
Avoid data inconsistencies
Define information classification at the data element level
Profile the data before onboarding

Such a data workbench will enable domain teams to publish transparent and trustworthy data for consumers and pave the way to onboarding new partners with increased confidence.

Data governance is essential for any organization working with big data because its implications are broader than just technology. It is cross-cutting, involving data collection and storage, data security of explicit and derived personally identifiable (PII) data, management of consent, algorithmic design, product design, organisational incentives, etc.

This presents a unique opportunity to use technology and working models to eliminate risks and bias in data-driven solutions. Good solutions require diverse multi-disciplinary teams, tools that can be use autonomously, scalable governance models and more. Data mesh is a perfect fit for this use case.

Future of data governance: Data Mesh

Zhamak Dehghani, Director of Emerging Technologies and Founder of Data Mesh, defines it as a novel concept that embraces the ubiquitous data in an organization by the convergence of distributed domain driven architecture, self-serve platform design and product thinking with data. Below are the features that set it apart:

Data as product (process & data)
Data responsibility decentralized by domains
Data producers are data owners and empowered by self-service platform/capabilities)
Federated governance enables centralized data governance, data quality and data life cycle

To learn more about Data Mesh, click here.

This blog is a shorter version of a white paper from the 21st International Conference on Electronic Business (ICEB 2021).

Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.