Data mesh marks a welcome architectural and organizational paradigm shift in how we manage big analytical data. The paradigm is founded on four principles: (1) domain-oriented decentralization of data ownership and architecture; (2) domain-oriented data served as a product; (3) self-serve data infrastructure as a platform to enable autonomous, domain-oriented data teams; and (4) federated governance to enable ecosystems and interoperability. Although the principles are intuitive and attempt to address many of the known challenges of previous centralized analytical data management, they transcend the available analytical data technologies. After building data mesh for multiple clients on top of the existing tooling, we learned two things: (a) there is a large gap in open-source or commercial tooling to accelerate implementation of data mesh (for example, implementation of a universal access model to time-based polyglot data which we currently custom build for our clients) and (b) despite the gap, it's feasible to use the existing technologies as the basic building blocks.
Naturally, technology fit is a major component of implementing your organization's data strategy based on data mesh. Success, however, demands an organizational restructure to separate the data platform team, create the role of data product owner for each domain and introduce the incentive structures necessary for domains to own and share their analytical data as products.
Data mesh is an architectural and organizational paradigm that challenges the age-old assumption that we must centralize big analytical data to use it, have data all in one place or be managed by a centralized data team to deliver value. Data mesh claims that for big data to fuel innovation, its ownership must be federated among domain data owners who are accountable for providing their data as products (with the support of a self-serve data platform to abstract the technical complexity involved in serving data products); it must also adopt a new form of federated governance through automation to enable interoperability of domain-oriented data products. Decentralization, along with interoperability and focus on the experience of data consumers, are key to the democratization of innovation using data.
If your organization has a large number of domains with numerous systems and teams generating data or a diverse set of data-driven use cases and access patterns, we suggest you assess data mesh. Implementation of data mesh requires investment in building a self-serve data platform and embracing an organizational change for domains to take on the long-term ownership of their data products, as well as an incentive structure that rewards domains serving and utilizing data as a product.
Data mesh is an architectural paradigm that unlocks analytical data at scale; rapidly unlocking access to an ever-growing number of distributed domain data sets, for a proliferation of consumption scenarios such as machine learning, analytics or data intensive applications across the organization. Data mesh addresses the common failure modes of the traditional centralized data lake or data platform architecture, with a shift from the centralized paradigm of a lake, or its predecessor, the data warehouse. Data mesh shifts to a paradigm that draws from modern distributed architecture: considering domains as the first-class concern, applying platform thinking to create a self-serve data infrastructure, treating data as a product and implementing open standardization to enable an ecosystem of interoperable distributed data products.