There's one particular question that has been coming up a lot at the moment: what tech should be used for Data Mesh? People often ask whether Databricks is a good choice, or whether they should use AWS, Snowflake or open source options. However, just as there’s no right tech for Microservices, there’s also no right tech for Data Mesh. That means that while this blog post won’t provide you with a shopping list of technologies, it will offer you some help in understanding what tech is out there — and how you should go about evaluating it for your mesh implementation.

Data Mesh is a paradigm, not a solution architecture

Different organizations will have different Data Mesh implementations supported by different architectures. In short, Data Mesh is a style, not a single architecture. That means there’s more than one way of architecting a Mesh on AWS or on Azure or Google. A good basic picture for Mesh is ‘like microservices, but for analytical data’.

This means there is no simple list of technologies that will let you start doing Data Mesh. As we’ll see, there are some useful tools, But rather than diving straight into the tools, it’s best to consider the characteristics or capabilities of a Data Mesh — that will help us understand what kinds of tools are needed and for what purpose.

The easiest way to do this is to begin with the core principles of Data Mesh and to consider what these principles mean from a technology perspective:

Domain ownership

This requires data products to be divided according to clear value streams rather than technical boundaries. It’s also essential that every individual data product team is to be able to look after their own pipelines and policies, as well as the data storage and output ports (such as APIs).

Data as a product

Consumers of data want the data in the form that works for them. Teams need to be able to transform and distribute the data in ways that delight the consumers of the data. It’s for this reason that the data as a product principle requires a polyglot ecosystem that is flexible to the demands of the data products. This demands flexibility and can constrain your tech choices — while an opinionated off-the-shelf solution may look great, it could be restrictive.

Self-serve data platform

Teams shouldn’t have to constantly reinvent the wheel when it comes to infrastructure — that wastes their time and energy and keeps them from focusing on building great data products. A self-serve platform empowers and supports developers so that tasks like provisioning are taken care of.

Federated computational governance

Data Mesh products should have some level of interoperability — this means we need to ensure that distributed ownership is balanced with standardization. There are two key reasons for this: the first is to make data products more discoverable inside an organization, and the second is to guarantee and maintain certain quality, interoperability and security standards.