Data Mesh revisited

Podcast host Rebecca Parsons and Birgitta Böckeler | Podcast guest Zhamak Dehghani and Emily Gorcenski

December 15, 2022 | 49 min 40 sec

Listen on these platforms

Brief summary

Data Mesh is one of the most powerful and widely-discussed concepts to emerge from Thoughtworks in recent years. As the world becomes increasingly aware of the risks and challenges data can pose — from the perspective of both privacy and organizational effectiveness — it has only become more relevant.

In this episode of the Technology Podcast, Zhamak Dehghani (Thoughtworks alumnus and author of O'Reilly's Data Mesh: Delivering Data Driven Value at Scale) and Emily Gorcenski join Rebecca Parsons and Birgitta Böckeler to discuss Data Mesh's place in the industry today, more than three years on from the first time we discussed the topic on the podcast. Together they explore some of the challenges organizations face when adopting it and what its future looks like, as it continues to push the world to rethink data centralization.

Episode transcript

[Music]

Rebecca Parsons: Hello, everyone. My name is Rebecca Parsons. I'm the Chief Technology Officer for Thoughtworks, and I'd like to welcome you to yet another episode of the Thoughtworks Technology podcast. I'm joined today by one of my other co-hosts, Birgitta. Hi!

Birgitta Böckeler: Hi, my name is Birgitta Böckeler. I'm a Technical Principal working for Thoughtworks out of the Berlin, Germany, office.

Rebecca: We are joined today by two guests: one Zhamak Dehghani, who is the author of the book on Data Mesh. Welcome, Zhamak.

Zhamak Dehghani: Wonderful to be here, Rebecca.

Rebecca: And we're also joined by Emily Gorcenski. I'll let you introduce yourself actually, Emily.

Emily Gorcenski: Great. Thanks. It's good to be here. I am the service line leader for data and AI in Thoughtworks Germany.

Rebecca: Great. Although the book has been out for a while, we wanted to get together and talk about Data Mesh: how it's going, where it's going. So, Zhamak — why don't you start with how did Data Mesh come about?

Zhamak: Well, in fact, it all started around 2018, 2017 with seeing similar patterns with our clients. I was at the time at Thoughtworks and we were working with fairly technologically advanced organizations that had made quite substantial investments in their data and AI and had quite ambitious expectations in getting value from data, but the pattern that was coming across all of them that they were failing to materialize value. And you could measure that failure based on the disproportional ratio of cost to value, the amount of money they were spending and whether they were competing using that data or not, the long lead time to get value, agility, response to change, and growth of organizations that were very slow in getting value from data and resiliency. We're seeing a lot of organizations wasting a lot of time dealing with data pipeline issues or data reliability issues. Nobody was really happy, and everyone was looking for the next silver bullet.

At the time, I was relatively new to the big data analytics space. I had worked always in applications, data-driven systems, but not so much in the BI, I guess, area of analytics but I had seen how we had solved similar problems in the operational, transactional systems with embracing complexity, essentially. When I looked at this organization, what I realized was that the complexity of organization in terms of growth, in terms of speed of change, had essentially broken the patterns, and paradigms, and the ways that we were trying to get value from data using analytics.

The common, underlying characteristics in those paradigms or past paradigms like warehousing or lakehouse and the like — anything in between — where that was causing this, I guess failure modes were around centralization that we had assumed technology and organizationally. We have to centralize data. Everybody else does, whatever they're doing digital application development, and then we have data. We have it somewhere else — we have it on different platforms, we have it in a different team. That centralization was a point of big friction for actually being able to be fast and getting data.

The other assumption that we had made was this pipeline-oriented thinking. That data comes from sources and then some team puts them through ETLs or transformation pipelines, and then we put it in some storage, and then we layered with metadata, then we layered with governance, and finally pop out the other end some value form of applied machine learning model or analytics and reports…

As you can imagine, that processing itself creates this long lead time from data on one side and popping out value on the other side. It's very functionally divided. If you look at the technology teams around this pipeline model that the roles we've defined, we've defined a bunch of fractional roles. We find like software engineers on one end and then data engineers in the middle, and then analytics and ML engineers after that, and so on and so on.

These fractional roles, they're just doing a fraction of tasks to create that end-to-end value. That creates a very fragile system full of handoffs. That system of the world was in contrast to all the other movements that we had, mainly the operational world. That's where it came from. Distributed system thinking, microservices, and scaling digital development of applications in or complex organizations.

They were seeing these antipatterns and seeing how we had solved complexity at the heart of software led to coming up with a different approach, which was initially really a question like, why we do the same, these things and why we have made these assumptions and in challenging those assumptions, Data Mesh came to exist.

Rebecca: What exactly is Data Mesh?

Zhamak: Data Mesh, I guess, is a set of principles that try to address those fundamental challenges with the past paradigms. If you define at a high level, if you have one sentence definition, the definition that I use is a decentralized socio-technical approach to manage access and share data for analytical purposes at scale. At scale means in complex organizations within the boundaries of trust, within the organization, or across organizations globally.

That's the one line definition, but underlying that definition sits four principles that, or we're a direct response to those issues of centralization and pipelining and so on. We try to address — Data Mesh tries to address — the problem of centralization with decentralized ownership of data by the teams that have been formed around the access of business functions and business missions.

If organization that you have these domain teams that have organized themselves around the particular mission and outcome for the business, then those domain teams have accountability, of course, for the technology that leads toward that outcome working toward achieving that outcome. Then we will have teams that are managing the data. Domain-oriented ownership is addressing the centralization through decentralizing data ownership but now consequently, the architecture.

It addresses that long lead time, that pipeline thinking by reducing that lead time break by creating this localized domain-oriented data products. Going from data to value within a bounded context of, again, domain data set and encapsulating everything that is needed to serve that data in a trustworthy, discoverable, usable way and encapsulating in this concept of data product, which in fact emphasizes our attitude toward data sharing more and less about data or hoarding or data collecting.

That's the second pillar. The third and fourth pillar are really designed to address some of the constraints and issues that arise with this decentralized ownership and data sharing. One of them is addressing the feasibility. How can we make it feasible that digital teams or technology teams become owners of the data or managing this data? That's the self-serve data platform thinking, elevating the levels of abstraction to empower this autonomous team, lower the cognitive loads, and so on and so on.

That's the self-serve data platform. Then final pillar is around addressing any interoperability and security and all of these governance policies, related policies that we want to consistently apply to this now distributed system. Decentralization without harmony and interoperability is total chaos. We can't really get value. Again, bringing learnings from operational system, which we had to solve this problem with microservices, we had to solve this with multi-cloud deployments of applications, bringing those learnings into the world of data.

We define this computational federated governance, which is about automating policies that govern these data products at code. As code, being able embedding them into the data products and defining a governance model that by definition embraces this autonomous team decision-making and be able to federate it. A lot of decision-making locally into those teams rather than centrally create this system of control, which becomes another point of synchronization and bottleneck.

Birgitta: The governance one is always the one that stands out a bit to me, like where the others are. There are all these parallels to what you were saying before, the operational systems and how we solve the challenges there. Usually, on that side, we don't highlight the governance part as much, but it becomes even more important in data, but maybe also to our detriment, we don't look into the governance part as much on the operational side.

Zhamak: Actually, I love this point that you mention. We just don't use the word governance, we always say cross-functional concern. In fact, I got a criticism in one of the conversations that I had, I use the word cross-functional concerns. Concerns that we always embed in our applications. When we incept building an application, we always thinking like okay, what about security concerns? What about reliability? What about resiliency? We do always do this but because we don't think about it as a centralized control function, it's just embedded in everything we do, we don't use this, like big words like governance. That's just my opinion.

There could be other reasons for it. I think it's actually ever present in operational systems, it just hasn't been given weight and I think in data systems because it's never been present and embedded from left, it hasn't been pushed left being embedded into the inception of data and data sharing, and it's always a function that happens later than the track by a third party and centralized theme, so then this notion of governance and that language. In fact, I didn't want to use that word because of its connotations in Data Mesh, but I realized if I don't use that word then I lose a lot of people because it's also familiar, it was a pointed decision whether to use it or not.

Birgitta: The word governance?

Zhamak: Yes, absolutely.

Rebecca: Yes, it is considered a dirty word in many cases, in part, because some of the governance functions that we've dealt within the past have been just heavy-handed, and really disconnected from the realities.

Emily: I think it's also complex in the data space because you have different interpretations of what governance means depending on who you're talking to, and where they sit in the organization. The conventional view that governance in data is, it's like anything else. It's people, process and technology. It's very easy to get fixated on one of those things when that's your domain of your responsibility in the organization.

We tend to think of governance as solving things like metadata management, or building knowledge graphs, or implementing privacy management policies, or compliance policies, or things like that, depending on where we sit in New York but it's all of those things all together. When you have people working with data, and they're not connected to that, the source of those challenges, it's really hard to actually engineer a system that has governance of any sort at its core.

One of the challenges that I see all the time is companies are looking for a tool that will solve a problem. How do I deal with my PII? How do I deal with my GDPR compliance? What they tend to do is they tend to shy away from building the things that they need to drive their business forward, because they don't know how to implement the governance controls that they need to implement for whatever reason.

The irony of this is that they're not actually solving the problem by simply avoiding the problem, it doesn't mean if you deny access to data, that doesn't mean your data is secured, because eventually somebody comes in and overrides that process by escalating high enough that they get access to the data and then what happens is, now you've lost control of the data because now you don't know where it is. This pattern repeats itself over and over and over.

I think that one of the things I like about Data Mesh is it's much like security like DevSecOps or other security-in-our-DNA type of approaches to software development. You can't just have a security team validate all of the code, you need to bake it into everything that you're doing.

Rebecca: That's one concrete experience. I know, Emily, you've had a lot of experience from our data service line in applying these principles in that messy place called the real world. What are some of the stories that you have of trying to bring this idea to life and realize it within the bounds of a messy real organization?

Emily: I think the biggest challenge that we face with every client is that they see Data Mesh as a new platform architecture. They see it as a next variation on a kappa or lambda or data lake or lake house or whatever the architecture de jour is. And it is, sort of, in a way, there is a technical architecture that underlies it to some extent, but it's a different way of thinking.

The challenge that I see that most companies have is they start asking questions like, "Well, what does this mean for how many times I copy the data?" Or, "What does this mean about my data access?" Or, "How do I do these things that are very central paradigm ways of thinking about data? How do I get the data out of the Data Mesh thing?" These are the types of questions that you shouldn't even really try to answer them because it's a totally different paradigmatic way of thinking about how you solve problems with data, some of these questions actually become less relevant and trying to explain to somebody that has worked for several years on answering these questions and telling them, "Hey, actually, you don't need to ask this question anymore, or we're solving this problem for you implicitly through other things. I think that's one of the biggest challenges is getting the new mindset.

A really concrete example I had actually in a conversation today was somebody asking, "Well, how do we put a knowledge graph into this?" I was giving them a demo of a Data Mesh that we were building. I said, "Well, a knowledge graph is inherent in the Data Mesh, you have your input ports, your output ports. They connect to each other. Your data products form a graph. If you have a knowledge graph that is not isomorphic, then you're probably risking something because now you're saying that there's a semantic dependency or any other type of dependency between product contexts within the organization that is not reflected in your actual technical systems."

The whole question of a knowledge graph doesn't become as important because you're building it as you built the Data Mesh. For your knowledge graph, all you need is a nice tool that extracts it from the specifications and that's not that hard to build. It's really a mindset shift in thinking about that. We're used to thinking, "Okay, all my data is in this big monolith, now I have to put a tool on top of it, and that tool has to reach its tendrils in and do really clever things with algorithms or whatever to assemble this." No, we're not doing that anymore. We're building it as we go.

Zhamak: I've been like, violent! If people saw our video, I'm just shaking my head up and down because I could not agree more! If I can just summarize what Emily said is that people try to solve the past problems with the new paradigm. The new paradigm inherently builds in solutions for those past problems. Those are not your problems anymore. You have a new set of problems that you have to now think about to build on top of Data Mesh itself.

Those problems are not the problems you need to worry about anymore because inherently built-in — and knowledge graph is such a great example because knowledge graph becomes an emergent phenomenon from interconnectivity of your data product. You don't have to layer it, you don't have to maintain it, it emerges. It's by definition is built into how you define schema or its semantic models and how you define your lineage by these input-output ports.

And there's just so many examples of that like copying data is another one. Single source of truth is another one. Master data management is another one. Of course, that ultimate goal of having a consistent view of data when we query data is what we want but we don't want master data management as a way of getting it. A lot of the “hows” will completely look different. So I would say every time you try to get into the mindset, you ask yourself, what is the — usually, you're asking not the ultimate outcome, you're asking for an interim solution that you applied in the past? Like, how do I master data management?

I would encourage people to think about “what was master data management trying to achieve? What was the outcome?” Then how does Data Mesh make sure that outcome is achieved? Is it built in? Do we need to add on to it? Don't try to map your paths solutions into a.new paradigm, it just does not make sense. That's both from the system perspective but both from roles perspective as well, like a lot of the roles will dissolve or go somewhere else.

Emily: Yesterday, I was with a client; we had a very long discussion about Data Mesh. At some point, he says, "Oh, now I finally get it." We've been trying to think about how do we get data out of the system. What we need to be thinking about is how do we get value out of the system. I think that one of the things that Data Mesh has really made clear is that all of these old metaphors like “data's the new oil”, “data's the new gold” don't really work anymore, right? Data's not something that you're supposed to hold. It's not something that has value just because you have a lot of it. You have to use it to extract value. It's a medium for doing something; it's a medium for taking action. If you're worrying about just trying to get data, you're not actually — what do you have at the end of the day if you've got the data? You just have data. Now, you have to still do something on it, so you've got to do the work again.

If you start by focusing and saying, "Hey, actually you know what? I need to figure out how to get value out of this," and you use that as your driver, then everything makes sense, in how the Data Mesh maps to your organization and how it will fit in with your ways of working and all of those other things. I think that's been the central challenge that we have to face is how do we get people thinking about pulling value out.

Birgitta: Everything is always about the why isn't it? Why do we need master data management? Why do we need data in the first place? It's the easiest trick. Always asking why again, right? Yes.

Zhamak: Five times and ask it five times, right?

Rebecca: I think part of it is, too, we're looking for easy solutions. And so if I focus on the implementation, I at least have a chance of tricking myself into believing I'm making progress on my problem. As you pointed out, Zhamak, this is just the same kind of conversations we've had about architecture and integration and all of those other things over the years. We want to talk about implementations, and what we really need to be talking about is this is the outcome that I'm trying to achieve and this is where the value was going to be realized by me doing this thing. That value is tied to the outcome and it's also tied to the domain.

Zhamak: I think when we think about this trying to [chuckles] overfit or try to jam the past solutions into any completely new paradigm, it results in some antipatterns as well, that point you made about we are looking for easy solutions and the path of least resistance path creates quite harmful antipatterns that we spend a lot of money, a lot of time building something that feels like, okay, we made movement, we implemented a solution, but we're not really addressing those underlying whys and underlying challenges. I see some of that in the wild as well. For example, I partially blame vendors for this particular antipattern as well that we see people had built a pipelines of data movement and centralized the storage and application teams far, far away from the data.

The easiest way to use the language of Data Mesh without really changing, making big changes, or big shifts is that-- this data-as-a-product seems a great idea. We're just going to downstream apply it to the data that we were dumping in the lake, and then we create this nice clear, maybe cleansed or beautified or augmented with documentation and metadata and we call it a data product. You probably buy some new tools to do that, and you feel good about yourself, but at the end of the day, you have not removed the centralized points of friction that you had like your technology and the team.

You have not removed that pipeline's long lead time, fragile process of getting value from data. You have not addressed the siloing and you just use this, I don't know, language that feels warm and fuzzy. Yes, you might get some immediate incremental benefits in thinking about the data as a product, but those fundamental three signals in terms of you're still spending a lot of money, say how long lead times, see how fragile-- we're not really addressing those problems. That's just one of the antipatterns as a result of quickly adopting a language and taking the path of least resistance to build solutions.

Emily: I understand some of the temptation though that some people have. Because there are things about Data Mesh that are hard and that are scary, right? When we start talking about moving these responsibilities and having to have the capabilities to work on data, a lot of organizations take a step back. They say, "How are we going to support this?" They're not used to distributing data engineering skills into multiple domains and multiple teams. There's a new way of working that needs to come up with that.

That means that you can't just do Data Mesh and not also have these cross-functional capabilities with your software development practice, with your product engineering, and your product design practice. You do need to have a certain number of fluencies already under your belt before you can start saying, "Well, I'm going to start distributing the data ownership and things like that as well", because frankly, it's still hard to find people at scale that can do these things. I understand why people look for alternate solutions because they're working in a constraint set that is hard to adjust.

Birgitta: Then if we accept the premise that there is no miracle solution that will make all of these problems go away with zero effort, zero people management, how do you get started? If we can't just buy a tool and drop it in and say open sesame and a miracle occurs, how do we get started? When we're talking to people, what are those early days like when you're trying to start this mindset shift that we know needs to happen?

Zhamak: When we think about change, we're talking about change at different levels here; technology, architecture, process, people: the best way to create change is through movement. I think folks at Ideal had some writings about change through organizational change through movement and I think there's bigger literature behind it. Then what does it mean to create change through movement?

What it means is that you want to create structural, organizational, cultural, reward system. All of those organizational changes as you implement and deliver value using approach to database. Using the technology and architecture. This is-- we use different language to express this. I know at ThoughtWorks, we always-- I still say we. We always talk about used case-driven, thin slicing. What it means is that, let's create an end-to-end experience.

In the case of Data Mesh, end-to-end experiences data from a source to the point that data has been applied as either manually consumed piece of insight or automatically applied machine learning model that has been trained. Let's go through that whole end-to-end experience. Let's go through a simple perhaps end-to-end experience. Let's build all of those steps out.

Let's build the platform that needs to support that thin slice and create the change around teams, around people while you're making movement and creating that same slice. Start with smaller thin slices that gives you a space for exploration and figuring out what the right way is and then move on to the scale and exploitation where you figure out exactly how we're going to do the platform or build the platform and capabilities and change and then we want to scale it out and roll it out to the larger organization.

Then we optimize and hopefully, we harvest a lot of the investment that we may make early on. Change the organization through real use cases and implementation. Always be business-focused and business oriented. Do thin slices, do them iteratively. Rebecca, you talk about evolutionary approaches quite a lot in architecture. The same thing evolutionary approaches applies to organizational change. We cognizant of where you are in that curve of adoption. Again, I love Roger Everett's, diffusion of innovation curve of adoption where you consciously decide and understand who would be the first persona of users.

Who are those innovator adopters that would come through these early iterations? And then decide who are the middle majority and who are the laggers and don't optimize for the laggers perhaps early on when you build something, build for those innovator adopters. That's how you plan it. That's the summary of I guess a movement based for me a movement-based change creation.

Emily: Yes, I think when we look at the Data Mesh work that we've done I think that there's a couple of patterns that can lead to better chances of success. I think one of the risk areas that I see or the risky patterns that I see is trying to start too small. Data Mesh cannot just be built with — one product does not make a Data Mesh, right? You do need to have a minimal set of things that you need to start working on changing.

I really recommend looking at the initial steps as first steps in a larger-scale transformation. That means you need to look at your infrastructure, you need to look at your cloud strategy. You need to look at your software development strategy as part of a Data Mesh transformation. You also need to have a change management and then governance — there's that word again, governance — function that is supporting the evolution of the Data Mesh work within the organization.

And then, of course, you have to have people that can build data products. You need to have data product development going on. I think where we've seen it not be as successful is when we see companies trying to just say. Well, I'm just going to build a product and I'm going to see if this works. Frankly, you should be able to build a product that works. A data product is not a very complicated thing. It's data coming in doing some pipelines and then you have some access model, some output port for that data.

That's not where the magic comes from; the magic comes from when you have the ability to do this repeatedly, to do this autonomously, to do this in a governed and controlled manner. To do this in a way that is where you're bringing product thinking into how you build the product. That's where the magic comes in. Of course, that means that you need to have the platform tooling to do it. You need to have the right mindset within the organization.

You need to have people who are willing to try the approach knowing that it could fail but it's going to be the right thing to try anyways and to have that appetite within the organization. No change comes easy. If anyone's telling you that the change will be easy, they're trying to sell you something.

Zhamak: In fact, it's interesting because you said something there which I think reminds me of another antipattern which you said: "They said okay let's go build one data product." To me, that's an antipattern because actually, we don't want to build data product for the sake of data products. We want to build data products so that we can easily implement at scale these end analytical use cases.

Often, analytical use cases — whether you are doing retrospective trend analysis of what trend has happened and predicting what's going to come next, whether you're using statistical model, ML, or reports, even simple reports and dashboards — often they're not really confined or localized to a data product. Often those go across multiple data products, you need a view of multiple datasets being correlated and multiple teams sharing data with each other.

I do agree that if you actually thought about your end analytical use cases and then work — even for that very first thin slice — and then work backward, you find yourselves with multiple data products. If the end goal of the first little iteration is just a data product, we're not really thinking about it end-to-end. That again is, to me, reminiscent of this idea of more data the better. We need more data products that — it was like how many terabytes or how many tables we have, and now it's like how many data products we have. Again, we don't build products for the sake of building products. We build products so the users use those products and put them into some useful function, improve the way they work or live.

Again, measuring even from the beginning when you get started, measuring the success and progress based on the usage and based on this network effective exchange of value being this data product not just creation of it is key and having those goals set up early on.

Birgitta: Oh, that was actually one of my misconceptions in the beginning about Data Mesh that, in my mind, those data products were a lot bigger than they actually are when I look at all the examples that have been around now. I guess coming more from the operational software side, I always thought of a product as a much bigger thing so that was one of my realizations over time, learning more about Data Mesh.

Emily: I think the other important thing is that products evolve. Data products can evolve. One thing that I was talking about recently was the data that lives in the Data Mesh is not the data that is living in the systems, it's the data that comes out of the output port. If you don't have an output port for data, the data's not in the mesh as far as you're concerned.

It's very conceivable to build a data product that does some inference, that does some analysis and you might need to create intermediate steps to get to the analysis that you want to create. That's a very normal thing. It may be that at some point in your evolution you say actually, "Hey, this intermediate step that I'm doing is a data product that's valuable in and of itself.

Then you can carve that out. You can turn that into its own data product as you go. It's an evolutionary thing. The whole goal is to find the uses for these data as you're developing. It's not to say, we need to do this data model. Data modeling I think is almost something that becomes an afterthought. Everyone's like, "Oh, how do we do the data model?" You're like, "You don't, you build the data product. You transform the data the way that you need to. The model is going to fall out of it."

Then if you need to find how this model works with something else, you can iterate, but you're not going to come up with the full data model a prior. You're going to solve your problem and then after you solve your problem, there's a good chance he might find that something you did along the way has added downstream benefit.

I think that that is the power that we see. It's not this chaining of materializing views or whatever it is that we do in the data warehouse land where we are concerned about copies of data. Because then we're concerned that we start generating multiple sources of truth or whatever that we can't control, those problems just go away. They don't exist.

Rebecca: In an earlier conversation, Emily, one of the things you mentioned was a common question that you get asked as people are trying to figure out “okay, how do I make this thing called Data Mesh real within my organization?”

The question was: we're shifting all kinds of focus and responsibility away from the centralized data team back into the data producing teams, what's in it for them to do the work? Why should they care? That mandate used to sit in that central organization. It was very clear that was their responsibility. How do you respond to that?

Emily: I think there's a couple of different perspectives on how to respond to that. The nirvana perspective is that if we're all working in the Data Mesh, then the benefit of you doing the work is that you're also then able to benefit from others' contributions to the Data Mesh. So everyone's working in the same ecosystem. You have to play by the rules in order to be a participant in the ecosystem and therefore, it's better for you.

That's a hard sell at the beginning because the ecosystem doesn't yet exist and you're asking people to do a lot of work; so what is actually in it for them? My own take on this is that we actually vastly underestimate the value of our data and the usage of our data even in our current product development. Even when we are building things that are data-driven products like AI systems or recommendation engines or whatever. We really don't actually understand how powerful that data can be.

One of the things that I think is the reason for that is we're still used to this left-to-right thinking. We have data sources on the left, data producers on the right and data consumers on the right rather. We start with gathering the data, then we transform the data, then we push the data somewhere and we're not creating a feedback loop in that process. I think that what's in it for data-producing teams is that with Data Mesh, you're able to have the tooling to make generating those feedback loops easy.

Then you also have the ability to see what the downstream value of your data is so that you can actually design a better product. And so what this is actually giving you is deeper insights in the value and the secondary value of your product to other users within the Data Mesh. That helps you design things better. That helps you make a better thing whatever it is, the thing is that you are actually building. I think that what's in it for them is, "Hey, if I tell you build a better data-driven product", you're going to need to actually even look at the data that you're generating and use that to generate a feedback cycle. Data Mesh is going to help make that easier for you to do that. It's going to give you better tools to make that data accessible, that's going to have immediate benefits for you. Where this is a challenge is you need the product people to be thinking in these terms, to be thinking in terms of experimentation, hypotheses, and so forth. I do think it has immediate benefits, even for data producers.

Zhamak: I love that answer and I 100% agree with Emily's point of view. I think we share the same opinion. I would summarize it to say that even the question itself, it's again a past paradigm question. The fact that we are separating the role of a data producer from a data consumer, and we see a gap that’s a chasm between them — it's again, that left-to-right thinking.

I would say, try to get a data producer to become a-- finding the fastest path to go from a data producer becoming a data consumer as well, to have that intrinsic need to care about data. That's what we should work on and that's about what just Emily mentioned, education and empowering and perhaps augmenting teams internally to think about using data-driven approaches to building applications.

A common use case of data-driven applications are converting rule-based rules engines to more ML based, whether it's optimization, classification, like different types of applications because rules engines are often hard to maintain, false positive, false negatives, a lot of issues — so going to a team that has a rules engine and often thinking about how I'm going to change this to a data-driven.

The data producers providing the data for that rules engine becomes the data product produces. That's going to use a new ML-based data product internally and creating that faster feedback would be the best way to shift that question. I think the question is… the question itself has to go away!

Rebecca: The book is out, what's next? What's next for Data Mesh? If you didn't have to worry about finally delivering the book to the producer, to the publisher, what would you want to have added? Where do you think the holes are? Do you think we need to actually apply this a few more times before we start to see what are the things that people are struggling, where are those gaps that maybe lots of people are struggling but they shouldn't because we've identified some patterns that people can fall from. Where's this all going?

Zhamak: I think we're really at the earliest stages. I think Data Mesh became too popular too quickly and even though that are many organizations are implementing it, they're thinking about implementing it. We have to accept as an industry that this is still an evolving paradigm and we have to figure out the best practices. Or best practice is not the right word, but suitable or default practices for your type of organization and evolving those and I defer to Emily perhaps share a bit more knowledge because she's embedded in applying it in organizations.

The area that I've redirected my energy and focus is building those catalyzing technologies for change. As in right now, we are still in an era where we want to create this new paradigm. We want to reshape people's behavior, we want to create an almost new persona that is this data product producer or user/consumer that's neither an app developer nor a data engineer. So we’re almost creating a new persona with a new behavior. I think an area that needs to be developed and thought through are these catalyzing technologies that are natively built with this model in mind.

Right now, we are duct-taping and integrating the past paradigms technology in this new way and that's costly, that difficult. That's one area. Again, just going through the example of data catalogs, there is a ton of money and investments and so many companies building data catalogs that have knowledge graphs and that have this and that. It's actually those tools become almost irrelevant, as Emily mentioned.

Once you build intentional metadata sharing, intentional schema sharing, all of those issues go away because your catalog becomes a very, frankly, a very simple tool surfacing this emergent set of information. I think in looking at this whole paradigm and say, what are the initial technologies to empower these new roles and nurture this type of new behavior in one area — but I'm curious what Emily thinks in terms of practices that are evolving.

Emily: I think that there's two sides of this of where this goes next, and because Data Mesh came in at this point between being something super avant-garde and also being a solution for very common problems. It's hovering in between and it's going to grow in both directions. One of the things that I see all the time is things like: "Okay. We like the theory, but the reality is our data comes into an SAP system. We don't have microservices teams on the other side of that system. What do we do about this? How do we do Data Mesh when we have this?"

They're not going to uninstall SAP or whatever. We need to come up with ways to answer some of these common questions in a repeatable way and we've come up with some of those solutions so far. I think that there's a tooling and a basic technology, fluency that's going to be developed over time. Tools are great, and I like to see people building tools that support Data Mesh in better and better ways. Where I'm excited here is to see things like better identity management native to the tools or native to the cloud, the cloud technologies, and stuff that and you really need to make Data Mesh work. I think that's going to happen.

The other thing I think is actually more on the avant-garde side. This is where I've started trying to project out what's it going to look like in 5 years, 10 years because every company I talk to is whether they're convinced Data Mesh is the right solution or not, is moving towards decentralization, whether they know it or not. They all know that centralization is a problem. They're all looking for different solutions on how to decentralize.

There are very few organizations that are still trying to pull all of their stuff into one big centralized place. This is where I think that there's a lot of emerging tech that is going to start mixing together with Data Mesh. We haven't really talked a lot about things like zero trust architectures even within an organization. We haven't talked about data sovereignty across organizational boundaries and how do we solve those problems. How do we do data sharing?

In some in consortium-type organizations with Data Mesh, I think those are the real challenges that are going to be driving businesses in the next several years and Data Mesh is or data is going to be a part of that. I very much hope that Data Mesh also becomes a part of that. Those are even harder challenges to solve to some extent because it's hard enough to change one organization; how are you going to change 10 in a consortium to work in this way? You have to get the fluency there to make it the natural and easy solution.

Birgitta: Once we see the anti-Data Mesh lobbying from data catalog companies, then we know we're making progress… [Laughter]

Zhamak: Yes, at the moment is the opposite. In fact, a lot of data catalog companies can sprinkle some Data Mesh magic dust on top and suddenly your data sets are data products, and so on.

Rebecca: Everyone, this has been great fun as always. Thank you Zhamak. Thank you, Emily, and thank you Birgitta for the lively discussion around Data Mesh. Thank you all for our listeners for listening to another addition of the Thoughtworks Technology podcast.

Zhamak: Thank you for having me.

Emily: Thank you.

Scott Shaw: Join us on our next ThoughtWorks technology podcast, where we'll be talking about shifting accessibility left for more equitable technology.

[Music]

[END OF AUDIO]

View less

More episodes

Episode name

Published

Why the tech industry needs Expert Generalists

July 10, 2025

The three new fallacies of distributed computing

June 26, 2025

MCP and SRE: Why the future of IT operations is agent-driven

June 12, 2025

Unpacking Google I/O 2025

May 29, 2025

Accelerating mainframe modernization using generative AI

May 15, 2025

Exploring the fundamentals of software engineering

May 01, 2025

Themes in Technology Radar Vol.32

April 17, 2025

We need to talk about vibe coding

April 02, 2025

Infrastructure as code in 2025

March 20, 2025

How fitness functions can help us govern and measure AI

March 06, 2025

Architecture as code

February 19, 2025

Decoding DeepSeek

February 06, 2025

AI testing, benchmarks and evals

January 23, 2025

Exploring the intersections of software architecture

January 09, 2025

Who should make software architecture decisions?

December 26, 2024

Generative AI's uncanny valley: Problem or opportunity?

December 12, 2024

Using generative AI for legacy modernization

November 28, 2024

Data contracts: What are they and why do they matter?

November 14, 2024

Themes from Technology Radar Vol.31

October 17, 2024

Build Your Own Radar: Using the Technology Radar as a governance tool

October 03, 2024

Exploring DuckDB: A relational database built for online analytical processing

September 19, 2024

Software service granularity: Getting it right

September 05, 2024

Measuring developer experience

August 22, 2024

How can AI support designers?

August 08, 2024

Sensible defaults: A way to think about our technology practices

July 25, 2024

Tracking technology stacks, practices and experiences across teams

July 11, 2024

Inside Bahmni: An open-source digital public good

June 27, 2024

How to assess your organization's security maturity

June 13, 2024

Continuous delivery vs. continuous deployment: What should be the default?

May 30, 2024

Themes from Technology Radar Vol.30

May 16, 2024

Building at the intersection of machine learning and software engineering

May 02, 2024

Refactoring with AI

April 18, 2024

How to measure your cloud carbon footprint

April 04, 2024

Technology through the Looking Glass: Preparing for 2024 and beyond

March 21, 2024

Diving head first into software architecture

March 07, 2024

Exploring the building blocks of distributed systems

February 22, 2024

Software-defined vehicles: The future of the automotive industry?

February 08, 2024

Beyond the DORA metrics: Measuring engineering excellence

January 25, 2024

Asynchronous collaboration: Getting it right

January 11, 2024

Looking back at key themes across technology in 2023

December 28, 2023

Leveraging generative AI at Bosch

December 14, 2023

Jugalbandi: Building with AI for social impact

November 30, 2023

AI-assisted coding: Experiences and perspectives

November 16, 2023

What's it like to maintain an award-winning open source tool?

November 02, 2023

Engineering platforms and golden paths: Building better developer experiences

October 19, 2023

Managing cost efficiency at scale-ups

October 03, 2023

Exploring SQL and ETL

September 21, 2023

Driving innovation in radio astronomy

September 07, 2023

XR with impact: Building experiences that drive business value

August 24, 2023

Leadership styles in technology teams

August 10, 2023

Making design matter in technology organizations

July 27, 2023

Generative AI and the future of knowledge work

July 13, 2023

Scaling mobile delivery

June 29, 2023

Making privacy a first-class citizen in data science

June 15, 2023

Multi-cloud: Exploring the challenges and opportunities

June 01, 2023

Scaling up at Etsy

May 18, 2023

TinyML: Bringing machine learning to the edge

May 04, 2023

The weaponization of complexity

April 20, 2023

How we put together the Technology Radar

April 06, 2023

Inside India's Drug Discovery Hackathon

March 23, 2023

Serverless in 2023

March 09, 2023

My Thoughtworks journey: Rebecca Parsons

February 23, 2023

How to tackle friction between product and engineering in scale-ups

February 09, 2023

6 key technology trends for 2023

January 26, 2023

Tackling system complexity with domain-driven design

January 12, 2023

Shifting left on accessibility

December 29, 2022

Data Mesh revisited

December 15, 2022

Low-code/no-code platforms: The 10% trap and the limits of abstractions

December 01, 2022

Welcome to the fediverse: Exploring Mastodon, ActivityPub and beyond [Special]

November 24, 2022

Rethinking software governance: Reflecting on the second edition of Building Evolutionary Architectures

November 17, 2022

Reckoning with the force of Conway's Law

November 03, 2022

Exploring the Basal Cost of software

October 20, 2022

Why full-stack testing matters

October 05, 2022

Acknowledging and addressing technical debt in startups and scale-ups

September 22, 2022

XR in practice: the engineering challenges of extending reality

September 08, 2022

Agent-based modelling for epidemiology: EpiRust and BharatSim

August 19, 2022

Mastering architectural metrics

August 12, 2022

Building a culture of innovation

July 28, 2022

Starting out with sensible default practices

July 14, 2022

Better testing through mutations

June 30, 2022

Patterns of legacy displacement — Part two

June 16, 2022

Patterns of legacy displacement — Part one

June 02, 2022

Mitigating cognitive bias when coding

May 19, 2022

Following an usual career path: from dev to CEO

May 05, 2022

Software engineering with Dave Farley

April 21, 2022

Tackling bottlenecks at scale-ups

April 07, 2022

Coding lessons from the pandemic

March 24, 2022

Is there ever a good time for a code freeze?

March 10, 2022

Navigating the perils of multicloud

February 25, 2022

Compliance as a product

February 10, 2022

The big five tech trends for 2022

January 27, 2022

Fluent Python revisited

January 13, 2022

Creating a developer platform for a networked-enabled organization

December 30, 2021

The art of Lean inceptions

December 16, 2021

The hard parts of data architecture

December 02, 2021

TDD for today

November 18, 2021

You can't buy integration

November 04, 2021

The rise of NoSQL

October 21, 2021

The hard parts of software architecture

October 07, 2021

Machine learning in the wild

September 24, 2021

Delivering innovation at scale

September 09, 2021

Jim Highsmith: a 54-year agile journey

August 26, 2021

Securing the software supply chain

August 12, 2021

Making retrospectives effective — and fun

July 22, 2021

Patterns of distributed systems

July 08, 2021

Refactoring databases — or evolutionary database design

June 24, 2021

Making developer effectiveness a reality

June 10, 2021

Team topologies and effective software delivery

May 20, 2021

How green is your cloud?

May 07, 2021

Green software engineering

April 22, 2021

Twenty years of agile

April 08, 2021

Talking with tech leads with Pat Kua

March 25, 2021

My Thoughtworks Journey: Patricia Mandarino

March 11, 2021

Exploring infrastructure as code

February 25, 2021

XR in the enterprise

February 11, 2021

Getting to grips with data visualization

January 21, 2021

Computational notebooks: the benefits and pitfalls

January 07, 2021

The architect elevator

December 24, 2020

The future of Clojure

December 10, 2020

The future of digital trust

November 27, 2020

Integration challenges in an ERP-heavy world — Pt 2

November 12, 2020

Democratizing programming

October 28, 2020

Integration challenges in an ERP-heavy world

October 16, 2020

Models of open sourcing software

October 01, 2020

Applying software engineering practices to data science

September 17, 2020

Using visualization tools to understand large polyglot code bases

September 03, 2020

Machine learning in astrophysics

August 20, 2020

Programming languages geek out

August 06, 2020

Observability does not equal monitoring

July 23, 2020

Working with 50% of code in the browser

July 09, 2020

Realising the full potential of CD

June 25, 2020

Testing the user journey

June 12, 2020

Continuous delivery in the wild

June 01, 2020

Lessons from a remote Tech Radar

May 13, 2020

The future of Python

April 30, 2020

A sensible approach to multi-cloud

April 17, 2020

Digital transformation: a tech perspective

April 02, 2020

IT delivery in unusual circumstances

March 20, 2020

Continuous delivery for today's enterprise

March 06, 2020

Fundamentals of Software Architecture

February 21, 2020

Cloud migration — part two

February 10, 2020

The price of reuse

January 24, 2020

Towards self-serve infrastructure

January 13, 2020

Martin Fowler: my Thoughtworks journey

December 27, 2019

Building an autonomous drone

December 13, 2019

Cloud migration is a journey not a destination

November 28, 2019

Getting to grips with functional programming

November 14, 2019

Compliance as code

November 01, 2019

Data meshes: a distributed domain-oriented data platform

October 18, 2019

Edge — a guide to value-driven digital transformation

October 04, 2019

Tech choices: CIO or CTO?

September 20, 2019

Microservices as complex adaptive systems

September 05, 2019

Supporting the Citizen Developer

August 22, 2019

Getting hands-on with RESTful web services

August 08, 2019

Zhong Tai: innovation in enterprise platforms from China

July 25, 2019

What’s so cool about micro frontends?

July 11, 2019

Unravelling the monoglot monopoly

June 27, 2019

Breaking down the barriers to innovation

June 13, 2019

Delivering strategic architectural transformation

May 30, 2019

Exploring programming languages via paradigms vs labels

May 16, 2019

Multicloud in a regulated environment

May 03, 2019

Can DevSecOps help secure the enterprise?

April 18, 2019

A11Y — Making web accessibility easier

April 04, 2019

Continuous delivery for modern architectures

March 21, 2019

Delivering developer value through platform thinking

March 07, 2019

Architectural governance: rethinking the Department of ‘No’

February 21, 2019

Serendipitous Events

February 08, 2019

Diving into serverless architecture

January 24, 2019

Seismic Shifts

January 10, 2019

Understanding bias in algorithmic systems

December 28, 2018

Microservices: The State of the Art

December 14, 2018

Evolving Interactions

November 29, 2018

The state of API design

November 15, 2018

How we build the Tech Radar

November 01, 2018

IoT Hardware

October 18, 2018

Continuous Intelligence

October 04, 2018

Distributed systems antipatterns

September 13, 2018

Agile Data Science

August 23, 2018

Solutions

Industries

Resource Hubs

Publications and Tools

All Insights

Brief summary

Episode transcript

Find out what's happening at the frontiers of tech