Data meshes: a distributed domain-oriented data platform

Podcast host Mike Mason and Neal Ford | Podcast guest Zhamak Dehghani and Ken Collier

October 18, 2019 | 31 min 37 sec

Read transcript

Listen on these platforms

Brief summary

In our latest episode, we explore the ideas of data meshes, an alternative approach to serve and service data organizationally. Our regular co-hosts Mike Mason and Neal Ford talk to Ken Collier, Head of Data Science and Engineering at Thoughtworks, and Zhamak Dehghani, one of our regular co-hosts and also a Principal Consultant, with a focus on distributed systems architecture.

Podcast Transcript

Mike Mason:

Hello everyone and welcome to the Thoughtworks Podcast, my name is Mike Mason.

Neal Ford:

And I'm another of your regular hosts, Neil Ford, and we're joined today by two of our colleagues, I'll let them introduce themselves.

Ken Collier:

I'm Ken Collier, I'm the head of Data Science and Data Engineering at Thoughtworks.

Zhamak Dehghani:

Hi everyone, I'm Zhamak Dehghani, and I'm a Technical Principal at Thoughtworks from San Francisco.

Mike Mason:

And of course you might recognize a Zhamak as one of the hosts of the podcast, but in fact today we have her as a guest, so we're going to be picking her brains.

Neal Ford:

That's right, we're into one of her areas of domain expertise, and so we're going to be talking today about data and data architecture, and in particular, the ideas around what's the next generation architecture beyond the Data Lake.

Neal Ford:

So many of you may be familiar with the concept of Data Lake that Martin Fowler writes about in his website, but we've been doing some thinking about what goes beyond that, and that's what Ken and Zhamak are here to talk about.

Zhamak Dehghani:

Thank you for having us.

Neal Ford:

So, what's the problem we're trying to address here?

Ken Collier:

So data, I'll jump in a little bit because I am an old guy who goes way back in data.

Ken Collier:

Data management and data architectures have largely been a centralized focus, so from Enterprise data warehousing, to even data-marts and Kimbell-style architectures.

Ken Collier:

We've continued to focus on centralizing data. In 2009 James Dixon introduced the concept of Data Lake, which captured everyone's attention and got everybody's imagination working.

Ken Collier:

Largely, Data Lake architectures have followed the paradigm of collating data, harmonizing data in one central, or a few central places. So Zhamak has introduced some new ideas that I think are very exciting

Zhamak Dehghani:

And I think in fact, I don't go back so much in data and I don't have that historical background, but what I've noticed working with our clients over the course of the last two years, is that there are many failure modes to building big data architectures or big data platforms.

Zhamak Dehghani:

We have customers and clients that are stuck in building, designing the Data Lake that never realize any value. We have customers that have invested immensely in data warehouse and proprietary hardware and software and they don't get the value they want, so there is problems with boot-strapping, there are problems with scaling and are problems with actually getting value from the investments of big data.

Zhamak Dehghani:

So that led to my curiosity to look under the hood to see what's going on and bring some of the observations and thinking from the operational systems, and how the operational systems and architectures that large, has evolved over the last 10 years and kind of bring that thinking to the world of data.

Ken Collier:

And a lot of those failure modes that Zhamak talks about are pretty well understood in the data community; trying to consolidate data from a broad set of disparate source systems is complicated, trying to do it for all imaginable use cases is nearly impossible, trying to manage all of your transformation logic in batch jobs or even streaming jobs is overwhelming, and for data teams that are only about 13 to 20 people, there's just a number of friction points and bottle-necks.

Mike Mason:

So help me out here, cause I thought the part of the premise of the Data Lake was that because you could store the raw data, you could do all that kind of sorting everything out later on in the process, because you were capturing everything, you could then figure out, okay, we do need to do this harmonization or we need to build a Lake Shore Mart for a particular purpose.

Mike Mason:

That seemed promising to me because you could do just enough of that work, are you saying even that has not really worked out as we'd hoped.

Ken Collier:

So it's gotten better, and in fact over the last several years, I've been talking a lot about keeping data in its raw native state until the last responsible moment, applying business logic and transformation logic as close to the business as possible, that's a fundamental shift from the way that data warehousing has been done in the past.

Ken Collier:

It's been an improvement, I think it certainly has been more helpful in reducing friction, but we still have a number of conversations with IT leaders who have a fairly vague sense that there's a lot of use cases that need to be supported, and many of those use cases come from very desperate, "end users", so whether it's marketing or finance or supply chain, one centralized data management platform to do all of those things, kind of doesn't make sense.

Ken Collier:

So I think Zhamak has a good point of view on this.

Zhamak Dehghani:

I think Mike, when you just mentioned Data Lake, you say what the premise was that we just store data from everywhere. Understated, we're just going to store data from everywhere, is part of the problem.

Zhamak Dehghani:

So one side of the problem, as Ken mentioned, is responding to a very diverse set of use cases and supporting accessibility to those, but the other point of the friction, and point of kind of problem for scale, is getting data, or making data ubiquitously available, from all these diverse domains by a centralized team who doesn't know about these domains, doesn't intimately understand what the data means, what the business represents, how the business is represented through those data sets or data events, and try to capture that, keep that up to date, keep up with changes, and I think that's flawed, data has to be, I think we should accept the reality, that data is ubiquitous, and the people who are responsible for operational domains at the point of generation of that data, should think about that data as an asset for the rest of the organization to consume.

Zhamak Dehghani:

So the distribution of the ownership is a fundamental shift to think about, whether it's the ownership of the native data or the raw data at the point of origin, or ownership of the data that is aggregated and modeled at the point of consumption.

Zhamak Dehghani:

So try to kind of distribute that across and make that fabric of the organization, as opposed to this Data Lake thing on the side.

Ken Collier:

And there's an interesting pattern that I think can carry us forward, if we, going back to data warehousing and Data Lake, there is the need to ingest data from sources, there's the need to transform or process that data, and there's the need to serve that data up.

Ken Collier:

So that's a three-tiered architecture that is not new, it's that sort of thinking we've been doing for a long time, the challenge or I think this new thinking, that is more of a federated approach, enables that sequence of steps, to be broken down at a domain level, and those data domains can take care of their own ingestion of whatever upstream data are needed, take care of their own processing specific to that or relative to that domain of data, and serve up the data for the consumers that are going to be using that data.

Ken Collier:

Now when you package up lots of those small domain data products, or domain data nodes in a mesh, now you can start to isolate and encapsulate the work, the governance, and the trustworthiness of the data, so that you don't have it all trying to be done in one large Lake or one collection of data.

Neal Ford:

Well, and that's exactly what Zhamak was alluding to earlier, when she was talking about taking modern thinking in architecture, and applying it to the world of data, and this whole domain versus technical partitioning stuff, right?

Zhamak Dehghani:

Yeah, absolutely, I think we've seen this in operational systems that when the technology is new, especially in the maturity of a trend, we put technology and techniques and tooling, at the top of thinking and they become the first class concerns around which we decompose.

Zhamak Dehghani:

So for example, we created layered enterprise architectures with the customer touch-based applications as one layer, and business process as another, and centralized databases with DBA's as another layer, maybe because the database technologies, those business process modeling technologies, were kind of new, so that in creating boundaries of architecture around the technology, was the main concern.

Zhamak Dehghani:

And with micro services we realized that is actually not an efficient way, or effective way of decomposing architecture, because the change actually requires to go across all those layers, so the axis of changes are orthogonal to the axis of the architectural decomposition, and we changed it around and said, "let's localize functionality, to where the change happens," and we realized that this concept of domains with Eric Evans, thinking around domain driven design, was a nice boundary for localizing and creating the components of the distributed architecture, and we still have layered architecture, as internal implementations of those services in the operational world.

Zhamak Dehghani:

So I think the same thing is happening in the data world that we are, as Ken was mentioning, that the pipelines, and anytime you talk to a data engineer, and you talk about their architecture, they talk about data pipelines. It's a 90 degree rotation of a layered architecture, but horizontally a pipeline of ingestion, transformation and serving, but I think it's because the technology is still new, so we're focusing on challenges around technology and optimizing our ingestion services, optimizing our surveying services, as opposed to consider that a second class concern, break the data around domains, as Ken was mentioning, and still have data pipelines as a second class kind of decomposition layer as an implementation of data domains.

Neal Ford:

At the architectural level, we refer to this as the top level partitioning, whether it's a technical top level partitioning, like a layered architecture or domain at the top level, and so what you're talking about is shifting the top level partitioning away from the kind of mechanics of data and toward domains as the first class citizen and embed the mechanics within each of those domains.

Zhamak Dehghani:

Exactly, and I think the beauty of that is this kind of ecosystem effect that you get, because then you can compose new solutions or new data models and new data aggregates by composing these domain data products.

Mike Mason:

so I want to keep exploring this, but I'm actually a little bit lost, and I think the listeners might be as well.

Mike Mason:

[inaudible 00:11:50] We talked about paradigm shifts, software engineering approaches, and problems with centralization, and you touched a little bit on data producers thinking about data as a product, from the top, can you sketch for me what this thing looks like?

Mike Mason:

I'm enjoying the discussion, but I'm feeling it's a tiny bit theoretical and I've kind of lost track of the moving part.

Ken Collier:

An example might, would be good right about now, as Ward Cunningham would say.

Ken Collier:

So I think about this, we're working right now with a healthcare provider, as an example, so they have insurance claims and claims processing, they have the health of their members and the care, the medical care that their members receive, they have financial concerns that are sort of back office issues that all companies deal with.

Ken Collier:

So what this data architecture might look like, from a decomposition point of view, is an identification of the key domain, So we may have a member domain that is going to be de-identified so that members are anonymous and data scientists can do machine learning from that, they may be the consumers of the data from that data domain, there may be a longitudinal data about member's care over time, with a prediction of what's to come next, or maybe the outcome is a recommended action for the member or the patient to not eat so much salt, or something like this.

Ken Collier:

So if you think about the end use cases, either being data scientists, consumers for machine learning or business consumers for other purposes, that starts that begins to drive those data domains, and Zhamak and I are both working with the same company, so we're familiar with their scenario, it could be call center for another company, it could be streaming music for "You Know", I think that's your example that you talk about, I'll let you add.

Zhamak Dehghani:

Sure, I think maybe it's the point that we introduce the concept, give it a name?

Ken Collier:

Sure, give it a name.

Zhamak Dehghani:

So we talk about this architecture, as Data Mesh, and there are certain characteristics around this architecture and I think we touched on a few characteristics.

Zhamak Dehghani:

What is the Data Mesh? Data Mesh is an alternative approach to manage data, surface data, and serve data organizationally, and address a diverse set of needs, such as analytical needs, or business needs, or machine learning based needs.

Zhamak Dehghani:

The main characteristics of this architecture, is One, what we just talked about, which is thinking about capturing and exposing your data around domains, and have these data sets be consumed by whoever wants to consume them downstream, so there is not a direct pipeline, and have a distributed ownership of those data sets as opposed to a central ownership of the data sets, and bringing the ownership of those data sets as closely as possible to the point of origin or the point of consumption.

Zhamak Dehghani:

So as Ken was mentioning in the healthcare example, people who are actually responsible for building the operational domains that deal with claims and claims systems, they're also become responsible for providing claims information as easily consumable, trustworthy data, whether as events or you know, historical snapshots to the rest of the organization. Similarly, people who deal with members.

Zhamak Dehghani:

So those are the data sets around the point of origin, and there will be a set of data products or data domains, that might be newly aggregated to just patient history for various fit-for-purpose consumptions.

Zhamak Dehghani:

So that's one of the concepts, data around domains with distributed ownership. The Second attribute of this architecture is that if we need to build data pipelines to provide those data sets that we turn into domain data sets, we do so, but those pipelines are just in internal implementations that are specific to a particular data domain.

Zhamak Dehghani:

And the third characteristics that we haven't touched on yet, is that to support this distributed ownership of data across your organization, and allow this rapid creation of different data domains, we need to provide some form of a self-serve infrastructure that is designed for supporting, building these datasets. So that means I need to have easily set up POLYGLOT storage, depending on the type of data that I have, I need to have ways of providing data securely, to the rest of the organization, because now I have exposed my data to whoever has access to it.

Zhamak Dehghani:

So there's a set of infrastructure, as a platform that needs to be put in place, to support these distributed data domains.

Zhamak Dehghani:

We emphasize, and you might have heard in our conversation, we emphasize these attributes around products, so think about data as a product, because now you're providing this as an asset to the rest of the organization, so there is certain attributes that comes with product thinking, that this data should be easily discovered, and it should have its own SOO's or service-level objectives, what's the quality that's associated with the data, have good documentations, to provide a delightful experience to the data scientists that want to find this data and use it, and that's the concept of Data Mesh.

Ken Collier:

And I'll add a little bit to that because as a long time data guy, when Zhamak first introduced these ideas to me, one of the first things I said was, "So does that mean all this Data Lake stuff we've been talking about for the last 10 years is nonsense?" and it's not, there may still be Data Lakes designed around domain collections of data, but instead of a centralized, I think you use the term one Data Lake to rule them all, instead of focusing there, we might have multiple data lakes that are consolidation points along the way.

Ken Collier:

The other thing that you didn't mention is the question of "How do I master my data so that if a customer record is coming from one source and/or from say five sources, which one do I trust?"

Ken Collier:

So that interoperability issue requires some Global Governance Umbrella that sits over the top of this, Mesh, and you may elaborate on that.

Zhamak Dehghani:

Yeah, I think with any distributed architecture, if you don't have standardization in the seams, which is how we communicate information and we don't have that global governance, the system just falls apart.

Zhamak Dehghani:

The example analogy I give, is that the API revolution happened because we had standardization around HTTP and REST to allow interoperability, so if now I need to join data from two different sources, what sort of standardization or interoperability I need to build in so I can actually join data from different sources and that comes under the umbrella of that global or central governance.

Neal Ford:

Well, so being an all a long time architecture guy, I can see the parallels where the Mesh idea comes from because Service Meshes are very popular in the microservices world, and it's a way to consolidate, and couple the operational concerns in the architecture, and leave all the domains de-coupled and you're using this for the same purpose here, as a way of tying your operational concerns together, like query-ability but also leaving the domains highly de-coupled from one another, so the name matches very nicely.

Zhamak Dehghani:

Absolutely, I didn't use a whole lot of imagination to come up with anything.

Neal Ford:

But that also provides you a platform for doing this kind of automated governance stuff at that Data Mesh level, because if there are certain things that the services need to expose, you can build that into the platform, and make sure that all of these have a consistent interface, so that you can get to the data that you need to, so that allows you to pave over differences in graph databases or relational or name value pairs, and those kind of differences.

Ken Collier:

So capabilities like encryption or de-identification, or other kinds of common transformations or calculations, could live at that infrastructure level, and be consumed by the product teams that are creating the data domains.

Mike Mason:

That's something that's quite interesting to me, is one of the use cases that you talked about, was de-identifying the data, so that the identified data product could be used by a different team to generate useful machine learning based insights or whatever, but in a way where they don't need to be cleared for access to that PII data or whatever else it is with a healthcare provider.

Mike Mason:

That actually seems quite interesting and powerful cause one of the problems that we run into, we talk about democratizing access to data and all this stuff, and the first thing that you get is somebody saying, "well that's highly personal patient data and we need to secure it really well", and then you've run into roadblocks on being able to do anything interesting. This seems to be a really interesting way of producing safe data sets that people can be authorized to use.

Ken Collier:

I think so. One of the things that I've thought about in this, especially in the healthcare sector, is you may now focus on role-based access-control, at the data domain level, as opposed to worrying about cellular level or row level authentication, so you may say, well we have this data domain that is de-identified and it's trustworthy and it's verified that we're not going to identify patients, therefore a broader universe of users, analysts, data scientists can can subscribe to that, meanwhile, here's another data domain that has patients that are still identifiable and a smaller subset of people are allowed to subscribe to or consume data from there.

Mike Mason:

And the thing that's interesting is even with the de-identified version, people can create useful insights and say, "Look, I've clustered the data and this cluster of patients I want to give this advice to." And you can create that insight and give it to the folks who still actually have all the patient names and addresses, in order to actually get the advice out there, but without causing a privacy issue through throughout.

Zhamak Dehghani:

Yeah, absolutely. And I think one of those key subjects around interoperability is that sort of global identifiers or federated identifiers you could still pass around without passing the personal details and using those global identifiers, you can join back the personal information with the insights that you found.

Ken Collier:

One of the other benefits to this way of thinking, especially bringing product thinking into this discussion, is if each data domain is supported by a cross functional product team, and that product team includes business domain experts, as well as technologists, and whoever else needs to be involved. But through product ownership, those data domains don't need to live on if they're no longer useful, whereas in our current paradigm, data accumulates immutably, data models grow immutably, data just never really gets cleaned up and incurs a lot of technical debt.

Ken Collier:

So this notion of encapsulating data as domain data products, with investment being made in the product, as long as that product is serving a useful life, and then kill it when it's not. If you don't have users anymore that need that data domain, then the data still is available upstream, and you don't need the domain products to live on.

Neal Ford:

That's a huge advantage of not centralizing all that data, cause when it gets centralized, it gets coupled too, and you can't get rid of it, and so you're left with it forever.

Neal Ford:

So let me engage in a little bit of metaphorical whimsy here, so I think what you guys are suggesting is that rather than Data Lakes, what we've actually had our Data Oceans, and what you guys are suggesting are Data Ponds, with canals between them.

Ken Collier:

Yeah, that's a fair analogy. One of the architects that our healthcare company, refers to this as Michigan? The Land of Lakes?

Zhamak Dehghani:

I've been trying to not use the water metaphor.

Ken Collier:

Yeah. Just trying to stay away from water.

Zhamak Dehghani:

I think to your point Ken, that these cross functional teams, with that product ownership, being a recognized role, it's so important. But I think there is another side effect.

Zhamak Dehghani:

As an industry we are struggling to find data engineers, both ways, the organizations are struggling to hire, and generally, software engineers are struggling to double up those skills because of the silo-ing. If you're a data engineer already somehow, then you know all these tools, you go into these silo data engineering team, work with your fellow data engineers on these data platform, and that has caused data engineers to miss out on a lot of advancements in software engineering practices, that has happened in operational world, and conversely, if you're a software engineer, you never talk or sit next to a data engineer colleague to learn from them.

Zhamak Dehghani:

I read the stats in 2016, I think LinkedIn had 60,000, if I remember correctly, 60,000 data engineers, 60,000 people who had claimed to be data engineers, and that year only in the Bay area, there were 65,000 jobs open for data engineers, and I'm sure it had gotten more since then. I think like bringing software engineers and data engineers together as one team, we allow that cross-pollination, so that software engineers can add working with data engineering tools to their tool belts, and that becomes just part of the generalist toolkit.

Ken Collier:

I think it is important to point out that the tooling doesn't change, the underlying technologies don't need to change, we don't need any special new things, we just need to re-think how we're implementing and managing.

Zhamak Dehghani:

Yeah, I think it's just an inverted model and I hope that with this inverted thinking, we develop a whole new language to go with it, cause that's important.

Neal Ford:

Well, it sounds like taking some of the best practices and some of the good perspectives we picked up from the software architecture world, particularly micro-services, and applying it to the data world, which seeing the obvious parallels there and then applying those same principles.

Zhamak Dehghani:

I think when I visualize the ideal state, when we get there, data is really part of the fabric of the organization, the same way that API's today are part of the fabric of organization, and they're not siloed in some Lake or Platform in the corner.

Ken Collier:

And one question that comes up, and I think it's a legitimate one, isn't there still the need to get a holistic view of the enterprise, and be able to do interesting analysis about a more 360 view from all these data sources? And in that context, it may make sense to have some very clearly stated use cases or analytical goals, and if that's necessary, then that becomes a data domain, it's just a different type of data domain, and you create that for the kind of holistic view that you want, rather than being the central source.

Zhamak Dehghani:

And I think for that to happen, we do need some central centralized views, or global views of this data. So even though we have this kind of de-centralized world with, different teams owning the data, there should be a governance in place that says if your data wants to become a data product we use for the rest of the organization and it's registered itself with this catalog, so it can be found. It needs to have this sort of documentation, so that data catalog or data discovery tool, it is a globally available central tool. Right now there are a lot of technologies around data cataloging, but they came from a different place. They came from a need for discovering data that is siloed and hidden, not data that is intentionally designed to be shared, so I hope that the next generation data catalogs would actually support data teams that are intentionally trying to make their data available and discoverable.

Neal Ford:

Okay, so if you had to summarize this approach in one sentence, what do you think that that summary would be?

Zhamak Dehghani:

It's going to be a long sentence.

Neal Ford:

That's okay, as long as it's a sentence.

Zhamak Dehghani:

It's a Mesh of data that is organized around domains, and owned by cross functional teams, and governed by a centralized governance to allow interoperability, and served by a self-serve infrastructure.

Zhamak Dehghani:

I hope that makes sense.

Neal Ford:

Perfect.

Mike Mason:

People wanted to find out more, where could they do that?

Zhamak Dehghani:

If they look for, actually, on Martin Fowler website right now, there is an article how to move beyond Lake to a distributed Data Mesh, that's where they can find it or they can reach us on Twitter.

Mike Mason:

And we'll link it in the show notes as well.

Neal Ford:

All right, thank you very much, very interesting and very informative.

Mike Mason:

Thank you for having us.

Zhamak Dehghani:

Thank you.

Ken Collier:

Thank you.

Rebecca Parsons:

Next time on the Thoughtworks podcast, I will be speaking with Satyam Argawala, about compliance as code, and we'll be seeing how you're bringing yet another operations and organizational function into the * as code family, looking at compliance as code and some of the implications of automating governance, risk and compliance. So please join us. Thank you.

View full transcript

View less

More episodes

Episode name

Published

Themes from Technology Radar Vol.31

October 17, 2024

Build Your Own Radar: Using the Technology Radar as a governance tool

October 03, 2024

Exploring DuckDB: A relational database built for online analytical processing

September 19, 2024

Software service granularity: Getting it right

September 05, 2024

Measuring developer experience

August 22, 2024

How can AI support designers?

August 08, 2024

Sensible defaults: A way to think about our technology practices

July 25, 2024

Tracking technology stacks, practices and experiences across teams

July 11, 2024

Inside Bahmni: An open-source digital public good

June 27, 2024

How to assess your organization's security maturity

June 13, 2024

Continuous delivery vs. continuous deployment: What should be the default?

May 30, 2024

Themes from Technology Radar Vol.30

May 16, 2024

Building at the intersection of machine learning and software engineering

May 02, 2024

Refactoring with AI

April 18, 2024

How to measure your cloud carbon footprint

April 04, 2024

Technology through the Looking Glass: Preparing for 2024 and beyond

March 21, 2024

Diving head first into software architecture

March 07, 2024

Exploring the building blocks of distributed systems

February 22, 2024

Software-defined vehicles: The future of the automotive industry?

February 08, 2024

Beyond the DORA metrics: Measuring engineering excellence

January 25, 2024

Asynchronous collaboration: Getting it right

January 11, 2024

Looking back at key themes across technology in 2023

December 28, 2023

Leveraging generative AI at Bosch

December 14, 2023

Jugalbandi: Building with AI for social impact

November 30, 2023

AI-assisted coding: Experiences and perspectives

November 16, 2023

What's it like to maintain an award-winning open source tool?

November 02, 2023

Engineering platforms and golden paths: Building better developer experiences

October 19, 2023

Managing cost efficiency at scale-ups

October 03, 2023

Exploring SQL and ETL

September 21, 2023

Driving innovation in radio astronomy

September 07, 2023

XR with impact: Building experiences that drive business value

August 24, 2023

Leadership styles in technology teams

August 10, 2023

Making design matter in technology organizations

July 27, 2023

Generative AI and the future of knowledge work

July 13, 2023

Scaling mobile delivery

June 29, 2023

Making privacy a first-class citizen in data science

June 15, 2023

Multi-cloud: Exploring the challenges and opportunities

June 01, 2023

Scaling up at Etsy

May 18, 2023

TinyML: Bringing machine learning to the edge

May 04, 2023

The weaponization of complexity

April 20, 2023

How we put together the Technology Radar

April 06, 2023

Inside India's Drug Discovery Hackathon

March 23, 2023

Serverless in 2023

March 09, 2023

My Thoughtworks journey: Rebecca Parsons

February 23, 2023

How to tackle friction between product and engineering in scale-ups

February 09, 2023

6 key technology trends for 2023

January 26, 2023

Tackling system complexity with domain-driven design

January 12, 2023

Shifting left on accessibility

December 29, 2022

Data Mesh revisited

December 15, 2022

Low-code/no-code platforms: The 10% trap and the limits of abstractions

December 01, 2022

Welcome to the fediverse: Exploring Mastodon, ActivityPub and beyond [Special]

November 24, 2022

Rethinking software governance: Reflecting on the second edition of Building Evolutionary Architectures

November 17, 2022

Reckoning with the force of Conway's Law

November 03, 2022

Exploring the Basal Cost of software

October 20, 2022

Why full-stack testing matters

October 05, 2022

Acknowledging and addressing technical debt in startups and scale-ups

September 22, 2022

XR in practice: the engineering challenges of extending reality

September 08, 2022

Agent-based modelling for epidemiology: EpiRust and BharatSim

August 19, 2022

Mastering architectural metrics

August 12, 2022

Building a culture of innovation

July 28, 2022

Starting out with sensible default practices

July 14, 2022

Better testing through mutations

June 30, 2022

Patterns of legacy displacement — Part two

June 16, 2022

Patterns of legacy displacement — Part one

June 02, 2022

Mitigating cognitive bias when coding

May 19, 2022

Following an usual career path: from dev to CEO

May 05, 2022

Software engineering with Dave Farley

April 21, 2022

Tackling bottlenecks at scale-ups

April 07, 2022

Coding lessons from the pandemic

March 24, 2022

Is there ever a good time for a code freeze?

March 10, 2022

Navigating the perils of multicloud

February 25, 2022

Compliance as a product

February 10, 2022

The big five tech trends for 2022

January 27, 2022

Fluent Python revisited

January 13, 2022

Creating a developer platform for a networked-enabled organization

December 30, 2021

The art of Lean inceptions

December 16, 2021

The hard parts of data architecture

December 02, 2021

TDD for today

November 18, 2021

You can't buy integration

November 04, 2021

The rise of NoSQL

October 21, 2021

The hard parts of software architecture

October 07, 2021

Machine learning in the wild

September 24, 2021

Delivering innovation at scale

September 09, 2021

Securing the software supply chain

August 12, 2021

Making retrospectives effective — and fun

July 22, 2021

Patterns of distributed systems

July 08, 2021

Refactoring databases — or evolutionary database design

June 24, 2021

Making developer effectiveness a reality

June 10, 2021

Team topologies and effective software delivery

May 20, 2021

How green is your cloud?

May 07, 2021

Green software engineering

April 22, 2021

Twenty years of agile

April 08, 2021

Talking with tech leads with Pat Kua

March 25, 2021

My Thoughtworks Journey: Patricia Mandarino

March 11, 2021

Exploring infrastructure as code

February 25, 2021

XR in the enterprise

February 11, 2021

Getting to grips with data visualization

January 21, 2021

Computational notebooks: the benefits and pitfalls

January 07, 2021

The architect elevator

December 24, 2020

The future of Clojure

December 10, 2020

The future of digital trust

November 27, 2020

Integration challenges in an ERP-heavy world — Pt 2

November 12, 2020

Democratizing programming

October 28, 2020

Integration challenges in an ERP-heavy world

October 16, 2020

Models of open sourcing software

October 01, 2020

Applying software engineering practices to data science

September 17, 2020

Using visualization tools to understand large polyglot code bases

September 03, 2020

Machine learning in astrophysics

August 20, 2020

Programming languages geek out

August 06, 2020

Observability does not equal monitoring

July 23, 2020

Working with 50% of code in the browser

July 09, 2020

Realising the full potential of CD

June 25, 2020

Testing the user journey

June 12, 2020

Continuous delivery in the wild

June 01, 2020

Lessons from a remote Tech Radar

May 13, 2020

The future of Python

April 30, 2020

A sensible approach to multi-cloud

April 17, 2020

Digital transformation: a tech perspective

April 02, 2020

IT delivery in unusual circumstances

March 20, 2020

Continuous delivery for today's enterprise

March 06, 2020

Fundamentals of Software Architecture

February 21, 2020

Cloud migration — part two

February 10, 2020

The price of reuse

January 24, 2020

Towards self-serve infrastructure

January 13, 2020

Martin Fowler: my Thoughtworks journey

December 27, 2019

Building an autonomous drone

December 13, 2019

Cloud migration is a journey not a destination

November 28, 2019

Getting to grips with functional programming

November 14, 2019

Compliance as code

November 01, 2019

Data meshes: a distributed domain-oriented data platform

October 18, 2019

Edge — a guide to value-driven digital transformation

October 04, 2019

Tech choices: CIO or CTO?

September 20, 2019

Microservices as complex adaptive systems

September 05, 2019

Supporting the Citizen Developer

August 22, 2019

Getting hands-on with RESTful web services

August 08, 2019

Zhong Tai: innovation in enterprise platforms from China

July 25, 2019

What’s so cool about micro frontends?

July 11, 2019

Unravelling the monoglot monopoly

June 27, 2019

Breaking down the barriers to innovation

June 13, 2019

Delivering strategic architectural transformation

May 30, 2019

Exploring programming languages via paradigms vs labels

May 16, 2019

Multicloud in a regulated environment

May 03, 2019

Can DevSecOps help secure the enterprise?

April 18, 2019

A11Y — Making web accessibility easier

April 04, 2019

Continuous delivery for modern architectures

March 21, 2019

Delivering developer value through platform thinking

March 07, 2019

Architectural governance: rethinking the Department of ‘No’

February 21, 2019

Serendipitous Events

February 08, 2019

Diving into serverless architecture

January 24, 2019

Seismic Shifts

January 10, 2019

Understanding bias in algorithmic systems

December 28, 2018

Microservices: The State of the Art

December 14, 2018

Evolving Interactions

November 29, 2018

The state of API design

November 15, 2018

How we build the Tech Radar

November 01, 2018

IoT Hardware

October 18, 2018

Continuous Intelligence

October 04, 2018

Distributed systems antipatterns

September 13, 2018

Agile Data Science

August 23, 2018

Services

Industries

Resource Hubs

Publications and Tools

All Insights

Data meshes: a distributed domain-oriented data platform

Brief summary

Check out the latest edition of the Technology Radar