MCP and SRE: Why the future of IT operations is agent-driven

Podcast host Ken Mugrage | Podcast guest Zichuan Xiong

June 12, 2025 | 28 min 33 sec

Brief summary

What if your AI agents could think more like IT operations staff — and less like tools? In this episode, we catch up with Zichuan Xiong, to explore the Model Context Protocol (MCP) — a powerful new way to give AI agents deeper awareness of the tools, information and history they need to work effectively in the operations space.

Unlike traditional APIs that just trigger functions, MCP adds a semantic layer of context that helps AI understand what to do, why it matters and how to do it better.

Whether you’re deep in site reliability engineering (SRE) or just curious about the next leap in AIOps, this episode unpacks how MCP could be the missing layer between today’s tools and tomorrow’s autonomous systems.

If you want to find out more, check out this piece co-written by Zichuan.

Ken Mugrage: Hello, everybody. Welcome to another edition of the Thoughtworks Technology Podcast. My name is Ken Mugrage. I am one of your regular hosts. With me today is a longtime Thoughtworker, works a lot on our AI and so forth. I'll let him introduce himself. Zichuan, you want to introduce yourself?

Zichuan Xiong: Hi, everyone. My name is Zichuan. I joined Thoughtworks about 18 years back. In the past 18 years, I was switching a lot of different roles. I was into AI about three, four years back, leading model AI solutioning at Thoughtworks. I focused on industry solutions. Then I joined a managed service business at Thoughtworks called DAMO, which is our managed service business. Right now, I'm handling the AI ops for our managed service business. Usually, we assign three-year deal with the customers. My job is to bring the AI solutions into our IT operations, focusing on infrastructure-managed services, application-managed services, and also data managed services. That's me, based in San Francisco, and MCP is a huge deal for me. Recently, I wrote articles about MCP. I'm excited about the breakthrough of AI technologies in the past three years.

Ken: I really appreciate you joining, and as you hinted at there, I have a background in DevOps and SRE and that whole movement, and I know that we've interacted that way in the past. You recently did an article with a couple of our partners talking specifically about MCPs and APIs in the SRE-type context. What I wanted to do today is dive into that a little bit for our listeners. First off, just from a context perspective, an SRE, why do you need more than just APIs? Why MCP? What's it bring in? That sort of thing.

Zichuan: When I think about this, SRE means Site Reliability Engineering. In the past 15 years, 20 years, the SRE as a term expanded a lot. If you think about it, it's not only about a site, it can be any type of workloads. It can be AI application, it can be a data product, it can be a pipeline, multiple different things. Reliability change a lot. When people think about reliability, it's not a non-functional requirement, it's one of the ilities. Reliability can mean trust, it can mean continuity, it can be customer experience, it can be performance, multiple different things.

Also, engineering change a lot. How we do things, it's so different compared to 15 years back. To me, SRE is a huge deal into the context because you always need to consider the context, either investigating some of the incidents or try to figure out how to resolve the things, the incidents. That's to me, the context we're seeing here is we have the shifted context about SRE, and also a lot of the technology leaders are trying to build out AI into their existing SRE practice. All the way like two years back, when people start talking about the agentic AI flows, people are talking about building those AI agents into a linear focus or a value chain into different workflows.

That's when the technology leaders were thinking about bringing the AI agents into the SRE practices. About a year ago, Anthropic brought the MCP context into the game and talking about bringing the context to the AI agent so the AI agent can perform better. That's really whole story about how things are going on. From changing landscape of SRE to the AI agent usage to the MCP as a context provider to enable the AI agents to play better, to make better decisions. Those are really context we're seeing. That's why I personally believe MCPs should deal for SRE all the way because SRE is changing, AI agent is changing the game, and also, AI agent requires more context in the SRE space.

Ken: Just to be clear, doing the system reliability engineering on any application, even if the application isn't AI, this is using AI and MCP and all these things to manage and run and secure and et cetera, but it's not only for AI apps, is that correct?

Zichuan: Yes, that's correct. We keep talking about AI for operation, also operation for AI. It's two different things. In this context, I'll majorly talk about the AI for operations. Operation for any type of workloads. It can be applications, it can be a workload, a data pipeline, anything beyond AI. AI application is just one type of the workload. That's the context we're bringing here.

Ken: Then just for the benefit of our listeners that may not be aware, what's your short definition of MCP or Model Context Protocol? What do you mean when you say it? Just so they know what we're talking about the rest of the podcast.

Zichuan: MCP means Model Context Protocol. To me, it's a semantic, dynamic context layer to enable AI agents. It is beneficial for AI agent to understand the context better, then they can perform better to generate more specific output in the very responsible way, which is really required in SRE practice.

Ken: That's great. In the past, we've talked about the APIs in an SRE context. You mentioned earlier the illities, observability and scalability, and all those. We would have an API that we would use to get information. Often, that information is information overload. What's the paradigm shift here? Why MCP and not just APIs that have worked?

Zichuan: To me, I use this analogy. My wife always give me a Post-it, like, "Go to a grocery store to buy stuff." The Post-it writing, it's just a list of things. I need to buy a milk or chicken wings or something like that. It's just a list of things. I don't need to ask a question. I just go there and make sure the label are matching. I just buy it. I don't need to know what she's preparing, what type of dinner for this thing she's buying. I just pick the order, deliver it. To me, that's API. It's function, it's a script, it's structured. 99% is not wrong if you follow the instruction, you follow the schema, versus MCP is something like, she will say, "Sweetheart, we're going to have a anniversary, a 10th anniversary next month." What do you do?

That, to me, that's the MCP. It's, I need to figure out the context. I need to figure out, oh, we're having this anniversary. I need to find out what she likes. Try to plan something. To me, that's a real dynamic, that's contextual, that's semantic. You have to bring something together with my wife, to bring something together. That's to me the difference between the MCP and also API. API is really good at those strict function-based or schema-based structured input and output dealing with different systems. MCP is for more dynamic, open-ended, the request task, generative tasks. Also, you have to consider the context. That's why the difference between two things, I would say that it's not replacing, MCP is not replacing API. It just for different purposes, different scenario. You need a different type of job to be done and you need to apply different technology into this.

Ken: Would this be something like, for incidence response, for example?

Zichuan: Yes. If you think about the incident response, you have an incident, the things as an engineer, you will deal with, you will connect the multiple different contexts to that incident. You have to look at the block to understand what's going on. From the block, you have to go to the different impacted system, if you understand the impact to those system. Also, you need to connect the impact to the business metrics to understand the priority, whether this is hurting business continuity, this is raising a security issue. You always need to bring those context to the business.

Then you have to connect X with T to those different contexts. If this is about security, you have to connect to a security expert, or this is about compliance, you have to connect to compliance expert. In this case, investigation on incident, try to find a resolution, is context shifting gig. It's a very dynamic way of doing things, the shifting context. That's why I personally believe MCP plays a great role here because incident management resolution is a very dynamic, open-ended, decision-making task to do. Even though API is providing the support, I focus on the rule-based support, but that dynamic context-shifting exploration is also important in incident management. That's why MCP plays a critical role in supporting this.

Ken: What would be the scope here? For example, I want to do an incident response, and let's say I'm a retailer, and nobody can place an order on my website. That could be database, it could be throughput, it could be disc space, it could be a million things. Forgive me if I say something silly from an agentic architecture perspective, but do you have different agents for each of those things that are all sharing a context through the MCP? Do you have multiple of these? What's this look like on the ground from an implementation perspective?

Zichuan: From implementation perspective, there are just two, they're very simple. You have MCP server, you have MCP agents. The client, or MCP client. MCP servers are those like the knowledge system who provide the context. Let me give you some example. Right now we're building multiple different MCP clients for our client's adoption. We are building post-mortem analysis report every time we have an incident. In that case, we are trying to create the agent to generate that report. If you think about that report generation, it will consume the context from the past incidents and try to capture the learning and turn that into a report.

In this case, we're communicating with two MCP servers. One is really from the Jira. Jira MCP server will provide all the incident details. That's one. Then you have the Atlassian, more like a Confluence MCP server, to try to create the report. In this case, our agent is connecting those two servers and try to generate the two contexts together. One is the incident context, another one is the report format context and put them together to generate the post-mortem analysis. Right now, it's still case by case. Every time we have a solution, we think about, rather than building a customized base AI agent to do that, writing those instruction into the prompt building this special knowledge base. Or we can reach out the help from the two providers or MCP server provider to provide the context to bring contact to the AI agents. Right now, we're still at the stage of building those MCP client case by case. That's really the progress we're making today, building connection with multiple different MCP servers.

Ken: For the whole buy versus build thing, does that imply that at some point, if people have a particular type of infrastructure with a particular type of application, they'll be able to purchase the pieces to do this, or is this always going to be bake-at-home? What's that look like?

Zichuan: My point of view is always buy first. If necessary, build. If there's something super important to you, you think this is really your demonstrator, then you build. Always try to take a buy-as-accelerating approach. The good news in this space is most of the main strain, the two chain provider in SRE space, are releasing some sort of MCP servers so that every single company, I talk to at least 10 partners every month, everyone right now is having a roadmap being some sort of MCP servers. What you need to do is just try to build up some MCP client, and that's it.

Also, building MCP client, you can also use AWS Bedrock, you can use Vertex AI. They also support a lot of features so you can easily build out those MCP clients. To me, building those things is not that hard. Maybe how to adopt this, the eval is going to be problem, the security check is going to be a issue. That industry is not really fully solved yet. It's not solved yet, but I personally think it's like the whole industry will catch up. I try to build up more solution focusing on a scaled approach, a scaled implementation for those MCP workloads in the future.

Ken: If we say that APIs are one side of an equation and full MCP servers with multiple clients and handing off context and then there's the other side, what's in the middle? Is there anything, or is it a jump?

Zichuan: To me, of course, orchestration is super important. This is really, again, we're experimenting. I don't think in the market has a good definition, a framework to think about that orchestration layer in between whether this is API or this is MCP. That gateway is something I see some multiple technology providers being in some gateway type of thinking. Right now, we actually created something like a prompt-based routing system to route whether this is you should reach out to MCP to get a context. If there is a natural language about the user query and we think it is super dynamic, it's not structured enough, then you should reach out to MCP to get some help.

Right now, we're also waiting for the industry to have some really good gateway thinking. We also see some players focus on MCP marketplace generation, try to wrap all the MCP server access, all the security setup, and even the MCP client building, and you wrap them into services. We also see players that focus on MCP like marketplace as well. Those are really new things. I said this to my clients. I said, there are certain stage, maybe you don't go too fast because a lot of company are not even implementing agents into their SRE value chain. I would say, try something, less, build out 15 AI agents into SRE.

Now think about enjoying the problem of those AI agents cannot share a good context and knowledge, then bring some MCP. Then next step is try to, when your MCP ecosystem is complex, bring some thinking around the MCP gateway or MCP orchestration to really consider API, API ecosystem, and also the MCP ecosystem together. To me, that the future step. I see really a only small portion of the clients have tried to go that far right now.

Ken: Then, switching gears just a little. There's been technologies that have come out. If we only look at the last 10 or so years, you got Docker and Kubernetes and all that stuff, that were open source and had standards and fairly good on their own momentum with different people contributing and so forth. Then it's different cloud providers and would pick them up, and the implementation started to drift. It was easy to say that, "Oh, I can do my Docker container, I can run it anywhere," or, "I can have my Helm chart for Kubernetes, and I can run it anywhere."

In practice, it actually ran a little different on cloud provider A versus cloud provider B or what have you. It wasn't just so easy that you could pick up your Kubernetes-based application and shift it like the dream was. How are they behaving here? Here, Anthropic came out with an open-source thing less than a year ago. My understanding is OpenAI and Google DeepMind and Microsoft, and all these others are saying, "Yes, this is a good idea." How are they behaving, though? Are we diverging? Are we actually having a standard? What's your impression?

Zichuan: I don't think there is a discussion around a standard yet. We see our clients implementing, as I said, MCP is still at the early stage, to be honest with you, especially in the client, the enterprise adoption. If you look, let's say 60, our clients' SRE-managed services, I would say 30% of them are implementing AI into their SRE. I, myself, my job is to push AI adoption. Maybe only another 10% considering the MCP because they have been implemented, multiple different agent. Then the context sharing becoming a problem for MCP to solve that problem.

I still think the adoption is not there yet, but I personally believe that's a good paradigm to think about. What you just mentioned, maybe that's a happy problem you solve in the next six months. I don't think there is a huge deal right now. Even people are talking about standard, talking about security of the MCP, but we're not there yet. That's just my observation in the industry. Maybe there are some certain areas, especially I'm focusing on the SRE space. Their MCP is applying for multiple different other spaces.

Ken: If I'm on an SRE team or that type of team, I'm really trying really hard not to say DevOps team because I'm a believer there shouldn't be such a thing, but if my job is running and managing these things and understanding what to have for incident response and so forth, what should I be learning? What should I be watching? What should I be reading? How do I make sure that I'm not falling behind on this stuff? Because, like you said, it's all being divined.

Zichuan: My tangible suggestion to all the engineer leaders, the operation leaders in companies in enterprise is, just create a list of your mainstream, your existing tool chain. Either it's Dynatrace, it's DataDog, New Relic, whatever. You must have 15 of them, then try to reach out to them, asking direct question, what's your roadmap for MCP? Someone say, "Oh, we already have the MCP server there." Then you ask the question, "Let's do a demo." Ask them to provide you a demo to connect some of the AI agents to their MCP server and see the difference because MCP right now is like you going to rely on those two provider to help you accelerate the thing.

That's always my genuine suggestion to the IT leader there. Like, "Use your partner to do that." They have a plan to deploy MCP and ask them the question. That's my suggestion. Learning is one thing, but implementing is another thing. It's tangible use cases, just use it. We use [unintelligible 00:20:58] we think is promising. It's getting the tangible business barrier for us. It's just, start implementing and experimentation.

Ken: Admittedly, some of this is hype marketing, but there's a lot of AI-enabled tools and agents and agentic architectures and that kind of thing that have been out for a couple of years now, much longer if we don't consider LLM. AI was not actually invented two years ago. There's lots of stuff out there. If someone is already using something that's doing that, that's using AI and writing the context in another way, I don't know if it's RAG or whatever it is, are there compelling reasons for them to be looking at changing, or should they wait for this to mature a little better, or what do you think about that?

Zichuan: My suggestion, the reason why you use MCP is try to augment your existing RAG approach or to improve the efficiency and the performance of your AI agents. My opinion is really to just try it because I think this is a tangible, at least from the practitioner point of view, we're implementing this. We're working with the accelerators and with the different partners. We believe this isn't working because this is our daily job. We're managing our customers' infrastructure and applications. We try to do things faster. There is a business case for us.

Do something in the context of driving a business case. This area rarely has some business case. If you consider other AI focus areas, I think SRE is definitely the big area, which has a great use case, great business case to deliver. My suggestion is, don't hesitate or work with the community. Of course, there is a hype in the market for sure, but calling out something hype is easy to be honest with you. Trying something, try, go to the hype, try something out is hard. That's really my suggestion.

Ken: I love that. People are like, "Oh, it's all hype." No, it's not all hype. Hype comes from somewhere.

Zichuan: That's also my job to differentiate what's real, what's not real, but always have your point of view. The only way to having your point of view is you try something out.

Ken: That's a really good point. If I'm an engineering leader and I'm developing new applications, and I'm going to be creating them, let's assume Greenfield. Let's not assume Legacy or what have you. Are there things that I should be thinking about from a architecture perspective to make this sort of thing easier later? Like, "I want to make my SRE more effective. I want to be able to solve incidents quicker. I want faster mean time to recovery, all the stuff from Dora, et cetera. Are there things that these folks should be thinking about from an architecture perspective or even team structure? What should people be doing to make sure that they make you more effective?

Zichuan: Because everything we're talking about make it easy to operate, that follows the same principle when we're thinking about the cognitive applications. Those best practice, like building the cognitive application will apply here. I don't have actual advice to the technology leader to say, "Make the architecture better, too, so that SRE can be better." To me, it's all about observatory first, you always need to make sure your systems are at what level to be more observable by the different other tools. Always look at your vendor list. Look at your vendors.

Some two providers are cloud-native or AI-native or AI-first. Some providers are established. Their tools, their capabilities, established for a long time. Maybe not that cloud-native. It's just two selections. Selects. Try to combine your strategy together. Try to look at those big, established, two-provider, large-scale, but always bring some of the AI-native. Those new companies are built out only two years, three years old, they bring new practices, folks in SRE. I always want to take a partnership or ecosystem play. Use your ecosystem to evolve your tech stack. It's not only about technical or architecture decision you're making. You also use your ecosystem drive you to improve your tech deck, to improve your tech practice.

That's really the reason why you work with Thoughtworks is, always, Thoughtworks brings some new ideas. Of course, you can work with a large-scale established company, provide services and tool. It's stable, it's predictable, but to drive innovation, sometime you have to bring those ecosystem player who are AI-native or AI-first into your toolchain. That's really my suggestion.

Ken: All right. I really appreciate that. For the listeners, we'll also put a link to the article that Zichuan wrote with JJ Tang and Rob Skillington in the show notes so you can check it there. Just, thank you for your time, but is there anything in closing that you'd like to add, something I didn't ask you about that you think people should hear?

Zichuan: We're doing multiple things for MCP. I would say the key focus area, I think if you're implementing the MCP, I would suggest you consider the following areas. It's just a matter of context, what context you'll bring to your incident manager. We are trying the following things. I think it's worth recommending and sharing. One is bringing the business context into the incident, problem-solving, incident resolution and navigation, investigation. For example, GCP, we're trying the Cloud Run as the MCP technology to bring the business context into the incident triaging resolution. That's one.

The second context you can bring is security context. That's something we're working with Panther, which is a security company, bringing their MCP server into the security detection, incident detection, logic creation as a use case. That's another thing, bringing the security context. The third interesting context is really about the observability context. Think about Chronosphere, think about New Relic. They hold all the lock information. We're using those MCP servers, their MCP servers to connect the SRE engineering together with observability context. Those are three area. Business context, security context, observability context. Those three areas, I think if you were experimenting this, those are tangible use cases you can work on.

Ken: Great. Again, thank you very much for your time. I appreciate it, and we'll speak to you later.

Zichuan: Yes. No problem. Thank you, Ken. I'll talk to you later.

View less

More episodes

Episode name

Published

Why the tech industry needs Expert Generalists

July 10, 2025

The three new fallacies of distributed computing

June 26, 2025

MCP and SRE: Why the future of IT operations is agent-driven

June 12, 2025

Unpacking Google I/O 2025

May 29, 2025

Accelerating mainframe modernization using generative AI

May 15, 2025

Exploring the fundamentals of software engineering

May 01, 2025

Themes in Technology Radar Vol.32

April 17, 2025

We need to talk about vibe coding

April 02, 2025

Infrastructure as code in 2025

March 20, 2025

How fitness functions can help us govern and measure AI

March 06, 2025

Architecture as code

February 19, 2025

Decoding DeepSeek

February 06, 2025

AI testing, benchmarks and evals

January 23, 2025

Exploring the intersections of software architecture

January 09, 2025

Who should make software architecture decisions?

December 26, 2024

Generative AI's uncanny valley: Problem or opportunity?

December 12, 2024

Using generative AI for legacy modernization

November 28, 2024

Data contracts: What are they and why do they matter?

November 14, 2024

Themes from Technology Radar Vol.31

October 17, 2024

Build Your Own Radar: Using the Technology Radar as a governance tool

October 03, 2024

Exploring DuckDB: A relational database built for online analytical processing

September 19, 2024

Software service granularity: Getting it right

September 05, 2024

Measuring developer experience

August 22, 2024

How can AI support designers?

August 08, 2024

Sensible defaults: A way to think about our technology practices

July 25, 2024

Tracking technology stacks, practices and experiences across teams

July 11, 2024

Inside Bahmni: An open-source digital public good

June 27, 2024

How to assess your organization's security maturity

June 13, 2024

Continuous delivery vs. continuous deployment: What should be the default?

May 30, 2024

Themes from Technology Radar Vol.30

May 16, 2024

Building at the intersection of machine learning and software engineering

May 02, 2024

Refactoring with AI

April 18, 2024

How to measure your cloud carbon footprint

April 04, 2024

Technology through the Looking Glass: Preparing for 2024 and beyond

March 21, 2024

Diving head first into software architecture

March 07, 2024

Exploring the building blocks of distributed systems

February 22, 2024

Software-defined vehicles: The future of the automotive industry?

February 08, 2024

Beyond the DORA metrics: Measuring engineering excellence

January 25, 2024

Asynchronous collaboration: Getting it right

January 11, 2024

Looking back at key themes across technology in 2023

December 28, 2023

Leveraging generative AI at Bosch

December 14, 2023

Jugalbandi: Building with AI for social impact

November 30, 2023

AI-assisted coding: Experiences and perspectives

November 16, 2023

What's it like to maintain an award-winning open source tool?

November 02, 2023

Engineering platforms and golden paths: Building better developer experiences

October 19, 2023

Managing cost efficiency at scale-ups

October 03, 2023

Exploring SQL and ETL

September 21, 2023

Driving innovation in radio astronomy

September 07, 2023

XR with impact: Building experiences that drive business value

August 24, 2023

Leadership styles in technology teams

August 10, 2023

Making design matter in technology organizations

July 27, 2023

Generative AI and the future of knowledge work

July 13, 2023

Scaling mobile delivery

June 29, 2023

Making privacy a first-class citizen in data science

June 15, 2023

Multi-cloud: Exploring the challenges and opportunities

June 01, 2023

Scaling up at Etsy

May 18, 2023

TinyML: Bringing machine learning to the edge

May 04, 2023

The weaponization of complexity

April 20, 2023

How we put together the Technology Radar

April 06, 2023

Inside India's Drug Discovery Hackathon

March 23, 2023

Serverless in 2023

March 09, 2023

My Thoughtworks journey: Rebecca Parsons

February 23, 2023

How to tackle friction between product and engineering in scale-ups

February 09, 2023

6 key technology trends for 2023

January 26, 2023

Tackling system complexity with domain-driven design

January 12, 2023

Shifting left on accessibility

December 29, 2022

Data Mesh revisited

December 15, 2022

Low-code/no-code platforms: The 10% trap and the limits of abstractions

December 01, 2022

Welcome to the fediverse: Exploring Mastodon, ActivityPub and beyond [Special]

November 24, 2022

Rethinking software governance: Reflecting on the second edition of Building Evolutionary Architectures

November 17, 2022

Reckoning with the force of Conway's Law

November 03, 2022

Exploring the Basal Cost of software

October 20, 2022

Why full-stack testing matters

October 05, 2022

Acknowledging and addressing technical debt in startups and scale-ups

September 22, 2022

XR in practice: the engineering challenges of extending reality

September 08, 2022

Agent-based modelling for epidemiology: EpiRust and BharatSim

August 19, 2022

Mastering architectural metrics

August 12, 2022

Building a culture of innovation

July 28, 2022

Starting out with sensible default practices

July 14, 2022

Better testing through mutations

June 30, 2022

Patterns of legacy displacement — Part two

June 16, 2022

Patterns of legacy displacement — Part one

June 02, 2022

Mitigating cognitive bias when coding

May 19, 2022

Following an usual career path: from dev to CEO

May 05, 2022

Software engineering with Dave Farley

April 21, 2022

Tackling bottlenecks at scale-ups

April 07, 2022

Coding lessons from the pandemic

March 24, 2022

Is there ever a good time for a code freeze?

March 10, 2022

Navigating the perils of multicloud

February 25, 2022

Compliance as a product

February 10, 2022

The big five tech trends for 2022

January 27, 2022

Fluent Python revisited

January 13, 2022

Creating a developer platform for a networked-enabled organization

December 30, 2021

The art of Lean inceptions

December 16, 2021

The hard parts of data architecture

December 02, 2021

TDD for today

November 18, 2021

You can't buy integration

November 04, 2021

The rise of NoSQL

October 21, 2021

The hard parts of software architecture

October 07, 2021

Machine learning in the wild

September 24, 2021

Delivering innovation at scale

September 09, 2021

Securing the software supply chain

August 12, 2021

Making retrospectives effective — and fun

July 22, 2021

Patterns of distributed systems

July 08, 2021

Refactoring databases — or evolutionary database design

June 24, 2021

Making developer effectiveness a reality

June 10, 2021

Team topologies and effective software delivery

May 20, 2021

How green is your cloud?

May 07, 2021

Green software engineering

April 22, 2021

Twenty years of agile

April 08, 2021

Talking with tech leads with Pat Kua

March 25, 2021

My Thoughtworks Journey: Patricia Mandarino

March 11, 2021

Exploring infrastructure as code

February 25, 2021

XR in the enterprise

February 11, 2021

Getting to grips with data visualization

January 21, 2021

Computational notebooks: the benefits and pitfalls

January 07, 2021

The architect elevator

December 24, 2020

The future of Clojure

December 10, 2020

The future of digital trust

November 27, 2020

Integration challenges in an ERP-heavy world — Pt 2

November 12, 2020

Democratizing programming

October 28, 2020

Integration challenges in an ERP-heavy world

October 16, 2020

Models of open sourcing software

October 01, 2020

Applying software engineering practices to data science

September 17, 2020

Using visualization tools to understand large polyglot code bases

September 03, 2020

Machine learning in astrophysics

August 20, 2020

Programming languages geek out

August 06, 2020

Observability does not equal monitoring

July 23, 2020

Working with 50% of code in the browser

July 09, 2020

Realising the full potential of CD

June 25, 2020

Testing the user journey

June 12, 2020

Continuous delivery in the wild

June 01, 2020

Lessons from a remote Tech Radar

May 13, 2020

The future of Python

April 30, 2020

A sensible approach to multi-cloud

April 17, 2020

Digital transformation: a tech perspective

April 02, 2020

IT delivery in unusual circumstances

March 20, 2020

Continuous delivery for today's enterprise

March 06, 2020

Fundamentals of Software Architecture

February 21, 2020

Cloud migration — part two

February 10, 2020

The price of reuse

January 24, 2020

Towards self-serve infrastructure

January 13, 2020

Martin Fowler: my Thoughtworks journey

December 27, 2019

Building an autonomous drone

December 13, 2019

Cloud migration is a journey not a destination

November 28, 2019

Getting to grips with functional programming

November 14, 2019

Compliance as code

November 01, 2019

Data meshes: a distributed domain-oriented data platform

October 18, 2019

Edge — a guide to value-driven digital transformation

October 04, 2019

Tech choices: CIO or CTO?

September 20, 2019

Microservices as complex adaptive systems

September 05, 2019

Supporting the Citizen Developer

August 22, 2019

Getting hands-on with RESTful web services

August 08, 2019

Zhong Tai: innovation in enterprise platforms from China

July 25, 2019

What’s so cool about micro frontends?

July 11, 2019

Unravelling the monoglot monopoly

June 27, 2019

Breaking down the barriers to innovation

June 13, 2019

Delivering strategic architectural transformation

May 30, 2019

Exploring programming languages via paradigms vs labels

May 16, 2019

Multicloud in a regulated environment

May 03, 2019

Can DevSecOps help secure the enterprise?

April 18, 2019

A11Y — Making web accessibility easier

April 04, 2019

Continuous delivery for modern architectures

March 21, 2019

Delivering developer value through platform thinking

March 07, 2019

Architectural governance: rethinking the Department of ‘No’

February 21, 2019

Serendipitous Events

February 08, 2019

Diving into serverless architecture

January 24, 2019

Seismic Shifts

January 10, 2019

Understanding bias in algorithmic systems

December 28, 2018

Microservices: The State of the Art

December 14, 2018

Evolving Interactions

November 29, 2018

The state of API design

November 15, 2018

How we build the Tech Radar

November 01, 2018

IoT Hardware

October 18, 2018

Continuous Intelligence

October 04, 2018

Distributed systems antipatterns

September 13, 2018

Agile Data Science

August 23, 2018

Solutions

Industries

Resource Hubs

Publications and Tools

All Insights

MCP and SRE

Brief summary

Find out more about Thoughtworks' unique approach to managed services