What we're talking about when we talk about context engineering

Podcast host Rachel Laycock | Podcast guest Bharani Subramaniam and Alessio Ferri

October 02, 2025 | 20 min 11 sec

Listen on these platforms

Brief summary

Everyone seems to be talking about context engineering. That was certainly the case in our recent discussions for the upcoming edition of the Technology Radar (volume 33, due early November 2025). And although we ran into the term on the Technology Podcast just a few weeks ago, we thought it would be useful to try and tackle exactly what people are talking about when they talk about context engineering. We know context is important when it comes to AI, but what does it mean to engineer it?

On this episode of the Technology Podcast, host and Thoughtworks CTO Rachel Laycock is joined by Thoughtworkers Alessio Ferri and Bharani Subramaniam to discuss what context engineering is, how it's being done and what it tells us about the evolution of AI. This certainly won't be the last word — ours or anyone else's — on context engineering, but it might help clarify and cement your understanding as the term comes to dominate technology conversations.

Episode transcript

Rachel Laycock: Hello, everyone, and welcome to the Thoughtworks Technology Podcast. I'm Rachel Laycock, the global CTO for Thoughtworks, and I'm joined today by...?

Bharani Subramaniam: Bharani Subramaniam.

Rachel: Okay, and?

Alessio Ferri: Alessio Ferri. Hi, everyone.

Rachel: We're going to talk about a hot topic as we were putting together our technology radar this week, which is context engineering. We had quite a hot debate about what it was, how you define it, what's going on in the industry. I have these two special guests with me who seem to be deep experts in this space right now, which is rare given how new it is. Let's start with what is your definition of what context engineering is?

Bharani: Context engineering is this emerging field where you curate what the model sees so that you get a better result. That is what context engineering is.

Alessio: Maybe before we go into the context engineering base, I'd like to define what the actual context is. Context is this concept of providing tokens into a model so that it can eventually produce an output. The engineering is all the practice that sits around that and applying engineering principle throughout the design of what goes into this model, how we build this context, and also how do we test it. Context is actually a very broad topic that could include, for example, memory, conversations, data that's being accessed by an LLL. It's an umbrella term that covers everything that goes into a model as its input.

Rachel: I've heard argue, though, that that is actually prompt engineering and that this is just a piece of prompt engineering. What are your thoughts on that? Do you think it's a different thing? Do you think it's an extension of prompt engineering?

Bharani: Yes, I think this is mostly true, and at times, false, because some people say context engineering is a super set of prompt engineering, but if you look at it, we started with a prompt, a carefully curated template, so that a user can structure the input to get the desired output. What really happened is when you're building complex systems, it is not just a one-hop, single prompt, single output. You are going to have a multi-hop. Then what is carefully curated is no longer true.

You're no longer worried about how you are structuring for a given prompt. You're going to deal with a history of such conversations. That's where I think you have to make a switch from how am I crafting my input to what the model actually sees? I think this shift is what, in my mind, that actually made us focus on, "Let me worry about the bigger picture of what model sees as opposed to how I'm crafting a single piece of input."

Rachel: What impacts does it have? What is a good version of this? Maybe give some examples of doing it well and the impact of doing this well versus not well?

Bharani: I think I would say that we are still discovering this, what works well, what doesn't work well, because if you think about it, almost all models right now are stateless in the sense they don't have any memory, and it's a pure function. If you think about it, the only thing that can affect the output is the input that you give to the model. What we have seen is that if you have an extremely large input, irrespective of how big the capability of the model is, even though a model can take, let's say, a million token, if you give a really large input, we have seen the actual quality of the response to it.

You would also burn a lot more tokens when you have a very lengthy, messy input. I think one technique that has worked for us in the past is that our time, and this is where it gets subjective, you need to compact and curate your history. You don't need the entire messy output of those tools. You may have to consider compacting because-- I say this, it's almost like you visit a doctor, you need all of your history, but also, you need a neat little summary so that they can quickly see what has happened and start from there.

Rachel: Yes, almost like when you present to executives and you need the summary review, and then all the appendix is later. Who's building the context? Who should care about it? Who has to craft it? What skills does it take?

Alessio: I think context engineering, from what Bharani was saying before, we started experimenting with prompts and understanding how we could ask to a model better the instructions that we wanted the model to do for us. Prompt engineering exposed everyone, exposed developers. Building with AI exposed developers that were using AI to build software. It was exposed to users that were interacting with models directly such as on ChatGPT. This technology is quite ubiquitous today, and everyone can access it.

I think that context engineering should be something that we all care about. It's not just who is building AI, who is building with AI, but it's also users, because as we interact more and more with these tools, as we feed, for example, information and we feed documents to these interfaces that allow us to work with the data that we as organizations own, then everything that we feed into that context has to be relevant. We need to almost help the model that sits behind to understand and focus on what's relevant for the task that we are putting in the prompt.

I think what we said before also about the fact that over time, we've seen context windows grow in LLMs, but as Bharani said, the attention focus hasn't grown as fast as those context windows. It's even more relevant today because if we don't do context engineering, the impact that what you were saying before, Rachel, the impact is we have uneccessary costs when we run these models. We lose focus from the LLM or potentially miss this important information, which can be quite impactful for the task that we have ahead. Maybe it misses a specific requirement in a document that were fed into an LLM or potentially even hallucinates because there's so much confusion in the context that it sees that the model is not powerful yet to handle that.

Rachel: I think escalating cost is certainly something when you think about leveraging these models at scale, it's going to be a growing concern, I think, in the industry. It'd be good maybe to dive into, and you can both share some of your own examples of like what-- because one of the things we talked about a lot, actually, in our sessions was the different techniques that exist. Maybe, Bharani and Alessio, you could tell us some of the techniques you've used and what have been the benefits of those, and when you might use one technique over another and what you've learned. I recognize it's early days, but I think people would be really interested to hear what have been our experiences so far.

Bharani: I would say that the most important principle is you need to build the entire context engineering around the KV cache. Imagine this. Let's say if you are giving the entire history for every single conversation, the model is not going to execute that extra. It's going to just compare, "Do I have it in my cache or not?" The most important engineering technique is to treat the context is append only.

You should be intentional about rewriting history because every time you rewrite, the model is going to recompute and it's just not the last instruction, it's going to recompute your new history. I would say the fundamental principle of context engineering is to treat-- Context is mutable, but treat it as append only. If you want change, be intentional about, let's say if you're combating history, then you need to be aware that it's going to cause inorderly computation. That, I would say, is the first principle.

Rachel: What about you, Alessio?

Alessio: We mentioned before the different users and people that have to be concerned about context engineering, and we've seen some techniques that are very accessible to anywhere, for example, using AI to build software. There is emerging techniques such as these, what we call AGENTS.md, and every coding assistant provides their own version or name of that, but ultimately, it's about providing context about, for example, the project that you're working with, how folder structures are structured like that and why have you done it with that intention so that the LLM can almost understand what your principles are and as it addresses the tasks that you fit it with, it knows about that context.

That's, I think, why quite accessible for everyone interacting with these tools to provide that extra level of context to a model. I think there's some very interesting techniques once some of them have been quite exposed by building internally at Thoughtworks, the CodeConcise accelerator. That is what we call RAG. RAG has many different facets to it. What we ended up building is GraphRAG, which I find is a very interesting technique, because it's not just about providing that context to the LLM, but it's also making the context understandable by people because that context is a graph, and you can navigate that. It's about making the context not only understandable by a model, but also trying to optimize for that context to be explorable by humans.

Rachel: Yes. That's funny because one of the other topics we talked about was AI-friendly code bases, I remember. Just that made me think about that as another topic we explored as to why identifying that good coding practices that we generally use to create modular, easy-to-read code actually makes it easier for the AI to read it as well. Ironically, that when you're generating lots of code with AI, it doesn't necessarily build it in a modular and clean way, so you almost need the humans in the loop. I guess that brings me to another question around this, which is, where are the important human-in-the-loop pieces of building out, doing context engineering well?

Bharani: The point Alessio touched is important. For example, we were thinking of, let's say if you have a piece of data, should you settle for a JSON format or are you good with, let's say, a CSP? Both are human-readable, but the CSP is going to be much more compact because you're not going to repeat your headers and things like that. It fundamentally comes down to how many tokens you want to spend. It should be human-readable, but at the same time, compact. I think that's where the empathy for the tokens comes in, because every token costs and everything counts in the latency as well.

Rachel: Right. It's that mechanical sympathy. It's like we have to have empathy for the humans so it's human-readable, but also empathy for how the machine is going to operate and the potential for spiraling costs or other infrastructure concerns. I know it's early days, but do you think that there's any long-term patterns emerging, or is this just are we still in the early days of some techniques work but we haven't really established what's long-term at this point?

Bharani: I think that is a pattern, I would say, almost in the AI industry right now. For example, lots of the prompting techniques are now fundamentally at the model layer. For example, the chain of thought is understood by the model and not just at the prompt layer. Over time, if you think of it, people draw an analogy to the entire operating system model. Like right now, you think of context engineering as where we are programming with C, perhaps, and you are manually managing your memory, your paging, what should be in the RAM, what should be swapped out and things like that.

You can draw an analogy where in future, maybe in the model, or maybe you would have a smarter operating system, an equivalent of that for AI where the context as the working memory is managed better either by some standardized scheduler-like processes, or natively in the model. I think we can expect that to come up because everyone who's building agents, everyone who's working with model has to deal with this problem at this point of time.

Rachel: Right. I guess I'd be remiss not to ask you where MCP fits into this. Where does that play into what you've been building, and what are some of the challenges you've had with leveraging MCP that you've already experienced? Because we did talk about that quite a lot over the last few days as well.

Bharani: In your mind, if you can think of, there are three important agents, I would say. If you can think of an agent that you're building and the resources in your organization, the protocol that bridges that gap is like the model context. You need some ATD that provides your contextual resources to the model, and that's the MCP server. I have multiple agents and microsystems, multi-agent systems. I need a standard for that. Okay, so that's your agent-to-agent. We talked about agent-to-agent protocol as well.

Then the third one is we are building these user interfaces for these agents, and you need a standard for that, that's the agent-to-UI spec. I think all these protocols, MCPs, agent-to-agent, or agent-to-UI, they're all emerging, but of all three, I think MCP is the most mature right now because we are trying to expose and make these applications a lot more context-aware with the internal data.

One thing to call out when building MCP is that the security is still a challenge, that the protocol is evolving, they're incorporating a lot of these recommendations from the security community. We've come across a lot of MCP servers that are just built over the weekend as a hackathon exposing a lot of sensitive data. That's something to watch out for. I think there's also an upcoming tool called MCP Scan that's coming up that we are positioning in the radar.

Rachel: Then I think the last thing I want to ask, because when we're talking about context and another topic we discussed quite a lot over the last few days, is around spec-driven development. This push, in the context of building software to give as much context, to let the AI run away and build things. Is there anything you want to raise with the audience of what's working and not working in that space and what we discussed over the last few days?

Alessio: I think we were saying before that context engineering might sometimes feel like-- and Bharani made this point, that it might feel like it's what we provide as input to a model. It's actually also what comes back from a model because most likely, what comes back is also going to be the input to another model.

As part of a lot of work that we're doing in Thoughtworks is trying to use AI-- is use AI to understand better and faster legacy systems. On the other side, we have also teams that are exploring how can we use these specifications that we produce, which have the right level of detail, which are not ambiguous for an LLM to be able to implement and with engineering practices around it so that we can test what the LLM is producing.

We are actually looking at, can we bridge these two gaps? Can we, for example, extract documentation and create these specifications from a legacy code and then feed that for forward engineering by another LLM to produce? I think I really like the nature of the LLM as in it can be quite creative at times at solving problems. When I say creative, it's quite interesting because how creative it can be, I don't know.

What I find quite interesting is the fact that when you take legacy code and you describe it into these specifications, finding the right balancing between providing a lot of detail and not enough detail is actually quite hard because when you provide too much detail, you might end up with a solution that looks too similar to the one that you already had before. When you find a way to actually describe the problem without actually describing how it's solutionized, but describe the problem and what you want to get out of that problem, then that leaves enough room for the LLM to be able to understand and use its training data to find a way that's good, given the modern techniques that we have to solve problems to implement that solution. I think that's actually very, very powerful.

Bharani: Just want to stress one point that there is these lots of different opinions saying, "Oh, you know what, you don't need engineers for this anymore." If you actually look at these specs, they are design and abstractions and good examples and structured communications. All of this is what I would expect from a software engineer to do. It's just that it's another layer of abstraction. We are like from programming languages with types and rigid structures, we are going into English. If you look at these specs, they are still engineering documents.

Yes, you can get a trivial example just with a plain English because model has seen enough hollow world programs out there. But when you start solving for enterprises and really sharp problems, you need to have a good abstraction so that we don't repeat yourselves. Expressing that in the document is going to be as challenging as expressing in any language. That is the tough balance or the hard balance.

Rachel: It's a really interesting space because it feels like the whole industry has been trying to create the citizen developer forever, and with local no-code platforms. I think when we try to go too far down that route, especially if you're building new complex systems, beyond the hollow world, getting the spec right is actually part of the whole agile software delivery feedback loop. It's like, that's the hard part.

If you lock that down too much too early, you're going to have to do too much rework. I do think in Alessio's example, where I'm seeing a more solid and workable use cases on legacy systems where you're trying to describe the behavior so that you can then-- a large portion of it, you would essentially going to want to reimplement. It's very early days and it's probably its own topic in and of itself. We do have to go today. I just want to thank the folks that have joined us, Bharani and Alessio. There will be more to come over the coming episodes about some of the deep discussions we've had this week as we're recording the Radar together. Thanks very much.

View less

More episodes

Episode name

Published

How developers can get the most from new AI coding workflows

November 13, 2025

Themes from Technology Radar Vol.33

October 30, 2025

What does an AI strategy with humans at the center look like?

October 16, 2025

What we're talking about when we talk about context engineering

October 02, 2025

Mean time to shared understanding: Bridging the gap between citizen developers and developers

September 18, 2025

Organizational design and Team Topologies after AI

September 04, 2025

Context engineering: Tackling legacy systems with generative AI

August 21, 2025

Navigating AI opportunities at MYOB

August 07, 2025

Caring about documentation in the LLM era

July 24, 2025

Why the tech industry needs Expert Generalists

July 10, 2025

The three new fallacies of distributed computing

June 26, 2025

MCP and SRE: Why the future of IT operations is agent-driven

June 12, 2025

Unpacking Google I/O 2025

May 29, 2025

Accelerating mainframe modernization using generative AI

May 15, 2025

Exploring the fundamentals of software engineering

May 01, 2025

Themes in Technology Radar Vol.32

April 17, 2025

We need to talk about vibe coding

April 02, 2025

Infrastructure as code in 2025

March 20, 2025

How fitness functions can help us govern and measure AI

March 06, 2025

Architecture as code

February 19, 2025

Decoding DeepSeek

February 06, 2025

AI testing, benchmarks and evals

January 23, 2025

Exploring the intersections of software architecture

January 09, 2025

Who should make software architecture decisions?

December 26, 2024

Generative AI's uncanny valley: Problem or opportunity?

December 12, 2024

Using generative AI for legacy modernization

November 28, 2024

Data contracts: What are they and why do they matter?

November 14, 2024

Themes from Technology Radar Vol.31

October 17, 2024

Build Your Own Radar: Using the Technology Radar as a governance tool

October 03, 2024

Exploring DuckDB: A relational database built for online analytical processing

September 19, 2024

Software service granularity: Getting it right

September 05, 2024

Measuring developer experience

August 22, 2024

How can AI support designers?

August 08, 2024

Sensible defaults: A way to think about our technology practices

July 25, 2024

Tracking technology stacks, practices and experiences across teams

July 11, 2024

Inside Bahmni: An open-source digital public good

June 27, 2024

How to assess your organization's security maturity

June 13, 2024

Continuous delivery vs. continuous deployment: What should be the default?

May 30, 2024

Themes from Technology Radar Vol.30

May 16, 2024

Building at the intersection of machine learning and software engineering

May 02, 2024

Refactoring with AI

April 18, 2024

How to measure your cloud carbon footprint

April 04, 2024

Technology through the Looking Glass: Preparing for 2024 and beyond

March 21, 2024

Diving head first into software architecture

March 07, 2024

Exploring the building blocks of distributed systems

February 22, 2024

Software-defined vehicles: The future of the automotive industry?

February 08, 2024

Beyond the DORA metrics: Measuring engineering excellence

January 25, 2024

Asynchronous collaboration: Getting it right

January 11, 2024

Looking back at key themes across technology in 2023

December 28, 2023

Leveraging generative AI at Bosch

December 14, 2023

Jugalbandi: Building with AI for social impact

November 30, 2023

AI-assisted coding: Experiences and perspectives

November 16, 2023

What's it like to maintain an award-winning open source tool?

November 02, 2023

Engineering platforms and golden paths: Building better developer experiences

October 19, 2023

Managing cost efficiency at scale-ups

October 03, 2023

Exploring SQL and ETL

September 21, 2023

Driving innovation in radio astronomy

September 07, 2023

XR with impact: Building experiences that drive business value

August 24, 2023

Leadership styles in technology teams

August 10, 2023

Making design matter in technology organizations

July 27, 2023

Generative AI and the future of knowledge work

July 13, 2023

Scaling mobile delivery

June 29, 2023

Making privacy a first-class citizen in data science

June 15, 2023

Multi-cloud: Exploring the challenges and opportunities

June 01, 2023

Scaling up at Etsy

May 18, 2023

TinyML: Bringing machine learning to the edge

May 04, 2023

The weaponization of complexity

April 20, 2023

How we put together the Technology Radar

April 06, 2023

Inside India's Drug Discovery Hackathon

March 23, 2023

Serverless in 2023

March 09, 2023

My Thoughtworks journey: Rebecca Parsons

February 23, 2023

How to tackle friction between product and engineering in scale-ups

February 09, 2023

6 key technology trends for 2023

January 26, 2023

Tackling system complexity with domain-driven design

January 12, 2023

Shifting left on accessibility

December 29, 2022

Data Mesh revisited

December 15, 2022

Low-code/no-code platforms: The 10% trap and the limits of abstractions

December 01, 2022

Welcome to the fediverse: Exploring Mastodon, ActivityPub and beyond [Special]

November 24, 2022

Rethinking software governance: Reflecting on the second edition of Building Evolutionary Architectures

November 17, 2022

Reckoning with the force of Conway's Law

November 03, 2022

Exploring the Basal Cost of software

October 20, 2022

Why full-stack testing matters

October 05, 2022

Acknowledging and addressing technical debt in startups and scale-ups

September 22, 2022

XR in practice: the engineering challenges of extending reality

September 08, 2022

Agent-based modelling for epidemiology: EpiRust and BharatSim

August 19, 2022

Mastering architectural metrics

August 12, 2022

Building a culture of innovation

July 28, 2022

Starting out with sensible default practices

July 14, 2022

Better testing through mutations

June 30, 2022

Patterns of legacy displacement — Part two

June 16, 2022

Patterns of legacy displacement — Part one

June 02, 2022

Mitigating cognitive bias when coding

May 19, 2022

Following an usual career path: from dev to CEO

May 05, 2022

Software engineering with Dave Farley

April 21, 2022

Tackling bottlenecks at scale-ups

April 07, 2022

Coding lessons from the pandemic

March 24, 2022

Is there ever a good time for a code freeze?

March 10, 2022

Navigating the perils of multicloud

February 25, 2022

Compliance as a product

February 10, 2022

The big five tech trends for 2022

January 27, 2022

Fluent Python revisited

January 13, 2022

Creating a developer platform for a networked-enabled organization

December 30, 2021

The art of Lean inceptions

December 16, 2021

The hard parts of data architecture

December 02, 2021

TDD for today

November 18, 2021

You can't buy integration

November 04, 2021

The rise of NoSQL

October 21, 2021

The hard parts of software architecture

October 07, 2021

Machine learning in the wild

September 24, 2021

Delivering innovation at scale

September 09, 2021

Jim Highsmith: a 54-year agile journey

August 26, 2021

Securing the software supply chain

August 12, 2021

Making retrospectives effective — and fun

July 22, 2021

Patterns of distributed systems

July 08, 2021

Refactoring databases — or evolutionary database design

June 24, 2021

Making developer effectiveness a reality

June 10, 2021

Team topologies and effective software delivery

May 20, 2021

How green is your cloud?

May 07, 2021

Green software engineering

April 22, 2021

Twenty years of agile

April 08, 2021

Talking with tech leads with Pat Kua

March 25, 2021

My Thoughtworks Journey: Patricia Mandarino

March 11, 2021

Exploring infrastructure as code

February 25, 2021

XR in the enterprise

February 11, 2021

Getting to grips with data visualization

January 21, 2021

Computational notebooks: the benefits and pitfalls

January 07, 2021

The architect elevator

December 24, 2020

The future of Clojure

December 10, 2020

The future of digital trust

November 27, 2020

Integration challenges in an ERP-heavy world — Pt 2

November 12, 2020

Democratizing programming

October 28, 2020

Integration challenges in an ERP-heavy world

October 16, 2020

Models of open sourcing software

October 01, 2020

Applying software engineering practices to data science

September 17, 2020

Using visualization tools to understand large polyglot code bases

September 03, 2020

Machine learning in astrophysics

August 20, 2020

Programming languages geek out

August 06, 2020

Observability does not equal monitoring

July 23, 2020

Working with 50% of code in the browser

July 09, 2020

Realising the full potential of CD

June 25, 2020

Testing the user journey

June 12, 2020

Continuous delivery in the wild

June 01, 2020

Lessons from a remote Tech Radar

May 13, 2020

The future of Python

April 30, 2020

A sensible approach to multi-cloud

April 17, 2020

Digital transformation: a tech perspective

April 02, 2020

IT delivery in unusual circumstances

March 20, 2020

Continuous delivery for today's enterprise

March 06, 2020

Fundamentals of Software Architecture

February 21, 2020

Cloud migration — part two

February 10, 2020

The price of reuse

January 24, 2020

Towards self-serve infrastructure

January 13, 2020

Martin Fowler: my Thoughtworks journey

December 27, 2019

Building an autonomous drone

December 13, 2019

Cloud migration is a journey not a destination

November 28, 2019

Getting to grips with functional programming

November 14, 2019

Compliance as code

November 01, 2019

Data meshes: a distributed domain-oriented data platform

October 18, 2019

Edge — a guide to value-driven digital transformation

October 04, 2019

Tech choices: CIO or CTO?

September 20, 2019

Microservices as complex adaptive systems

September 05, 2019

Supporting the Citizen Developer

August 22, 2019

Getting hands-on with RESTful web services

August 08, 2019

Zhong Tai: innovation in enterprise platforms from China

July 25, 2019

What’s so cool about micro frontends?

July 11, 2019

Unravelling the monoglot monopoly

June 27, 2019

Breaking down the barriers to innovation

June 13, 2019

Delivering strategic architectural transformation

May 30, 2019

Exploring programming languages via paradigms vs labels

May 16, 2019

Multicloud in a regulated environment

May 03, 2019

Can DevSecOps help secure the enterprise?

April 18, 2019

A11Y — Making web accessibility easier

April 04, 2019

Continuous delivery for modern architectures

March 21, 2019

Delivering developer value through platform thinking

March 07, 2019

Architectural governance: rethinking the Department of ‘No’

February 21, 2019

Serendipitous Events

February 08, 2019

Diving into serverless architecture

January 24, 2019

Seismic Shifts

January 10, 2019

Understanding bias in algorithmic systems

December 28, 2018

Microservices: The State of the Art

December 14, 2018

Evolving Interactions

November 29, 2018

The state of API design

November 15, 2018

How we build the Tech Radar

November 01, 2018

IoT Hardware

October 18, 2018

Continuous Intelligence

October 04, 2018

Distributed systems antipatterns

September 13, 2018

Agile Data Science

August 23, 2018

Solutions

Industries

Publications and Tools

All Insights

What we're talking about when we talk about context engineering

Brief summary

Episode transcript

Explore a snapshot of today's tech landscape