Context engineering: Tackling legacy systems with generative AI

Podcast host Ken Mugrage and Neal Ford | Podcast guest Birgitta Böckeler and Chandirasekar Thiagarajan

August 21, 2025 | 40 min 41 sec

Listen on these platforms

Brief summary

Generative AI can be incredibly powerful when it comes to legacy modernization. Not only can it help us better understand a large, aging codebase, it can even help us reverse engineer a legacy system when we don't have access to the complete source code. Doing it, though, requires a specific approach that's being described as 'context engineering'.

This is something we've been exploring a lot in recent months at Thoughtworks. On this episode of the Technology Podcast, Thoughtworks' lead for AI-enabled software engineering, Birgitta Böckeler, and tech principal Chandirasekar Thiagarajan join hosts Ken Mugrage and Neal Ford to discuss how it works.

They explain the process, the tools and what the work is teaching them about both generative AI and legacy modernization.

Read Birgitta's blog post on reverse engineering with AI.

Episode transcript

Ken Mugrage: Hello, everybody, and welcome to another episode of the Thoughtworks Technology Podcast. My name is Ken Mugrage. I'm one of your regular hosts. We have an exciting episode today talking about experimentation that we're doing in the generative AI space. Co-hosting today with me is Neal Ford.

Neal Ford: For the second podcast in a row, we've promised an exciting episode before we've actually said anything. We'll see if we can make that come true. I think it's absolutely going to come true today because we're talking about one of the favorite hot topics right now, which is generative AI and its uses.

Ken: Our first guest, Birgitta, would you like to introduce yourself?

Birgitta Böckeler: Yes. Hi, I'm Birgitta Böckeler. I'm a Distinguished Engineer for Thoughtworks in Germany, and I'm currently a full-time subject matter expert at Thoughtworks for using generative AI on software teams and for creating software. Not putting AI into a piece of software but using it to build software.

Ken: Chandru, you want to introduce yourself, please?

Chandirasekar Thiagarajan: I've been with Thoughtworks since 2009, and 10 out of those 15 years I've been with various retail accounts. I'm starting to see myself as a specialist over there.

Ken: For today's episode, we're going to talk about using generative AI for one facet of legacy modernization, which is one of the bigger use cases for it. Experimentation is really important overall because you really need to set a goal or we can overload the term hypothesis or what have you, and run tests and understand, and kind of the scientific method, if you will. You try something, you pass or fail, you move on. Again, it's pass or fail, so you're trying to learn from the experiment. We have, as an industry and at Thoughtworks, we have a long history of this.

You can go all the way back to continuous integration, and then it wasn't called CI at first, continuous delivery and microservices and data mesh and et cetera. It was trying different things, building on what works, killing what doesn't, sometimes not as fast as we should, frankly. It's that idea of trying things and seeing what will work and being, frankly, a little bit brave. The one we're talking about today, I want to actually ask Birgitta to talk about an experiment that we're doing when you don't have access to all of the building blocks for a legacy application. Birgitta, what are we talking about today?

Birgitta: Like you said, legacy migration and legacy modernization is a very interesting use case for generative AI with a lot of potential. We've done it and seen it in a lot of different areas already. It's kind of like this workflow that is emerging that is always very, very similar, but then each step is done in different ways. You always, when you have an application that you want to rebuild, be it a COBOL module or like a 15-year-old monolith or 20-year-old monolith or something like that, you have certain data sources that you can use to find out what that application even does if you want to rebuild that functionality.

You have the code, maybe you can look at the application itself in a test environment, in your production environment, you might have documentation, all of these different things. What we usually see is not "Let's take generative AI, give it the old code and recreate immediately the new application," this A to Z, COBOL to Java is the furthest distance for that, I guess. It's more like, how can we step by step do this? In each of the steps, AI helps the human do the step.

The first step being, let's create a description of what the functionality is today, of the COBOL module or like whatever the application does, and then take that description, also double-check it with a subject matter expert, with users, whoever is relevant in that domain. Then take that description and feed it into the forward engineering. Feed it into creating stories, creating new backend API design, stuff like that. What we want to talk about today is a case of this workflow, in terms of the data sources, you don't have access to the full code.

This might be because maybe you had a bad breakup with the vendor and they didn't leave the code behind, you only have the binary, or it could also be because you do not want to access some of the code because it's so messy that you've experienced that confuses generative AI. That's the premise or the setup of the experimentation that we did. We'll talk about that first, and then later Chandru will also chime in on what we experienced when we started doing that at an actual real case.

Chandirasekar: Yes, we actually had even less data sources than what we anticipated. We thought we'd have a running application, a test environment where we can just see how things were. That core requirement was also missing.

Neal: That's getting a little bit ahead of things. Let's talk about this approach. I think it's particularly appealing for not using generative AI as a COBOL Whisperer to translate all the messy COBOL code into pristine Java code. Because a lot of times the reason the legacy migration is so difficult is because it has been layers and layers and layers of changes and updates and maintenance by different teams.

At some point, it becomes incomprehensible even to generative AI. Looking at it purely from a behavioral standpoint, I think, is a useful way of thinking about reverse engineering the existing behavior of an application. We're throwing around some terms before we got started. Let's talk about some of these. Is this reverse engineering? Is this a black box? How black is the box are we talking about here? Let's do a little bit of clarification around the approach here.

Birgitta: Terminology. Yes, at the moment, I always call it a two-step process. I say AI-accelerated reverse engineering and then AI-accelerated forward engineering, where the reverse engineering only includes a description of the application, not actually building it, so the forensics kind of. You do forensics on the existing application and the existing code to recreate a good description of what it does. Because we can now use generative AI in the forward engineering, there's a new incentive for us to actually create this description in textual form in a lot of detail.

In the past, maybe we wouldn't even do it at that level of detail. We would still, in the forward engineering, maybe have stories as the placeholder for a conversation because we want to build a new, fresh application. Now that we can use AI for the forward engineering, there is an incentive there to have very detailed descriptions. It maybe even changes the equation of cost benefit when we think about feature parity. That's also one of the hypotheses. That with AI-accelerated reverse and forward engineering, feature parity might become less of a sticking point.

Neal: Especially if you can actually build better behavior than just replacing the old behavior that's already there. I know Ken has a question, but I'm curious about a little deeper investigation into exactly how do you investigate the behavior of the existing system? Is this through user interface or through inputs and outputs? Is it through change data capture? Or is it all of those things plus other stuff?

Birgitta: Yes. There are some established techniques that we already have. I see this over and over again that combining generative AI with techniques that we already have is where the magic happens. Not when you only use generative AI. In our first experimentation that we did, we actually used different techniques than we then used later at the actual client where Chandru was.

In that experimentation phase, first we actually focused very much on dynamic data capture indeed. When we did this, this was a few months ago when the first MCP servers were just coming out that actually allow you to browse an application, to have your AI coding assistant or whatever AI tool you're using to control a browser and click through an application and actually find out what's on the page. That was the first thing we tried. We used the Playwright MCP server, which is one of the MCP servers built by Microsoft. It uses the Playwright end-to-end web testing framework under the hood.

Then my AI-- my coding assistant, for example, my agentic coding assistant can go and explore the application. In this case, we used Odoo, which is an open-source ERP system. We used one of its components, the customer relationship management, and we asked AI, "We want to rebuild this functionality to create a new opportunity in the sales pipeline. Here's the page, the URL of the page where that's happening, create a description of all the functionality, try to find all the possible actions, all the possible click paths, describe all the elements on the screen that you see, describe any dynamic behavior you see."

For example, there were things like when the user selects a value in this dropdown, then the application pre-fills all the other fields, which is dynamic behavior that you cannot see just from a screenshot. When you use something like Playwright MCP, AI actually gets access to the DOM structure. Multimodal large language models can actually also do this just based on visuals.

They can just look at the screen and then say, "Click at coordinate X and Y," because they see a button there. Using something like Playwright MCP is a lot more effective because it gets access to the DOM. It's faster and more reliable. That was the first step, this dynamic click, explore the application, and create a description, and also create screenshots for us on the way. We didn't have to take screenshots manually.

Chandirasekar: Actually, with a lot of prompt experiments, we were able to get a very detailed specification. Playwright, as soon as it's given access to a running website, it not only knows what are the visual elements that are present over there, but it can also, to an extent, discover and specify what happens on blur, what happens on focus, how the elements interact with each other. If we had to do that manually, then that's going to either result in some things being missed or take a really lot of time to put together that specification. I was particularly impressed with how quickly and easily Playwright can just browse through the pages, give us a specification of what user journey is happening.

This is a really valuable document for us when the time comes to do the forward engineering part. One other thing worth mentioning is the documents that we create can be iteratively enriched. You start with an initial specification and then you discover, okay, I can make Playwright do a different journey, give it a different prompt. Then AI is really good at merging two different specifications together, such that what's missing from one document can be enriched into the other document.

Again, that's an activity if I had to do manually, it would really be a painful process, like looking at two specifications, find out what's missing, and then creating a master document out of it is not something that we want to spend too much time there. I found those aspects super useful when it comes to creating the specification. Birgitta, I don't know if you wanted to mention it next, but the server, the exploration of the UI was not the only data source in Birgitta's experiment. I think she also used CDC. It's not really change data capture in the normal way of how people mean it.

It's more like monitoring what is changing in the database as we browse through the application. As the Playwright MCP server was helping us go through page after page, we could potentially use an MCP server to find out what are the changes happening behind the scenes in the database, like what inserts are happening, what updates are happening, and then provide that as context back to the AI. Our models are smart enough to understand what's happening and put that also in the specification. That becomes an even more rich document for us later.

Ken: Do you have a feel-- Obviously, one of the things that gets thrown around whenever you hear reverse engineering or black box or any of those terms touched on a few minutes ago, is do you own the application, or are you doing it on somebody else's app? Because you wouldn't have access to the database. I wouldn't think. What was the quality difference when you did that? When you added that step of watching the data and seeing what changed in the backend, do you have a feel for how much better the discovery got, or did you do a checkpoint there at all?

Chandirasekar: I think it gave us a lot more confidence. Our intention was to recreate the application going forward. The exploration just using the frontend was very good at specifying the behavior of how the user journey should progress and how the user experience should be, but it was not very good at specifying how we have to build the application going forward. Having the insight coming from the database, and normally, applications do a lot more than talk to database.

As we provide more and more context to our model coming from different data sources, then it gives us more and more confidence to accurately recreate the application when we do the forward engineering. I know Birgitta wanted to say something.

Birgitta: It was a lot more than increasing confidence, because, in this case, the only indicator we had of what was happening in the backend. This was definitely not like a black box exploration. It was more like gray box. The things that we could, that we allowed ourselves to access-- In the experiment, we set those boundaries ourselves, but we allowed ourselves to access the frontend, for example. That's not a black box anymore. We can see the rendered DOM. We allowed ourselves to access the database schema. We allowed ourselves to do that, to create those database triggers, the change data capture in concept.

We didn't use a specific change data capture product, but we created a lightweight version ourselves with database triggers. Without that, we wouldn't know anything about what happens in the backend. In this experiment, the only indication we had of what's happening in the backend. In a real-world application, it depends a bit on the type of application, often in digital products or enterprise business software. The data processing is the main step, what actually happens on the database query level is the main thing that's happening.

There's often, of course, more stuff happening under the hood. You still have to, from the queries, infer a lot about and guess a lot about what logic is happening. Depending on how much of that logic there is in an application, you might have much more blind spots in one application than in another. That's where the subject matter expert sanity check comes in, where you then look at what you get from AI and see what's missing here.

There's also things like calling other services or sending an email, like calling up an SMTP service or something like that. We didn't do that as part of the experiment, but that's, of course, also something that you have to keep in mind. It's all about augmenting the humans that are creating this description, but it's not AI only creates the description and it has everything.

Ken: You just mentioned one or two other possible data sources, SMTP server. What are some others? Do you watch the network for API calls? What would give you the most complete picture?

Birgitta: Like I said, it might be calling other services in the landscape, in the client case that we'll get to in a few minutes, we ultimately discovered there might be some mainframe calls. Other things might be, might create events on some kind of enterprise service bus. This may be a good example for an old application like that, where you-- that you want to replace.

We actually also, as part of this, one of our team members also explored network capture, to see, "Oh, can we have this clicking through the application and not just check what changed in the database, but also check what network requests happened." We only did that for network requests between the client and the server, let's say. We could actually recreate a Swagger API documentation of what the current backend API looked like.

That gives us another indication. Maybe we don't want to rebuild exactly that API, but it gives us another indication. It might also be interesting to do network capture between the backend and whatever else it's calling to. We didn't try that. I also, at the moment, don't have a good idea of what that would look like or if it's even possible. Again, depends on the environment, but that could maybe be an additional data source.

Ken: As this matured, could you see this--, and I'm going to use a little bit of a dirty word here, like almost a lift and shift? Is that--

Birgitta: This is a lift and shift, yes.

Ken: Okay. Is that a pattern that you recommend for people? I know we're getting into the forward engineering now, okay, I have an application, I rely on it. I don't have access to all the information about it. Should I lift and shift as a baseline?

Birgitta: Let me maybe rephrase when I just said, "Yes, lift and shift," what I meant is more what I mentioned before about the feature parity. It's almost like lift and shift feature parity, but I think typically when people say "lift and shift," maybe it depends on the context, but it was like, when cloud started coming up, it was a lot of this, "Just take your load and lift it and shift it exactly like that to the cloud."

Then maybe it's not optimized for cloud, and so you want to re-architect. It is, I think, for use cases where you want feature parity-type lift and shift, but you do not want technical lift and shift because this multi-step workflow actually allows you to forward engineer like a modern stack that implements the same functionality.

Neal: I would call this not at all black box. I would call this a very light gray box reverse engineering because you had the user interface, you have what changes in the database. For most legacy applications, I don't want to diminish that, but those are the trivial bits because the really complicated part to recreate are all the layers and layers and layers of source code, all the business logic. Okay, yes, the database changed from these fields to this field, but why for this particular workflow?

That's going to be layers and layers and layers of changes and exceptions and those sorts of really, really messy, complicated things. I think one of the things--, and I think Chandru is here to talk about this, you did to get even more confidence in this was actually take some of the source code that you had in binary form and decompile it to see if it's actually doing what you thought it did. Can you talk a little bit about that, please?

Chandirasekar: Yes. We had the same uncertainty that you talked about. We kind of know that there could be a lot of hidden logic, let's say, within the binaries. We really wanted to see if we can crack the DLL and find out a little bit more if there is any business logic within that. We know what's happening in the DB, but like you said, we don't know if the rows were-- how they were calculated, basically. We don't know if some math happened. What if 10% or some threshold had to be respected? All of those business logic, if it was happening within the, let's say, middle layer, we did not know if we can accurately find that and recreate it in our forward application.

Birgitta: To set the scene, like in this case, we did actually-- This was a Windows stack, as Chandru said, it was a DLL, but we did have access to some of the frontend code, so to some of the ASP code. We saw some of the client-side stuff happening, and we had the database schema, of course, and the stored procedures, but the actual black box, to an extent, that we wanted to make more gray was the DLL, that application server.

Chandirasekar: Yes. I think what we tried to do is we-- We know that it's a 2000-2005 legacy binary, and it was a Windows binary as well, so we tried to look for tools that can help us get insight into this binary. One tool specifically stood out.

This tool was called as Ghidra, and it's an open-source project from NSA. I was particularly impressed about this tool because it was able to run on a Mac, on an M1 processor, and still give us the assembly that's sitting within the DLL that was originally built in an x86 processor for Windows. It's a bit of-- For some reason, I assumed that, okay, if this is a Windows binary, if it's built long ago, when .NET wasn't even a hot topic, so it's probably going to be tied to that ecosystem, it's not going to be easy to look into it, but Ghidra tool actually surpassed expectations and still gave us a good view of what's there within that assembly.

Of course, it's free. That's another point working towards it. It offered bulk decompilation. It was able to take all the binary and then convert it back to C code for us. The C code was not super readable. It was still pretty obfuscated, but it was something because a lot of other tools were not offering the bulk functionality. They were offering, "Let's decompile one function or a part of it." Then the full feature set was more of a commercial offering there. For all those reasons, and also because of the fact that it was a very powerful editor, easy to learn, this was really helpful. What we did there is we asked Ghidra to basically give us a C version of what's running within the Windows binary. We got like 5,000 lines out of it.

The C code was also not very readable. Initially we were like, "Okay, did I just catch the tiger by the tail? How am I going to make sense of 5,000 lines of C code?" Then started to use AI a little bit more to make sense of it. As soon as we started this experiment, we ran into all sorts of limits. The first limit was the file was too big. It could not be streamed to the AI. Then the next challenge was, we split the file, but still, AI was doing a lot of analysis. It was analyzing, it was going through the files, it was analyzing even more, but then we did not have any output because the analysis continued to take such a long time.

Birgitta: We should mention here that we were using Gemini Pro for this. That was the AI tool of choice at this organization. Gemini Pro has a 1 million context window. It even has a really large window.

Chandirasekar: Then we ran into 429s. We ran into rate limiting after some time because at this point in time we thought we'll split it into one function per file. Then we had, I think, 3,800 files. We ran into 429 requests. Then we finally found out that okay, we needed to break down this problem. What we did next is we created a Python program to make sure that we send file by file to our model, and then ask it to translate it into pseudocode each of those functions.

We also asked AI to classify it, whether it's a library function or if it's a function that's related to business logic. This filtering really helped us, I don't think it would have been possible before to classify lot of functions like this in bulk. I think in this sense at least, AI did a pretty good job of telling us, "Okay, this is a C++ standard library function. Okay, but this, on the other hand, is most likely a business logic over there."

Ken: Just real quick on that classification, you said it was mostly accurate, you have a feel for how accurate it was? What would be the consequence if it was inaccurate and you missed it?

Chandirasekar: I think one possible consequence is we run the risk of sometimes missing the business logic. Given the large amount of available resources for what constitutes a C++ library function that's already available in the internet, this information really helped AI to identify those footprints, or let me say the fingerprint of those functions, whether it matches with the standard library or not in a good fashion.

Birgitta: Maybe before you continue, Chandru, with the decompilation process, your question about the impact, Ken. Generally, when working with generative AI, with coding, and I felt that especially in this process, you're constantly doing risk assessment. You're constantly thinking about what is the probability that gen AI gets this wrong that's based on thinking about what information did I give it? How did it even come up with this? Did it have access to that information? Where is it coming from?

Then the second one is what's the impact of getting this wrong. In this case, this was actually a relatively straightforward application with a bunch of forms that the user fills out, and then it lands in the database. There was-- You can--, and we have access to the database queries and stuff like that. That's another part of about probability and impact. The process that Chandru is in the middle of describing right now, the decompilation, was for us another sanity check, because at this point, we had already fed AI with screenshots, with the ASP code, with the database schema, and so we had gotten to a point where we had inferred already what we thought was happening in the backend.

Kind of like based on the schema and what you see here in the form, what do you think is the SQL query? Because we didn't have access to a test environment, so we couldn't do data capture. That's what we had at this point in time. Now what Chandru is talking us through right now was our next level of getting more confidence in the probability that we had it right.

Chandirasekar: Yes, thanks for bringing that up. I think that helps in validating how accurate our classification is also, because we have the information from before. We can compare and check if it falls in line with what we know already so far. After doing this classification, after filtering down a lot of functions that we obtained from the binary into a small set, we noticed that there are some inaccuracies, like we couldn't pinpoint a specific function that we were looking for. It's a pretty important function that happens on submit of a particular functionality.

We really wanted to pinpoint and identify what's happening in that area. That's when we noticed that with Ghidra, despite it being a very powerful tool, it can sometimes miss important information as it decompiles from assembly to C. In our case, we were looking for a specific query. This query was not available in the C code, but it was available in the assembly code.

We decided, "Okay, AI is doing a pretty good job with analyzing the C, but let's switch to assembly itself." Assembly code is super hard for us to go through, but AI doesn't have all those hindrance, whatever knowledge it has on C, it also has an equivalent amount of knowledge on assembly. It really helped us over there to convert a set of assembly functions into pseudocode. I think one thing worth highlighting here is we don't want to actually decompile the binary back into the original source. We just want to know what's happening within the binary.

This was possible by getting AI to describe in pseudocode what's happening in the assembly code. What we did over there was, we know the region that we were interested in. We took all the assembly code starting with the main area and then all the functions that it calls and all the place that is referring to that function. We took all of it together. We were also able to get a Type Lib out of the binary. Most old legacy COM DLLs, they tend to have this Type Lib information, so we were able to extract that.

Birgitta: By the way, this type of information played a really big role in the first step even before the decompilation. Because it had all of the interfaces of the middleware functions that were called by the ASP code. That already was a big component in our inference, in our guesses of what the functionality does even before we went into the decompilation.

Chandirasekar: Together with the Type Lib and the set of assembly functions that we know are relevant to the function that we are investigating, together with the context that we already had before, we were able to get AI to connect all the dots. That was another area that it shined. It looked at the assembly, it looked at the Type Lib, it looked at all the previous contexts that we have, including the stored procedures.

Then it gave us a summary of, "Okay, this is what is happening in that area of the binary." For example, we learned that, when doing this action, there was logic in there that prevents that action from being duplicated. It was triggered based on a user action. If the user double-clicks, we don't want that to cause issues. The people who wrote that code originally, they had the logic over there to prevent those kind of things from happening.

We also found that there is some calls to mainframe. We know now that we have a new data source for us to monitor. All of those things started to come out. In general, what we noticed is that, if you really want to be confident in an area, if you really want to go to the level of detail and make sure that we have not missed out anything, that's when we bring this big gun and then start looking at the assembly, start looking at all the information that it has gathered so far, and then build a full picture over there.

Ken: Yes, like many, or maybe even most things with generative AI, what I'm hearing is that, well, the AI is certainly key to this. It's a very important part of it. This also required a whole lot of expertise. I guess, as a closing, even though we could keep talking about this for quite a while, for both of you, if someone wanted to do this, they have an application, let's assume they had roughly the same level of access that you had to data and so forth. What kind of team does it take to do this? What's the expertise required to do this kind of thing?

Birgitta: The most tricky part was the part that Chandru just went into in detail. I was really glad that Chandru was on this team because he has experience in the .NET stack and in that whole ecosystem, so he could identify that you can't just decompile this with any old C-sharp decompiler, but that's actually C++ and an older artifact and all of that. Chandru and Tiago, who were working on this together, had some previous experience in that area. I think that was really helpful.

Chandirasekar: I wanted to add on. If it was a new person, I think the only context that they need is some background on how the binary was built. If they're listening to this podcast, they know what kind of tools to use. We started from nothing. We did not do this before. We were able to start making progress within like three days of when we started. We ran into all those challenges that I mentioned, and then we solved them quickly.

Then we progressed to identifying what's present in the crucial function. I think not more than a week, the whole process, it took. I think as long as they're listening to this podcast and as long as they have a basic background of how their binary was built and what's the context of it, they should be able to ramp up quickly.

Ken: I think it's worth noting that we're also talking about a team of pretty senior technologists. It was easy for a few days for you, but that might not be the-- and that's what I meant, is like, you're not going to throw a com grad at this probably, right?

Birgitta: Yes. I mean, the key skill here is knowing where to look for the data sources. That means a certain level of familiarity with this legacy stack. To know what back in that day, like there's a web server, there's an application server, there's no React single page application, and so how can we connect to that web server and get the ASP code? Just to know what you need to look for. I think that's the thing.

Often, when it's a really old stack, you have that experience and people who have maybe worked on that stack in the past, or who knows some of these outdated patterns. I think that's the main skill. Then it's about pulling it all together with prompting and with the constant risk assessment type of thing that I talked about before, where the risk assessment, again, the probability, a lot of what feeds into that is me knowing about the available data sources in that stack and a certain confidence in, yes, I think we covered everything that's probably there. We're not missing anything. If you don't know that stack, you might have a lot of unknown unknowns.

Neal: I think it's a great illustration that at no part of this process did generative AI do any of the thinking for you. All it did was eliminate busy work, because you had to know what the Type Lib business was and how ASP connects to a backend with a DLL and database schemas and all that sort of stuff and using AI to get rid of busy work.

Like Chandru said, if you'd just given this DLL and decompile the thing, there's no way you would have ever by hand tried to go through and classify it, because we have utility of generative AI, which you have to trust, but verify at every step, as Birgitta said, you always have to think about how is this not doing the right thing? It's a great-- In the hands of a knowledgeable practitioner, it is a very, very powerful force.

Birgitta: Yes. Basically what we did, there's a buzzword right now, context engineering. It's more meant in the context of agentic systems, where you just want to think about how do you engineer the context for the agents so that they always have the right information, not too much information and all of that. If you apply the term to this case, a slightly different type of context engineering, that's what we did. We put together the data sources and the information that we fed AI to with a high level of confidence, get a description of the functionality that we felt like we could trust at a reasonable level for this use case.

Neal: We know the impact of, the top 20% developers are markedly better than the ones below it. Now my hypothesis is, does the further you get up that stack toward the top 20% developers, does AI actually accelerate their abilities even more? In other words, the more expertise you have, does it actually empower you to become even more productive because you know the places to look for and the things to find and the uses with a generative AI because you have much more relevant context for the overall problem space?

Chandirasekar: Yes as well there. If you had asked me this question before we had the models, for me, looking at a lot of assembly files and translating them into pseudocode would have made me give up.

Neal: It would have made you quit your job.

[Laughter]

Chandirasekar: Now that we have this service, which can quickly look at 16, 17 assembly files, all of them, hundreds of lines long, assembly being assembly, it can quickly translate it into English for us. That is really a game changer. At least personally, I found it like it enables a lot more things than what was possible before.

Birgitta: Yes, and this was a specific case, of course, we talked about how it's about an old stack. Naturally, even without gen AI, if you have experience with that old stack, it helps you be more effective. I would say, yes, especially in that risk assessment part that I talked about, but that I feel is a lot of what I'm always doing. If it's like when building a prototype building, doing agent-assisted coding, doing this type of thing.

There, my experience plays a really big role in this probability and impact assessment. Because I have a lot of scars, for example. It also is really helpful for people with less experience, to learn faster. There's definitely a case to be made that when you have that experience, you can use it a lot more responsibly.

Ken: We'll add a link to a blog that Birgitta posted about a month ago about some of this. It also goes into testing the feature parity and some other stuff that folks might find interesting. With that, I want to say-- We can keep talking about this for a very long time, but watch this space. There will be more updates on this and other experiments that the teams are doing. I want to thank everybody very much for coming. Thank you for your time. Thank our listeners, of course. We're looking forward to the next episode.

Neal: I think it did, in fact, turn out to be exciting. Ken was right at the beginning. Thanks so much, Birgitta and Chandru.

Chandirasekar: Thank you.

Birgitta: Thank you.

View less

More episodes

Episode name

Published

How developers can get the most from new AI coding workflows

November 13, 2025

Themes from Technology Radar Vol.33

October 30, 2025

What does an AI strategy with humans at the center look like?

October 16, 2025

What we're talking about when we talk about context engineering

October 02, 2025

Mean time to shared understanding: Bridging the gap between citizen developers and developers

September 18, 2025

Organizational design and Team Topologies after AI

September 04, 2025

Context engineering: Tackling legacy systems with generative AI

August 21, 2025

Navigating AI opportunities at MYOB

August 07, 2025

Caring about documentation in the LLM era

July 24, 2025

Why the tech industry needs Expert Generalists

July 10, 2025

The three new fallacies of distributed computing

June 26, 2025

MCP and SRE: Why the future of IT operations is agent-driven

June 12, 2025

Unpacking Google I/O 2025

May 29, 2025

Accelerating mainframe modernization using generative AI

May 15, 2025

Exploring the fundamentals of software engineering

May 01, 2025

Themes in Technology Radar Vol.32

April 17, 2025

We need to talk about vibe coding

April 02, 2025

Infrastructure as code in 2025

March 20, 2025

How fitness functions can help us govern and measure AI

March 06, 2025

Architecture as code

February 19, 2025

Decoding DeepSeek

February 06, 2025

AI testing, benchmarks and evals

January 23, 2025

Exploring the intersections of software architecture

January 09, 2025

Who should make software architecture decisions?

December 26, 2024

Generative AI's uncanny valley: Problem or opportunity?

December 12, 2024

Using generative AI for legacy modernization

November 28, 2024

Data contracts: What are they and why do they matter?

November 14, 2024

Themes from Technology Radar Vol.31

October 17, 2024

Build Your Own Radar: Using the Technology Radar as a governance tool

October 03, 2024

Exploring DuckDB: A relational database built for online analytical processing

September 19, 2024

Software service granularity: Getting it right

September 05, 2024

Measuring developer experience

August 22, 2024

How can AI support designers?

August 08, 2024

Sensible defaults: A way to think about our technology practices

July 25, 2024

Tracking technology stacks, practices and experiences across teams

July 11, 2024

Inside Bahmni: An open-source digital public good

June 27, 2024

How to assess your organization's security maturity

June 13, 2024

Continuous delivery vs. continuous deployment: What should be the default?

May 30, 2024

Themes from Technology Radar Vol.30

May 16, 2024

Building at the intersection of machine learning and software engineering

May 02, 2024

Refactoring with AI

April 18, 2024

How to measure your cloud carbon footprint

April 04, 2024

Technology through the Looking Glass: Preparing for 2024 and beyond

March 21, 2024

Diving head first into software architecture

March 07, 2024

Exploring the building blocks of distributed systems

February 22, 2024

Software-defined vehicles: The future of the automotive industry?

February 08, 2024

Beyond the DORA metrics: Measuring engineering excellence

January 25, 2024

Asynchronous collaboration: Getting it right

January 11, 2024

Looking back at key themes across technology in 2023

December 28, 2023

Leveraging generative AI at Bosch

December 14, 2023

Jugalbandi: Building with AI for social impact

November 30, 2023

AI-assisted coding: Experiences and perspectives

November 16, 2023

What's it like to maintain an award-winning open source tool?

November 02, 2023

Engineering platforms and golden paths: Building better developer experiences

October 19, 2023

Managing cost efficiency at scale-ups

October 03, 2023

Exploring SQL and ETL

September 21, 2023

Driving innovation in radio astronomy

September 07, 2023

XR with impact: Building experiences that drive business value

August 24, 2023

Leadership styles in technology teams

August 10, 2023

Making design matter in technology organizations

July 27, 2023

Generative AI and the future of knowledge work

July 13, 2023

Scaling mobile delivery

June 29, 2023

Making privacy a first-class citizen in data science

June 15, 2023

Multi-cloud: Exploring the challenges and opportunities

June 01, 2023

Scaling up at Etsy

May 18, 2023

TinyML: Bringing machine learning to the edge

May 04, 2023

The weaponization of complexity

April 20, 2023

How we put together the Technology Radar

April 06, 2023

Inside India's Drug Discovery Hackathon

March 23, 2023

Serverless in 2023

March 09, 2023

My Thoughtworks journey: Rebecca Parsons

February 23, 2023

How to tackle friction between product and engineering in scale-ups

February 09, 2023

6 key technology trends for 2023

January 26, 2023

Tackling system complexity with domain-driven design

January 12, 2023

Shifting left on accessibility

December 29, 2022

Data Mesh revisited

December 15, 2022

Low-code/no-code platforms: The 10% trap and the limits of abstractions

December 01, 2022

Welcome to the fediverse: Exploring Mastodon, ActivityPub and beyond [Special]

November 24, 2022

Rethinking software governance: Reflecting on the second edition of Building Evolutionary Architectures

November 17, 2022

Reckoning with the force of Conway's Law

November 03, 2022

Exploring the Basal Cost of software

October 20, 2022

Why full-stack testing matters

October 05, 2022

Acknowledging and addressing technical debt in startups and scale-ups

September 22, 2022

XR in practice: the engineering challenges of extending reality

September 08, 2022

Agent-based modelling for epidemiology: EpiRust and BharatSim

August 19, 2022

Mastering architectural metrics

August 12, 2022

Building a culture of innovation

July 28, 2022

Starting out with sensible default practices

July 14, 2022

Better testing through mutations

June 30, 2022

Patterns of legacy displacement — Part two

June 16, 2022

Patterns of legacy displacement — Part one

June 02, 2022

Mitigating cognitive bias when coding

May 19, 2022

Following an usual career path: from dev to CEO

May 05, 2022

Software engineering with Dave Farley

April 21, 2022

Tackling bottlenecks at scale-ups

April 07, 2022

Coding lessons from the pandemic

March 24, 2022

Is there ever a good time for a code freeze?

March 10, 2022

Navigating the perils of multicloud

February 25, 2022

Compliance as a product

February 10, 2022

The big five tech trends for 2022

January 27, 2022

Fluent Python revisited

January 13, 2022

Creating a developer platform for a networked-enabled organization

December 30, 2021

The art of Lean inceptions

December 16, 2021

The hard parts of data architecture

December 02, 2021

TDD for today

November 18, 2021

You can't buy integration

November 04, 2021

The rise of NoSQL

October 21, 2021

The hard parts of software architecture

October 07, 2021

Machine learning in the wild

September 24, 2021

Delivering innovation at scale

September 09, 2021

Jim Highsmith: a 54-year agile journey

August 26, 2021

Securing the software supply chain

August 12, 2021

Making retrospectives effective — and fun

July 22, 2021

Patterns of distributed systems

July 08, 2021

Refactoring databases — or evolutionary database design

June 24, 2021

Making developer effectiveness a reality

June 10, 2021

Team topologies and effective software delivery

May 20, 2021

How green is your cloud?

May 07, 2021

Green software engineering

April 22, 2021

Twenty years of agile

April 08, 2021

Talking with tech leads with Pat Kua

March 25, 2021

My Thoughtworks Journey: Patricia Mandarino

March 11, 2021

Exploring infrastructure as code

February 25, 2021

XR in the enterprise

February 11, 2021

Getting to grips with data visualization

January 21, 2021

Computational notebooks: the benefits and pitfalls

January 07, 2021

The architect elevator

December 24, 2020

The future of Clojure

December 10, 2020

The future of digital trust

November 27, 2020

Integration challenges in an ERP-heavy world — Pt 2

November 12, 2020

Democratizing programming

October 28, 2020

Integration challenges in an ERP-heavy world

October 16, 2020

Models of open sourcing software

October 01, 2020

Applying software engineering practices to data science

September 17, 2020

Using visualization tools to understand large polyglot code bases

September 03, 2020

Machine learning in astrophysics

August 20, 2020

Programming languages geek out

August 06, 2020

Observability does not equal monitoring

July 23, 2020

Working with 50% of code in the browser

July 09, 2020

Realising the full potential of CD

June 25, 2020

Testing the user journey

June 12, 2020

Continuous delivery in the wild

June 01, 2020

Lessons from a remote Tech Radar

May 13, 2020

The future of Python

April 30, 2020

A sensible approach to multi-cloud

April 17, 2020

Digital transformation: a tech perspective

April 02, 2020

IT delivery in unusual circumstances

March 20, 2020

Continuous delivery for today's enterprise

March 06, 2020

Fundamentals of Software Architecture

February 21, 2020

Cloud migration — part two

February 10, 2020

The price of reuse

January 24, 2020

Towards self-serve infrastructure

January 13, 2020

Martin Fowler: my Thoughtworks journey

December 27, 2019

Building an autonomous drone

December 13, 2019

Cloud migration is a journey not a destination

November 28, 2019

Getting to grips with functional programming

November 14, 2019

Compliance as code

November 01, 2019

Data meshes: a distributed domain-oriented data platform

October 18, 2019

Edge — a guide to value-driven digital transformation

October 04, 2019

Tech choices: CIO or CTO?

September 20, 2019

Microservices as complex adaptive systems

September 05, 2019

Supporting the Citizen Developer

August 22, 2019

Getting hands-on with RESTful web services

August 08, 2019

Zhong Tai: innovation in enterprise platforms from China

July 25, 2019

What’s so cool about micro frontends?

July 11, 2019

Unravelling the monoglot monopoly

June 27, 2019

Breaking down the barriers to innovation

June 13, 2019

Delivering strategic architectural transformation

May 30, 2019

Exploring programming languages via paradigms vs labels

May 16, 2019

Multicloud in a regulated environment

May 03, 2019

Can DevSecOps help secure the enterprise?

April 18, 2019

A11Y — Making web accessibility easier

April 04, 2019

Continuous delivery for modern architectures

March 21, 2019

Delivering developer value through platform thinking

March 07, 2019

Architectural governance: rethinking the Department of ‘No’

February 21, 2019

Serendipitous Events

February 08, 2019

Diving into serverless architecture

January 24, 2019

Seismic Shifts

January 10, 2019

Understanding bias in algorithmic systems

December 28, 2018

Microservices: The State of the Art

December 14, 2018

Evolving Interactions

November 29, 2018

The state of API design

November 15, 2018

How we build the Tech Radar

November 01, 2018

IoT Hardware

October 18, 2018

Continuous Intelligence

October 04, 2018

Distributed systems antipatterns

September 13, 2018

Agile Data Science

August 23, 2018

Solutions

Industries

Publications and Tools

All Insights

Context engineering:

Tackling legacy systems with generative AI

Brief summary

Episode transcript

Explore a snapshot of today's tech landscape