AI giant Anthropic recently explained how developments in Claude Code are transforming COBOL modernization. In doing so it shook IBM (and its investors) — the company has a huge share of the mainframe modernization market.
At Thoughtworks, however, the piece excited us — while also prompting a wry smile: We’ve been working on modernizing legacy code with AI for some time now and know precisely what is possible and how it should be done. Seeing the topic gain wider visibility, though, is ultimately gratifying and validates our focus and effort over the last 18 months.
While Anthropic’s post offers a good overview of what legacy modernization with generative AI actually looks like, it skips over some of the details and nuances that anyone who’s been working on these challenges will be all too aware of. With that in mind, we wanted to add more color to Anthropic’s story and use the increased attention on the topic to share our experiences and perspectives.
How do AI tools actually map a code base?
Reading Anthropic’s blog post, one might think that the process involves just prompting Claude Code to map a legacy codebase, but based on our experience, what we've seen working well is a process that involves a layered combination of:
Chunking large codebases into manageable units (that could be files, programs, sections and procedures).
Local summarization of those units.
Extracting entities, such as database tables, segments, copybooks, transactions, jobs and screens.
Inferring relationships, like who calls who and what reads and writes which parts of the data.
A synthesis into higher level abstractions.
In other words, there are multiple things happening at once. This in itself poses challenges — what we have is essentially probabilistic reasoning over partial context windows. At present, even with large context sizes, models and agents alone cannot ingest tens of millions of lines of code simultaneously and reliably enough to draw a modernization roadmap. This means the mapping process has to rely on:
External scaffolding (static and dynamic analysis).
Iterative summarisation passes.
Prompt engineering (ie. with a human-in-the-loop).
In practice, then, AI does not replace tools we’ve already been using for this work like structural analysis tools; it works with them. Modernization projects already use static analysis tools, dependency mapping, data lineage extraction and job flow mapping; AI augments this ecosystem.
We use this approach because it’s strong on certain things that existing tools are not, such as...
Abstraction and summarization.
Pattern detection across heterogeneous artefacts.
Noise reduction (particularly relevant for verbose technical things that are COBOL-specific for example and do not hold any valuable business/ technical information)
Producing knowledge nuggets just in time.
What’s the main challenge of COBOL?
The Anthropic blog post seems to frame the problem of COBOL as one of the world losing its ability to read it, as if COBOL is something impenetrable and ancient. That’s not really true: the main problem with large COBOL systems is rarely that the code is unreadable. In fact, COBOL was deliberately designed to be readable.
It’s really an issue of scale and cognitive load. In this context AI is reducing the burden of processing large volumes of artefacts. That is its core contribution in the reverse engineering space.
How it’s actually done
Treating this as if we’re deploying the tool and letting it run with a prompt is a deeply naive view of execution and modernization overall.
In a real modernization program, the workflow typically includes:
Pre-processing. This involves static/dynamic analysis tools, call flows, data flows, integrations, seams (as we greatly discussed in here for mainframe specifically). Here, AI is not in the picture.
AI-assisted discovery and surfacing of information to build a high level understanding of the legacy system. We’ve seen this in various forms, from chatbot, to documentation, to diagrams. There are many forms that you can build on top of tools like Claude Code. It’s important to remember, though, that it’s the human engineer who consumes this data — they are key for the next step.
AI-assisted modernization planning should account for business priorities, technical constraints, seams, risks, ROI, target delivery platform enablement, transitional architecture and to-be architecture. AI can help incorporate regulatory requirements and other publicly available events or market signals. However, modernization priorities are often shaped by factors that aren’t formally documented or readily available to AIs. Much of what influences a roadmap resides in stakeholders’ tacit knowledge, along with differing incentives and agendas that can shape discussions and outcomes. In these contexts, human expertise is essential. Experienced practitioners provide judgment, surface implicit constraints, reconcile competing interests and ensure that prioritization reflects strategic intent rather than individual influence.
AI-assisted reverse engineering for an increment to deliver, extracting low level requirements from the legacy system, deciding what to retain, drop, or change.
AI-assisted forward engineering to deliver the increment, meaning writing the code and testing, building transitional architecture components, data migration and synchronization, potentially dual run for a period of time, and ultimately cut over. Code generation is a small piece of the problem here, but one that is becoming cheaper and faster with AI tools at our disposal.
Why can’t AI tools like Claude Code just translate code from one language to another?
The assumption that AI can simply translate COBOL into Java treats modernization as a syntactic exercise, as though a system is nothing more than its source code. That premise is flawed.
A direct translation would, in the best case scenario, faithfully reproduce existing architectural constraints, accumulated technical debt and outdated design decisions. It wouldn’t address weaknesses; it would restate them in a different language.
A system is more than code; it includes infrastructure, data and models, operational processes, integration contracts, external dependencies and, of course, people. Modernization, then, isn’t primarily a coding exercise. It’s a delivery challenge: enabling safe cutover, introducing new capabilities, incrementally deprecating legacy components and often applying patterns such as strangler migration to reduce risk to manageable delivery chunks.
In practice, modernization is rarely about preserving the past in a new syntax. It’s about aligning systems with current market demands, infrastructure paradigms, software supply chains and operating models. Even if AI were eventually capable of highly reliable code translation, blind conversion would risk recreating the same system with the same limitations, in another language, without a deliberate strategy for replacing or retiring its legacy ecosystem.
Evaluating tools other than Claude Code for modernizing legacy code
Clearly Claude Code is one of the best coding agents so far, but many open source alternatives, such as OpenCode, are popular. They achieve comparable results when used with state of the art large language models.
Teams should always evaluate these tools using evals; doing so not only helps us measure how good our agents are over time, it also helps us assess which agents perform better for specific use cases.
We would suggest there are a few key things that should be considered when judging AI outputs in this context:
Time and effort (for both tool and human).
Cost.
Correctness —
Are the results supported by context? (Or has there been a lot of hallucination?)
Is the context faithfully represented in the results? (Or have lots of things been missed?)
Drift —
When you ask the same tool to do the same thing what variance do you see in the results? Is this variance impacting the other dimensions?
In terms of Claude Code, it’s exceptional when it comes to time and cost. What it’s not as strong as, though, are the bottom two issues (at the moment, anyway...).
Does Claude Code make COBOL knowledge obsolete?
While the impact of Anthropic’s blog post on IBM stock prices is understandable given IBM’s reputation in this space, it would be wrong for onlookers to assume that this means COBOL knowledge is now obsolete and that modernization has been fully automated.
Solid mainframe skills are a scarce resource or a supply constrained space, so even though AI tools can make a significant impact on this work the knowledge and skill required for this work has not automatically become obsolete or will have to immediately move to a demand constrained space.
Every single mainframe modernization program — irrespective of chosen approach and tooling — requires strong SME involvement. For us, what’s exciting is learning how humans and AI can best collaborate together. Legacy modernization has been a tough and expensive challenge for businesses; and leading indicators demonstrate a net positive economic impact of generativeAI, but they are just that -> leading indicators.
The jury is still out on this — we'll know for sure on the basis of how many more mainframes are replaced in the next six to 12 months.