Enable javascript in your browser for better experience. Need to know to enable it? Go here.
Modernise without the big bang:

Governed AI as a delivery safeguard for legacy public services

In the halls of government, the word "legacy" has long been synonymous with technical debt: old code, flickering green screens, and systems that nobody quite remembers how to fix. But in an AI-accelerated environment, the definition has shifted. Today, legacy is no longer just a technical burden; it is a strategic risk.
 

During a recent gathering of public sector technology leaders in Australia, a pressing reality emerged: as citizen expectations for seamless, integrated digital experiences continue to rise, the risk of outages, cyber exposure, and "front-page failures" has made the status quo untenable.
 

However, the solution isn’t the high-risk, "big-bang" rewrite that has historically haunted government budgets and timelines. Instead, a pragmatic approach is emerging: modernizing in "thin slices" using governed AI as a delivery safeguard.

Rethinking legacy: It’s about data and risk, not just age


To move forward, public sector leaders are starting to define legacy not just by the age of the software, but by the criticality of the risk it represents.
 

As one senior federal technology leader put it, the conversation has shifted from, “Is this system out of vendor support?” to, “What data does it hold, how does that data flow, and what risk does that create for citizens and staff?
 

Viewed through that lens, agencies can begin prioritizing systems based on the operational, regulatory and public trust risks they create, rather than simply maintaining the status quo.
 

Seen this way, leaders can classify systems along more practical dimensions:
 

  • End-of-life support: Is the vendor still present, and are the skills to maintain the system still available in the workforce?

  • Data sensitivity and visibility: Is critical citizen data (for example, health, licensing or voting data) locked in a "black box" that prevents safe cross-agency insight and auditability?

  • Process rigidity: Does the system force agencies to build a "faster horse" by replicating outdated processes and interfaces, rather than redesigning services for the 21st century?


This reframing moves legacy out of the IT basement and into the core of risk, resilience and public trust.

 

Data first: the AI “hygiene” effect


One of the most transformative realizations in modern engineering is that AI is often the best catalyst for long-overdue data hygiene. Many agencies find that the moment they attempt to implement AI, they uncover data quality issues that had gone unnoticed for years.
 

When agencies shift to digital-first systems, such as electronic voting, digital licensing, or online case management, they create a wealth of behavioral and operational data. In one agency, moving from paper-based to electronic voting generated detailed insight into when, where, and how people voted. Once data analysts applied AI and advanced analytics, the team could map turnout patterns, plan staffing and infrastructure more accurately, and start asking new questions about accessibility and engagement. Many senior leaders hadn’t realized this asset even existed until the analysis surfaced it.
 

Across multiple departments, similar patterns are emerging:
 

  • Attempts to use AI expose inconsistent formats, missing fields, and legacy workarounds.
  • Business teams, seeing AI-generated insights for the first time, become more motivated to improve upstream data quality.
  • Data hygiene work moves earlier in the value chain, instead of being a last-minute clean-up exercise
  •  

The turning point for any agency is the moment they can credibly say, “We can trust our data now.” Only then can meaningful, scaled AI adoption begin.

 

 

AI as a governed collaborator, not an autonomous actor


Much of the fear around AI in the public sector stems from the image of a "black box" making life-altering decisions. In a governed engineering environment, AI is reframed as a friendly collaborator in delivery and operations, not an autonomous decision-maker.

Concrete applications already in use include:
 

  • AI-assisted case review: AI models triage and flag inconsistencies in complex applications, so human reviewers can focus on the most ambiguous or high-risk cases. In one program, this allowed the agency to safely reduce the number of human reviewers on each case while maintaining scrutiny, delivering material cost savings (around 30% in that context) and more consistent decisions.
  • Reverse-engineering legacy logic: AI tools help teams infer and document business rules buried deep within mainframe or bespoke systems, creating a clearer picture of dependencies and side effects before any change is made.


In these models, AI is never the final decision-maker. Instead, it acts as a safeguard, catching anomalies, highlighting edge cases, and supporting humans with better evidence.
 

This is the essence of “human in the loop” and “human on the loop”: people remain accountable for decisions, while AI provides structured support, transparency and signal-to-noise filtering.

The goal isn't to replace the human element, but to augment it with a governed safety net. When we use AI to reverse-engineer legacy logic or triage workloads, we aren't just speeding up the process, we're creating an audit trail and an 'explainability' layer that actually reduces the risk of human error in high-stakes public services.
Ash Marshall, Client Partner at Thoughtworks.

 

Solving for sovereignty and equity


As governments adopt AI, concerns around "technology colonialism" and data sovereignty have moved from the margins to the mainstream. Public sector leaders are increasingly exploring sovereign and federated AI models, keeping sensitive data within jurisdictional control while still sharing patterns and insights safely across agencies.

 

At the same time, equity is non-negotiable. AI models trained on relatively uniform populations can fail badly on edge cases, particularly in health and social services. For example:

 

  • Diagnostic or risk-scoring models built on historically narrow datasets may systematically misclassify underrepresented groups.

  • “Average” patterns identified by models can mask the needs of people at the margins, who often rely most heavily on public services.
  •  

Designing for sovereignty and fairness and fairness by default is now a prerequisite for any government AI project. Practically, that means:

 

  • Being explicit about where data is stored, how it is shared, and who has access.
  • Testing models rigorously across diverse populations, including edge cases.
  • Embedding accessibility, privacy, and ethics reviews into the delivery process, not as an afterthought. 


Done well, AI can improve equity by making patterns of exclusion more visible. Done poorly, it can entrench or accelerate existing bias. The difference lies in governance and intent.


 

Culture: from resistance to demand

 

Technical hurdles are often smaller than cultural ones. Public sector staff are frequently “change fatigued” and skeptical of new initiatives, especially when past programs have promised transformation and delivered disruption.

 

Leaders who are successfully embedding AI and modernization tend to follow a different pattern:

 

  • Remove the grind: They frame AI first as a way to automate tedious, low-value manual tasks such as rekeying, reconciliations, and simple checks, freeing staff to focus on complex cases, human interaction, and policy work.

  • Secure visible sponsorship: Secretaries, directors-general, and CEOs speak consistently about why the change matters, how staff will be supported, and what guardrails are in place.
  • Leverage the “peer pull” effect: Rather than rolling out tools everywhere at once, they run time-boxed pilots with willing teams. When colleagues see peers benefiting through less rework, fewer late nights, and more focus on meaningful work, demand for AI support grows from the bottom up.
  •  

This approach turns AI from something “done to” staff into something they actively ask for.

 

The playbook: modernise without the big bang


Modernizing what matters, one slice at a time, allows agencies to make progress without taking on the catastrophic risk profile of a total system replacement. A typical thin-slice, AI-assisted approach looks like this:

 

Step 1: Reverse-engineer the current reality

Use AI-assisted engineering and targeted discovery to surface the business rules, data flows, and dependencies hidden within legacy systems, especially mainframes and bespoke platforms. Document what the system actually does today, not just what the original specification said.

 

Step 2: Take a “thin slice” tied to a real outcome

Identify a small, high-value journey tied to a specific citizen or staff outcome, such as a licensing renewal, a benefits eligibility check, or a frontline worker workflow. Design that slice end-to-end on modern, cloud-native foundations, with clear guardrails for data, AI, and security.

 

Step 3: Measure, learn and scale

Put the slice into production and measure its impact: reduced cycle times, improved data accuracy, fewer manual handoffs, and better experiences for citizens and staff. Use that evidence to refine governance, adjust risk thresholds, and expand into adjacent processes over time.

 

This approach changes the conversation from, “Can we afford a $100m+ replacement?” to, “Can we invest a fraction of that in a 3–6 month slice that proves value and reduces risk?” It also gives leaders real-world evidence they can take back to ministers, boards and audit committees.


Internationally, agencies such as Singapore's Government Technology Agency (GovTech),  through initiatives including the Singpass National Digital Identity platform and the Corppass business-to-government authorisation service, and the UK Department for Transport hhave followed similar patterns, working with partners including Thoughtworks to evolve digital identity and transport services through incremental modernization. In each case, thin-slice delivery and strong engineering governance have allowed teams to improve critical services while keeping legacy systems running safely in the background.


 

A starting checklist for public sector leaders


Before embarking on the next phase of your digital roadmap, it can be helpful to ask:

 

  • Do we have a shared, risk-based definition of what “legacy” means for our agency across business, technology, and risk functions?
  • Which citizen or staff journey is most constrained by our legacy systems today?
  • What data do we already have that we aren’t using, and can we honestly say we trust it?
  • Where could AI safely assist humans in review, triage, or analysis right now, without making the final decision?
  • Do we have clear governance guardrails for how AI can and cannot be used in critical workflows?
  • What “thin slice” could we take to production in the next 3 to 6 months to prove value and learn, without betting the entire program?

 

Modernization in the public sector is no longer a “nice to have”; it is a trust imperative. By using governed AI to de-risk the process, agencies can finally move away from the gamble of the big-bang rewrite and toward a more resilient and responsive future.

We're redefining the way the world builds and maintains software