CircleCI’s latest State of Software Delivery Report confirms something Thoughtworks has been saying to clients for more than a year: AI is accelerating code creation far faster than most organizations can absorb. The report’s analysis of 28 million CI workflows paints an unambiguous picture — average throughput is up 59% year over year, but the overwhelming majority of that activity never reaches production. Main-branch throughput for the median team actually declined 7%, success rates have fallen to a five-year low and recovery times are climbing.
These are not isolated signals. They describe a systemic condition: the delivery pipeline has become the binding constraint on engineering performance. Organizations that fail to recognize this will find that every dollar spent on AI coding tools produces diminishing returns — or worse, actively degrades their ability to ship reliable software.
This article outlines our perspective on the report’s key findings, where we agree, where we think the analysis could go further and what we believe engineering leaders should do next.
The productivity paradox is now an empirical fact
For much of the past two years, the conversation about AI in software engineering has been dominated by throughput metrics — lines of code written, pull requests opened, features prototyped. CircleCI’s data finally puts hard numbers behind the counter-narrative: more code is not the same as more delivered software.
The report shows that even top-10% teams, those running 47% more workflows overall, achieved essentially flat main-branch activity. Feature branch throughput surged by nearly 50%, but almost none of that translated into shipped changes. For the median team, the picture is starker: a 15% increase in feature-branch activity alongside a 7% decline on the main branch.
This is the productivity paradox that the 2025 DORA Report also identified, and that Thoughtworks has observed across our client engagements: AI boosts individual effectiveness and local throughput, but without corresponding investment in the systems those changes flow through, the net effect can be negative. More code enters the pipeline, more of it fails, and recovery takes longer. The result is not acceleration — it is congestion.
Thoughtworks’ Chief Scientist Martin Fowler has noted that most organizations applying AI to software delivery are still treating it as a coding tool rather than a systems transformation. Only a small number are attempting to treat AI-first software delivery as more than the introduction of a tool that doesn’t require any change management. CircleCI’s data now makes the cost of that narrow approach measurable.
The bottleneck has moved — permanently
One of the report’s most important contributions is its clear demonstration that integration, not code generation, is now the rate-limiting step in software delivery. This is not a temporary adjustment period. It is a structural shift in where value is created and destroyed in the engineering lifecycle.
Consider the numbers: main-branch success rates have fallen to 70.8%, the lowest in five years and well below CircleCI’s recommended benchmark of 90%. Recovery times have risen to 72 minutes on average and nearly 80 minutes on feature branches. For a team pushing just five changes a day, the report estimates this translates to an additional 250 hours per year lost to debugging and blocked deployments. At enterprise scale — 500 changes per day — that becomes the equivalent of 12 full-time engineers consumed entirely by failure recovery.
At Thoughtworks, we see the same pattern in our client work. AI-generated code introduces a specific kind of complexity: it is often syntactically correct but contextually misaligned. It may pass unit tests in isolation but create subtle integration failures when it encounters the real topology of a production system — shared state, legacy APIs, implicit contracts between services. These are precisely the kinds of failures that take longer to diagnose, because the developer who needs to fix the code did not write it and may not fully understand its assumptions.
This is why we have been advocating for what we call AI-first software delivery (AIFSD), an approach that applies AI across the entire lifecycle, from requirements and design through testing, deployment and maintenance, rather than concentrating it at the code-generation step. The CircleCI data validates this framing: the organizations pulling ahead are not simply generating more code. They are operating delivery systems that can absorb, validate and recover from higher volumes of change.
The 5% are not just faster — they are organized differently
The report identifies a striking elite: fewer than one in 20 teams has managed to scale both code creation and code delivery simultaneously. These teams saw main-branch throughput grow 26% while feature-branch activity surged 85%. The top performer ran approximately 136,000 workflows per day — an order of magnitude beyond anything previously observed in the dataset.
CircleCI attributes this to strong validation systems, feedback loops and recovery workflows. We agree, but would frame it more broadly: these teams have invested in their engineering platforms. They have built the internal developer platforms, automated quality gates and observability infrastructure that allow AI-generated change to flow through the system without overwhelming it.
This aligns with another core finding from the 2025 DORA Report: quality internal platforms are the essential foundation for AI success. As Thoughtworks’ Chris Westerhold has argued, while individual productivity soars with AI, sustainable success requires treating AI adoption as an organizational transformation — one centered on platform engineering and developer experience, not just tool procurement.
The CircleCI data on team and company size reinforces this. Performance is roughly U-shaped: small companies (two to five employees) and large enterprises (1,000+) outperform the middle. This is consistent with our experience. Small teams benefit from low coordination overhead and direct ownership. Large enterprises that have invested in platform engineering can absorb complexity through shared infrastructure. Mid-sized organizations, caught between the two, often lack both the simplicity of small teams and the platform maturity of large ones.
Where the report could go further
While the CircleCI report provides an invaluable empirical foundation, there are several dimensions we believe deserve deeper exploration.
Beyond CI metrics to business outcomes
The report measures what happens inside the pipeline — throughput, success rates, recovery time, duration. These are essential operational metrics. But they are proxies for what ultimately matters: whether software is reaching users and creating value. A team that ships 26% more changes to the main branch has not necessarily delivered 26% more business value. Future analysis that connects CI performance to deployment frequency, lead time for changes, or customer-facing incident rates would strengthen the argument considerably.
The role of testing strategy
The declining success rate is one of the report’s most alarming findings, but the analysis stops short of examining why builds are failing more often. Is this primarily a code quality problem, a test coverage problem, or a test design problem? In our experience, many organizations’ test suites were designed for a world where humans wrote code at human speed. They were never built to validate the volume, variety and velocity of AI-generated changes. Flaky tests, slow integration suites and inadequate contract testing become acute bottlenecks when throughput doubles. The solution is not just faster pipelines, but fundamentally rethinking what and how you test in an AI-first world.
Technical debt and code maintainability
The report notes that AI-generated code is harder to debug when it fails, but does not explore the longer-term maintainability implications. Thoughtworks CTO Rachel Laycock has emphasized that while faster delivery grabs headlines, tackling long-standing challenges like technical debt and legacy modernization is where AI can bring truly transformative value. Conversely, if AI is generating code that is difficult to maintain, understand, or extend, organizations may be accumulating technical debt faster than ever — even as their throughput numbers look impressive. The rising recovery times in the CircleCI data may be an early indicator of this debt accumulating.
Security and governance
With AI-generated code entering pipelines at unprecedented volume, the attack surface and governance burden grow proportionally. The Thoughtworks Technology Radar has flagged AI antipatterns including complacency with AI-generated code and AI-accelerated shadow IT as emerging risks. When 30% of main-branch merges are failing, the temptation to relax quality gates or bypass review processes is real — and dangerous. Any framework for scaling AI-driven delivery must include continuous compliance and security validation baked into the pipeline, not bolted on afterward.
What engineering leaders should do now
The CircleCI report’s conclusion — that success in the AI era depends on the ability to validate, integrate and recover at scale — is correct but incomplete. Validation infrastructure is necessary but not sufficient. Based on our experience transforming engineering organizations, we recommend leaders focus on four priorities.
Invest in your internal developer platform
The data is clear: the teams absorbing AI-driven acceleration are the ones with mature platform engineering practices. Self-service infrastructure, standardized deployment pipelines, automated quality gates and built-in observability are what allow 136,000 daily workflows to succeed rather than collapse. If you do not have a platform engineering function today, creating one should be your top priority. If you do, ensure it is designed to handle AI-scale volume.
Redesign your testing strategy for AI-generated code
Test suites designed for human-speed development are not adequate for AI-speed development. Prioritize fast, targeted feedback: contract tests at service boundaries, property-based tests that can catch unexpected behaviors in generated code and AI-aided, test-first development practices that ensure quality is built in rather than inspected after the fact. Treat test infrastructure as a product, not a cost center.
Apply AI across the full delivery lifecycle
Stop treating AI as a coding accelerator and start treating it as a delivery system transformation. AI can improve requirements analysis, architecture validation, test generation, deployment orchestration, incident detection and root-cause analysis. Organizations that apply AI only at the code-generation step are optimizing one link in the chain while the rest rusts. The goal should be AI-first software delivery: end-to-end integration of AI capabilities across every phase of the software lifecycle.
Measure what matters
Throughput is a seductive metric because it is easy to measure and easy to improve with AI tools. But the CircleCI data shows that throughput without stability is not productivity — it is waste. Track the metrics that reflect end-to-end delivery health: deployment frequency, change failure rate, mean time to recovery and lead time for changes (the four DORA metrics). If your throughput is rising but your change failure rate is climbing and your recovery time is growing, you are moving backward, not forward.
Conclusion: The delivery gap is a strategy gap
The CircleCI State of Software Delivery Report provides compelling evidence that AI is reshaping engineering performance — but not in the way most organizations expected. The teams pulling ahead are not the ones writing the most code. They are the ones whose delivery systems can process the most change without breaking.
This is not a CI/CD optimization problem. It is a strategic problem. The gap between teams that can turn AI acceleration into business results and those that cannot will only widen as AI capabilities advance. Organizations that invest now in platform engineering, testing strategy and full-lifecycle AI adoption will compound those advantages. Those that continue to treat AI as a coding productivity tool will find themselves producing more code, shipping less software and spending more time debugging failures they do not understand.
The era of vibe coding is over. The era of delivery engineering has begun. Get your copy of the report.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.