When a team of software delivery specialists set out to improve how developers use AI coding assistants, they weren’t looking to build a new tool from scratch. Instead, they wanted to shape a repeatable, adaptable workflow that helps people get the best out of AI coding assistants while avoiding common pitfalls.
The result was Casper — a simple but powerful, rule-based framework designed to work inside any coding assistant. Casper guides developers through a structured three-phase approach to producing high-quality code with AI: explore, craft and polish.
What problem were you trying to solve?
When developers start using AI coding assistants, they often go through what we call the “honeymoon phase” — productivity seems to skyrocket, tasks that took hours are done in minutes. But delight at that boost in productivity can quickly sour, as developers work with code being generated: each new fix is breaking something else, and the code base is becoming a mess; on occasion, the assistant might declare that it's done, but nothing is working.
Your team will then hit the "disillusionment phase” hits. You might find that the coding assistant has generated hundreds of lines of code, when a dozen would have sufficed. Code becomes overly complex, best practices are ignored and AI makes incorrect assumptions (hallucinations).
Casper was created to provide a structured workflow that gets developers past that frustration phase and keeps AI contributions aligned with good engineering practice.
How does Casper work?
Casper runs entirely on natural-language rule files rather than custom code, making it portable across different AI coding assistants.
It follows three phases:
Explore. Before writing any code, Casper prompts the developer with targeted questions about edge cases, technical approaches and functional requirements. This mirrors how a human would conduct thorough analysis before starting work, reducing the risk of unspoken assumptions. The result is a set of exploration notes capturing agreed-upon functionality and technical direction.
Craft. Using those notes, Casper encourages a test-driven development (TDD) approach. It generates an initial list of test cases, which the developer reviews and refines. The story is then implemented one test case at a time, producing small, readable code changes. This prevents AI from generating overly long or complex code blocks that are hard to review.
Polish. Independently from the implementation phase, Casper runs validation checks: does the code meet acceptance criteria? Were all edge cases handled? Is the coding style consistent? This final pass can uncover missing test cases, error handling or refactoring opportunities.
What makes this different from just “using an AI coding assistant well”?
Without guidance, AI assistants tend to generate large chunks of code quickly — which is impressive but risky. Casper introduces process discipline — what Thoughtworks refers to as sensible defaults — into AI-assisted coding, so road-tested practices like TDD, building for production and security-first thinking are part of every story.
Because Casper is rule-based and written in plain language, teams can easily adapt it for different project contexts — for example, focusing more on refactoring in a brownfield project or integrating extra checks for security-sensitive systems.
What challenges did you encounter while implementing Casper?
Like any workflow, Casper evolved through trial and error.
Early on, the team noticed that the AI sometimes deleted existing test files when adding new tests. This was traced to a misused internal function in the coding assistant. The fix was simple once identified — another benefit of using a flexible, rule-based approach.
More broadly, adoption was gradual. In the initial rollout, about 30 out of 80 developers in the project used Casper regularly; that number was expected to grow to around 50 within weeks. Each team’s feedback helped refine the rules and improve reliability.
How have clients responded to this approach?
While clients were not directly involved in Casper’s design, they did have early questions about security and risk when introducing AI tools. The team addressed these by running workshops, demonstrating the workflow, and working with platform providers (e.g., AWS Bedrock) to ensure data security guarantees were clear and contractually covered.
What are the key lessons learned?
Structure beats spontaneity. AI works best when it follows a process designed for human-AI collaboration.
Natural-language rules are powerful. They make the workflow tool-agnostic and easy to adapt.
Human review remains essential. AI can do much of the heavy lifting but final judgment on quality still comes from people.Gradual adoption works best. Expanding usage in phases allows teams to refine the approach without overwhelming developers.
What’s next for Casper?
The team is exploring how to let AI take over parts of the workflow autonomously when a developer is unavailable. For example, if a developer has completed four out of ten test cases before being pulled into a meeting, Casper could delegate the remaining six to AI agents — one to write the code, another to review it — and present a pull request for human review upon return.
This vision of multi-agent collaboration could extend Casper’s reach even further, turning it from a guided workflow into an active coding partner.
Augmenting coding assistants
Casper shows that the real opportunity in AI-assisted coding isn’t just about having a smarter AI — it’s about having smarter ways for AI to help humans. By embedding good practices into a flexible, tool-agnostic workflow, teams can harness the speed of AI without sacrificing quality, maintainability or security.
The experiment demonstrates a core truth about AI in software delivery: productivity gains are only sustainable when they’re built on strong engineering discipline — and sometimes, the best way to teach AI discipline is to give it a friendly ghost in the IDE.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.