技术
采纳
-
1. Continuous compliance
Continuous compliance is the practice of ensuring that software development processes and technologies meet regulatory and security standards on an ongoing basis through automation. Manual compliance checks can slow development and introduce human error, whereas automated checks and audits provide faster feedback, clearer evidence and simplified reporting.
By integrating policy-as-code tools such as Open Policy Agent and generating SBOMs within CD pipelines — aligned with SLSA guidance — teams can detect and address compliance issues early. Codifying rules and best practices enforces standards consistently across teams without creating bottlenecks. OSCAL also shows promise as a framework for automating compliance at scale.
Practices and tooling for continuous compliance are now mature enough that it should be treated as a sensible default, which is why we’ve moved our recommendation to Adopt. The increasing use of AI in coding — and the accompanying risk of complacency with AI-generated code — makes embedding compliance into the development process more critical than ever.
-
2. Curated shared instructions for software teams
For teams actively using AI in software delivery, the next step is moving beyond individual prompting toward curated instructions for software teams. This practice helps you apply AI effectively across all delivery tasks — not just coding — by sharing proven, high-quality instructions. The most straightforward way to implement this is by committing instruction files, such as an AGENTS.md, directly to your project repository. Most AI-coding tools — including Cursor, Windsurf and Claude Code — support sharing instructions through custom slash commands or workflows. For noncoding tasks, you can set up organization-wide prompt libraries ready to use. This systematic approach allows for continuous improvement: As soon as a prompt is refined, the entire team benefits, ensuring consistent access to the best AI instructions.
-
3. Pre-commit hooks
Git hooks have been around for a long time, but we feel they’re still underused. The rise of AI-assisted and agentic coding has increased the risk of accidentally committing secrets or problematic code. While there are many mechanisms for code validation, such as continuous integration, pre-commit hooks are a simple and effective safeguard that more teams should adopt. However, overloading hooks with slow-running checks can discourage developers from using them, so it's best to keep them minimal and focused on risks that are most effectively caught at this stage of the workflow, such as secret scanning.
-
4. Using GenAI to understand legacy codebases
In recent months, we’ve seen clear evidence that using GenAI to understand legacy codebases can significantly accelerate comprehension of large and complex systems. Tools such as Cursor, Claude Code, Copilot, Windsurf, Aider, Cody, Swimm, Unblocked and PocketFlow-Tutorial-Codebase-Knowledge help developers surface business rules, summarize logic and identify dependencies. Used alongside open frameworks and direct LLM prompting, they dramatically reduce the time needed to understand legacy codebases.
Our experience across multiple clients shows that GenAI-assisted understanding of legacy systems is now a practical default rather than an experiment. Setup effort varies, particularly for advanced approaches such as GraphRAG, and tends to scale with the size and complexity of the codebase being analyzed. Despite this, the impact on productivity is consistent and substantial. GenAI has become an essential part of how we explore and understand legacy systems.
试验
-
5. AGENTS.md
AGENTS.md is a common format for providing instructions to AI coding agents working on a project. Essentially a README file for agents, it has no required fields or formatting other than Markdown, relying on the ability of LLM-based coding agents to interpret human-written, human-readable guidance. Typical uses include tips on using tools in the coding environment, testing instructions and preferred practices for managing commits. While AI tools support various methods for context engineering, the value of AGENTS.md lies in creating a simple convention for a file that acts as a starting point.
-
6. AI for code migrations
Code migrations take many forms — from language rewrites to dependency or framework upgrades — and are rarely trivial, often requiring months of manual effort. One of our teams, when upgrading their .NET framework version, experimented with using AI to shorten the process. In the past, we blipped OpenRewrite, a deterministic, rule-based refactoring tool. Using AI alone for such upgrades has often proven costly and prone to meandering conversations. Instead, the team combined traditional upgrade pipelines with agentic coding assistants to manage complex transitions. Rather than delegating a full upgrade, they broke the process into smaller, verifiable steps: analyzing compilation errors, generating migration diffs and validating tests iteratively. This hybrid approach positions AI coding agents as pragmatic collaborators in software maintenance. Industry examples, such as Google’s large-scale int32-to-int64 migration, reflect a similar trend. While our results are mixed in measurable time savings, the potential to reduce developer toil is clear and worth continued exploration.
-
7. Delta Lake liquid clustering
Liquid clustering is a technique for Delta Lake tables that serves as an alternative to partitioning and Z-ordering. Historically, optimizing Delta tables for read performance required defining partition and Z-order keys at table creation based on anticipated query patterns. Modifying these keys later necessitates a full data rewrite. In contrast, clustering employs a tree-based algorithm to cluster data based on designated keys, which can be incrementally changed without rewriting all data. This provides greater flexibility to support diverse query patterns, thereby reducing compute costs and enhancing read performance. Furthermore, the Databricks Runtime for Delta Lake supports automatic liquid clustering by analyzing historical query workloads, identifying optimal columns, and clustering data accordingly. Both standalone Delta Lake and Databricks Runtime users can leverage liquid clustering to optimize read performance.
-
8. Self-serve UI prototyping with GenAI
We use the phrase self-serve UI prototyping with GenAI to describe an emerging technique where tools such as Claude Code, Figma Make, Miro AI and v0 enable product managers to generate interactive, user-testable prototypes directly from text prompts. Instead of manually crafting wireframes, teams can generate functional HTML, CSS and JS artifacts in minutes — offering the speed of a sketch, but with real interactivity and higher fidelity. These “throwaway” prototypes trade polish for rapid learning, making them ideal for early validation during design sprints. However, higher fidelity can lead to misplaced focus on small details or unrealistic expectations about production effort — making clear framing and expectation management essential.
Used alongside user research, this technique accelerates discovery by turning abstract ideas into tangible experiences for users to react to. However, teams should take care not to let these tools replace the research process itself. Done well, self-serve prototyping shortens feedback loops, lowers barriers for non-designers and helps teams iterate quickly while maintaining a healthy balance between speed and quality.
-
9. Structured output from LLMs
Structured output from LLMs is the practice of constraining a large language model to produce responses in a predefined format — such as JSON or a specific programming class. This technique is essential for building reliable, production-grade applications, transforming the LLM's typically unpredictable text into a machine-readable, deterministic data contract. Based on successful production use, we’re moving this technique from Assess to Trial.
Approaches range from simple prompt-based formatting and model-native structured outputs to more robust constrained decoding methods using tools like Outlines and Instructor, which apply finite-state machines to guarantee valid output. We’ve successfully used this technique to extract complex, unstructured data from diverse document types and convert it into structured JSON for downstream business logic.
-
10. TCR (Test && Commit || Revert)
Test && commit || revert (TCR) is a programming workflow derived from test-driven development that promotes very small, continuous steps through a simple rule: After each change, if the tests pass, the changes are committed; if they fail, the changes are reverted. Implementing TCR is straightforward: You only need to define a script that automates this cycle within your codebase. Originally introduced in a canonical article by Kent Beck, we've found that TCR reinforces positive coding practices such as YAGNI and KISS. It's worth evaluating as we experiment with new workflows for building software using GenAI.
评估
-
11. AI-powered UI testing
In the previous Radar, AI-powered UI testing primarily focused on exploratory testing, where we noted that the non-determinism of LLMs could introduce flakiness. With the rise of MCP, we’re now seeing major UI testing frameworks like Playwright and Selenium introduce their own MCP servers (playwright-mcp, mcp-selenium). These provide reliable browser automation through their native technologies, enabling coding assistants to generate reliable UI tests in Playwright or Selenium. While AI-powered UI testing remains a fast-evolving space — the latest Playwright release, for example, introduced Playwright Agents — we’re excited about those developments and look forward to seeing more practical guidance and field experience emerge.
-
12. Anchoring coding agents to a reference application
In the past we blipped the tailored service templates pattern, which helped organizations adopting microservices by providing sensible defaults to bootstrap new services and integrate them seamlessly with existing infrastructure. Over time, however, code drift between these templates and existing services tends to grow as new dependencies, frameworks and architectural patterns emerge. To maintain good practices and architectural consistency — especially in the age of coding agents — we’ve been experimenting with anchoring coding agents to a reference application. This pattern guides generative code agents by providing a live, compilable reference application instead of static prompt examples. A Model Context Protocol (MCP) server exposes both reference template code and commit diffs, enabling agents to detect drift and propose repairs. This approach transforms static templates into living, adaptable blueprints that AI can reference intelligently — maintaining consistency, reducing divergence and improving control over AI-driven scaffolding as systems evolve.
-
13. Context engineering
Context engineering is the systematic design and optimization of the information provided to a large language model during inference to reliably produce the desired output. It involves structuring, selecting and sequencing contextual elements — such as prompts, retrieved data, memory, instructions and environmental signals — so the model’s internal layers operate in an optimal state. Unlike prompt engineering, which focuses only on the wording of prompts, context engineering considers the entire configuration of context: how relevant knowledge, instructions and prior context are organized and delivered to achieve the most effective results.
Today, engineers use a range of discrete techniques that can be grouped into three areas: Context setup covers curation tactics such as using minimal system prompts, canonical few-shot examples and token-efficient tools for decisive action. Context management for long-horizon tasks addresses finite context windows through context summarization , structured note-taking to persist external memories and sub-agent architectures to isolate and summarize complex sub-tasks. Dynamic information retrieval relies on just-in-time (JIT) context retrieval, where agents autonomously load external data only when immediately relevant, maximizing efficiency and precision.
-
14. GenAI for forward engineering
GenAI for forward engineering is an emerging technique for modernizing legacy systems through AI-generated descriptions of legacy codebases. It introduces an explicit step focused on what the legacy code does (its specification) while deliberately hiding how it’s currently implemented. This relates to spec-driven development but is specifically applied to legacy modernization.
By generating and iterating on functional descriptions before rewriting the code, teams can use GenAI to surface hidden logic, dependencies and edge cases that might otherwise be overlooked. Emphasizing the problem space rather than the existing system also allows GenAI models to explore more creative and forward-looking solutions. The workflow follows a reverse-engineering → design/solutioning → forward-engineering loop, which enables both humans and AI agents to reason at a higher level before committing to an implementation.
At Thoughtworks, we’re seeing multiple teams apply this approach successfully to accelerate the rewrite of legacy systems. The goal isn’t to obscure implementation details entirely, but to introduce a temporary abstraction that helps teams and agents explore alternatives without being constrained by the current structure. This technique is showing promising results in producing cleaner, more maintainable and future-ready code while also reducing the time spent understanding existing implementations.
-
15. GraphQL as data access pattern for LLMs
GraphQL as a data access pattern for LLMs is an emerging approach for creating a uniform, model-friendly data access layer that enhances context engineering. It enables teams to expose structured, queryable data without granting models direct database access. Unlike REST APIs, which tend to over-fetch data or require new endpoints or filters for each use case, GraphQL lets the model retrieve only the data it needs — reducing noise, improving context relevance and cutting token usage.
A well-defined GraphQL schema also provides metadata that LLMs can use to reason about available entities and relationships, enabling dynamic, schema-aware querying for agentic use cases. This pattern offers a secure middle ground between REST and SQL, balancing governance controls with flexible access.
However, the approach relies on well-structured schemas and meaningful field names. Interpreting schema semantics and navigating complex structures remain challenging — and what is difficult for people to reason about is often equally difficult for LLMs. It’s also important to be cognizant of the increased vectors for DoS attacks as well as typical GraphQL challenges, such as caching and versioning.
-
16. Knowledge flows over knowledge stocks
One question we get asked a lot is "how do we improve the way we share information amongst our teams?" Techniques for managing organizational knowledge continue to evolve, and one perspective we've found valuable borrows from systems thinking: the concepts of knowledge flows and knowledge stocks. Originally from economics, this framing encourages teams to view their organizational knowledge as a system — stocks representing accumulated knowledge and flows representing how knowledge moves and evolves through the organization. Increasing the flow of external knowledge into an organization tends to boost innovation. A time-tested way to improve flow is to establish communities of practice, which consistently show measurable benefits. Another is by deliberately seeking diverse, external sources of insight. As GenAI tools make existing knowledge stocks more accessible, it's worth remembering that nurturing fresh ideas and external perspectives is just as critical as adopting new technologies.
-
17. LLM as a judge
Using an LLM as a judge — to evaluate the output of another system, usually an LLM-based generator — has garnered attention for its potential to deliver scalable, automated evaluation in generative AI. However, we’re moving this blip from Trial to Assess to reflect newly recognized complexities and risks.
While this technique offers speed and scale, it often fails as a reliable proxy for human judgment. Evaluations are prone to position bias, verbosity bias and low robustness. A more serious issue is scaling contamination: When LLM as a judge is used in training pipelines for reward modeling, it can introduce self-enhancement bias — where a model family favors its own outputs — and preference leakage, blurring the boundary between training and testing. These flaws have led to overfitted results that inflate performance metrics without real-world validity. There have been research studies that conduct more rigorous investigations into this pattern. To counter these flaws, we are exploring improved techniques, such as using LLMs as a jury (employing multiple models for consensus) or chain-of-thought reasoning during evaluation. While these methods aim to increase reliability, they also increase cost and complexity. We advise teams to treat this technique with caution — ensuring human verification, transparency and ethical oversight before incorporating LLM judges into critical workflows. The approach remains powerful but less mature than once believed.
-
18. On-device information retrieval
On-device information retrieval is a technique that enables search, context-awareness and retrieval-augmented generation (RAG) to run entirely on user devices — mobile, desktop or edge devices — prioritizing privacy and computational efficiency. It combines a lightweight local database with a model optimized for on-device inference. A promising implementation pairs sqlite-vec, a SQLite extension that supports vector search within the embedded database, with EmbeddingGemma, a 300 million parameter embedding model built on the Gemma 3 architecture. Optimized for efficiency and resource-constrained environments, this combination keeps data close to the edge, reducing dependence on cloud APIs and improving latency and privacy. We recommend teams assess this technique for local-first applications and other use cases where data sovereignty, low-latency and privacy are critical.
-
19. SAIF
SAIF (Secure AI Framework) is a framework developed by Google to provide a practical guide for managing AI security risks. IIt systematically addresses common threats such as data poisoning and prompt injection through a clear risk map, component analysis and practical mitigation strategies. We find its focus on the evolving risks of building agentic systems especially timely and valuable. SAIF offers a concise, actionable playbook that teams can use to strengthen security practices for LLM usage and AI-driven applications.
-
20. Service mesh without sidecar
As the cost and operational complexity of sidecar-based service meshes persists, we’re excited to see another option for service mesh without sidecar emerge: Istio ambient mode. Ambient mode introduces a layered architecture that separates concerns between two key components: the per-node L4 proxy (ztunnel) and the per-namespace L7 proxy (Waypoint proxy). ztunnel ensures that L3 and L4 traffic is transported efficiently and securely. It powers the ambient data plane by fetching certificates for all node identities and handles traffic redirection to and from ambient-enabled workloads. The Waypoints proxy, an optional ambient mode component, enables richer Istio features such as traffic management, security and observability. We’ve had good experience with ambient mode in small-scale clusters and look forward to gaining more large-scale insights and best practices as adoption grows.
-
21. Small language models
We’ve observed steady progress in the development of small language models (SLMs) across several volumes of the Technology Radar. With growing interest in building agentic solutions, we’re seeing increasing evidence that SLMs can power agentic AI efficiently. Most current agentic workflows are focused on narrow, repetitive tasks that don’t require advanced reasoning, making them a good match for SLMs. Continued advancements in SLMs such as Phi-3, SmolLM2 and DeepSeek suggest that SLMs offer sufficient capability for these tasks — with the added benefits of lower cost, reduced latency and lower resource consumption compared to LLMs. It’s worth considering SLMs as the default choice for agentic workflows, reserving larger, more resource-intensive LLMs only when necessary.
-
22. Spec-driven development
Spec-driven development is an emerging approach to AI-assisted coding workflows. While the term's definition is still evolving, it generally refers to workflows that begin with a structured functional specification, then proceed through multiple steps to break it down into smaller pieces, solutions and tasks. The specification can take many forms: a single document, a set of documents or structured artifacts that capture different functional aspects.
We’ve seen many developers adopt this style (and have one of our own that we’re sharing internally at Thoughtworks). Three tools in particular have recently explored distinct interpretations of spec-driven development. Amazon's Kiro guides users through three workflow stages — requirements, design and tasks creation. GitHub's spec-kit follows a similar three-step process but adds richer orchestration, configurable prompts and a “constitution” defining immutable principles that must always be followed. Tessl Framework (still in private beta as of September 2025) takes a more radical approach in which the specification itself becomes the maintained artifact, rather than the code.
We find this space fascinating, though the workflows remain elaborate and opinionated. These tools behave very differently depending on task size and type; some generate lengthy spec files that are hard to review, and when they produce PRDs or user stories, it's sometimes unclear who their intended user is. We may be relearning a bitter lesson — that handcrafting detailed rules for AI ultimately doesn’t scale.
-
23. Team of coding agents
Team of coding agents refers to a technique in which a developer orchestrates multiple AI coding agents, each with a distinct role — for example, architect, back-end specialist, tester — to collaborate on a development task. This practice is supported by tools such as Claude Code, Roo Code and Kilo Code which enable subagents and multiple operating modes. Building on the proven principle that assigning LLMs specific roles and personas improves output quality, the idea is to achieve better results by coordinating multiple role-specific agents rather than relying on a single general-purpose one. The optimal level of agent is still being explored, but it can extend beyond simple one-to-one mappings with traditional team roles. This approach marks a shift toward orchestrated, multi-step AI-assisted development pipelines.
-
24. Topology-aware scheduling
GPUs and LPUs are no longer standalone devices but tightly coupled networks of accelerators whose performance depends on placement and topology. In rack-scale systems like NVIDIA’s NVL72, 72 GPUs share over 13 TB of VRAM and act as a single accelerator — until workloads cross-switch islands, turning collective operations into bottlenecks. Similarly, Groq’s compile-time, software-scheduled architecture assumes deterministic data movement; random scheduling breaks those assumptions and predictability. Even within the same data center, GPU performance can vary significantly, creating demand for topology-aware scheduling that accounts for both hardware layout and variability when placing jobs.
Naive schedulers that ignore NVLink, PCIe or NIC topology often scatter multi-GPU workloads arbitrarily, resulting in degraded step time and efficiency. Training workloads, which are synchronous and bandwidth-bound, favor contiguous NVLink islands with uniform, high-bandwidth paths for all-reduce and pipeline stages. These jobs should co-schedule based on fabric bandwidth, avoid cross-switch hops and treat link, switch and node boundaries as failure domains. Inference workloads, by contrast, are latency and SLO-bound and typically balance replication for high availability across domains with sharding to keep mixture of experts (MoE) and KV-cache locality on the shortest paths. Optimizing placement for prefill versus decode phases, micro-batching and tenant isolation further improves efficiency. We believe topology-aware scheduling will become essential as accelerator performance grows increasingly dependent on network and data center topology. Our teams are already assessing Kueue and related projects to improve placement precision, boost performance and ensure reliable scaling for our clients.
-
25. Toxic flow analysis for AI
The now-familiar joke that the S in MCP stands for “security” hides a very real problem. When agents communicate with one another — through tool invocation or API calls — they can quickly encounter what's become known as the lethal trifecta: access to private data, exposure to untrusted content and the ability to communicate externally. Agents with all three are highly vulnerable. Because LLMs tend to follow instructions in their input, content that includes a directive to exfiltrate data to an untrusted source can easily lead to data leaks. One emerging technique to mitigate this risk is toxic flow analysis, which examines the flow graph of an agentic system to identify potentially unsafe data paths for further investigation. While still in its early stages, toxic flow analysis represents one of several promising approaches to detecting the new attack vectors that agentic systems and MCP servers are increasingly exposed to.
暂缓
-
26. AI-accelerated shadow IT
AI is lowering the barriers for noncoders to build and integrate software themselves, instead of waiting for the IT department to get around to their requirements. While we’re excited about the potential this unlocks, we’re also wary of the first signs of AI-accelerated shadow IT. No-code workflow automation platforms now support AI API integration (e.g., OpenAI or Anthropic), making it tempting to use AI as duct tape — stitching together integrations that previously weren’t possible, such as turning chat messages in one system into ERP API calls via AI. At the same time, AI coding assistants are becoming more agentic, enabling noncoders with basic training to build internal utility applications.
This has all the hallmarks of the next evolution of the spreadsheets that still power critical processes in some enterprises — but with a much bigger footprint. Left unchecked, this new shadow IT could lead to a proliferation of ungoverned, potentially insecure applications, scattering data across more and more systems. Organizations should be aware of these risks and carefully weigh the trade-offs between rapid problem-solving and long-term stability.
-
27. Capacity-driven development
Key to the success of modern software development practices is maintaining a focus on the flow of work. Stream-aligned teams concentrate on a single, valuable flow — such as a user journey or product — which allows them to deliver end-to-end value efficiently. However, we’re seeing a concerning trend toward capacity-driven development, where teams aligned in this way take on features from other products or streams when they have spare capacity. While this may seem efficient in the short term, it's a local optimization best suited to handling sudden spikes in demand. When normalized it increases cognitive load and technical debt, and in the worst case can lead to congestion collapse as the cost of context switching across products compounds. Teams with spare capacity are better served by focusing on improving system health. To manage capacity effectively, use WIP limits to control work across adjacent streams, consider cross-training to balance periods of high demand and apply dynamic reteaming techniques when needed.
-
28. Complacency with AI-generated code
As AI coding assistants and agents gain traction, so does the body of data and research highlighting concerns about complacency with AI-generated code. While there’s ample evidence these tools can accelerate development — especially for prototyping and greenfield projects — studies show that code quality can decline over time.
GitClear's 2024 research found that duplicate code and code churn have risen more than expected, while refactoring activity in commit histories has dropped. Reflecting a similar trend, Microsoft research on knowledge workers shows that AI-driven confidence often comes at the expense of critical thinking — a pattern we’ve observed as complacency sets in with prolonged use of coding assistants.
The rise of coding agents further amplifies these risks, since AI now generates larger change sets that are harder to review. As with any system, speeding up one part of the workflow increases pressure on the others. Our teams are finding that using AI effectively in production requires renewed focus on code quality. We recommend reinforcing established practices such as TDD and static analysis, and embedding them directly into coding workflows, for example through curated shared instructions for software teams.
-
29. Naive API-to-MCP conversion
Organizations are eager to let AI agents interact with existing systems, often by attempting a seamless, direct conversion of internal APIs to the Model Context Protocol (MCP). A growing number of tools, such as MCP link and FastAPI-MCP, aim to support this conversion.
We advise against this naive API-to-MCP conversion. APIs are typically designed for human developers and often consist of granular, atomic actions that, when chained together by an AI, can lead to excessive token usage, context pollution, and poor agent performance. In addition, these APIs — especially internal ones — frequently expose sensitive data or allow destructive operations. For human developers, such risks are mitigated through architecture patterns and code reviews, but when APIs are naively exposed to agents via MCP, there’s no reliable, deterministic way to prevent an autonomous AI agent from misusing such endpoints. We recommend architecting a dedicated, secure MCP server specifically tailored for agentic workflows, built on top of your existing APIs.
-
30. Standalone data engineering teams
Organizing separate data engineering teams to develop and own data pipelines and products — separate from the stream-aligned business domains they serve — is an antipattern that leads to inefficiencies and weak business outcomes. This structure repeats past mistakes of isolating DevOps, testing or deployment functions, creating knowledge silos, bottlenecks and wasted effort. Without close collaboration, data engineers often lack the business and domain context needed to design meaningful data products, limiting both adoption and value. In contrast, data platform teams should focus on maintaining the shared infrastructure, while cross-functional business teams build and own their data products, following data mesh principles. We put this practice in Hold to strongly discourage siloed organizational patterns — especially as the need for domain-rich, AI-ready data continues to grow.
-
31. Text to SQL
Text to SQL uses LLMs to translate natural language into executable SQL, but its reliability often falls short of expectations. We’ve moved this blip to Hold to discourage its use in unsupervised workflows — for example, dynamically converting user-generated queries where the output is hidden or automated. In these cases, LLMs frequently hallucinate due to limited schema or domain understanding, risking incorrect data retrieval or unintended data modification. The non-deterministic nature of LLM outputs also makes debugging and auditing errors challenging.
We advise treating Text to SQL with caution, human review is required for all generated queries. For agentic business intelligence, avoid direct database access and instead use a governed data abstraction semantic layer — such as Cube or dbt's semantic layer — or a semantically rich access layer like GraphQL or MCP.
无法找到需要的信息?
每期技术雷达中的条目都在试图反映我们在过去六个月中的技术洞见,或许你所搜索的内容已经在前几期中出现过。由于我们有太多想要谈论的内容,有时候不得不剔除一些长期没有发生变化的条目。技术雷达来自于我们的主观经验,而非全面的市场分析,所以你可能会找不到自己最在意的技术条目。