Techniques
Adopt
-
1. Data product thinking
Organizations actively adopt data product thinking as a standard practice for managing data assets. This approach treats data as a product with its own lifecycle, quality standards and focus on meeting consumer needs. We now recommend it as default advice for data management, regardless of whether organizations choose architectures like data mesh or lakehouse.
We emphasize consumer-centricity in data product thinking to drive greater adoption and value realization. This means designing data products by working backward from use cases. We also focus on capturing and managing both business-relevant metadata and technical metadata using modern data catalogs like DataHub, Collibra, Atlan and Informatica. These practices improve data discoverability and usability. Additionally, we apply data product thinking to scale AI initiatives and create AI-ready data. This approach includes comprehensive lifecycle management, ensuring data is not only well-governed and high quality but also retired in compliance with legal and regulatory requirements when no longer needed.
-
2. Fuzz testing
Fuzz testing , or simply fuzzing, is a testing technique that has been around for a long time but it is still one of the lesser-known techniques. The goal is to feed a software system all kinds of invalid input and observe its behavior. For an HTTP endpoint, for example, bad requests should result in 4xx errors, but fuzz testing often provokes 5xx errors or worse. Well-documented and supported by tools, fuzz testing is more relevant than ever with more AI-generated code and complacency with AI generated code. That means it’s now a good time to adopt fuzz testing to ensure code remains robust and secure.
-
3. Software Bill of Materials
Since our initial blip in 2021, Software Bill of Materials (SBOM) generation has transitioned from an emerging practice to a sensible default across our projects. The ecosystem has matured significantly, offering a robust tooling ecosystem and seamless CI/CD integration. Tools like Syft, Trivy and Snyk provide comprehensive SBOM generation from source code to container images and vulnerability scanning. Platforms such as FOSSA and Chainloop enhance security risk management by integrating with development workflows and enforcing security policies. While a universal SBOM standard is still evolving, broad support for SPDX and CycloneDX has minimized adoption challenges. AI systems also require SBOMs, as evidenced by the UK Government's AI Cyber Security Code of Practice and CISA's AI Cybersecurity Collaboration Playbook. We’ll continue to monitor developments in this space.
-
4. Threat modeling
In the rapidly evolving AI-driven landscape of software development, threat modeling is more crucial than ever for building secure software while maintaining agility and avoiding the security sandwich. Threat modeling — a set of techniques for identifying and classifying potential threats — applies across various contexts, including generative AI applications, which introduce unique security risks. To be effective, it must be performed regularly throughout the software lifecycle and works best alongside other security practices. These include defining cross-functional security requirements to address common risks in the project's technologies and leveraging automated security scanners for continuous monitoring.
Trial
-
5. API request collection as API product artifact
Treating APIs as a product means prioritizing developer experience, not only by putting sensible and standard design into APIs themselves but also providing comprehensive documentation as well as smooth onboarding experiences. While OpenAPI (Swagger) specifications can effectively document API interfaces, onboarding remains a challenge. Developers need rapid access to working examples, with preconfigured authentication and realistic test data. With the maturing of API client tools (such as Postman, Bruno and Insomnia), we recommend treating API request collection as API product artifact. API request collections should be thoughtfully designed to guide developers through key workflows, helping them to understand the API's domain language and functionality with minimal effort. To keep collections up to date, we recommend storing them in a repository and integrating them into the API's release pipeline.
-
6. Architecture advice process
One of the persistent challenges in large software teams is determining who makes the architectural decisions that shape the evolution of systems. The State of DevOps report reveals that the traditional approach of Architecture Review Boards is counterproductive, often hindering workflow and correlating with low organizational performance. A compelling alternative is an architectural advice process — a decentralized approach where anyone can make any architectural decision, provided they seek advice from those affected and those with relevant expertise. This method enables teams to optimize for flow without compromising architectural quality, both at small and large scales. At first glance, this approach may seem controversial, but practices such as Architecture Decision Records and advisory forums ensure that decisions remain informed, while empowering those closest to the work to make decisions. We’ve seen this model succeed at scale in an increasing number of organizations, including those in highly regulated industries.
-
7. GraphRAG
In our last update of RAG, we introduced GraphRAG , originally described in Microsoft's article as a two-step approach: (1) Chunking documents and using an LLM-based analysis of the chunks to create a knowledge graph; (2) retrieving relevant chunks at query time via embeddings while following edges in the knowledge graph to discover additional related chunks, which are then added to the augmented prompt. In many cases this approach enhances LLM-generated responses. We've observed similar benefits when using GenAI to understand legacy codebases, where we use structural information — such as abstract syntax trees and dependencies — to build the knowledge graph. The GraphRAG pattern has gained traction, with tools and frameworks like Neo4j's GraphRAG Python package emerging to support it. We also see Graphiti as fitting a broader interpretation of GraphRAG as a pattern.
-
8. Just-in-time privileged access management
Least privilege ensures users and systems have only the minimum access required to perform their tasks. Privileged credential abuse is a major factor in security breaches, with privilege escalation being a common attack vector. Attackers often start with low-level access and exploit software vulnerabilities or misconfigurations to gain administrator privileges, especially when accounts have excessive or unnecessary rights. Another overlooked risk is standing privileges — continuously available privileged access that expands the attack surface. Just-in-time privileged access management (JIT PAM) mitigates this by granting access only when needed and revoking it immediately after, minimizing exposure. A true least-privilege security model ensures that users, applications and systems have only the necessary rights for the shortest required duration — a critical requirement for compliance and regulatory security. Our teams have implemented this through an automated workflow that triggers a lightweight approval process, assigns temporary roles with restricted access and enforces time to live (TTL) for each role, ensuring privileges expire automatically once the task is completed.
-
9. Model distillation
Scaling laws have been a key driver of the AI boom — the principle that larger models, datasets and compute resources lead to more powerful AI systems. However, consumer hardware and edge devices often lack the capacity to support large-scale models, creating the need for model distillation.
Model distillation transfers knowledge from a larger, more powerful model (teacher) to a smaller, cost-efficient model (student). The process typically involves generating a sample dataset from the teacher model and fine-tuning the student to capture its statistical properties. Unlike pruning or quantization, which focus on compressing models by removing parameters, distillation aims to retain domain-specific knowledge, minimizing accuracy loss. It can also be combined with quantization for further optimization.
Originally proposed by Geoffrey Hinton et al., model distillation has gained widespread adoption. A notable example is the Qwen/Llama distilled version of DeepSeek R1, which preserves strong reasoning capabilities in smaller models. With its growing maturity, the technique is no longer confined to research labs; it’s now being applied to everything from industrial to personal projects. Providers such as OpenAI and Amazon Bedrock offer guides to help developers distill their own small language models (SLMs). We believe adopting model distillation can help organizations manage LLM deployment costs while unlocking the potential of on-device LLM inference.
-
10. Prompt engineering
Prompt engineering refers to the process of designing and refining prompts for generative AI models to produce high-quality, context-aware responses. This involves crafting clear, specific and relevant prompts tailored to the task or application to optimize the model’s output. As LLM capabilities evolve, particularly with the emergence of reasoning models, prompt engineering practices must also adapt. Based on our experience with AI code generation, we’ve observed that few-shot prompting may underperform compared to simple zero-shot prompting when working with reasoning models. Additionally, the widely used chain-of-thought (CoT) prompting can degrade reasoning model performance — likely because reinforcement learning has already fine-tuned their built-in CoT mechanism.
Our hands-on experience aligns with academic research, which suggests "advanced models may eliminate the need for prompt engineering in software engineering." However, traditional prompt engineering techniques still play a crucial role in reducing hallucinations and improving output quality, especially given the differences in response time and token costs between reasoning models and general LLMs. When building agentic applications, we recommend choosing models strategically, based on your needs, while continuing to refine your prompt templates and corresponding techniques. Striking the right balance among performance, response time and token cost remains key to maximizing LLM effectiveness.
-
11. Small language models
The recent announcement of DeepSeek R1 is a great example of why small language models (SLMs) continue to be interesting. The full-size R1 has 671 billion parameters and requires about 1,342 GB of VRAM in order to run, only achievable using a "mini cluster" of eight state-of-the-art NVIDIA GPUs. But DeepSeek is also available "distilled" into Qwen and Llama — smaller, open-weight models — effectively transferring its abilities and allowing it to be run on much more modest hardware. Though the model gives up some performance at those smaller sizes, it still allows a huge leap in performance over previous SLMs. The SLM space continues to innovate elsewhere, too. Since the last Radar, Meta introduced Llama 3.2 at 1B and 3B sizes, Microsoft released Phi-4, offering high-quality results with a 14B model, and Google released PaliGemma 2, a vision-language model at 3B, 10B and 28B sizes. These are just a few of the models currently being released at smaller sizes and definitely an important trend to continue to watch.
-
12. Using GenAI to understand legacy codebases
In the past few months, using GenAI to understand legacy codebases has made some real progress. Mainstream tools such as GitHub Copilot are being touted as being able to help modernize legacy codebases. Tools such as Sourcegraph's Cody are making it easier for developers to navigate and understand entire codebases. These tools use a multitude of GenAI techniques to provide contextual help, simplifying work with complex legacy systems. On top of that, specialized frameworks like S3LLM are showing how LLMs can handle large-scale scientific software — such as that written in Fortran or Pascal — bringing GenAI-enhanced understanding to codebases outside of traditional enterprise IT. We think this technique is going to continue to gain traction given the sheer amount of legacy software in the world.
Assess
-
13. AI友好的代码设计
监督式软件工程代理的能力正在不断提升,它们现在能够识别所需的更新,并对代码库进行更大范围的修改。然而,我们也注意到,开发者对 AI 生成代码的自满情绪 正在增加,很多人不愿意审查由 AI 创建的大型变更集。一个常见的理由是:既然未来的修改可以交给 AI 来完成,那么面向人类的代码质量就没那么重要了。然而,事实恰恰相反——AI 编程助手在结构良好的代码库上表现得更好,因此 AI 友好的代码设计 对于代码的可维护性至关重要。
值得庆幸的是,面向人类的优秀软件设计同样能够为 AI 提供助力。比如,明确的命名可以为代码提供领域上下文和功能信息;模块化和抽象设计能够限制代码改动范围,使 AI 的工作上下文更易于处理;而 DRY(don’t repeat yourself)原则则能减少重复代码,让 AI 更容易确保行为一致性。到目前为止,最适合 AI 的设计模式依然与传统的软件设计最佳实践密切相关。随着 AI 的不断发展,我们预计会有更多专门针对 AI 的设计模式出现。因此,从现在开始以 AI 友好的视角来思考代码设计,将会对未来大有裨益。
-
14. AI驱动的UI测试
AI 在软件团队中的应用正逐步超越单纯的代码生成,新的技术正在涌现。其中, AI 驱动的 UI 测试 正受到越来越多的关注,它利用 LLM 的能力来理解图形用户界面(GUI)。目前,该领域主要有几种不同的实现方式。一种方法是使用针对 UI 快照处理进行微调的多模态 LLM,这类工具允许测试脚本以自然语言编写,并能自主导航应用程序。例如,QA.tech 和 LambdaTest 的 KaneAI 就属于这一类别。另一种方法,则是像 Browser Use 这样,结合多模态基础模型与 Playwright,通过对网页结构的深入理解进行测试,而不是依赖于特定微调的模型。
在测试策略中引入 AI 驱动的 UI 测试时,需要考虑其价值所在。这些方法可以补充人工探索性测试。尽管 LLM 的非确定性特性可能会导致测试结果的不稳定性,但它的模糊匹配能力也可能成为优势,尤其适用于缺少选择器的遗留应用程序或经常变更标签和点击路径的应用。
-
15. 能力边界作为理解系统故障的模型
优雅扩展性理论 定义了适应性系统(包括构建和操作软件的社会技术系统)的基本规则。这一理论中的一个关键概念是 能力边界(competence envelope) —— 系统在面对失败时能够 稳健 运作的边界。当系统被推到能力边界之外时,它会变得脆弱,更容易崩溃。该模型为理解系统故障提供了一个有价值的视角,例如在 2024 Canva 故障 中观察到的复杂故障。
残余性理论 是软件架构思维中最近的发展,它提供了一种方法,可以通过故意引入压力源并分析系统随时间对历史压力源的适应情况,来测试系统的能力边界。这些方法与社会技术系统中的反脆弱性、弹性和稳健性概念相一致。我们期待这些理论在实践中的应用,为提升系统的适应性与可靠性提供新的思路和工具。
-
16. 从LLMs获取结构化输出
从 LLMs 获取结构化输出 是指通过定义的结构模式来约束语言模型的响应。这可以通过指示通用模型以特定格式响应,或者通过微调模型使其“原生”输出例如 JSON 的结构化数据来实现。OpenAI 现在支持结构化输出,允许开发人员提供 JSON Schema、pydantic 或 Zod 对象来约束模型响应。这种能力在函数调用、API 交互和外部集成中尤其有价值,因为这些场景中格式的准确性和一致性至关重要。结构化输出不仅提升了 LLMs 与代码交互的方式,还支持更广泛的使用场景,例如生成用于呈现图表的标记语言。此外,结构化输出已被证明可以减少模型输出中的幻觉现象。
Hold
-
17. AI 加速影子 IT(AI-accelerated Shadow IT)
AI 正在降低非专业开发人员自行构建和集成软件的门槛,使他们无需等待 IT 部门响应自己的需求。尽管我们对这种技术带来的潜力感到兴奋,但同时也开始关注到 AI 加速影子 IT(AI-accelerated Shadow IT) 的初步迹象。一些无代码(No-code)工作流自动化平台已支持对 AI API(如 OpenAI 或 Anthropic)的集成,这使得用户可能倾向于将 AI 用作“胶带”,将此前难以实现的系统集成临时拼凑起来,例如通过 AI 将聊天消息转换为 ERP 系统的 API 调用。同时,越来越多具有自主 Agent 能力的 AI 编码助手,甚至允许仅经过基础培训的非技术人员创建内部工具应用。
这些迹象呈现出类似于电子表格(Spreadsheets)当年迅速扩散的特征:虽然为企业关键流程提供了快速解决方案,但在长期运行后往往会造成规模更大的技术债(Tech Debt)。如果不加管控,这种新型影子 IT 将导致未经治理的应用程序激增,安全隐患加剧,数据分散在不同系统内。我们建议企业对此风险保持警觉,并谨慎评估快速问题解决与长期技术稳定性之间的平衡与取舍。
-
18. 自满于 AI 生成的代码
随着 AI 编码助手的普及,越来越多的数据和研究也揭示了关于 自满于 AI 生成的代码 所带来的问题。GitClear 最新的代码质量研究显示,到 2024 年,重复代码和代码频繁变更的现象比预测的还要严重,而提交历史中的重构活动却在减少。同样反映出对 AI 的自满,微软的研究显示,AI 驱动的信心往往以牺牲批判性思维为代价——这种模式在长期使用编码助手时表现得尤为明显。随着监督式软件工程代理的兴起,这种风险进一步放大,因为当 AI 生成的变更集越来越大时,开发者在审查这些结果时面临的挑战也随之增加。而 vibe coding 的出现——即开发者在审查极少的情况下让 AI 生成代码——更是说明了人们对 AI 生成输出的信任正在增长。这种方法可能适用于原型或其他一次性代码,但我们强烈建议不要将其用于生产环境的代码。
-
19. 本地编码助手
由于对代码机密性的担忧,许多组织对第三方 AI 编码助手保持谨慎态度。因此,许多开发者开始考虑使用 本地编码助手 ,即完全在本地机器上运行的 AI 工具,无需将代码发送到外部服务器。然而,本地助手仍然落后于依赖更大型、更强大模型的云端助手。即使是在高端开发者设备上,较小的模型仍然存在能力上的局限性。我们发现它们难以处理复杂的提示词,缺乏解决更大问题所需的上下文窗口,并且通常无法触发工具集成或函数调用。这些能力对于当前编码辅助领域的前沿技术——代理式工作流——尤为重要。
因此,我们建议在使用本地助手时保持较低的期望值,但也有一些功能在本地环境中是可行的。目前一些流行的 IDE 已将较小的模型嵌入其核心功能中,例如 Xcode 的预测代码补全和 JetBrains 的整行代码补全。此外,可在本地运行的大语言模型,如 Qwen Coder,为本地的行内建议和处理简单编码查询迈出了重要一步。您还可以使用 Continue 测试这些功能,该工具支持通过 Ollama 等运行时集成本地模型。
-
20. 使用AI代替结对编程
当人们谈论编码助手的时候,关于 结对编程 的话题就会不可避免地被提及。我们所处的行业对于它爱恨交织: 有的同行信任它,有的同行讨厌它。现在大家都会对编码助手们提出同样的问题: 一个程序员可以选择与人工智能,而不是另外一个程序员,进行结对编程,从而达到同样的团队产出吗?Github Copilot 甚至自称为“你的 AI 结对程序员”。然而当大家都认为编程助手可以在结对编程方面带来好处时,我们还是不建议完全使用 AI 代替结对编程。把编码助手当做结对编程者忽略了结对编程的一个关键收益: 它可以让团队而不只是个人变得更好。在帮助解决难题,学习新技术,引导新人,或者提高技术任务的效率从而让团队更关注战略性设计等这些方面,使用编程助手确实大有裨益。但在诸如将正在进行的工作的数量控制在低水平,减少团队交接与重复学习,让持续集成成为可能,或者改善集体代码所有权等等这些团队合作相关的层面,它没有带来什么好处。
-
21. 逆向ETL(Reverse ETL)
我们注意到,所谓的 逆向 ETL(Reverse ETL) 正在迅速扩散。常规 ETL 在传统数据架构中是很常见的,它将数据从事务处理系统传输到集中式分析系统(如数据仓库或数据湖)。尽管这种架构存在许多已被广泛记录的缺点,其中一些问题通过 数据网格 方法得到了缓解,但它在企业中仍然很常见。在这种架构中,将数据从集中分析系统逆向回流到事务处理系统,在某些特定场景下有其合理性,例如,集中系统可以汇总来自多个来源的数据,或者在向数据网格迁移的过程中,作为一种 过渡架构 的一部分。然而,我们观察到一个令人担忧的趋势,产品供应商正利用 Reverse ETL 的概念,将越来越多的业务逻辑转移到一个集中式的平台(即它们的产品)中。这种方法加剧了集中式数据架构所导致的许多问题。例如,过度依赖于将业务逻辑嵌入到一个庞大的中央数据平台中,会导致数据管理复杂性增加,并削弱领域团队对其数据的控制能力。因此,我们建议在从庞大的集中数据平台向事务处理系统引入数据流时,务必保持高度谨慎。
-
22. SAFe™
我们持续观察到 SAFe™ (Scaled Agile Framework®)(规模化敏捷框架)正被广泛采用。同时我们也注意到,SAFe 过度标准化和阶段性门限的流程会造成阻碍,这可能助长信息孤岛,其自上而下的管控模式会在价值流中产生浪费,并抑制工程人才的创造力,还会限制团队的自主性和实验空间。企业之所以采用 SAFe,一个关键原因是组织敏捷化过程非常复杂,他们希望像 SAFe 这样的框架能够提供一个基于流程的简单捷径,从而实现敏捷转型。鉴于 SAFe 的广泛应用——包括在我们的客户中——我们已经培训了 100 多名 Thoughtworks 咨询顾问,以便更好地为他们提供支持。尽管我们拥有深入的知识并且做了诸多尝试,但我们仍然认为,面对复杂问题,有时确实没有简单的解决方案。因此,我们一直建议采用与全面变革计划相结合的更加精简、价值驱动的方法和治理模式。
Scaled Agile Framework® 和 SAFe™ 是 Scaled Agile, Inc. 的商标。
Unable to find something you expected to see?
Each edition of the Radar features blips reflecting what we came across during the previous six months. We might have covered what you are looking for on a previous Radar already. We sometimes cull things just because there are too many to talk about. A blip might also be missing because the Radar reflects our experience, it is not based on a comprehensive market analysis.
