Enable javascript in your browser for better experience. Need to know to enable it? Go here.
Volume 30 | April 2024

Techniques

  • Techniques

    Adopt Trial Assess Hold Adopt Trial Assess Hold
  • New
  • Moved in/out
  • No change

Techniques

Adopt ?

  • 1. Retrieval-augmented generation (RAG)

    Retrieval-augmented generation (RAG) is the preferred pattern for our teams to improve the quality of responses generated by a large language model (LLM). We’ve successfully used it in several projects, including the popular Jugalbandi AI Platform. With RAG, information about relevant and trustworthy documents — in formats like HTML and PDF — are stored in databases that supports a vector data type or efficient document search, such as pgvector, Qdrant or Elasticsearch Relevance Engine. For a given prompt, the database is queried to retrieve relevant documents, which are then combined with the prompt to provide richer context to the LLM. This results in higher quality output and greatly reduced hallucinations. The context window — which determines the maximum size of the LLM input — is limited, which means that selecting the most relevant documents is crucial. We improve the relevancy of the content that is added to the prompt by reranking. Similarly, the documents are usually too large to calculate an embedding, which means they must be split into smaller chunks. This is often a difficult problem, and one approach is to have the chunks overlap to a certain extent.

Trial ?

  • 2. Automatically generate Backstage entity descriptors

    Backstage from Spotify has been widely adopted across our client base as the preferred platform to host developer experience portals. Backstage, on its own, is just a shell that hosts plugins and provides an interface to manage the catalog of assets that make up a platform ecosystem. Any entity to be displayed or managed by Backstage is configured in the catalog-info file, which contains data such as status, lifecycle, dependencies and APIs among other details. By default, individual entity descriptors are written by hand and usually maintained and versioned by the team responsible for the component in question. Keeping the descriptors up to date can be tedious and create a barrier to developer adoption. Also, there is always the possibility that changes are overlooked or that some components are missed entirely. We've found it more efficient and less error-prone to automatically generate Backstage entity descriptors. Most organizations have existing sources of information that can jump-start the process of populating catalog entries. Good development practices, for example, putting appropriate tags on AWS resources or adding metadata to source files, can simplify entity discovery and descriptor generation. These automated processes can then be run on a regular basis — once a day, for example — to keep the catalog fresh and up to date.

  • 3. Combining traditional NLP with LLMs

    Large language models (LLMs) are the Swiss Army knives of natural language processing (NLP). But they’re also quite expensive and not always the best tool for the job — sometimes it's more effective to use a proper corkscrew. Indeed, there’s a lot of potential in combining traditional NLP with LLMs, or in building multiple NLP approaches in conjunction with LLMs to implement use cases and leverage LLMs for the steps where you actually need their capabilities. Traditional data science and NLP approaches for document clustering, topic identification and classification and even summarization are cheaper and can be more effective for solving a part of your use case problem. We then use LLMs when we need to generate and summarize longer texts, or combine multiple large documents, to take advantage of the LLM's superior attention span and memory. For example, we’ve successfully used this combination of techniques to generate a comprehensive trends report for a domain from a large corpus of individual trend documents, using traditional clustering alongside the generative power of LLMs.

  • 4. Continuous compliance

    Continuous compliance is the practice of ensuring that software development processes and technologies comply with industry regulations and security standards on an ongoing basis, by heavily leveraging automation. Manually checking for security vulnerabilities and adhering to regulations can slow down development and introduce errors. As an alternative, organizations can automate compliance checks and audits. They can integrate tools into software development pipelines, allowing teams to detect and address compliance issues early in the development process. Codifying compliance rules and best practices helps enforce policies and standards consistently across teams. It enables you to scan code changes for vulnerabilities, enforce coding standards and track infrastructure configuration changes to ensure they meet compliance requirements. Lastly, automated reporting of the above simplifies audits and provides clear evidence of compliance. We’ve already talked about techniques like publishing SBOMs and applying the recommendations from SLSA — they can be very good starting points. The benefits of this technique are multifold. First, automation leads to more secure software by identifying and mitigating vulnerabilities early and, second, development cycles accelerate as manual tasks are eliminated. Reduced costs and enhanced consistency are additional perks. For safety-critical industries like software-driven vehicles, automated continuous compliance can improve the efficiency and reliability of the certification process, ultimately leading to safer and more reliable vehicles on the road.

  • 5. Edge functions

    Although not a new concept, we've noticed the growing availability and use of decentralized code execution via content delivery networks (CDNs). Services such as Cloudflare Workers or Amazon CloudFront Edge Functions provide a mechanism to execute snippets of serverless code close to the client's geographic location. Edge functions not only offer lower latency if a response can be generated at the edge, they also present an opportunity to rewrite requests and responses in a location-specific way on their way to and from the regional server. For example, you might rewrite a request URL to route to a specific server that has local data relevant to a field found in the request body. This approach is best suited to short, fast-running stateless processes since the computational power at the edge is limited.

  • 6. Security champions

    Security champions are team members who think critically about security repercussions of both technical and nontechnical delivery decisions. They raise these questions and concerns with team leadership and have a firm understanding of basic security guidelines and requirements. They help development teams approach all activities during software delivery with a security mindset, thus reducing the overall security risks for the systems they develop. A security champion is not a separate position but a responsibility assigned to an existing member of the team who is guided by appropriate training from security practitioners. Equipped with this training, security champions improve the security awareness of the team by spreading knowledge and acting as a bridge between the development and security teams. One great example of an activity security champions can help drive within the team is threat modeling, which helps teams think about security risks from the start. Appointing and training a security champion on a team is a great first step, but relying solely on champions without proper commitment from leaders can lead to problems. Building a security mindset, in our experience, requires commitment from the entire team and managers.

  • 7. Text to SQL

    Text to SQL is a technique that converts natural language queries into SQL queries that can be executed by a database. Although large language models (LLMs) can understand and transform natural language, creating accurate SQL for your own schema can be challenging. Enter Vanna, an open-source Python retrieval-augmented generation (RAG) framework for SQL generation. Vanna works in two steps: first you create embeddings with the data definition language statements (DDLs) and sample SQLs for your schema, and then you ask questions in natural language. Although Vanna can work with any LLMs, we encourage you to assess NSQL, a domain-specific LLM for text-to-SQL tasks.

  • 8. Tracking health over debt

    We keep experiencing the improvements teams make to their ecosystem by treating the health rating the same as other service-level objectives (SLOs) and prioritizing enhancements accordingly, instead of solely focusing on tracking technical debt. By allocating resources efficiently to address the most impactful issues related to health, teams and organizations can reduce long-term maintenance costs and evolve products more efficiently. This approach also enhances communication between technical and nontechnical stakeholders, fostering a common understanding of the system's state. Although metrics may vary among organizations (see this blog post for examples) they ultimately contribute to long-term sustainability and ensure software remains adaptable and competitive. In a rapidly changing digital landscape, focusing on tracking health over debt of systems provides a structured and evidence-based strategy to maintain and enhance them.

Assess ?

  • 9. AI team assistants

    AI coding assistance tools like GitHub Copilot are currently mostly talked about in the context of assisting and enhancing an individual's work. However, software delivery is and will remain team work, so you should be looking for ways to create AI team assistants to help create the "10x team," as opposed to a bunch of siloed AI-assisted 10x engineers. We've started using a team assistance approach that can increase knowledge amplification, upskilling and alignment through a combination of prompts and knowledge sources. Standardized prompts facilitate the use of agreed-upon best practices in the team context, such as techniques and templates for user story writing or the implementation of practices like threat modeling. In addition to prompts, knowledge sources made available through retrieval-augmented generation provide contextually relevant information from organizational guidelines or industry-specific knowledge bases. This approach gives team members access to the knowledge and resources they need just in time.

  • 10. Graph analysis for LLM-backed chats

    Chatbots backed by large language models (LLMs) are gaining a lot of popularity right now, and we’re seeing emerging techniques around productionizing and productizing them. One such productization challenge is understanding how users are conversing with a chatbot that is driven by something as generic as an LLM, where the conversation can go in many directions. Understanding the reality of conversation flows is crucial to improving the product and improving conversion rates. One technique to tackle this problem is to use graph analysis for LLM-backed chats. The agents that support a chat with a specific desired outcome — such as a shopping action or a successful resolution of a customer's problem — can usually be represented as a desired state machine. By loading all conversations into a graph, you can analyze actual patterns and look for discrepancies to the expected state machine. This helps find bugs and opportunities for product improvement.

  • 11. LLM-backed ChatOps

    LLM-backed ChatOps is an emerging application of large language models through a chat platform (primarily Slack) that allows engineers to build, deploy and operate software via natural language. This has the potential to streamline engineering workflows by enhancing the discoverability and user-friendliness of platform services. At the time of writing, two early examples are PromptOps and Kubiya. However, considering the finesse needed for production environments, organizations should thoroughly evaluate these tools before allowing them anywhere near production.

  • 12. LLM-powered autonomous agents

    LLM-powered autonomous agents are evolving beyond single agents and static multi-agent systems with the emergence of frameworks like Autogen and CrewAI. These frameworks allow users to define agents with specific roles, assign tasks and enable agents to collaborate on completing those tasks through delegation or conversation. Similar to single-agent systems that emerged earlier, such as AutoGPT, individual agents can break down tasks, utilize preconfigured tools and request human input. Although still in the early stages of development, this area is developing rapidly and holds exciting potential for exploration.

  • 13. Using GenAI to understand legacy codebases

    Generative AI (GenAI) and large language models (LLMs) can help developers both write and understand code. In practical application, this is so far mostly limited to smaller code snippets, but more products and technology developments are emerging for using GenAI to understand legacy codebases. This is particularly useful in the case of legacy codebases that aren’t well-documented or where the documentation is outdated or misleading. For example, Driver AI or bloop use RAG approaches that combine language intelligence and code search with LLMs to help users find their way around a codebase. Emerging models with larger and larger context windows will also help to make these techniques more viable for sizable codebases. Another promising application of GenAI for legacy code is in the space of mainframe modernization, where bottlenecks often form around reverse engineers who need to understand the existing codebase and turn that understanding into requirements for the modernization project. Using GenAI to assist those reverse engineers can help them get their work done faster.

  • 14. VISS

    Zoom recently open-sourced its Vulnerability Impact Scoring System, or VISS. This system is mainly focused on vulnerability scoring that prioritizes actual demonstrated security measures. VISS differs from the Common Vulnerability Scoring System (CVSS) by not focusing on worst-case scenarios and attempting to more objectively measure the impact of vulnerabilities from a defender's perspective. To this aim, VISS provides a web-based UI to calculate the vulnerability score based on several parameters — categorized into platform, infrastructure and data groups — including the impact on the platform, the number of tenants impacted, data impact and more. Although we don't have too much practical experience with this specific tool yet, we think this kind of priority-tailored assessment approach based on industry and context is worth practicing.

Hold ?

  • 15. Broad integration tests

    While we applaud a focus on automated testing, we continue to see numerous organizations over-invested in what we believe to be ineffective broad integration tests. As the term "integration test" is ambiguous, we've taken the broad classification from Martin Fowler's bliki entry on the subject which indicates a test that requires live versions of all run-time dependencies. Such a test is obviously expensive, because it requires a full-featured test environment with all the necessary infrastructure, data and services. Managing the right versions of all those dependencies requires significant coordination overhead, which tends to slow down release cycles. Finally, the tests themselves are often fragile and unhelpful. For example, it takes effort to determine if a test failed because of the new code, mismatched version dependencies or the environment, and the error message rarely helps pinpoint the source of the error. Those criticisms don't mean that we take issue with automated "black box" integration testing in general, but we find a more helpful approach is one that balances the need for confidence with release frequency. This can be done in two stages by first validating the behavior of the system under test assuming a certain set of responses from run-time dependencies, and then validating those assumptions. The first stage uses service virtualization to create test doubles of run-time dependencies and validates the behavior of the system under test. This simplifies test data management concerns and allows for deterministic tests. The second stage uses contract tests to validate those environmental assumptions with real dependencies.

  • 16. Overenthusiastic LLM use

    In the rush to leverage the latest in AI, many organizations are quickly adopting large language models (LLMs) for a variety of applications, from content generation to complex decision-making processes. The allure of LLMs is undeniable; they offer a seemingly effortless solution to complex problems, and developers can often create such a solution quickly and without needing years of deep machine learning experience. It can be tempting to roll out an LLM-based solution as soon as it’s more or less working and then move on. Although these LLM-based proofs of value are useful, we advise teams to look carefully at what the technology is being used for and to consider whether an LLM is actually the right end-stage solution. Many problems that an LLM can solve — such as sentiment analysis or content classification — can be solved more cheaply and easily using traditional natural language processing (NLP). Analyzing what the LLM is doing and then analyzing other potential solutions not only mitigates the risks associated with overenthusiastic LLM use but also promotes a more nuanced understanding and application of AI technologies.

  • 17. Rush to fine-tune LLMs

    As organizations are looking for ways to make large language models (LLMs) work in the context of their product, domain or organizational knowledge, we're seeing a rush to fine-tune LLMs. While fine-tuning an LLM can be a powerful tool to gain more task-specificity for a use case, in many cases it’s not needed. One of the most common cases of a misguided rush to fine-tuning is about making an LLM-backed application aware of specific knowledge and facts or an organization's codebases. In the vast majority of these cases, using a form of retrieval-augmented generation (RAG) offers a better solution and a better cost-benefit ratio. Fine-tuning requires considerable computational resources and expertise and introduces even more challenges around sensitive and proprietary data than RAG. There is also a risk of underfitting, when you don't have enough data available for fine-tuning, or, less frequently, overfitting, when you have too much data and are therefore not hitting the right balance of task specificity that you need. Look closely at these trade-offs and consider the alternatives before you rush to fine-tune an LLM for your use case.

  • 18. Web components for SSR web apps

    With the adoption of frameworks like Next.js and htmx, we’re seeing more usage of server-side rendering (SSR). As a browser technology, it's not trivial to use web components on the server. Frameworks have sprung up to make this easier, sometimes even using a browser engine, but the complexity is still there. Our developers find themselves needing workarounds and extra effort to order front-end components and server-side components. Worse than the developer experience is the user experience: page load performance is impacted when custom web components have to be loaded and hydrated in the browser, and even with pre-rendering and careful tweaking of the component, a "flash of unstyled content" or some layout shifting is all but unavoidable. As mentioned in the previous Radar, one of our teams had to move their design system away from the web components-based Stencil because of these issues. Recently, we received reports from another team that they ended up replacing server-side–generated components with browser-side components because of the development complexity. We caution against the use of web components for SSR web apps, even if supported by frameworks.

Unable to find something you expected to see?

 

Each edition of the Radar features blips reflecting what we came across during the previous six months. We might have covered what you are looking for on a previous Radar already. We sometimes cull things just because there are too many to talk about. A blip might also be missing because the Radar reflects our experience, it is not based on a comprehensive market analysis.

Download the PDF

 

 

English | Español | Português | 中文

Sign up for the Technology Radar newsletter

 

Subscribe now

Visit our archive to read previous volumes