Enable javascript in your browser for better experience. Need to know to enable it? Go here.

Techniques

Adopt ?

  • To measure software delivery performance, more and more organizations are defaulting to the four key metrics as defined by the DORA research program: change lead time, deployment frequency, mean time to restore (MTTR) and change fail percentage. This research and its statistical analysis have shown a clear link between high-delivery performance and these metrics; they provide a great leading indicator for how a delivery organization as a whole is doing.

    We're still big proponents of these metrics, but we've also learned some lessons. We're still observing misguided approaches with tools that help teams measure these metrics based purely on their continuous delivery (CD) pipelines. In particular when it comes to the stability metrics (MTTR and change fail percentage), CD pipeline data alone doesn't provide enough information to determine what a deployment failure with real user impact is. Stability metrics only make sense if they include data about real incidents that degrade service for the users.

    We recommend always to keep in mind the ultimate intention behind a measurement and use it to reflect and learn. For example, before spending weeks building up sophisticated dashboard tooling, consider just regularly taking the DORA quick check in team retrospectives. This gives the team the opportunity to reflect on which capabilities they could work on to improve their metrics, which can be much more effective than overdetailed out-of-the-box tooling. Keep in mind that these four key metrics originated out of the organization-level research of high-performing teams, and the use of these metrics at a team level should be a way to reflect on their own behaviors, not just another set of metrics to add to the dashboard.

  • A single team remote wall is a simple technique to reintroduce the team wall virtually. We recommend that distributed teams adopt this approach; one of the things we hear from teams who moved to remote working is that they miss having the physical team wall. This was a single place where all the various story cards, tasks, status and progress could be displayed, acting as an information radiator and hub for the team. The wall acted as an integration point with the actual data being stored in different systems. As teams have become remote, they've had to revert to looking into the individual source systems and getting an "at a glance" view of a project has become very difficult. While there might be some overhead in keeping this up-to-date, we feel the benefits to the team are worth it. For some teams, updating the physical wall formed part of the daily "ceremonies" the team did together, and the same can be done with a remote wall.

Trial ?

  • Data mesh is a decentralized organizational and technical approach in sharing, accessing and managing data for analytics and ML. Its objective is to create a sociotechnical approach that scales out getting value from data as the organization's complexity grows and as the use cases for data proliferate and the sources of data diversify. Essentially, it creates a responsible data-sharing model that is in step with organizational growth and continuous change. In our experience, interest in the application of data mesh has grown tremendously. The approach has inspired many organizations to embrace its adoption and technology providers to repurpose their existing technologies for a mesh deployment. Despite the great interest and growing experience in data mesh, its implementations face high cost of integration. Moreover, its adoption remains limited to sections of larger organizations and technology vendors are distracting the organizations from the hard socio aspects of data mesh — decentralized data ownership and a federated governance operating model.

    These ideas are explored in Data Mesh, Delivering Data-Driven Value at Scale, which guides practitioners, architects, technical leaders and decision makers on their journeys from traditional big data architecture to data mesh. It provides a complete introduction to data mesh principles and its constituents; it covers how to design a data mesh architecture, guide and execute a data mesh strategy and navigate organizational design to a decentralized data ownership model. The goal of the book is to create a new framework for deeper conversations and lead to the next phase in maturity of data mesh.

  • In an organization that practices the "you build it, you run it" principle, a definition of production readiness (DPR) is a useful technique to support teams in assessing and preparing the operational readiness of new services. Implemented as a checklist or a template, a DPR gives teams guidance on what to think about and consider before they bring a new service into production. While DPRs do not define specific service-level objectives (SLOs) to fulfill (those would be hard to define one-size-fits-all), they remind teams what categories of SLOs to think of, what organizational standards to comply with and what documentation is required. DPRs provide a source of input that teams turn into respective product-specific requirements around, for example, observability and reliability, to feed into their product backlogs.

    DPRs are closely related to Google's concept of a production readiness review (PRR). In organizations that are too small to have a dedicated site reliability engineering team, or who are concerned that a review board process could negatively impact a team's flow to go live, having a DPR can at least provide some guidance and document the agreed-upon criteria for the organization. For highly critical new services, extra scrutiny on fulfilling the DPR can be added via a PRR when needed.

  • Writing good documentation is an overlooked aspect of software development that is often left to the last minute and done in a haphazard way. Some of our teams have found documentation quadrants a handy way to ensure the right artifacts are being produced. This technique classifies artifacts along two axes: The first axis relates to the nature of the information, practical or theoretical; the second axis describes the context in which the artifact is used, studying or working. This defines four quadrants in which artifacts such as tutorials, how-to guides or reference pages can be placed and understood. This classification system not only ensures that critical artifacts aren't overlooked but also guides the presentation of the content. We've found this particularly useful for creating onboarding documentation that brings developers up to speed quickly when they join a new team.

  • The term standup originated from the idea of standing up during this daily sync meeting, with the goal of making it short. It's a common principle many teams try to abide by in their standups: keep it crisp and to the point. But we're now seeing teams challenge that principle and rethinking remote standups. When co-located, there are lots of opportunities during the rest of the day to sync up with each other spontaneously, as a complement to the short standup. Remotely, some of our teams are now experimenting with a longer meeting format, similar to what the folks at Honeycomb call a “meandering team sync.”

    It's not about getting rid of a daily sync altogether; we still find that very important and valuable, especially in a remote setup. Instead, it's about extending the time blocked in everybody's calendars for the daily sync to up to an hour, and use it in a way that makes some of the other team meetings obsolete and brings the team closer together. Activities can still include the well-tried walkthrough of the team board but are then extended by more detailed clarification discussions, quick decisions, and taking time to socialize. The technique is considered successful if it reduces the overall meeting load and improves team bonding.

  • When putting together a new volume of the Radar, we're often overcome by a sense of déjà vu, and the technique of server-driven UI sparks a particularly strong case with the advent of frameworks that allow mobile developers to take advantage of faster change cycles while not falling foul of an app store's policies around revalidation of the mobile app itself. We've blipped about this before from the perspective of enabling mobile development to scale across teams. Server-driven UI separates the rendering into a generic container in the mobile app while the structure and data for each view is provided by the server. This means that changes that once required a round trip to an app store can now be accomplished via simple changes to the responses the server sends. Note, we're not recommending this approach for all UI development, indeed we've experienced some horrendous, overly configurable messes, but with the backing of behemoths such as AirBnB and Lyft, we suspect it's not only us at Thoughtworks getting tired of everything being done client side. Watch this space.

  • With continued pressure to keep systems secure and no reduction in the general threat landscape, a machine-readable Software Bill of Materials (SBOM) may help teams stay on top of security problems in the libraries that they rely on. The recent Log4Shell zero-day remote exploit was critical and widespread, and if teams had had an SBOM ready, it could have been scanned for and fixed quickly. We've now had production experience using SBOMs on projects ranging from small companies to large multinationals and even government departments, and we're convinced they provide a benefit. Tools such as Syft make it easy to use an SBOM for vulnerability detection.

  • Tactical forking is a technique that can assist with restructuring or migrating from monolithic codebases to microservices. Specifically, this technique offers one possible alternative to the more common approach of fully modularizing the codebase first, which in many circumstances can take a very long time or be very challenging to achieve. With tactical forking a team can create a new fork of the codebase and use that to address and extract one particular concern or area while deleting the unnecessary code. Use of this technique would likely be just one part of a longer-term plan for the overall monolith.

  • A system's architecture mimics an organizational structure and its communication. It's not big news that we should be intentional about how teams interact — see, for instance, the Inverse Conway Maneuver. Team interaction is one of the variables for how fast and how easily teams can deliver value to their customers. We were happy to find a way to measure these interactions; we used the Team Topologies author's assessment which gives you an understanding of how easy or difficult the teams find it to build, test and maintain their services. By measuring team cognitive load, we could better advise our clients on how to change their teams' structure and evolve their interactions.

  • A transitional architecture is a useful practice used when replacing legacy systems. Much like scaffolding might be built, reconfigured and finally removed during construction or renovation of a building, you often need interim architectural steps during legacy displacement. Transitional architectures will be removed or replaced later on, but they're not just throwaway work given the important role they play in reducing risk and allowing a difficult problem to be broken into smaller steps. Thus they help with avoiding the trap of defaulting to a "big bang" legacy replacement approach, because you cannot make smaller interim steps line up with a final architectural vision. Care is needed to make sure the architectural "scaffolding" is eventually removed, lest it just become technical debt later on.

Assess ?

  • How do you approach writing good code? How do you judge if you've written good code? As software developers, we're always looking for catchy rules, principles and patterns that we can use to share a language and values with each other when it comes to writing simple, easy-to-change code.

    Daniel Terhorst-North has recently made a new attempt at creating such a checklist for good code. He argues that instead of sticking to a set of rules like SOLID, using a set of properties to aim for is more generally applicable. He came up with what he calls the CUPID properties to describe what we should strive for to achieve "joyful" code: Code should be composable, follow the Unix philosophy and be predictable, idiomatic and domain based.

  • We recommend organizations assess inclusive design as a way of making sure accessibility is treated as a first-class requirement. All too often requirements around accessibility and inclusivity are ignored until just before, if not just after, the release of software. The cheapest and simplest way to accommodate these requirements, while also providing early feedback to teams, is to incorporate them fully into the development process. In the past, we've highlighted techniques that perform a "shift-left" for security and cross-functional requirements; one perspective on this technique is that it achieves the same goal for accessibility.

  • We're continuing to see increasing use of the Kubernetes Operator pattern for purposes other than managing applications deployed on the cluster. Using the Operator pattern for nonclustered resources takes advantage of custom resource definitions and the event-driven scheduling mechanism implemented in the Kubernetes control plane to manage activities that are related to yet outside of the cluster. This technique builds on the idea of Kube-managed cloud services and extends it to other activities, such as continuous deployment or reacting to changes in external repositories. One advantage of this technique over a purpose-built tool is that it opens up a wide range of tools that either come with Kubernetes or are part of the wider ecosystem. You can use commands such as diff, dry-run or apply to interact with the operator's custom resources. Kube's scheduling mechanism makes development easier by eliminating the need to orchestrate activities in the proper order. Open-source tools such as Crossplane, Flux and Argo CD take advantage of this technique, and we expect to see more of these emerge over time. Although these tools have their use cases, we're also starting to see the inevitable misuse and overuse of this technique and need to repeat some old advice: Just because you can do something with a tool doesn't mean you should. Be sure to rule out simpler, conventional approaches before creating a custom resource definition and taking on the complexity that comes with this approach.

  • Service mesh is usually implemented as a reverse-proxy process, aka sidecar, deployed alongside each service instance. Although these sidecars are lightweight processes, the overall cost and operational complexity of adopting service mesh increases with every new instance of the service requiring another sidecar. However, with the advancements in eBPF, we're observing a new service mesh without sidecar approach where the functionalities of the mesh are safely pushed down to the OS kernel, thereby enabling services in the same node to communicate transparently via sockets without the need of additional proxies. You can try this with Cilium service mesh and simplify the deployment from one proxy-per-service to one proxy-per-node. We're intrigued by the capabilities of eBPF and find this evolution of service mesh to be important to assess.

  • As software continues to grow in complexity, the threat vector of software dependencies becomes increasingly challenging to guard against. The recent Log4J vulnerability showed how difficult it can be to even know those dependencies — many companies who didn't use Log4J directly were unknowingly vulnerable simply because other software in their ecosystem relied on it. Supply chain Levels for Software Artifacts, or SLSA (pronounced "salsa"), is a consortium-curated set of guidance for organizations to protect against supply chain attacks, evolved from internal guidance Google has been using for years. We appreciate that SLSA doesn't promise a "silver bullet," tools-only approach to securing the supply chain but instead provides a checklist of concrete threats and practices along a maturity model. The threat model is easy to follow with real-world examples of attacks, and the requirements provide guidance to help organizations prioritize actions based on levels of increasing robustness to improve their supply chain security posture. We think SLSA provides applicable advice and look forward to more organizations learning from it.

  • The need to respond quickly to customer insights has driven increasing adoption of event-driven architectures and stream processing. Frameworks such as Spark, Flink or Kafka Streams offer a paradigm where simple event consumers and producers can cooperate in complex networks to deliver real-time insights. But this programming style takes time and effort to master and when implemented as single-point applications, it lacks interoperability. Making stream processing work universally on a large scale can require a significant engineering investment. Now, a new crop of tools is emerging that offers the benefits of stream processing to a wider, established group of developers who are comfortable using SQL to implement analytics. Standardizing on SQL as the universal streaming language lowers the barrier for implementing streaming data applications. Tools like ksqlDB and Materialize help transform these separate applications into unified platforms. Taken together, a collection of SQL-based streaming applications across an enterprise might constitute a streaming data warehouse.

  • Until recently, executing a machine-learning (ML) model was seen as computationally expensive and in some cases required special-purpose hardware. While creating the models still broadly sits within this classification, they can be created in a way that allows them to be run on small, low-cost and low-power consumption devices. This technique, called TinyML, has opened up the possibility of running ML models in situations many might assume infeasible. For example, on battery-powered devices, or in disconnected environments with limited or patchy connectivity, the model can be run locally without prohibitive cost. If you've been considering using ML but thought it unrealistic because of compute or network constraints, then this technique is worth assessing.

Hold ?

  • For organizations using Azure as their primary cloud provider, Azure Data Factory is currently the default for orchestrating data-processing pipelines. It supports data ingestion, copying data from and to different storage types on prem or on Azure and executing transformation logic. Although we've had adequate experience with Azure Data Factory for simple migrations of data stores from on prem to the cloud, we discourage the use of Azure Data Factory for orchestration of complex data-processing pipelines and workflows. We've had some success with Azure Data Factory when it's used primarily to move data between systems. For more complex data pipelines, it still has its challenges, including poor debuggability and error reporting; limited observability as Azure Data Factory logging capabilities don't integrate with other products such as Azure Data Lake Storage or Databricks, making it difficult to get an end-to-end observability in place; and availability of data source-triggering mechanisms only to certain regions. At this time, we encourage using other open-source orchestration tools (e.g., Airflow) for complex data pipelines and limiting Azure Data Factory for data copying or snapshotting. Our teams continue to use Data Factory to move and extract data, but for larger operations we recommend other, more well-rounded workflow tools.

  • We previously featured platform engineering product teams in Adopt as a good way for internal platform teams to operate, thus enabling delivery teams to self-service deploy and operate systems with reduced lead time and stack complexity. Unfortunately we're seeing the "platform team" label applied to teams dedicated to projects that don't have clear outcomes or a well-defined set of customers. As a result, these miscellaneous platform teams, as we call them, struggle to deliver due to high cognitive loads and a lack of clearly aligned priorities as they're dealing with a miscellaneous collection of unrelated systems. They effectively become just another general support team for things that don't fit or that are unwanted elsewhere. We continue to believe platform engineering product teams focused around a clear and well-defined (internal) product offer a better set of outcomes.

  • We continue to perceive production data in test environments as an area for concern. Firstly, many examples of this have resulted in reputational damage, for example, where an incorrect alert has been sent from a test system to an entire client population. Secondly, the level of security, specifically around protection of private data, tends to be less for test systems. There is little point in having elaborate controls around access to production data if that data is copied to a test database that can be accessed by every developer and QA. Although you can obfuscate the data, this tends to be applied only to specific fields, for example, credit card numbers. Finally, copying production data to test systems can break privacy laws, for example, where test systems are hosted or accessed from a different country or region. This last scenario is especially problematic with complex cloud deployments. Fake data is a safer approach, and tools exist to help in its creation. We do recognize there are reasons for specific elements of production data to be copied, for example, in the reproduction of bugs or for training of specific ML models. Here our advice is to proceed with caution.

  • We generally avoid putting blips in Hold when we consider that advice too obvious, including blindly following an architectural style without paying attention to trade-offs. However, the sheer prevalence of teams choosing a single-page application (SPA) by default when they need a website has us concerned that people aren't even recognizing SPAs as an architectural style to begin with, instead immediately jumping into framework selection. SPAs incur complexity that simply doesn't exist with traditional server-based websites: search engine optimization, browser history management, web analytics, first page load time, etc. That complexity is often warranted for user experience reasons, and tooling continues to evolve to make those concerns easier to address (although the churn in the React community around state management hints at how hard it can be to get a generally applicable solution). Too often, though, we don't see teams making that trade-off analysis, blindly accepting the complexity of SPAs by default even when the business needs don't justify it. Indeed, we've started to notice that many newer developers aren't even aware of an alternative approach, as they've spent their entire career in a framework like React. We believe that many websites will benefit from the simplicity of server-side logic, and we're encouraged by techniques like Hotwire that help close the gap on user experience.

 
  • techniques quadrant with radar rings Adopt Trial Assess Hold Adopt Trial Assess Hold
  • New
  • Moved in/out
  • No change

Unable to find something you expected to see?

 

Each edition of the Radar features blips reflecting what we came across during the previous six months. We might have covered what you are looking for on a previous Radar already. We sometimes cull things just because there are too many to talk about. A blip might also be missing because the Radar reflects our experience, it is not based on a comprehensive market analysis.

Unable to find something you expected to see?

 

Each edition of the Radar features blips reflecting what we came across during the previous six months. We might have covered what you are looking for on a previous Radar already. We sometimes cull things just because there are too many to talk about. A blip might also be missing because the Radar reflects our experience, it is not based on a comprehensive market analysis.

Radar

Download Technology Radar Volume 26

English | Español | Português | 中文

Radar

Stay informed about technology

 

Subscribe now

Visit our archive to read previous volumes