The thorough State of DevOps reports have focused on data-driven and statistical analysis of high-performing organizations. The result of this multiyear research, published in Accelerate, demonstrates a direct link between organizational performance and software delivery performance. The researchers have determined that only four key metrics differentiate between low, medium and high performers: lead time, deployment frequency, mean time to restore (MTTR) and change fail percentage. Indeed, we've found that these four key metrics are a simple and yet powerful tool to help leaders and teams focus on measuring and improving what matters. A good place to start is to instrument the build pipelines so you can capture the four key metrics and make the software delivery value stream visible. GoCD pipelines, for example, provide the ability to measure these four key metrics as a first-class citizen of the GoCD analytics.
We've seen significant benefits from introducing microservices, which have allowed teams to scale the delivery of independently deployed and maintained services. Unfortunately, we've also seen many teams create a frontend monolith — a large, entangled browser application that sits on top of the backend services — largely neutralizing the benefits of microservices. Since we first described micro frontends as a technique to address this issue, we've had almost universally positive experiences with the approach and have found a number of patterns to use micro frontends even as more and more code shifts from the server to the web browser. So far, web components have been elusive in this field, though.
We put polyglot programming on Trial in one of our first Radars to suggest that choosing the right language for the job could significantly boost productivity, and there were new language entrants that were worthy of consideration. We want to reraise this suggestion because we're seeing a new push to standardize language stacks by both developers and enterprises. While we acknowledge that placing no restrictions on language uses can create more problems than it solves, promoting a few languages that support different ecosystems or language features is important for both enterprises to accelerate processes and go live more quickly and developers to have the right tools to solve the problem at hand.
Humans and machines use secrets throughout the value stream of building and operating software. The build pipelines need secrets to interface with secure infrastructures such as container registries, the applications use API keys as secrets to get access to business capabilities, and the service-to-service communications are secured using certificates and keys as secrets. You can set and retrieve these secrets in different ways. We've long cautioned developers about using source code management for storing secrets. We've recommended decoupling secret management from source code and using tools such as git-secrets and Talisman to avoid storing secrets in the source code. We've been using secrets as a service as a default technique for storing and accessing secrets. With this technique you can use tools such as Vault or AWS Key Management Service (KMS) to read/write secrets over an HTTPS endpoint with fine-grained levels of access control. Secrets as a service uses external identity providers such as AWS IAM to identify the actors who request access to secrets. Actors authenticate themselves with the secrets service. For this process to work, it's important to automate bootstrapping the identity of the actors, services and applications. Platforms based on SPIFFE have improved the automation of assigning identities to services.
In the last year we've seen Chaos Engineering move from a much talked-about idea to an accepted, mainstream approach to improving and assuring distributed system resilience. As organizations large and small begin to implement Chaos Engineering as an operational process, we're learning how to apply these techniques safely at scale. The approach is definitely not for everyone, and to be effective and safe, it requires organizational support at scale. Industry acceptance and available expertise will definitely increase with the appearance of commercial services such as Gremlin and deployment tools such as Spinnaker implementing some Chaos Engineering tools.
The container revolution around Docker has massively reduced the friction in moving applications between environments, fueling increased adoption of continuous delivery and continuous deployments. The latter, especially, has blown a rather large hole in the traditional controls over what can go to production. The technique of container security scanning is a necessary response to this threat vector. Tools in the build pipeline automatically check containers flowing through the pipeline against known vulnerabilities. Since our first mention of this technique, the tool landscape has matured and the technique has proven useful on development efforts with our clients.
Continuous delivery for machine learning (CD4ML) models apply continuous delivery practices to developing machine learning models so that they are always ready for production. This technique addresses two main problems of traditional machine learning model development: long cycle time between training models and deploying them to production, which often includes manually converting the model to production-ready code; and using production models that had been trained with stale data.
A continuous delivery pipeline of a machine learning model has two triggers: (1) changes to the structure of the model and (2) changes to the training and test data sets. For this to work we need to both version the data sets and the model's source code. The pipeline often includes steps such as testing the model against the test data set, applying automatic conversion of the model (if necessary) with tools such as H2O, and deploying the model to production to deliver value.
Maintaining proper control over sensitive data is difficult, especially when it's copied outside of a master system of record for backup and recovery purposes. Crypto shredding is the practice of rendering sensitive data unreadable by deliberately overwriting or deleting encryption keys used to secure that data. Considering there are systems, such as audit application or blockchain, that should not or could not delete historical records, this technique is quite useful for privacy protection and GDPR compliance.
For some time now we've recommended that delivery teams take ownership of their entire stack, including infrastructure. This means increased responsibility in the delivery team itself for configuring the infrastructure in a safe, secure and compliant way. When adopting cloud strategies, most organizations default to a tightly locked-down and centrally managed configuration to reduce risk, but this also creates substantial productivity bottlenecks. An alternative approach is to allow teams to manage their own configuration and use an infrastructure configuration scanner to ensure the configuration is safe and secure. Options include open-source scanners such as prowler for AWS and kube-bench for Kubernetes installations. For more continuous detection, take a look at cloud platforms such as AWS Config Rules among other commercial services.
Service mesh is an approach to operating a secure, fast and reliable microservices ecosystem. It has been an important stepping stone in making it easier to adopt microservices at scale. It offers discovery, security, tracing, monitoring and failure handling. It provides these cross-functional capabilities without the need for a shared asset such as an API gateway or baking libraries into each service. A typical implementation involves lightweight reverse-proxy processes, aka sidecars, deployed alongside each service process in a separate container. Sidecars intercept the inbound and outbound traffic of each service and provide cross-functional capabilities mentioned above. This approach has relieved the distributed service teams from building and updating the capabilities that the mesh offers as code in their services. This has lead to an even easier adoption of polyglot programming in a microservices ecosystem. Our teams have been successfully using this approach with open source projects such as Istio and we will continue to monitor other open service mesh implementations such as Linkerd closely.
As developers at ThoughtWorks we're acutely aware of the ethics of the work we do. As society becomes ever more reliant on technology, it's important that we consider ethics when making decisions as software development teams. Several toolkits have emerged that can help us think through some of the future implications of the software we're building. They include Tarot Cards of Tech and Ethical OS, which we've had good feedback on. Ethical OS is a thinking framework and a set of tools that drive discussions around the ethics of building software. The framework is a collaboration between the Institute for the Future and the Tech and Society Solutions Lab. It's based on a practical set of risk zones, such as addiction and the dopamine economy, plus a number of scenarios to drive conversation and discussion.
The more experience we gain with using distributed ledger technologies (DLTs), the more we encounter the rough edges around the current state of smart contracts. Committing automated, irrefutable, irreversible contracts on ledger sounds great in theory. The problems arise when you consider how to use modern software delivery techniques to developing them, as well as the differences between implementations. Immutable data is one thing, but immutable business logic is something else entirely! It's really important to think about whether to include logic in a smart contract. We've also found very different operational characteristics between different implementations. For example, even though contracts can evolve, different platforms support this evolution to a greater or lesser extent. Our advice is to think long and hard before committing business logic to a smart contract and to weigh the merits of the different platforms before you do.
Transfer learning has been quite effective within the field of computer vision, speeding the time to train a model by reusing existing models. Those of us who work in machine learning are excited that the same techniques can be applied to natural language processing (NLP) with the publication of ULMFiT and open source pretrained models and code examples. We think transfer learning for NLP will significantly reduce the effort to create systems dealing with text classification.
We're usually wary of covering diagrammatic techniques, but Wardley mapping is an interesting approach to start conversations around the evolution of an organization's software estate. At their simplest, they're used to visualize the value chains that exist within an organization, starting with customers' needs and progressively plotting the different capabilities and systems used to deliver on those needs along with the evolution of those capabilities and systems. The value of this technique is the process of collaborating to create the maps rather than the artefact itself. We recommend getting the right people in the room to produce them, and then treat them as living, evolving things rather than a complete artefact.
Jupyter Notebooks have gained in popularity among data scientists who use them for exploratory analyses, early-stage development and knowledge sharing. This rise in popularity has led to the trend of productionizing Jupyter Notebooks, by providing the tools and support to execute them at scale. Although we wouldn't want to discourage anyone from using their tools of choice, we don't recommend using Jupyter Notebooks for building scalable, maintainable and long-lived production code — they lack effective version control, error handling, modularity and extensibility among other basic capabilities required for building scalable, production-ready code. Instead, we encourage developers and data scientists to work together to find solutions that empower data scientists to build production-ready machine learning models using continuous delivery practices with the right programming frameworks. We caution against productionization of Jupyter Notebooks to overcome inefficiencies in continuous delivery pipelines for machine learning, or inadequate automated testing.
Change data capture (CDC) is a very powerful technique for pulling database changes out of a system and performing some actions on that data. One of the most popular ways of doing this is to use the database's transaction log to identify changes and then publish those changes directly onto an event bus that can be consumed by other services. This works very well for use cases such as breaking monoliths into microservices but when used for first-class integration between microservices, this leads to puncturing encapsulation and leaking the source service's data layer into the event contract. We've talked about domain scoped events and other techniques that emphasize the importance of having our events model our domain properly. We're seeing some projects use CDC for publishing row-level change events and directly consuming these events in other services. This puncturing of encapsulation with change data capture can be a slippery slope leading to fragile integrations and we would like to call this out with this blip.
We've seen organizations successfully move from very infrequent releases to a higher cadence by using the release train concept. The release train is a technique for coordinating releases across multiple teams or components that have runtime dependencies. All releases happen on a fixed and reliable schedule regardless of whether all expected features are ready (the train doesn't wait for you — if you miss it you wait for the next one). Although we wholeheartedly endorse discipline around regularly releasing and demoing working software, we've experienced serious drawbacks with the approach over the medium to long term as it reinforces temporal coupling around sequencing of changes and can degrade quality as teams rush to complete features. We prefer to focus on the architectural and organizational approaches necessary to support independent releases. Although the train can be a useful forcing function for speeding up slower teams, we've also seen it as imposing an upper limit on how quickly faster-moving teams can move. We believe that it is a technique that should be approached with a good degree of caution, if at all.
As infrastructures grow in complexity, so do the configuration files that define them. Tools such as AWS CloudFormation, Kubernetes and Helm expect configuration files in JSON or YAML syntax, presumably in an attempt to make them easy to write and process. However, in most cases, teams quickly reach the point where they have some parts that are similar but not quite the same, for example, when the same service must be deployed in different regions with a slightly different setup. For such cases tools offer templating in YAML (or JSON), which has caused a huge amount of frustration with practitioners. The problem is that the syntax of JSON and YAML requires all sorts of awkward compromises to graft templating features such as conditionals and loops into the files. We recommend using an API from a programming language instead or, when this is not an option, a templating system in a programming language, either a general-purpose language such as Python or something specialized such as Jsonnet.