Airflow remains our most widely used and favorite open-source workflow management tool for data-processing pipelines as directed acyclic graphs (DAGs). This is a growing space with open-source tools such as Luigi and Argo and vendor-specific tools such as Azure Data Factory or AWS Data Pipeline. However, Airflow differentiates itself with its programmatic definition of workflows over limited low-code configuration files, support for automated testing, open-source and multiplatform installation, rich set of integration points to the data ecosystem and large community support. In decentralized data architectures such as data mesh, however, Airflow currently falls short as a centralized workflow orchestration.
Bitrise, a domain-specific CD tool for mobile applications, continues to be a useful part of the mobile workflow, and teams really should be using it. Bitrise can build, test and deploy mobile applications all the way from developer laptop to app store publishing. It's easy to set up and provides a comprehensive set of prebuilt steps for most mobile development needs.
Among the available tools for keeping dependencies up to date, Dependabot is a solid default choice in our opinion. Dependabot's integration with GitHub is smooth and automatically sends you pull requests to update your dependencies to their latest versions. It can be enabled at the organization level, so it's very easy for teams to receive these pull requests. If you're not using GitHub, you can still use the Dependabot libraries within your build pipeline. If you're interested in an alternative tool, also consider Renovate, which supports a wider range of services, including GitLab, Bitbucket and Azure DevOps.
Helm is a package manager for Kubernetes. It comes with a repository of curated Kubernetes applications that are maintained in the official Charts repository. Since we last talked about Helm, Helm 3 has been released, and the most significant change is the removal of Tiller, the server-side component of Helm 2. The benefit of a design without Tiller is that you can only make changes to the Kubernetes cluster from the client side, that is, you can only modify the cluster according to the permissions you have as a user of the Helm command. We've used Helm in a number of client projects and its dependency management, templating and hook mechanism has greatly simplified the application lifecycle management in Kubernetes.
Build pipelines that create and deploy containers should include container security scanning. Our teams particularly like Trivy, a vulnerability scanner for containers. We've tried Clair and Anchore Engine among other good tools in this field. Unlike Clair, Trivy doesn’t only check containers but also dependencies in the codebase. Also, because Trivy ships as a stand-alone binary, it's easier to set up and run the scan locally. Other benefits of Trivy are that it's open-source software and that it supports distroless containers.
Implementing sustainable continuous delivery pipelines that can build and deploy production software across multiple environments requires a tool that treats build pipelines and artifacts as first-class citizens. When we first started assessing Concourse, we liked its simple and flexible model, the principle of container-based builds and the fact that it forces you to define pipelines as code. Since then, the usability has improved, and the simple model has stood the test of time. Many of our teams and clients have successfully been using Concourse for large pipeline setups over longer periods of time. We also often leverage Concourse's flexibility to run workers anywhere, for example, when hardware integration tests require a local setup.
This edition of the Radar introduces several new tools for creating web applications that help end users visualize and interact with data. These are more than simple visualization libraries such as D3. Instead, they reduce the effort necessary to build standalone analytic applications for manipulating existing data sets. Dash from Plotly is gaining popularity among data scientists for creating richly functional analytics applications in Python. Dash augments Python data libraries much like Shiny sits on top of R. These applications are sometimes referred to as dashboards, but the range of possible functionality is really much greater than the term implies. Dash is particularly suited to building scalable, production-ready applications, unlike Streamlit, another tool in this class. Consider using Dash when you need to present more sophisticated analyses to business users than a low- or no-code solution such as Tableau can provide.
Kustomize is a tool to manage and customize Kubernetes manifest files. It allows you to select and patch your base Kubernetes resources before applying them to different environments and is now natively supported by kubectl. We like it because it helps keep your code DRY and in contrast to Helm (which is trying to do many things — package management, version management and so on), we find Kustomize follows the Unix philosophy: do one thing well and expect the output of every program to be input to another.
MLflow is an open-source tool for machine-learning experiment tracking and lifecycle management. The workflow to develop and continuously evolve a machine-learning model includes a series of experiments (a collection of runs), tracking the performance of these experiments (a collection of metrics) and tracking and tweaking models (projects). MLflow facilitates this workflow nicely by supporting existing open standards and integrates well with many other tools in the ecosystem. MLflow as a managed service by Databricks on the cloud, available in AWS and Azure, is rapidly maturing and we've used it successfully in our projects. We find MLflow a great tool for model management and tracking, supporting both UI-based and API-based interaction models. Our only growing concern is that MLflow is attempting to deliver too many conflating concerns as a single platform, such as model serving and scoring.
Traditional testing approaches focus on evaluating if our production code is doing what it's supposed to do. However, we could make mistakes in the testing code introducing incomplete or useless assertions that create a false sense of confidence. This is where mutation testing comes in; it assesses the quality of the tests themselves, finding corner cases that are hard to realize. Our teams have used Pitest for a while now, and we recommend its use in Java projects to measure the health of the test suite. In short, mutation testing introduces changes in the production code and executes the same tests a second time; if the tests are still green it means that the tests are not good and need to improve. When you’re using programming languages other than Java Stryker is a good choice in this space.
Sentry is a cross-platform application monitoring tool with a focus on error reporting. Tools like Sentry distinguish themselves from traditional logging solutions such as the ELK Stack in their focus on discovering, investigating and fixing errors. Sentry has been around for a while and supports several languages and frameworks. We've used Sentry in many projects, and it has been really useful in tracking errors, finding out if a commit actually fixed an issue and alerting us if an issue resurfaces due to a regression.
Even though tooling has vastly improved in the infrastructure space, writing a shell script may make sense in some cases. Of course, the syntax of shell scripts can only be described as arcane, and as we've less practice writing shell scripts these days, we've come to like ShellCheck, a linter for shell scripts. ShellCheck can be used from the command line, as part of a build or, even better, as an extension in many popular IDEs. The wiki contains a detailed description of several hundred issues that ShellCheck can detect, and most tools and IDEs provide a way to conveniently access the respective wiki page when an issue is found.
We've used Terraform extensively to create and manage cloud infrastructure. In our experience with larger setups, where code is divided into modules that are included in different ways, teams eventually hit a wall of unavoidable repetition caused by a lack of flexibility. We've addressed this by using Terragrunt, a thin wrapper for Terraform that implements the practices advocated by Yevgeniy Brikman’s Terraform: Up and Running. We've found Terragrunt helpful because it encourages versioned modules and reusability for different environments. Lifecycle hooks are another useful feature providing additional flexibility. In terms of packaging, Terragrunt has the same limitations as Terraform: there is no proper way to define packages or dependencies between packages. As a workaround, you can use modules and specify a version associated with a Git tag.
Security is everyone's concern, and capturing risks early is always better than facing problems later on. In the infrastructure as code space — where Terraform has been an obvious choice to manage cloud environments — we now also have tfsec, a static analysis tool that scans Terraform templates to find potential security issues. Our teams have been using tfsec quite successfully. The tool is easy to set up and use, which makes it a great choice for any development team determined to mitigate security risks to prevent breaches before they happen. Its preset rules for different cloud providers, including AWS and Azure, compliment the benefits that tfsec brings to the teams that use Terraform.
Yarn continues to be the package manager of choice for many teams. We're excited about Yarn 2, a major new release with a long list of changes and improvements. In addition to usability tweaks and improvements in the area of workspaces, Yarn 2 introduces the concept of zero-installs, which allows developers to run a project directly after cloning it. However, Yarn 2 includes some breaking changes which makes the upgrade nontrivial. It also defaults to plug'n'play (PnP) environments and at the same time doesn't support React Native in PnP environments. Teams can, of course, opt out of PnP or stay on Yarn 1. They should be aware, though, that Yarn 1 is now in maintenance mode.
We've included continuous delivery for machine learning as a technique in previous Radars, and in this edition we want to highlight a promising new tool called Continuous Machine Learning (or CML) from the people who made DVC. CML aims to bring the best engineering practices of CI and CD to AI and ML teams and can help to organize your MLOps infrastructure on top of a traditional software engineering stack, instead of creating separate AI platforms. We like that they've prioritized support for DVC and see this as a good sign for this burgeoning new tool.
We've long liked the idea of using static site generators to avoid complexity and improve performance, whenever the use case allows it. Although Eleventy has been around for a few years, it's recently caught our attention as it's matured and previous favorites such as Gatsby.js displayed some scalability problems. Eleventy is quick to learn and easy to build sites with. We also like the ease with which you can create semantic (and therefore more accessible) markup with its templating and its simple and robust support for pagination.
Service meshes and API gateways provide a convenient way to route traffic to a variety of microservices, all of which implement the same API interface. Flagger uses this feature to dynamically adjust the portion of traffic that is routed to a new version of a service. This is a common technique for canary releases or blue/green deployment. Flagger works in conjunction with a variety of popular proxies (including Envoy and Kong) to progressively ramp up requests to a service and report metrics on the load in order to provide fast feedback on a new release. We like that Flagger simplifies this valuable practice so that it can be more widely adopted. Although Flagger is sponsored by Weaveworks, it stands on its own with no obligation to use it in conjunction with Weaveworks' other tooling.
When connecting to server instances on AWS, it is recommended to go through a bastion host instead of a direct connection. However, provisioning a bastion host just for that purpose can be frustrating, which is why AWS Systems Manager’s Session Manager provides tunneling to more comfortably connect to your servers. gossm is an open-source CLI tool that makes the use of the Session Manager even more convenient. gossm lets you leverage the security provided by Session Manager and IAM policies from your terminal using tools such as
scp. It also has some capabilities that the AWS CLI is missing, including server discovery and SSH integration.
With the rise of CD4ML, operational aspects of data engineering and data science have received more attention. Automated data governance is one aspect of this development. Great Expectations is a framework that enables you to craft built-in controls that flag anomalies or quality issues in data pipelines. Just as unit tests run in a build pipeline, Great Expectations makes assertions during execution of a data pipeline. This is useful not only for implementing a sort of Andon for data pipelines but also for ensuring that model-based algorithms remain within the operating range determined by their training data. Automated controls like these can help distribute and democratize data access and custodianship. Great Expectations also ships with a profiler tool to help understand the qualities of a particular data set and to set appropriate limits.
Katran is a high-performance layer 4 load balancer. It's not for everyone, but if you need redundancy for layer 7 load balancers (such as HAProxy or NGINX) or need to scale load balancers to two or more servers, then we recommend assessing Katran. We see Katran as a flexible and efficient choice over techniques such as round-robin DNS over L7 load balancers or the IPVS Kernel model that network engineers usually adopt to solve similar challenges.
Given the increased use of service mesh to deploy collections of containerized microservices, we can expect to see tools emerge that automate and simplify the administrative tasks associated with this architectural style. Kiali is one such tool. Kiali provides a graphical user interface to observe and control networks of services deployed with Istio. We've found Kiali useful for visualizing the topology of services in a network and understanding the traffic routed between them. For example, when used in conjunction with Flagger, Kiali can display requests that have been routed to a canary service release. We particularly like Kiali's ability to artificially inject network faults into a service mesh to test resilience in the face of network interruptions. This practice is all too often ignored due to the complexity of configuring and running failure tests in a complex mesh of microservices.
Litmus is a chaos engineering tool with a low barrier to entry. It allows you to inject various error scenarios into your Kubernetes cluster with minimal effort. We're particularly excited by the range of capabilities Litmus offers beyond your random pod kill, including simulating network, CPU, memory and I/O issues. Litmus also supports tailored experiments to simulate errors for Kafka and Cassandra among other common services.
The concept of differential privacy first appeared in the Radar in 2016. Although the problem of breaking privacy through systematic model inference queries was recognized at the time, it was largely a theoretical issue since remedies were few. The industry has lacked tools to prevent this from happening. Opacus is a new Python library that can be used in conjunction with PyTorch to help thwart one type of differential privacy attack. Although this is a promising development, finding the right model and data set to which it applies has been a challenge. The library is still quite new so we're looking forward to seeing how it'll be accepted going forward.
It's important for a development team to identify whether the dependencies of their application have known vulnerabilities. OSS Index could be used to achieve this goal. OSS Index is a free catalog of open-source components and scanning tools designed to help developers identify vulnerabilities, understand risk and keep their software safe. Our teams are already integrating this index into pipelines via different languages, including AuditJS and Gradle plugin. The speed is fast, vulnerabilities are identified accurately and few false positives occur.
Web UI testing continues to be an active space. Some of the folks who built Puppeteer have since moved on to Microsoft and are now applying their learnings to Playwright, which allows you to write tests for Chromium and Firefox as well as WebKit, all through the same API. Playwright has gained some attention for its support of all the major browser engines, which it currently achieves by including patched versions of Firefox and Webkit. It remains to be seen how quickly other tools can catch up, with more and more support for the Chrome DevTools Protocol as a common API for automating browsers.
pnpm is an up-and-coming package manager for Node.js that we're looking at closely because of its higher speed and greater efficiency compared to other package managers. Dependencies are saved in a single place on the disk and are linked into the respective
node_modulesdirectories. pnpm also supports incremental optimization on file level, provides a solid API foundation to allow extension/customization and supports store server mode, which speeds up dependency download even more. If your organization has a large number of projects with the same dependencies, you may want to take a closer look at pnpm.
Sensei from Secure Code Warrior is a Java IDE plugin that makes it easy to create and distribute secure code quality guidelines. At ThoughtWorks we often advocate for "tools over rules," that is, make it easy to do the right thing over applying checklist-like governance rules and procedures, and this tool fits this philosophy. Developers create recipes that can be easily shared with team members. These can be simple or complex and are implemented as queries targeting the Java AST. Examples include warnings for SQL injection, cryptographic weakness and many others. Another feature we like: Since it executes on code changes in the IDE, Sensei provides faster feedback than the more traditional static analysis tools.
Zola is a static site generator written in Rust. As such it comes as a single executable with no dependencies, is very fast and supports all the usual things you'd expect such as Sass, content in markdown and hot reloading. We've had success building static sites with Zola and appreciate how intuitive it is to use.