Commitizen is a simple tool to help streamline the commit process when using Git. It prompts you to provide any required fields and also formats your commit message appropriately. It supports different conventions for describing the required check-in formats, and you can add your own via an adapter. This simple tool saves time and avoids later rejections from a commit hook.
React Styleguidist is a development environment for React components. It includes a dev server with hot reloading capabilities and generates an HTML style guide for sharing with teams. The style guide shows a live version of all components in one place with documentation and a list of their props. We've mentioned React Styleguidist as a UI dev environment before, and over time it has become our default choice among similar tools in this space.
Building, testing and deploying mobile applications entails complex steps, especially when we consider a pipeline from source code repository to app stores. All of these steps can be automated with scripts and build pipelines in generic CI/CD tools. However, our teams have found Bitrise, a domain-specific CD tool for mobile applications, useful for mobile applications when there was no need to integrate with build pipelines for back-end systems. Bitrise is easy to set up and provides a comprehensive set of prebuilt steps for most mobile development needs.
Keeping dependencies up to date is a chore, but for security reasons it's important to respond to updates in a timely manner. You can use tools to make this process as painless and automated as possible. In practical use our teams have had good experiences with Dependabot. It integrates with GitHub repositories and automatically checks dependencies for new versions. When required, Dependabot will open a pull request with upgraded dependencies.
Detekt is a static code analysis tool for Kotlin. It provides code smell analysis and complexity reports based on highly configurable rule sets. It can be run from the command line and, using plugins, via Gradle, SonarQube and IntelliJ. Our teams have found great value in using Detekt to maintain high code quality. When analysis and report generation are integrated into a build pipeline, it's obviously important that the reports are checked on a regular basis and the team sets aside time to act on the findings.
One of the great pain points in interaction and visual design is the lack of tools built for collaboration. This is where Figma comes in. It has the same functionalities of design programs such as Sketch and Invision, but by being able to collaborate with another person at the same time, it helps you discover new ideas together with real-time collaboration capabilities. Our teams find Figma very useful, especially in remote and distributed design work enablement and facilitation. In addition to its collaboration capabilities, Figma also offers an API that helps to improve the DesignOps process.
Building containerized applications can require complex configurations in development environments and on build agents. If you're building a Java application and use Docker, you might consider using Google's Jib. Jib is an open-source plugin supporting both Maven and Gradle. The Jib plugin uses information from your build config to build your application directly as a Docker image without requiring a Dockerfile or Docker daemon. Jib optimizes around image layering, promising to speed up subsequent builds.
Loki is a visual regression tool that works with Storybook, which we mentioned previously in the context of UI dev environments. With a few lines of configuration, Loki can be used to test all UI components. The preferred mode of operation is using Chrome in a Docker container as this avoids one-pixel differences when tests are run in nonidentical environments. Our experience has been that the tests are very stable, but updates to Storybook tend to cause tests to fail with minor differences. It also seems impossible to test components which use
position: fixedbut you can work around that by wrapping the component with a
Build pipelines that create and deploy containers should include container security scanning. Our teams particularly like Trivy, a vulnerability scanner for containers, because it's easier to set up than other tools, thanks to it shipping as a stand-alone binary. Other benefits of Trivy are that it's open-source software and that it supports distroless containers.
Twistlock is a commercial product with build-time and run-time security vulnerability detection and prevention capabilities. These capabilities span protecting VMs, container schedulers and containers to various registries and repositories that applications rely on. Twistlock has helped our teams accelerate development of regulated applications, where application infrastructure and architecture require compliance with, for example, Payment Card Industry (PCI) standards and the Health Insurance Portability and Accountability Act (HIPAA). Our teams have enjoyed the developer experience that Twistlock provides: the ability to run provisioning as code, the easy integration with other common observability platforms, and the out-of-the-box benchmarks to measure the infrastructure against industry-consensus best practices. We run Twistlock with regular runtime scans over our cloud-native applications, particularly when regulatory compliance is required.
Increasingly we're seeing powerful Internet of Things devices that run Linux rather than a special embedded OS. In order to reduce resource usage and decrease the attack surface, it makes sense to build a custom Linux distribution that only contains the tools and dependencies needed to run the software on the device. In this context the Yocto Project has renewed relevance as a tool to create a Linux distribution tailored to the needs of a specific case. The learning curve is steep and due to its flexibility, it can be easy to do the wrong thing. However, over the many years of its existence, the Yocto Project has attracted an active community that can help. Compared to similar tools, it's easier to integrate into a CD workflow and, unlike Android Things or Ubuntu core for example, it's not tied to a specific ecosystem.
It's often very difficult to get a handle on our software estates as they grow ever more complex. Aplas is a new software mapping tool that can be used to create visualizations of our software landscapes in the form of maps. The tool works by ingesting metadata about your existing systems and then displaying a map over which various views can be projected. Ingestion is either a manual process or one that can be automated via APIs. We're pretty excited to see this product evolve and to see what's possible with the automated collection of metadata. It should be possible, for example, to expose architectural fitness functions such as run cost to create visualizations of how much is being spent on cloud infrastructure. Understanding which systems talk to other systems via which technology is another problem we often face and Aplas can visualize it for us.
asdf-vm is a command-line tool to manage runtime versions of multiple languages, per project. It's similar to other command-line version management tools, such as RVM for Ruby and nvm for Node.js, with the advantage of an extensible plugin architecture to handle multiple languages. Its list of current plugins include many languages as well as tools such as Bazel or tflint, whose runtime version you may need to manage per project.
AWSume is a convenient script to manage AWS session tokens and assume role credentials from the command line. We find AWSume quite handy when we deal with multiple AWS accounts at the same time. Instead of specifying profiles individually in every command, the script reads from the CLI cache and exports them to environment variables. As a result, both the commands and AWS SDKs pick up the right credentials.
Data transformation is an essential part of data-processing workflows: filtering, grouping or joining multiple sources into a format that is suitable for analyzing data or feeding machine-learning models. dbt is an open-source tool and a commercial SaaS product that provides simple and effective transformation capabilities for data analysts. The current frameworks and tooling for data transformation fall either into the group of powerful and flexible — requiring intimate understanding of the programming model and languages of the framework such as Apache Spark — or in the group of dumb drag-and-drop UI tools that don't lend themselves to reliable engineering practices such as automated testing and deployment. dbt fills a niche: it uses SQL — an interface widely understood — to model simple batch transformations, while it provides command-line tooling that encourages good engineering practices such as versioning, automated testing and deployment; essentially it implements SQL-based transformation modeling as code. dbt currently supports multiple data sources, including Snowflake and Postgres, and provides various execution options, such as Airflow and Apache's own cloud offering. Its transformation capability is limited to what SQL offers, and it doesn't support real-time streaming transformations at the time of writing.
Docker Notary is an OSS tool that enables signing of assets such as images, files and containers. This means that the provenance of assets can be asserted which is superuseful in regulated environments and better practice everywhere. As an example, when a container is created, it's signed by a private key and a hash, tied to the publisher's identity, stored as metadata. Once published, the provenance of the container (or other asset) can be checked using the image hash and the publisher's public key. There are publicly available, trusted registries such as the Docker Trusted Registry, but it's also possible to run your own. Our teams have reported some spiky edges running local Notary servers and suggest using a registry that includes Notary where possible.
Given the growing amount of weighty decisions that are derived from large data sets, either directly or as training input for machine learning models, it's important to understand the gaps, flaws and potential biases in your data. Google's Facets project provides two helpful tools in this space: Facets Overview and Facets Dive. Facets Overview visualizes the distribution of values for features in a data set, can show training and validation set skew and can be used to compare multiple data sets; Facets Dive is for drilling down and visualizing individual data points in large data sets, using different visual dimensions to explore the relationships between attributes. They're both useful tools in carrying out ethical bias testing.
With increased adoption of Kubernetes as container orchestrator, the security toolset around containers and Kubernetes is evolving rapidly. Falco is one such container-native tool aimed at addressing runtime security. Falco leverages Sysdig's Linux kernel instrumentation and system call profiling and lets us gain deep insights into system behavior and helps us detect abnormal activities in applications, containers, underlying host or Kubernetes orchestrator itself. We like Falco's capability to detect threats without injecting third-party code or sidecar containers.
We're seeing increased use of Binary attestation for securing the software supply chain, particularly within regulated industries. The currently favored approaches seem to involve either building a custom system for implementing the binary verification or relying on a cloud vendor's service. We're encouraged to see the open-source in-toto enter this space. in-toto is a framework for cryptographically verifying every component and step along the path to production for a software artifact. The project includes a number of integrations into many widely used build, container auditing and deployment tools. A software supply chain tool can be a critical piece of an organization's security apparatus, so we like that as an open-source project, in-toto's behavior is transparent, and its own integrity and supply chain can be verified by the community. We'll have to wait and see if it'll gain a critical mass of users and contributors to compete in this space.
Kubeflow is interesting for two reasons. First, it is an innovative use of Kubernetes Operators which we've spotlighted in our April 2019 edition of the Radar. Second, it provides a way to encode and version machine-learning workflows so that they can be more easily ported from one execution environment to another. Kubeflow consists of several components, including Jupyter notebooks, data pipelines, and control tools. Several of these components are packaged as Kubernetes operators to draw on Kubernetes's ability to react to events generated by pods implementing various stages of the workflow. By packaging the individual programs and data as containers, entire workflows can be ported from one environment to another. This can be useful when moving a useful but computationally challenging workflow developed in the cloud to a custom supercomputer or tensor processing unit cluster.
If your application handles sensitive information (such as cryptographic keys) as plain text in memory, there's a high probability that someone could potentially exploit it as an attack vector and compromise the information. Most of the cloud-based solutions often use hardware security modules (HSM) to avoid such attacks. However, if you're in a situation where you need to do this in a self-hosted manner without access to HSMs, then we've found MemGuard to be quite useful. MemGuard acts as a secured software enclave for storage of sensitive information in memory. Although MemGuard is not a replacement for HSMs, it does deploy a number of security tactics such as protection against cold boot attacks, avoiding interference with garbage collection and fortifying with guard pages to reduce the likelihood of sensitive data being exposed.
Defining and enforcing security policies uniformly across a diverse technology landscape is a challenge. Even for simple applications, you have to control access to their components — such as container orchestrators, services and data stores to keep the services' state — using their components' built-in security policy configuration and enforcement mechanisms.
We're excited about Open Policy Agent (OPA), an open-source technology that attempts to solve this problem. OPA lets you define fine-grained access control and flexible policies as code, using the Rego policy definition language. Rego enforces the policies in a distributed and unobtrusive manner outside of the application code. At the time of this writing, OPA implements uniform and flexible policy definition and enforcement to secure access to Kubernetes APIs, microservices APIs through Envoy sidecar and Kafka. It can also be used as a sidecar to any service to verify access policies or filter response data. Styra, the company behind OPA, provides commercial solutions for centralized visibility to distributed policies. We like to see OPA mature through the CNCF incubation program and continue to build support for more challenging policy enforcement scenarios such as diverse data stores.
Pumba is a chaos testing and network emulation tool for Docker. Pumba can kill, stop, remove or pause Docker containers. Pumba can also emulate networks and simulate different network failures such as delays, packet loss and bandwidth rate limits. Pumba uses the tc tool for network emulation which means it needs to be available in our containers or we need to run Pumba in a sidecar container with tc. Pumba is particularly useful when we want to run some automated chaos tests against a distributed system running on a bunch of containers locally or in the build pipeline.
Google brings us Skaffold, an open-source tool to automate local development workflows, including deployment on Kubernetes. Skaffold detects changes in source code and triggers workflows to build, tag and deploy into a K8s cluster including capturing application logs back to the command line. The workflows are pluggable with different build and deployment tools, but this comes with an opinionated default configuration to make it easier to get started.
The machine learning world has shifted emphasis slightly from exploring what models are capable of understanding to how they do it. Concerns about introducing bias or overgeneralizing a model's applicability have resulted in interesting new tools such as What-If Tool (WIT). This tool helps data scientists to dig into a model's behavior and to visualize the impact various features and data sets have on the output. Introduced by Google and available either through Tensorboard or Jupyter notebooks, WIT simplifies the tasks of comparing models, slicing data sets, visualizing facets and editing individual data points. Although WIT makes it easier to perform these analyses, they still require a deep understanding of the mathematics and theory behind the models. It is a tool for data scientists to gain deeper insights into model behavior. Naive users shouldn't expect any tool to remove the risk or minimize the damage done by a misapplied or poorly trained algorithm.
Azure Data Factory (ADF) is currently Azure's default product for orchestrating data-processing pipelines. It supports data ingestion, copying data from and to different storage types on prem or on Azure and executing transformation logic. While we've had a reasonable experience with ADF for simple migrations of data stores from on prem to cloud, we discourage the use of Azure Data Factory for orchestration of complex data-processing pipelines. Our experience has been challenging due to several factors, including limited coverage of capabilities that can be implemented through coding first, as it appears that ADF is prioritizing enabling low-code platform capabilities first; poor debuggability and error reporting; limited observability as ADF logging capabilities don't integrate with other products such as Azure Data Lake Storage or Databricks, making it difficult to get an end-to-end observability in place; and availability of data source-triggering mechanisms only to certain regions. At this time, we encourage using other open-source orchestration tools (e.g., Airflow) for complex data pipelines and limit ADF for data copying or snapshotting. We're hoping that ADF will address these concerns to support for more complex data-processing workflows and prioritize access to capabilities through code first.