Enable javascript in your browser for better experience. Need to know to enable it? Go here.
Last updated : Oct 27, 2021
NOT ON THE CURRENT EDITION
This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more
Oct 2021
Trial ? Worth pursuing. It is important to understand how to build up this capability. Enterprises should try this technology on a project that can handle the risk.

In recent years we've seen the rise of generic and domain-specific workflow management tools. The drivers behind this rise include the increased usage of data-processing pipelines and the automation of the machine-learning (ML) model development process. Airflow is one of the early open-source task orchestration tools that popularized the definition of directed acyclic graphs (DAGs) as code, an improvement over an XML/YAML pipeline configuration. Although Airflow remains one of the most widely adopted orchestration tools, we encourage you to evaluate other tools based on your unique situation. For example, you may want to choose Prefect, which supports dynamic data-processing tasks as a first-class concern with generic Python functions as tasks; or Argo if you prefer a tight integration with Kubernetes; or Kubeflow or MLflow for ML-specific workflows. Given the rise of new tools, combined with some of the shortfalls of Airflow (such as lack of native support for dynamic workflows and its centralized approach to scheduling pipelines), we no longer recommend Airflow as the default orchestration tool.

We believe that with the increased usage of streaming in analytics and data pipelines, as well as managing data through a decentralized data mesh, the need for orchestration tools to define and manage complex data-processing pipelines is reduced.

Oct 2020
Adopt ? We feel strongly that the industry should be adopting these items. We use them when appropriate on our projects.

Airflow remains our most widely used and favorite open-source workflow management tool for data-processing pipelines as directed acyclic graphs (DAGs). This is a growing space with open-source tools such as Luigi and Argo and vendor-specific tools such as Azure Data Factory or AWS Data Pipeline. However, Airflow differentiates itself with its programmatic definition of workflows over limited low-code configuration files, support for automated testing, open-source and multiplatform installation, rich set of integration points to the data ecosystem and large community support. In decentralized data architectures such as data mesh, however, Airflow currently falls short as a centralized workflow orchestration.

Mar 2017
Trial ? Worth pursuing. It is important to understand how to build up this capability. Enterprises should try this technology on a project that can handle the risk.

Airflow is a tool to programmatically create, schedule and monitor data pipelines. By treating Directed Acyclic Graphs (DAGs) as code, it encourages maintainable, versionable and testable data pipelines. We've leveraged this configuration in our projects to create dynamic pipelines that resulted in lean and explicit data workflows. Airflow makes it easy to define your operators and executors and to extend the library so that it fits the level of abstraction that suits your environment.

Published : Mar 29, 2017

Download the PDF

 

 

English | Español | Português | 中文

Sign up for the Technology Radar newsletter

 

Subscribe now

Visit our archive to read previous volumes