Enable javascript in your browser for better experience. Need to know to enable it? Go here.
radar blip
radar blip

Declarative data pipeline definition

Last updated : Oct 28, 2020
NOT ON THE CURRENT EDITION
This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more
Oct 2020
Trial ? Worth pursuing. It is important to understand how to build up this capability. Enterprises should try this technology on a project that can handle the risk.

Many data pipelines are defined in a large, more or less imperative script written in Python or Scala. The script contains the logic of the individual steps as well as the code chaining the steps together. When faced with a similar situation in Selenium tests, developers discovered the Page Object pattern, and later many behavior-driven development (BDD) frameworks implemented a split between step definitions and their composition. Some teams are now experimenting with bringing the same thinking to data engineering. A separate declarative data pipeline definition, maybe written in YAML, contains only the declaration and sequence of steps. It states input and output data sets but refers to scripts if and when more complex logic is needed. A La Mode is a relatively new tool that takes a DSL approach to defining pipelines, but airflow-declarative, a tool that turns directed acyclic graphs defined in YAML into Airflow task schedules, seems to have the most momentum in this space.

May 2020
Assess ? Worth exploring with the goal of understanding how it will affect your enterprise.

Many data pipelines are defined in a large, more or less imperative script written in Python or Scala. The script contains the logic of the individual steps as well as the code chaining the steps together. When faced with a similar situation in Selenium tests, developers discovered the Page Object pattern, and later many behavior-driven development (BDD) frameworks implemented a split between step definitions and their composition. Some teams are now experimenting with bringing the same thinking to data engineering. A separate declarative data pipeline definition, maybe written in YAML, contains only the declaration and sequence of steps. It states input and output data sets but refers to scripts if and when more complex logic is needed. With A La Mode, we're seeing the first open source tool appear in this space.

Published : May 19, 2020

Download Technology Radar Volume 29

English | Español | Português | 中文

Stay informed about technology

 

Subscribe now

Visit our archive to read previous volumes