Technology Radar

Apache Beam

Last updated : Apr 24, 2019

NOT ON THE CURRENT EDITION

This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more

Apr 2019

Trial

Apache Beam is an open-source unified programming model for defining and executing both batch and streaming data parallel processing pipelines. The Beam model is based on the Dataflow model which allows us to express logic in an elegant way so that we can easily switch between batch, windowed batch or streaming. The big data-processing ecosystem has been evolving quite a lot which can make it difficult to choose the right data-processing engine. One of the key reasons to choose Beam is that it allows us to switch between different runners — a few months ago Apache Samza was added to the other runners it already supports, which include Apache Spark, Apache Flink and Google Cloud Dataflow. Different runners have different capabilities and providing a portable API is a difficult task. Beam tries to strike a delicate balance by actively pulling innovations from these runners into the Beam model and also working with the community to influence the roadmap of these runners. Beam has SDKs in multiple languages including Java, Python and Golang. We've also had success using Scio which provides a Scala wrapper around Beam.

Nov 2018

Assess

Apache Beam is an open source unified programming model for defining and executing both batch and streaming data-parallel processing pipelines. Beam provides a portable API layer for describing these pipelines independent of execution engines (or runners) such as Apache Spark, Apache Flink or Google Cloud Dataflow. Different runners have different capabilities and providing a portable API is a difficult task. Beam tries to strike a delicate balance by actively pulling innovations from these runners into the Beam model and also working with the community to influence the roadmap of these runners. Beam has a rich set of built-in I/O transformations that cover most of the data pipeline needs and it also provides a mechanism to implement custom transformations for specific use cases. The portable API and extensible IO transformations make a compelling case for assessing Apache Beam for data pipeline needs.

Published : Nov 14, 2018

Download the PDF

English | Português

Sign up for the Technology Radar newsletter

Subscribe now

Industries

Publications and Tools

All Insights

Apache Beam

Download the PDF

Sign up for the Technology Radar newsletter

Visit our archive to read previous volumes