Apache Spark

Technology Radar

Last updated : Nov 10, 2015

NOT ON THE CURRENT EDITION

This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more

Nov 2015

Trial

Apache Spark has been steadily gaining ground as a fast and general engine for large-scale data processing. The engine is written in Scala and is well suited for applications that reuse a working set of data across multiple parallel operations. It’s designed to work as a standalone cluster or as part of Hadoop YARN cluster. It can access data from sources such as HDFS, Cassandra, S3 etc. Spark also offers many higher level operators in order to ease the development of data parallel applications. As a generic data processing platform it has enabled development of many higher level tools such as interactive SQL (Spark SQL), real time streaming (Spark Streaming), machine learning library (MLib), R-on-Spark etc.

May 2015

Trial

Jan 2015

Assess

For iterative processing such as machine learning and interactive analysis, Hadoop map-reduce does not work very well because of its batch-oriented nature. Spark is a fast and general engine for large-scale data processing. It aims to extend map-reduce for iterative algorithms and interactive low latency data mining. It also ships with a machine learning library.

Jul 2014

Assess

Published : Jul 08, 2014

Download the PDF

English | Español | Português | 中文

Sign up for the Technology Radar newsletter

Subscribe now

Solutions

Industries

Publications and Tools

All Insights

Download the PDF

Sign up for the Technology Radar newsletter

Visit our archive to read previous volumes