menu

The information in our interactive Radar is currently only available in English. To get information in your native language, please download the PDF here.

Platforms

Apache Spark

ARCHIVED BLIP
Please be aware that we have archived this blip and are no longer actively keeping the information updated. The current edition of the radar only features items that we feel are new or noteworthy.Understand more
TRIAL?

Apache Spark has been steadily gaining ground as a fast and general engine for large-scale data processing. The engine is written in Scala and is well suited for applications that reuse a working set of data across multiple parallel operations. It’s designed to work as a standalone cluster or as part of Hadoop YARN cluster. It can access data from sources such as HDFS, Cassandra, S3 etc. Spark also offers many higher level operators in order to ease the development of data parallel applications. As a generic data processing platform it has enabled development of many higher level tools such as interactive SQL (Spark SQL), real time streaming (Spark Streaming), machine learning library (MLib), R-on-Spark etc. 

History for Apache Spark

Nov 2015
Trial?

Apache Spark has been steadily gaining ground as a fast and general engine for large-scale data processing. The engine is written in Scala and is well suited for applications that reuse a working set of data across multiple parallel operations. It’s designed to work as a standalone cluster or as part of Hadoop YARN cluster. It can access data from sources such as HDFS, Cassandra, S3 etc. Spark also offers many higher level operators in order to ease the development of data parallel applications. As a generic data processing platform it has enabled development of many higher level tools such as interactive SQL (Spark SQL), real time streaming (Spark Streaming), machine learning library (MLib), R-on-Spark etc. 

May 2015
Trial?
Jan 2015
Assess?

For iterative processing such as machine learning and interactive analysis, Hadoop map-reduce does not work very well because of its batch-oriented nature. Spark is a fast and general engine for large-scale data processing. It aims to extend map-reduce for iterative algorithms and interactive low latency data mining. It also ships with a machine learning library.

Jul 2014
Assess?
For iterative processing such as machine learning and interactive analysis, Hadoop map-reduce does not work very well because of its batch-oriented nature. Spark is a fast and general engine for large-scale data processing. It aims to extend map-reduce for iterative algorithms and interactive low latency data mining. It also ships with a machine learning library.