Master
ThoughtWorks
Menu
Close
  • What we do
    • Go to overview
    • Customer Experience, Product and Design
    • Data Strategy, Engineering and Analytics
    • Digital Transformation and Operations
    • Enterprise Modernization, Platforms and Cloud
  • Who we work with
    • Go to overview
    • Automotive
    • Healthcare
    • Public Sector
    • Cleantech, Energy and Utilities
    • Media and Publishing
    • Retail and E-commerce
    • Financial Services and Insurance
    • Not-for-profit
    • Travel and Transport
  • Insights
    • Go to overview
    • Featured

      • Technology

        An in-depth exploration of enterprise technology and engineering excellence

      • Business

        Keep up to date with the latest business and industry insights for digital leaders

      • Culture

        The place for career-building content and tips, and our view on social justice and inclusivity

    • Digital Publications and Tools

      • Technology Radar

        An opinionated guide to technology frontiers

      • Perspectives

        A publication for digital leaders

      • Digital Fluency Model

        A model for prioritizing the digital capabilities needed to navigate uncertainty

      • Decoder

        The business execs' A-Z guide to technology

    • All Insights

      • Articles

        Expert insights to help your business grow

      • Blogs

        Personal perspectives from ThoughtWorkers around the globe

      • Books

        Explore our extensive library

      • Podcasts

        Captivating conversations on the latest in business and tech

  • Careers
    • Go to overview
    • Application process

      What to expect as you interview with us

    • Grads and career changers

      Start your tech career on the right foot

    • Search jobs

      Find open positions in your region

    • Stay connected

      Sign up for our monthly newsletter

  • About
    • Go to overview
    • Our Purpose
    • Awards & Recognition
    • Diversity & Inclusion
    • Our Leaders
    • Partnerships
    • News
    • Conferences & Events
  • Contact
Global | English
  • United States United States
    English
  • China China
    中文 | English
  • India India
    English
  • Canada Canada
    English
  • Singapore Singapore
    English
  • United Kingdom United Kingdom
    English
  • Australia Australia
    English
  • Germany Germany
    English | Deutsch
  • Brazil Brazil
    English | Português
  • Spain Spain
    English | Español
  • Global Global
    English
Blogs
Select a topic
View all topicsClose
Technology 
Agile Project Management Cloud Continuous Delivery  Data Science & Engineering Defending the Free Internet Evolutionary Architecture Experience Design IoT Languages, Tools & Frameworks Legacy Modernization Machine Learning & Artificial Intelligence Microservices Platforms Security Software Testing Technology Strategy 
Business 
Financial Services Global Health Innovation Retail  Transformation 
Careers 
Career Hacks Diversity & Inclusion Social Change 
Blogs

Topics

Choose a topic
  • Technology
    Technology
  • Technology Overview
  • Agile Project Management
  • Cloud
  • Continuous Delivery
  • Data Science & Engineering
  • Defending the Free Internet
  • Evolutionary Architecture
  • Experience Design
  • IoT
  • Languages, Tools & Frameworks
  • Legacy Modernization
  • Machine Learning & Artificial Intelligence
  • Microservices
  • Platforms
  • Security
  • Software Testing
  • Technology Strategy
  • Business
    Business
  • Business Overview
  • Financial Services
  • Global Health
  • Innovation
  • Retail
  • Transformation
  • Careers
    Careers
  • Careers Overview
  • Career Hacks
  • Diversity & Inclusion
  • Social Change
Data Science & EngineeringLanguages, Tools & FrameworksTechnology

Scala Symposium: Big Data Pipeline Powered by Scala

Rahul Goma Phulore Rahul Goma Phulore

Published: May 7, 2014

Big Data Pipeline powered by Scala

Session presented by Rohit Rai, tuplejump

Rohit is a founder and the CEO of tuplejump Inc. Rohit is a true polyglot with experience in a number of programming languages. He is also a prolific open source contributor. He has been working in Scala, Akka, Play and the ecosystem for over 4 years. Tuplejump is a startup, with a vision to simplify data engineering, by making the data and tools to work with it accessible to the people who need it. They have built a big data platform powered by Scala everywhere.

 

Their big data pipeline comprises of various stages viz., collect, transform, store, explore, predict, and visualize. The “collect” stage uses Hydra, a framework built atop Akka to gather high volume and velocity data from both push based and pull based sources. The collected data is streamed to “transform” stage, which employs Spark to deal with both structured and unstructured data. The “store” stage uses DStore, a Cassandra based storage solution, which boasts of scalability and high availability with high performance reads and writes. Cassandra's support for replicating across multiple data centers is best-in-class, providing lower latency and high fault tolerance. The “explore” stage uses Shark analytics engine, Calliope, and Ubercube, a distributed OLAP cube engine developed by tuplejump. In “predict”, they are building their own EA and ANN/DL frameworks, gearing towards what they refer to as “Machine Assisted Insights”. The “visualize” stage uses Pizzaro, a modern data visualization front-end with highly interactive and reactive capabilities.

Tuplejump found Scala attractive for a number of reasons. It unifies OOP and FP, is modern and evolving, and is hosted on JVM, the only VM worth putting in production according to Rohit. :) Rohit went into details of how Akka’s actor concurrency works out in practice, the supervising and clustering features thereof. He spoke about Spark, the secret sauce in their batch processing system. He also touched upon Play, SBT, and ScalaTest.

Tuplejump has open sourced a number of tools, which you can find on their github here.

 

 

Master
Privacy policy | Modern Slavery statement | Accessibility
Connect with us
×

WeChat

QR code to ThoughtWorks China WeChat subscription account
© 2021 ThoughtWorks, Inc.