Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.
Rohit is a founder and the CEO of tuplejump Inc. Rohit is a true polyglot with experience in a number of programming languages. He is also a prolific open source contributor. He has been working in Scala, Akka, Play and the ecosystem for over 4 years. Tuplejump is a startup, with a vision to simplify data engineering, by making the data and tools to work with it accessible to the people who need it. They have built a big data platform powered by Scala everywhere.
Their big data pipeline comprises of various stages viz., collect, transform, store, explore, predict, and visualize. The “collect” stage uses Hydra, a framework built atop Akka to gather high volume and velocity data from both push based and pull based sources. The collected data is streamed to “transform” stage, which employs Spark to deal with both structured and unstructured data. The “store” stage uses DStore, a Cassandra based storage solution, which boasts of scalability and high availability with high performance reads and writes. Cassandra's support for replicating across multiple data centers is best-in-class, providing lower latency and high fault tolerance. The “explore” stage uses Shark analytics engine, Calliope, and Ubercube, a distributed OLAP cube engine developed by tuplejump. In “predict”, they are building their own EA and ANN/DL frameworks, gearing towards what they refer to as “Machine Assisted Insights”. The “visualize” stage uses Pizzaro, a modern data visualization front-end with highly interactive and reactive capabilities.
Tuplejump found Scala attractive for a number of reasons. It unifies OOP and FP, is modern and evolving, and is hosted on JVM, the only VM worth putting in production according to Rohit. :) Rohit went into details of how Akka’s actor concurrency works out in practice, the supervising and clustering features thereof. He spoke about Spark, the secret sauce in their batch processing system. He also touched upon Play, SBT, and ScalaTest.
Tuplejump has open sourced a number of tools, which you can find on their github here.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.
Thoughtworks acknowledges the Traditional Owners of the land where we work and live, and their continued connection to Country. We pay our respects to Elders past and present. Aboriginal and Torres Strait Islander peoples were the world's first scientists, technologists, engineers and mathematicians. We celebrate the stories, culture and traditions of Aboriginal and Torres Strait Islander Elders of all communities who also work and live on this land.
As a company, we invite Thoughtworkers to be actively engaged in advancing reconciliation and strengthen their solidarity with the First Peoples of Australia. Since 2019, we have been working with Reconciliation Australia to formalize our commitment and take meaningful action to advance reconciliation. We invite you to review our Reconciliation Action Plan.