- Session presented by Chris Stucchio, BayesianWitch
Chris is one of the founders of BayesianWitch, a web analytics company built almost entirely on Scala. He is currently focused on improving the scientific computing ecosystem in Scala.
The algorithms used at BayesianWitch have to solve coupled PDEs, minimize high dimensional objective function, and do some statistical sampling. All of this in under 400ms. Scala and Akka excel at real time streaming, concurrency, and fault tolerance. Python excels at solving PDEs and such, because it has excellent libraries like NumPy, SciPy, Matplotlib, and Bokeh. However using the two together does not sound like a very attractive option for multiple reasons. Chris talked about how Scala could replace Python in this domain, and what are the hurdles in its path.
Regular idiomatic Scala tends to be slow, and even though it’s possible to hand-tune it, the result you get is almost always ugly. You either trade performance or expressiveness. There are some advanced techniques though, which allow you to be expressive while retaining as much performance as possible. These include macros, carefully placed @specialized annotations, among others. The libraries in the number crunching domain have to make use of these techniques.
He then mentioned some key libraries in the domain. Spire provides numeric type-classes and primitives. Breeze aims to be NumPy for Scala, and has support for all the usual suspects - vectors, matrices, polynomials, statistics etc. It uses an interesting abstraction called UFunc to provide shape polymorphic operations, a la NumPy. Saddle is like Breeze, but with some other interesting structures, and nicer IO. For visualization, there are Breeze-Viz, Breeze-Bokeh, and JFreeChart. (Chris is a committer to Breeze and Breeze-Bokeh.) BayesianWitch also uses Scalding for dealing with big data sets. A big problem with all of these libraries is that they’re all mutually incompatible in a number of ways.
He concluded with a thought that given some effort, and if all these libraries could play nicely together, we could eventually get to where NumPy/SciPy is.
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.
Thoughtworks acknowledges the Traditional Owners of the land where we work and live, and their continued connection to Country. We pay our respects to Elders past and present. Aboriginal and Torres Strait Islander peoples were the world's first scientists, technologists, engineers and mathematicians. We celebrate the stories, culture and traditions of Aboriginal and Torres Strait Islander Elders of all communities who also work and live on this land.
As a company, we invite Thoughtworkers to be actively engaged in advancing reconciliation and strengthen their solidarity with the First Peoples of Australia. Since 2019, we have been working with Reconciliation Australia to formalize our commitment and take meaningful action to advance reconciliation. We invite you to review our Reconciliation Action Plan.