Show mobile menu

Alumni blogs

Lots of our people have lots of opinions. Here are just a few of them

ThoughtWorks embraces the individuality of the people in the organization and hence the opinions expressed in the blogs may contradict each other and also may not represent the opinions of ThoughtWorks.

Be an Ambassador!

You know how I keep banging on about attracting different types of people into programming?  You know how we say we need to get them young?

Typical Mathematician?
A little while back I decided to put my money where my mouth was, so I signed up to be a STEM Ambassador.  STEMNET is an organisation that aims to inspire children to go into Science, Technology, Engineering and Maths careers, and it does this by creating a network of mentors (people like you and me) and schools.  So schools can publicise the sorts of events they're running, and find mentors…

Blog post by Trisha Gee
23 May 2013

Original Link

Netflix delivers Java tools

Netflix continues as a Java technology powerhouse, delivering one open source tool or framework after another. The latest posting from their excellent blog is Brian Moore on Garbage Collection Visualization, a tool for turning gc.log into usable graphs.

The JVM heap teaser shot:

Blog post by Brian Oxley
22 May 2013

Original Link

My Summary of GeeCON, Krakow

Last week I was in Krakow, Poland for GeeCON.  Which was excellent!  I find it really interesting that conferences all have their own personalities, that they are not all the same.

GeeCON had its own distinct personality.  If you're a Java/JVM person based in Poland, I would highly recommend it - more than 90% of the attendees were Polish (probably the remainder were largely speakers) so this conference is very much for you.  The quality of the speakers was really good too, I learnt a lot off many of them.

From a speaker's point of view, there were some…

Blog post by Trisha Gee
22 May 2013

Original Link

Clojure at a Bank – Testing

This post is a continuation of my earlier ‘Clojure at a Bank’ posts. I’ve since left the bank and am working for a large newspaper company, fortunately for me still writing Clojure.

It’s an obvious point to make, that different projects can have very different testing demands. At the bank we managed a throughput of financial products so it was critical that we got no surprises. Prod deployments were often like moon-landings, staged well in advance with lots of people in mission control.

At the newspaper it’s a bit different. Whilst bugs are still not to be warmly…

Blog post by Jon Pither
21 May 2013

Original Link

Sinking Data to Neo4j from Hadoop with Cascading

Recently, I worked with a colleague (Paul Lam, aka @Quantisan on building a connector library to let Cascading interoperate with Neo4j: cascading.neo4j. Paul had been experimenting with Neo4j and Cypher to explore our data through graphs and we wanted an easy way to flow our existing data on Hadoop into Neo4j.

The data processing pipeline we’ve been growing at uSwitch.com is built around Cascalog, Hive, Hadoop and Kafka.

Once the data has been aggregated and stored a lot of our ETL is performed upon Cascalog and, by extension, Cascading. Querying/analysis is a mix of Cascalog and Hive.…

Blog post by Paul Ingles
20 May 2013

Original Link

Compressing CloudFront Assets and dfl8.co

Amazon’s web services have made rebuilding uSwitch.com so much easier. We’re gradually moving more and more static assets to CloudFront (although most visitors are in the UK responses have much lower latencies than direct from S3 or even our own nginx servers). CloudFront doesn't support serving gzip'ed content direct from S3 out of the box.

Because of this, up until last week we were serving uncompressed assets, at least anything that wasn’t already compressed (such as images). Last week we put together a simple static assets nginx server to help compress things.

Whilst doing the work for uSwitch.com…

Blog post by Paul Ingles
20 May 2013

Original Link

Evaluating classifier results with R part 2

In a previous article I showed how to visualise the results of a classifier using ggplot2 in R. In the same article I mentioned that Alex, a colleague at Forward, had suggested looking further at R’s caret package that would produce more detailed statistics about the overall performance of the classifer and within individual classes.

Confusion Matrix

Using ggplot2 we can produce a plot like the one below: a visual representation of a confusion matrix. It gives us a nice overview but doesn’t reveal much about the specific performance characteristics of our classifier.

Blog post by Paul Ingles
20 May 2013

Original Link

Visualising classifier results with R and ggplot2

Earlier in the year, myself and some colleagues started working on building better data processing tools for uSwitch.com. Part of the theory/reflection of this is captured in a presentation I was privileged to give at EuroClojure (titled Users as Data).

In the last few days, our data team (Thibaut, Paul and I) have been playing around with some of the data we collect and using it to build some classifiers. Precision and Recall provide quantitative measures but reading through Machine Learning for Hackers showed some nice ways to visualise results.

Binary Classifier

Our…

Blog post by Paul Ingles
20 May 2013

Original Link

Protocol Buffers with Clojure and Leiningen

This week I’ve been prototyping some data processing tools that will work across the platforms we use (Ruby, Clojure, .NET). Having not tried Protocol Buffers before I thought I’d spike it out and see how it might fit.

Protocol Buffers

The Google page obviously has a lot more detail but for anyone who’s not seen them: you define your messages in an intermediate language before compiling into your target language.

There’s a Ruby library that makes it trivially easy to generate Ruby code so you can create messages as follows:

Clojure and Leiningen

Blog post by Paul Ingles
20 May 2013

Original Link

Social Enterprise Development

When I read the transcript of Linus Torvald’s talk on Git at Google I was working at an investment bank in London and it was about 4 years ago. It was just as I’d started using GitHub for hosting my own side-projects and for doing some open-source work. Fast forward to today and I’ve just read an article about the fast rise of GitHub as the software repository of choice for open-source development and an interesting space for Enterprise hosting.

All the banks I worked in were extremely centrally controlled: you’d use approved libraries and tools only. However, the…

Blog post by Paul Ingles
20 May 2013

Original Link