Alumni blogs

Lots of our people have lots of opinions. Here are just a few of them

ThoughtWorks embraces the individuality of the people in the organization and hence the opinions expressed in the blogs may contradict each other and also may not represent the opinions of ThoughtWorks.

Introducing Blueshift: Automated Amazon Redshift ingestion from Amazon S3

I’m very pleased to say I’ve just made public a repository for a tool we’ve built to make it easier to ingest data automatically into Amazon Redshift from Amazon S3:

Amazon Redshift is a wonderfully powerful product, if you’ve not tried it yet you should definitely take a look; I’ve written before about the value of the analytical flow it enables.

However, as nice as it is to consume data from, ingesting data is a little less fun:

  1. Forget about writing raw INSERT statements: we saw individual inserts take on the order of 5 or 6 seconds…

Blog post by Paul Ingles
28 November 2014

Original Link

Chegando ao tal MVP (Minimum Viable Product)

Um MVP, no português Produto Mínimo Viável, pode ser visto como a versão de um produto ou serviço que vai ser colocada a teste, para a comunidade tida como público alvo do mesmo.

Um MVP não precisa ser um software pronto. O Dropbox tem a história clássica de fazer o pitch do produto tendo apenas uma página de sign-up e um vídeo mostrando como o serviço “funciona” (na época não existia nada).

O que se quer neste processo? Validação. Entender o que está sendo feito e poder validar com usuários potenciais. Ganhar aprendizado para poder ajustar e poder testar com…

Blog post by Daniel Wildt
28 November 2014

Original Link

Docker/Neo4j: Port forwarding on Mac OS X not working

Prompted by Ognjen Bubalo’s excellent blog post I thought it was about time I tried running Neo4j on a docker container on my Mac Book Pro to make it easier to play around with different data sets.

I got the container up and running by following Ognien’s instructions and had the following ports forwarded to my host machine:

$ docker ps
CONTAINER ID        IMAGE                 COMMAND                CREATED             STATUS              PORTS                                              NAMES
c62f8601e557        tpires/neo4j:latest   "/bin/bash -c /launc   About an hour ago   Up About an hour>1337/tcp,>7474/tcp   neo4j

This should allow me to access Neo4j on port 49154 but when I…

Blog post by Mark Needham
27 November 2014

Original Link

Debugging Cucumber tests on Codeship

I’ve been running into a few problems recently where my cucumber tests fail in the build but not on my local machine. It’s often related to the tests running slightly faster than the javascript on the build machine, but it can be really hard to catch.

A couple of things have worked well. First of all, I’ve added capybara-screenshot gem to my cucumber tests – setting it up was as simple as adding it to the gemfile and putting one line in the env.rb for my tests:

require 'capybara-screenshot/cucumber'

Every time a test fails, it captures an HTML output and…

Blog post by Joanne Cranford
27 November 2014

Original Link

Software Haiku, a tribute to utter nonsense

A long time ago in a galaxy far, far away... the below was actually recited by a real person;

I did some things,
Still the same thing,
Let me check a few things,
Before we conclude anything.

Blog post by Ozgur Tumer
27 November 2014

Original Link

R: dplyr – Select ‘random’ rows from a data frame

Frequently I find myself wanting to take a sample of the rows in a data frame where just taking the head isn’t enough.

Let’s say we start with the following data frame:

data = data.frame(
    letter = sample(LETTERS, 50000, replace = TRUE),
    number = sample (1:10, 50000, replace = TRUE)

And we’d like to sample 10 rows to see what it contains. We’ll start by generating 10 random numbers to represent row numbers using the runif function:

> randomRows = sample(1:length(data[,1]), 10, replace=T)
> randomRows
 [1]  8723 18772  4964 36134 27467 31890 16313 12841 49214 15621

We can then…

Blog post by Mark Needham
26 November 2014

Original Link

Amazon Redshift + R: Analytics Flow

Ok, so it’s a slightly fanboy-ish title but I’m starting to really like the early experimentation we’ve been doing with Amazon’s Redshift service at uSwitch.

Our current data platform is a mix of Apache Kafka, Apache Hadoop/Hive and a set of heterogenous data sources mixed across the organisation (given we’re fans of letting the right store find it’s place).

The data we ingest is reasonably sizeable (gigabytes a day); certainly enough to trouble the physical machines uSwitch used to host with. However, for nearly the last 3 years we’ve been breaking uSwitch’s infrastructure and systems…

Blog post by Paul Ingles
25 November 2014

Original Link

Learning to throw things away

New Blog Post: Learning to throw things away

(Remember, my new blog is at, and I don't always remember to post things on this old one)

Blog post by Trisha Gee
23 November 2014

Original Link

R: dplyr – “Variables not shown”

I recently ran into a problem where the result of applying some operations to a data frame wasn’t being output the way I wanted

I started with this data frame:

words = function(numberOfWords, lengthOfWord) {
  w = c(1:numberOfWords)  
  for(i in 1:numberOfWords) {
    w[i] = paste(sample(letters, lengthOfWord, replace=TRUE), collapse = "")
numberOfRows = 100
df = data.frame(a = sample (1:numberOfRows, 10, replace = TRUE),
                b = sample (1:numberOfRows, 10, replace = TRUE),
                name = words(numberOfRows, 10))

I wanted to group the data frame by a and b and output a comma separated list of the associated names…

Blog post by Mark Needham
23 November 2014

Original Link

The Reactive Manifesto

Over the past couple of months I have been helping out some friends to update the Reactive Manifesto.

There are several reasons why I agreed to help. First I was asked to, by my old friend Martin Thompson. The most important reason though is because I think that this is an important idea.

The Reactive Manifesto starts from a simple thought. 21st Century problems are not well-served by 20th Century assumptions of software architecture. The game is moving on!

There are lots of reasons for this: The problems that we are asked to tackle are growing in scale,

Blog post by Dave Farley
21 November 2014

Original Link