I went to a conference and heard “We’re making a significant shift towards more fact-based decision making. And I was kind of like, what do people usually use? Horoscopes?”
There’s now a big drive to be more data-analytical.
The first thing to know about big data according to Dave Coombes, a former ThoughtWorker and data scientist, is that it’s an over-hyped term.
“There are few apocryphal stories like; ‘We had this huge amount of data and then we shuffled it and, you know, it told us the answer to life, the universe and everything,’ but they are few and far between,” he says. “Generally, it’s not a valid approach."
Contrary to popular belief big data doesn’t happen when you get to gigabytes or terabytes but refers to any data analysis that is beyond the existing capabilities of a particular business. The problem of analytical demands outstripping capability is not new.
“We’ve always been at the point where we don’t quite have enough capacity," Dave says. “We’re always overstretching.”
He has a theory: that the traditional data requirements of transactional applications tend to grow linearly, but post-Internet boom companies such as Google or Facebook base their business on network effects. Those effects scale quadratically, meaning such companies were always going to outgrow their data handling capabilities.
Often, too much data is not the problem. Although some businesses need to do high volume analysis, such as understanding Twitter streams or access logs, it’s not usually of their core business data. Most core business analysis is retrospective – describing things that happened last week, last month or the first quarter last year. But where things really get interesting is in predictive analysis – such as recommendation engines or unsupervised machine learning.
“My background is in theoretical physics and so I still value the scientific method where you posit some hypothesis and then run an experiment to look at your data and work out if what you thought was is true, is actually true," Dave says.
But when you need to move your business quickly, you need to test and evaluate these hypotheses quickly and that doesn’t always leave room to build a traditional warehouse of historical data. Dave’s latest project takes the same lean approaches used in application development and applies them to the areas of analytics, reporting and business intelligence to tailor the advertisements web customers see.
A strategic approach would be to consider everything known about customers and coerce all the available data into a single form that meets that requirement. But designing a data strategy that works now and in the future is difficult. It’s easier to take the problem we want to solve and work backwards.
Dave asks, “What is the next most important thing that if the business knew the answer to, they’d be able to make an informed decision?”
When you start with a question, you can pull in the information you need rather than pushing everything into a model and analysing it.
There's more! Read the rest of this article in the June 2013 issue of ThoughtWorks' P2 magazine.