ThoughtWorks
  • Contact
  • Español
  • Português
  • Deutsch
  • 中文
Go to overview
  • Engineering Culture, Delivery Mindset

    Embrace a modern approach to software development and deliver value faster

    Intelligence-Driven Decision Making

    Leverage your data assets to unlock new sources of value

  • Frictionless Operating Model

    Improve your organization's ability to respond to change

    Platform Strategy

    Create adaptable technology platforms that move with your business strategy

  • Experience Design and Product Capability

    Rapidly design, deliver and evolve exceptional products and experiences

    Partnerships

    Leveraging our network of trusted partners to amplify the outcomes we deliver for our clients

Go to overview
  • Automotive
  • Cleantech, Energy and Utilities
  • Financial Services and Insurance
  • Healthcare
  • Media and Publishing
  • Not-for-profit
  • Public Sector
  • Retail and E-commerce
  • Travel and Transport
Go to overview

Featured

  • Technology

    An in-depth exploration of enterprise technology and engineering excellence

  • Business

    Keep up to date with the latest business and industry insights for digital leaders

  • Culture

    The place for career-building content and tips, and our view on social justice and inclusivity

Digital Publications and Tools

  • Technology Radar

    An opinionated guide to technology frontiers

  • Perspectives

    A publication for digital leaders

  • Digital Fluency Model

    A model for prioritizing the digital capabilities needed to navigate uncertainty

  • Decoder

    The business execs' A-Z guide to technology

All Insights

  • Articles

    Expert insights to help your business grow

  • Blogs

    Personal perspectives from ThoughtWorkers around the globe

  • Books

    Explore our extensive library

  • Podcasts

    Captivating conversations on the latest in business and tech

Go to overview
  • Application process

    What to expect as you interview with us

  • Grads and career changers

    Start your tech career on the right foot

  • Search jobs

    Find open positions in your region

  • Stay connected

    Sign up for our monthly newsletter

Go to overview
  • Conferences and Events
  • Diversity and Inclusion
  • News
  • Open Source
  • Our Leaders
  • Social Change
  • Español
  • Português
  • Deutsch
  • 中文
ThoughtWorksMenu
  • Close   ✕
  • What we do
  • Who we work with
  • Insights
  • Careers
  • About
  • Contact
  • Back
  • Close   ✕
  • Go to overview
  • Engineering Culture, Delivery Mindset

    Embrace a modern approach to software development and deliver value faster

  • Experience Design and Product Capability

    Rapidly design, deliver and evolve exceptional products and experiences

  • Frictionless Operating Model

    Improve your organization's ability to respond to change

  • Intelligence-Driven Decision Making

    Leverage your data assets to unlock new sources of value

  • Partnerships

    Leveraging our network of trusted partners to amplify the outcomes we deliver for our clients

  • Platform Strategy

    Create adaptable technology platforms that move with your business strategy

  • Back
  • Close   ✕
  • Go to overview
  • Automotive
  • Cleantech, Energy and Utilities
  • Financial Services and Insurance
  • Healthcare
  • Media and Publishing
  • Not-for-profit
  • Public Sector
  • Retail and E-commerce
  • Travel and Transport
  • Back
  • Close   ✕
  • Go to overview
  • Featured

  • Technology

    An in-depth exploration of enterprise technology and engineering excellence

  • Business

    Keep up to date with the latest business and industry insights for digital leaders

  • Culture

    The place for career-building content and tips, and our view on social justice and inclusivity

  • Digital Publications and Tools

  • Technology Radar

    An opinionated guide to technology frontiers

  • Perspectives

    A publication for digital leaders

  • Digital Fluency Model

    A model for prioritizing the digital capabilities needed to navigate uncertainty

  • Decoder

    The business execs' A-Z guide to technology

  • All Insights

  • Articles

    Expert insights to help your business grow

  • Blogs

    Personal perspectives from ThoughtWorkers around the globe

  • Books

    Explore our extensive library

  • Podcasts

    Captivating conversations on the latest in business and tech

  • Back
  • Close   ✕
  • Go to overview
  • Application process

    What to expect as you interview with us

  • Grads and career changers

    Start your tech career on the right foot

  • Search jobs

    Find open positions in your region

  • Stay connected

    Sign up for our monthly newsletter

  • Back
  • Close   ✕
  • Go to overview
  • Conferences and Events
  • Diversity and Inclusion
  • News
  • Open Source
  • Our Leaders
  • Social Change
Blogs
Select a topic
View all topicsClose
Technology 
Agile Project Management Cloud Continuous Delivery  Data Science & Engineering Defending the Free Internet Evolutionary Architecture Experience Design IoT Languages, Tools & Frameworks Legacy Modernization Machine Learning & Artificial Intelligence Microservices Platforms Security Software Testing Technology Strategy 
Business 
Financial Services Global Health Innovation Retail  Transformation 
Careers 
Career Hacks Diversity & Inclusion Social Change 
Blogs

Topics

Choose a topic
  • Technology
    Technology
  • Technology Overview
  • Agile Project Management
  • Cloud
  • Continuous Delivery
  • Data Science & Engineering
  • Defending the Free Internet
  • Evolutionary Architecture
  • Experience Design
  • IoT
  • Languages, Tools & Frameworks
  • Legacy Modernization
  • Machine Learning & Artificial Intelligence
  • Microservices
  • Platforms
  • Security
  • Software Testing
  • Technology Strategy
  • Business
    Business
  • Business Overview
  • Financial Services
  • Global Health
  • Innovation
  • Retail
  • Transformation
  • Careers
    Careers
  • Careers Overview
  • Career Hacks
  • Diversity & Inclusion
  • Social Change
Data Science & EngineeringMachine Learning & Artificial IntelligenceTechnology

Put Data Science Before Data Infrastructure

David Johnston David Johnston

Published: Oct 13, 2015

“Big Data” and “Data Science” are today’s business buzzwords. Many companies today are trying to modernize their data platform and enable their employees to monetize their valuable data, but most businesses are not seeing the benefits. Advanced data science may be driving some of the hottest startups but most mature companies are struggling to get into gear. The key reason for this is an overemphasis on architecture over ideas and a tendency to ignore agile practices that have proven so successful in other areas of software development.

While there are many tenets of Agile software development, it can be described succinctly: Don’t try to plan it all out up-front and then do it. Plan it lightly and adapt that plan while you do it. As my colleague Ken Collier argues in his book Agile Analytics, data infrastructure seems to have survived the big-upfront-investment extinction that happened in the rest of the software industry, and that has hamstrung Business Intelligence in the past and is now doing the same for Data Science.

Overcoming Data-Organization Paralysis

As a data science consultant, working for many large corporations, I see a similar pattern for failing to succeed at data science and it’s not surprising that it happens mostly with large mature companies. The starting problem is that their legacy data platform, put in places years ago, is not organized for effective data science. It is organized to enable efficient running of the business. While most companies recognize this problem and desire to transform to enable a more data-driven culture, they make the mistake of delaying data science until the data organization task is completed. This is the classic “waterfall” mentality that leads them into a kind of paralysis.

So often we hear: "We can’t do advanced data science yet. Our data is too disorganized."
Big Data product vendors are lined up at the door to take advantage of this mentality. But in many cases they sell a solution that is then imposed in a top-down fashion from executives to data scientists in a way that actually hinders progress.

Overcoming Data-Organization Paralysis

Don’t Delay Data Science

I’m here to tell you that you can, and in fact you must, start doing data science before putting Big Data infrastructure in place. The biggest reason is that it’s only through solving data science problems that you know how your data should be structured or even whether it should be structured. It’s only when you run into problems of data unavailability or scaling that you really discover what type of solution is needed to solve that. And different problems will prefer different solutions. You need to have solved enough problems with your current infrastructure to see the pattern emerge for what the new one should look like. These insights need to come in a bottom-up fashion from data science practitioners not executives or even data architects and least of all product vendor sales-people.

One myth about data scientists is that they need to have all the data available to get that global picture of the business, attain insights and suggest actions. In fact, data scientists, like everyone else, can only look at so much data before information overload sets it.
The key skill of a data scientist is in actually deciding what data NOT to look at.
They need to able to see the emerging picture amidst incomplete information. Furthermore, successful data science application are nearly always built from a relatively small fraction of available data. More data fields or more data volume should be brought in to improve already successful models in an iterative fashion; standard agile practice.

Scaling is an Overrated Problem

Hadoop, NoSQL databases and other Big Data technologies have helped some prominent companies build data science applications at full-scale. However, all of those successful companies share one thing in common: they were already successful doing data science at a smaller scale. Perhaps they computed things in batch, rather than in real-time. Perhaps they ran algorithms on a subsample of data or utilized far fewer data fields. But they solved their problem in a simpler way before trying to solve it in a better or faster way.

Scaling is never what prevents a data science team from arriving at their first successful models. Investing in scalable platforms is not what it takes to get started.

Invest in Data Science First

Investing in data science talent before data infrastructure is the key to becoming a data-driven company. Kapow Software in a research paper on Big Data concludes:

"Big Data projects are taking far too long, costing too much and not delivering on anticipated ROI because it's really difficult to pinpoint and surgically extract critical insights without hiring expensive consultants or data scientists in short demand.”

While Data Scientists may be difficult to hire, it’s an investment you must make. The best structured data and most advanced data science tools are simply not effective in the hands of people without the required background. Being ready for data scientists is not a question of data organization. They can help you achieve that organization. Rather, it’s a commitment to removing the barriers of business-as-usual and allowing your data and your data professionals to truly influence your business strategy.
  • What we do
  • Who we work with
  • Insights
  • Careers
  • About
  • Contact

WeChat

×
QR code to ThoughtWorks China WeChat subscription account

Media and analyst relations | Privacy policy | Modern Slavery statement ThoughtWorks| Accessibility | © 2021 ThoughtWorks, Inc.