ThoughtWorks
  • Contact
  • Español
  • Português
  • Deutsch
  • 中文
Go to overview
  • Engineering Culture, Delivery Mindset

    Embrace a modern approach to software development and deliver value faster

    Intelligence-Driven Decision Making

    Leverage your data assets to unlock new sources of value

  • Frictionless Operating Model

    Improve your organization's ability to respond to change

    Platform Strategy

    Create adaptable technology platforms that move with your business strategy

  • Experience Design and Product Capability

    Rapidly design, deliver and evolve exceptional products and experiences

    Partnerships

    Leveraging our network of trusted partners to amplify the outcomes we deliver for our clients

Go to overview
  • Automotive
  • Cleantech, Energy and Utilities
  • Financial Services and Insurance
  • Healthcare
  • Media and Publishing
  • Not-for-profit
  • Public Sector
  • Retail and E-commerce
  • Travel and Transport
Go to overview

Featured

  • Technology

    An in-depth exploration of enterprise technology and engineering excellence

  • Business

    Keep up to date with the latest business and industry insights for digital leaders

  • Culture

    The place for career-building content and tips, and our view on social justice and inclusivity

Digital Publications and Tools

  • Technology Radar

    An opinionated guide to technology frontiers

  • Perspectives

    A publication for digital leaders

  • Digital Fluency Model

    A model for prioritizing the digital capabilities needed to navigate uncertainty

  • Decoder

    The business execs' A-Z guide to technology

All Insights

  • Articles

    Expert insights to help your business grow

  • Blogs

    Personal perspectives from ThoughtWorkers around the globe

  • Books

    Explore our extensive library

  • Podcasts

    Captivating conversations on the latest in business and tech

Go to overview
  • Application process

    What to expect as you interview with us

  • Grads and career changers

    Start your tech career on the right foot

  • Search jobs

    Find open positions in your region

  • Stay connected

    Sign up for our monthly newsletter

Go to overview
  • Conferences and Events
  • Diversity and Inclusion
  • News
  • Open Source
  • Our Leaders
  • Social Change
  • Español
  • Português
  • Deutsch
  • 中文
ThoughtWorksMenu
  • Close   ✕
  • What we do
  • Who we work with
  • Insights
  • Careers
  • About
  • Contact
  • Back
  • Close   ✕
  • Go to overview
  • Engineering Culture, Delivery Mindset

    Embrace a modern approach to software development and deliver value faster

  • Experience Design and Product Capability

    Rapidly design, deliver and evolve exceptional products and experiences

  • Frictionless Operating Model

    Improve your organization's ability to respond to change

  • Intelligence-Driven Decision Making

    Leverage your data assets to unlock new sources of value

  • Partnerships

    Leveraging our network of trusted partners to amplify the outcomes we deliver for our clients

  • Platform Strategy

    Create adaptable technology platforms that move with your business strategy

  • Back
  • Close   ✕
  • Go to overview
  • Automotive
  • Cleantech, Energy and Utilities
  • Financial Services and Insurance
  • Healthcare
  • Media and Publishing
  • Not-for-profit
  • Public Sector
  • Retail and E-commerce
  • Travel and Transport
  • Back
  • Close   ✕
  • Go to overview
  • Featured

  • Technology

    An in-depth exploration of enterprise technology and engineering excellence

  • Business

    Keep up to date with the latest business and industry insights for digital leaders

  • Culture

    The place for career-building content and tips, and our view on social justice and inclusivity

  • Digital Publications and Tools

  • Technology Radar

    An opinionated guide to technology frontiers

  • Perspectives

    A publication for digital leaders

  • Digital Fluency Model

    A model for prioritizing the digital capabilities needed to navigate uncertainty

  • Decoder

    The business execs' A-Z guide to technology

  • All Insights

  • Articles

    Expert insights to help your business grow

  • Blogs

    Personal perspectives from ThoughtWorkers around the globe

  • Books

    Explore our extensive library

  • Podcasts

    Captivating conversations on the latest in business and tech

  • Back
  • Close   ✕
  • Go to overview
  • Application process

    What to expect as you interview with us

  • Grads and career changers

    Start your tech career on the right foot

  • Search jobs

    Find open positions in your region

  • Stay connected

    Sign up for our monthly newsletter

  • Back
  • Close   ✕
  • Go to overview
  • Conferences and Events
  • Diversity and Inclusion
  • News
  • Open Source
  • Our Leaders
  • Social Change
Blogs
Select a topic
View all topicsClose
Technology 
Agile Project Management Cloud Continuous Delivery  Data Science & Engineering Defending the Free Internet Evolutionary Architecture Experience Design IoT Languages, Tools & Frameworks Legacy Modernization Machine Learning & Artificial Intelligence Microservices Platforms Security Software Testing Technology Strategy 
Business 
Financial Services Global Health Innovation Retail  Transformation 
Careers 
Career Hacks Diversity & Inclusion Social Change 
Blogs

Topics

Choose a topic
  • Technology
    Technology
  • Technology Overview
  • Agile Project Management
  • Cloud
  • Continuous Delivery
  • Data Science & Engineering
  • Defending the Free Internet
  • Evolutionary Architecture
  • Experience Design
  • IoT
  • Languages, Tools & Frameworks
  • Legacy Modernization
  • Machine Learning & Artificial Intelligence
  • Microservices
  • Platforms
  • Security
  • Software Testing
  • Technology Strategy
  • Business
    Business
  • Business Overview
  • Financial Services
  • Global Health
  • Innovation
  • Retail
  • Transformation
  • Careers
    Careers
  • Careers Overview
  • Career Hacks
  • Diversity & Inclusion
  • Social Change
Data Science & EngineeringMachine Learning & Artificial IntelligenceCloudMicroservicesTechnology

Getting Smart: Applying Continuous Delivery to Data Science to Drive Car Sales

Arif Wider Arif Wider
Christian Deger Christian Deger

Published: Mar 1, 2017

Pricing second-hand cars is a complex procedure: there are many factors that affect a vehicle’s worth and customers’ tastes change quickly.

AutoScout24, the largest online car marketplace Europe-wide, wanted to get ahead of the field by developing an accurate price evaluation tool that updated continuously. Many companies use this type of predictive analytics capabilities internally, but shy away from using them for customer-facing services, because of the complexity.

Working together, AutoScout24 and ThoughtWorks were able to develop a constantly updated price evaluation tool that delivers superb performance and scalability, using a Continuous Delivery approach to predictive analytics. We implemented automated verification using live test data sets in a continuous delivery pipeline to enable us to release model improvements with confidence at any given time.

Keeping price evaluations up to speed

AutoScout24 has listings for more than 2.4 million vehicles across Europe. That’s given it a huge trove of current and historical data. But how could it use that to help sellers determine a fair price and buyers to make good decisions?

It had previously developed price evaluation tool, which based price recommendations on current active listings. This pricing engine used a machine learning approach that often draws a linear relationship between the vehicle price and certain factors—such as the vehicle age or the mileage.

But this tool couldn’t take into account real world prices. AutoScout24 needed a pricing tool that could make accurate price predictions based on constantly changing information.

Data science and continuous delivery

Before the price evaluation program, AutoScout24 had used predictive analytics primarily for internal decision making and looking to answer questions based on historical data.

But now, with the price evaluation tool we needed a prediction model that would be continuously integrated into live operations. This posed a significant challenges for our data science team: we needed to ensure that the system could handle performance requirements, without needing manual performance optimization or sacrificing prediction accuracy.

To achieve that result, we realized there was an opportunity to take a Continuous Delivery approach to predictive analytics. Typically, concepts such as Continuous Delivery, Test-Driven Development and Consumer-Driven Contracts are increasingly common in software engineering. They’re almost unheard of in data science practice.

Accelerating delivery through service teams

We had the opportunity to try something new because AutoScout24 had begun a large-scale migration of its technical infrastructure from its previous self-hosted, .NET-based monolithic system to a cloud-hosted, JVM-based microservices architecture. The aim of this tech stack migration was to enable innovations to be released more quickly.

The first stage of this transition was to divide the monolith into verticals that are managed by autonomous development team, each covering specific functions. These verticals are so-called self-contained systems.

It was decided to implement the price evaluation tool as a single microservice, operated on Amazon Web Services (AWS), using the Play Framework to deliver the greatest flexibility. 

Homepage of the AutoScout 24 Price Evaluation Tool in December 2016 with a silver car parked outside a modern home
[German homepage of the new price evaluation tool]


Using Random Forest to make better predictions

To improve price evaluation performance, AutoScout24’s data science team evaluated various machine learning approaches. Given the challenges they faced, they decided that a Random Forest approach would work best.

Random Forest is a supervised machine learning approach based on decision trees, which effectively counteracts the overfitting of other decision tree-based approaches. One big benefit is that it minimizes the chances that the prediction model only produces good results with input very similar to the learning data sets. 


[An exemplary price evaluation decision tree]

We used the statistics programming language “R”, which has random forest libraries available, to develop an initial model training script. This script processes and cleans the raw vehicle data from recent years and then generates a price prediction model.

From a price prediction model to a car evaluation product

This initial price prediction model only needed to be able to provide an accurate prediction. The final product would also need to deliver exceptional performance: it needed to be responsive, highly available and able to support a high volume of users. What’s more, the predictions had to reflect the current market situation — so we needed to be able to rapidly integrate model improvements and new training data.

That meant the price model whose training we specified in R needed to be able to automatically transfer to the production system, without changing its behavior.

One obvious solution was to provide the price model directly in an R runtime via an appropriate service, which is then accessed from a Play Framework front-end application via a REST API. Unfortunately, the open source version of the R runtime is not capable of multi-threading which makes it virtually impossible to scale to multiple parallel user requests.

We therefore decided to use H2O, an open source Java-based predictive analytics engine that can be easily integrated with the Apache projects Hadoop and Spark. It also connects to other programming languages popular in the big data field, such as Python and R.

As H2O provides its own implementation of Random Forest, it was a straightforward task to train a random forest-based price prediction model using H2O. That prediction model can then be executed in a cluster using the H2O engine and can be accessed via an API.

H2O also offers the option of exporting a fully trained prediction model completely to Java source code. In the case of our random forest price models, however, the compiled JAR files are very large, with several gigabytes per country.

This happens because the decision trees combine the model's logic and data. The accuracy and size of the model are linked, because both are largely determined by the configured maximum height of the decision trees.

Millisecond response times need the right approach

Overall our approach of exporting the trained prediction models to Java source code offered the key advantage that the compiled price model JAR file can be executed together with a Play web application (also deployable as a JAR) on one and the same Amazon EC2 machine and in a single JVM. This significantly reduces maintenance complexity because only the memory utilization of this JVM needed to be configured and monitored.

Furthermore, we could fully utilize the already available scaling mechanisms of both the JVM (thread pools, concurrency) and of AWS. The Play Framework builds heavily on Java's non-blocking I/O support. With Elastic Load Balancers (ELBs) and Autoscaling Groups (ASGs), AWS provides the option of automatically and load-sensitively lifting new EC2 machines using the same web application and distributing the load over these machines.


[How a price prediction model is trained, exported, and deployed together with the web application] 

This enabled us to deliver price predictions in a matter of a few milliseconds because the Java code generated from the prediction model consists almost exclusively of very large “if-else” statements. As a result, no objects have to be created during the calculation and the heap space usage remains consistently low.

On the other hand, loading the unusually high number of large classes requires lots of memory. But once these are fully loaded, the memory utilization virtually never changes during operation. That further reduces maintenance and monitoring complexity and simplifies the handling of load changes.

Our approach enabled us to quickly release an initial simple price evaluation product for a launch country and for one user segment.

To extend that, we separated the service that implements the web front-end from the prediction model service. The latter was deployed as an independent web service with a REST interface.

The main reason for this service split was the different iteration speeds of the web interface and the prediction model: while the latter was only updated occasionally, improvements to the web interface were rolled out several times a day. Combining the web interface and price model service resulted in unnecessarily long deployment times.

The separation then also allowed us to use one price-model service per country and user segment, so we could partition the prediction model over several machines. And because AWS autoscaling simultaneously replicates the prediction models, we could dispense with a cluster database system — even though the combined size of all the price models together exceeded the storage capacity of a single machine.

The price for Continuous Delivery: extensive end-to-end testing

Initially, our prediction model interface would frequently change, without warning—for instance, when model parameters were renamed. That meant the model couldn’t be automatically transferred to production environments.

Together with our data science team, we therefore developed both test suites for the model generation script implemented in R and Consumer-Driven Contracts (CDCs), which automatically ensure the price model behavior expected by the price evaluation service before every deployment.

We also introduced extensive end-to-end tests, which ensure that the web application provides the same prices as the originally generated price model. To do this, the price model generated using H2O is initially queried with a large number of test price evaluations. The input data for these evaluations is taken from a test data set that was not used for the price model training.

The results of these test price evaluations serve two purposes: firstly, the model quality can be determined by comparing the actual prices with those in the test data set. Secondly, the results can be compared with those obtained by accessing the price model which was converted to Java bytecode. Altogether, the actions described so far allow us to release model improvements directly to productive operations in a fully automated way, allowing our users to immediately benefit from these improvements.

A diagram showing how test and validation steps looked during model deployment
[Test and validation steps during model deployment]

Conclusions

Using practices from Continuous Delivery such as automated end-to-end testing during deployment allowed us to automate the release of model improvements directly to productive operations. As a result, our data scientists don't need to wait for their improvements to be integrated into live operations and users benefit from improvements instantly.
 

Applying Continuous Delivery to data science accelerates its impact to your business.
Tweet this


In addition to ongoing improvements, the prediction model needs to be retrained every month at minimum, to accurately reflect the market. To encourage experimentation, and to improve the model further, we found that it helped to automatically validate prediction accuracy prior to deployment.

We have since used Continuous Delivery principles in other data science products. In some of those products, the prediction model rapidly grows stale—for instance, when current user behavior data is used for model training. Here, applying Continuous Delivery to data science has delivered even better results.

Contact Us

We'd love to help you on your digital journey.

Get in touch
Related blogs
Technology Strategy

Intelligent Empowerment: The Next Wave of Technology-led Disruption

Danilo Sato
Learn more
Data Science & Engineering

Introducing Agile Analytics

Ken Collier
Learn more
Continuous Delivery

Architecting for Continuous Delivery

Vishal Naik
Learn more
  • What we do
  • Who we work with
  • Insights
  • Careers
  • About
  • Contact

WeChat

×
QR code to ThoughtWorks China WeChat subscription account

Media and analyst relations | Privacy policy | Modern Slavery statement ThoughtWorks| Accessibility | © 2021 ThoughtWorks, Inc.