Continuous intelligence

Podcast host Mike Mason | Podcast guest Christoph Windheuser and Sheroy Marker

October 04, 2018 | 21 min 33 sec

Read transcript

Listen on these platforms

Brief summary

Join Thoughtworks’ global head of technology Mike Mason and his guests Christoph Windheuser, Thoughtworks’ global lead in artificial intelligence, and Sheroy Marker, head of technology at Thoughtworks Products, as they explore how to choose the the right data and AI model, to create complex models that are capable of continuously learning.

Podcast Transcript

Mike Mason:

My name is Mike Mason from Thoughtworks. I'm here with my two colleagues, Christoph Windheuser and Sheroy Marker and we're going to talk about continuous intelligence today. So Christoph, why don't you give us a little introduction of yourself and roughly what you do for Thoughtworks.

Christoph Windheuser:

Hello everybody. My name is Christoph Windheuser. I'm based in Germany, actually here in San Francisco in the Thoughtworks office. At Thoughtworks, I'm the global SME for the Offering Intelligent Empowerment, so I'm looking for this offering and rolling it out into the different Thoughtworks countries. And then I'm supporting our sales teams, our CPs. I'm working with them on different pursuits and pushing this topic also with capability so that we build up these offerings and make it one of our really important offerings for our clients.

Mike Mason:

And Sheroy.

Sheroy Marker:

Hi. I'm Sheroy Marker. I currently work as the Head of Technology for Thoughtworks products, and one of the products we build is GoCD, which is our continuous delivery server. I'm here to talk about some CD related aspects to machine learning algorithms.

Mike Mason:

Yeah. Today's topic is broadly continuous intelligence. I think everybody realizes the importance of data in today's world. The increasing emphasis on the things that organizations can do with data, the importance of being data-driven as opposed to just the Hippo model. The highest paid person's opinion is the Hippo model for decision making. But more and more we're seeing this kind of drive towards data being a decision making tool.

Mike Mason:

But more than that, we're also seeing machine learning and kind of advanced techniques becoming a much more powerful tool for building interesting software. Machine learning lets you build stuff that can better support someone's experience, better anticipate a consumer's needs and provide things to them in a more frictionless manner. With the rise of data and the rise of machine learning, what is the problem that we run into? Christoph.

Christoph Windheuser:

That is actually a very good point. So the data today is there, we see it at our clients, they have tons of datas. They are usually in silos or in ERP or traditional databases, and so they are not easily accessible for data scientists and machine learning, that is one of the problems. And after doing some easy proof of concept they're approaching us and saying, "We want to scale on that. We want to do it on a broader way. We want to use it in the whole company and not just in little little proof of concepts." And then they suddenly need a data infrastructure. So something like a data lake maybe in the cloud environment or on premise and we have the data scientists and the machine learning programs really can work on, so that's the first issue.

Christoph Windheuser:

The other problem is, and that is something we would like to talk about today here in this podcast is, it is kind of easy with tutorials to train TensorFlow on a problem, on a given data set, even the client maybe has his own data and he trains a decision. For example, this is fraud. This is not fraud. It seems to work, some good results in the learning curve. But what then, because when you train on some data, the data is already outdated. So data is continuously changing in our environment. Maybe you think of a recommender for a retailer, so you can train it on what the clients are buying on the platform. So you can recommend things to other clients who have a similar buying behavior, works fine, but this is changing all the time. So you have different weather, different seasons, different [inaudible 00:03:53] and stuff. So we have to continuously retrain this trainer.

Mike Mason:

Even the products being sold would be dynamic, right?

Christoph Windheuser:

Absolutely.

Mike Mason:

New products would be coming in and you wouldn't have any recommended behavior about that thing if you had not been trained on that...

Christoph Windheuser:

Exactly. So what the retailer is doing, he has to retrain. Data is changing, the behavior is changing, he has to retrain that. So get new perimeters, new weights, but doing this by hand. So changing the weights in a productive platform and doing the testing that is very cumbersome, that is very error prone. And Thoughtworks actually has a very good experience in doing it in continuously way, what we call the Continuous Delivery, Continuous Integration. And so we have GoCD for example as a great tool to do that. And that is something we are bringing together and showing the customer how to do this retraining continuously more or less automatically with pipelines, with GoCD and this stuff in excellent quality.

Sheroy Marker:

So we are talking about continuous intelligence in the context of data engineering pipelines. So typically you would develop machine learning models for inference purposes and train them initially with a training data set and then put it out in production. But then the data that you use to train these models is changing rapidly. So how do you continuously improve these machine learning algorithms on a continuous basis? So can we borrow from some of the CD concepts that we're used to for art making improvements or rolling out changes to software frequently? Are the same concept still applicable in this context? So Christoph, traditionally continuous delivery was all about making sure that any changes you made to software were tested and then rolled out in a sustainable sort of manner. Do those principles still apply to these machine learning models or do you also now include things like adding new data to training sets before you're actually make any changes to your models on a continuous basis?

Christoph Windheuser:

The principles are the same. Even if you add of course new data to the training set or you change the training set, that is the update of the continuous changement. So you have new products as we said for the recommender. So your training set is changing, but you know what is the principle of continuous delivery? Continuous integration is that you are able to deliver anytime into the production system in high quality. And that is the same what we would like to do for machine learning. So not just, "Okay, Oh shit I've trained, now I have to copy the weight somewhere else. I changed, I don't know, from Python to Java by hand, somebody reprogram it, it takes me weeks. I have to test again. And then I hit the button and make it productive." No, it should be, again, an automatic process. I should be able at any time, put the stuff I have retrained in the test, the development system, into the production system.

Sheroy Marker:

So which aspects of this are on the critical part of production for these recommender systems or any sort of machine learning, algorithms, mostly used for inference purposes or data engineering. So is all of this on the critical part to production you would think? And is there any difference between training a model per se and making changes to a model? Are both of those aspects of both of those changes to be considered to be synonymous or similar?

Christoph Windheuser:

Changing the training set of course is easier because you're not changing the model. If you change the model that has bigger impacts.

Mike Mason:

Just for clarity. By changing the model you mean something like moving from, I don't know, I'm not an expert here, but moving from say, linear regression to random forest, to a neural network of some sort. That would be a model change?

Christoph Windheuser:

Yeah.

Mike Mason:

Whereas a training data change would be the data that you feed into that model in order to set it up ready for production.

Christoph Windheuser:

Yeah. And we have a lot of different levels of what you can change. You can change completely the learning model, as you said, you could also change the architecture of model. So if you have a neural network, a backpropagation network, you could add additional layers, then you have a different architecture. You could put in convolutional networks or recurrent networks, you'd change the architecture. You have hyperparameters, learning rates, which you'd change. With hyper-parameters you also change the model in some way or you just change the training parameters which came out of the training. So different levels of possibly changes possible.

Sheroy Marker:

So that's interesting. So there are various things to consider in terms of what changes a model can undergo. So when you talk about changing, say the learning rate for example, that seems to be a small incremental change you might want to apply to the model and then test it multiple times in a iterative manner to see if the model is actually improving or becoming worse. So do you think we now need a specialized set of tooling to enable you to do these sorts of variations on a model to see how it behaves under certain stresses? Are the current set of tooling sufficient for these sorts of purposes or do you think we need more specialized tooling?

Christoph Windheuser:

That is an excellent question that would really play back to you because you are developing these tools like GoCD and that is something we are doing to use GoCD now in the learning environment and we are in this process. So trying that and setting up architectures to see how far we really can use for example, GoCD to do that, and what additional features we would need. So for example, one point is really important and I'm sure GoCD is able to do that. That is safeguarding the parameters and all of the parameters. So making the experiments of the training repeatable. So we worked with a client who didn't do that. They had their machine learning experts working on their laptops, optimizing the models, optimizing the parameters, everything. And then they said everything is ready and they move the stuff to production. But when this guy was ill or had an accident, the company couldn't repeat these experiments because everything on his laptop or in his brain and this is something that should not happen.

Sheroy Marker:

That's an interesting point because that's also a very core tenant of CD per se. So when we look at a CD pipeline and regenerate artifacts upstream in our pipeline, it's very important that we probably get the same artifact through the pipeline and in a repeatable manner. So if you trigger this pipeline multiple times, we triggered with exactly the same artifacts and artifacts don't vary anywhere from the time it was created to the final stages in the part of production. So the similar concepts could be applied to parameters that go into training a machine learning model for sure. I do think there are some opportunities for specialized tooling when it comes to the ways in which machine learning models are tested and retested before they're certified to be good because that's a pattern that's very specific to machine learning.

Mike Mason:

But I think some of that also comes down to the nature of a data platform that you might have in house for kind of managing this stuff. I read about, I think it's the Uber Michelangelo platform and the interesting thing about that is that it's a tool at the Uber scale of data, which is obviously a lot of bits and bytes, but the platform is managing datasets in a kind of a self service way for your data scientists to be able to get the right data set actually, run models against it and all of that kind of thing. But then also, to be able to take the output from those things and move them into production.

Mike Mason:

So I guess the question would be, what is the Michelangelo platform for the rest of us who are not at Uber and is there something in between the magical laptop for the data scientist versus doing every single model tweak in your CD pipeline. I think that also would not be a recommended solution because that's not really what you do. You still do some experimental work, but the intent would be in an environment that tracks the experiments or whether you're doing so you can reproduce it.

Christoph Windheuser:

Yeah, that is actually the problem. You have different environments. You need an environment for your data scientists. They feel at home on their laptop with their Python notebooks for example. They love R or Python, but these languages are not really well suited for full performance, highly scalable production environments. And so we are not using a Michelangelo that's proprietary from Uber. But what we have used in several of our projects is the environment H2O, you might know that. They have excellent learning models included, so random forest, also a neural network stuff is there and what is really nice, they can do an automatic migration or translation from Python to Java. And so that was something we used at AutoScout, which is a client in Europe. It's a car dealer, used car internet platform and we have dealt with the client's pricing estimation engine and the machine learning algorithm to estimate prices for cars.

Christoph Windheuser:

And the data scientists did this in this H2O environment. And then we translated that automatically into a Java or a J-A-R, JAR file. And then we used the pipelines and GoCD to test it, to test the results on our version. So put a tested data set into the algorithm, see the result, make the transition to Java and doing the same test again and then comparing the results and only if the results are exactly the same, we know the translation was correct, the parameters have been correctly moved over to the production system. And then we could give it free and put it automatically into the production system.

Mike Mason:

So you mentioned testing there and actually you were talking about actually testing the translation between the two languages for the model. But I'm actually curious about testing machine learning in general. Often ML models kind of are accused of being a black boxes. People might ask, "Well why was I not approved for my mortgage?" And well, unfortunately the model can only tell you "Because I added up these two numbers and it was less than 0.4 so you don't get approved." Which isn't particularly comforting to me as someone who's seeking mortgage. But the general question, we're trying to provide high quality, but we have this slight black box element to what we're doing. How do we test machine learning models?

Christoph Windheuser:

Usually you test the performance of machine learning algorithms by particular test sets. So you usually have a training set and of course you're not testing on the training set because then you're not testing the generalization ability of the machine learning algorithm, so you have an extra set which the algorithm haven't seen during training, it's new and then you test the performance on that and this gives you some idea how good your training has been and how good the performance of your algorithm is.

Mike Mason:

So you might be able to use, say a GoCD pipeline stage to run that testing against a previously untest data and say it needs to be at least this good by some quality.

Christoph Windheuser:

Exactly.

Sheroy Marker:

By some threshold of some sort.

Christoph Windheuser:

Yeah. That is something that you can set up with GoCD, that you automate this testing and you get the result and you compare the result with the threshold and this is then give you a green light to transport this stuff to the production center.

Mike Mason:

But then doesn't that run into the same issue as we had in the first place, which is that, production data is continually evolving and new things will happen all the time. Doesn't that mean that we then have to evolve the test data or is that part of it?

Christoph Windheuser:

Yeah, that is part of it. You have to evolve both, the training data actually and the test data. So what you do is, you have a big bunch of data and you take some part, maybe 20%, 10%, depends. You take away, you don't show these parts to the machine during training, but just your own testing.

Mike Mason:

And that data set that you're showing to the machine, so that must be also dynamic.

Christoph Windheuser:

Exactly.

Mike Mason:

Is that coming from yesterday's production dump or something like that?

Christoph Windheuser:

Yeah. We have to update this as well. Otherwise you would test against all data, which is not what you want.

Sheroy Marker:

And so how do some of these concepts map to unsupervised learning? Seems like a lot of the concepts we talked about were around supervised learning, fixed trainings sets and test sets and stuff like that. How does it work with unsupervised learning?

Christoph Windheuser:

To be honest, we don't have that many applications with clients for unsupervised learning. Unsupervised learning is something you can get some statistical knowledge out of data. So you use unsupervised learning for example for dimension reduction, when you have data with big vectorals and big data sets and you want to reduce that to smaller dimensionalities so that it's easier to handle that, you can do this with unsupervised learning. Where unsupervised learning can play a role is in the financial area with fraud detection because you do not know exactly what is the fraud. Maybe you know some frauds from history, from your training set you might have seen, "Okay this is a fraud." But usually fraud is something which is weird in some kind, which is different and that is something you can find out with unsupervised learning because it's just different from the others.

Mike Mason:

You gave an example earlier of AutoScout, the car retail company. I can give an example where we worked at a financial services company. They had a system where it would take up to six months to get a new machine learning model deployed into production because people were, fraud detection actually is the topic area, but what would happen is their data scientists would work with their data, but over the course of months, validate the new model and validation was a very slow process with lots of sign offs and so on and then put that into production. But you can imagine that a six month old model is actually not that great for keeping up with fraudsters.

Mike Mason:

So the team built an interesting piece of platform, and we've released it as open source software actually, which will allow you to easily promote machine learning models through a deployment pipeline and into production and actually lets you run multiple models in production. So I think they have a concept of the current master model, which is the one that's actually making live decisions about fraud or not fraud, but then you can have other models that are running against the same production data set and if they start to perform better, you can even have the platforms switching those models and start running those instead. So I thought that was a good example of kind of the continuous intelligence concept.

Christoph Windheuser:

I have a question to Sheroy actually. You and the development group for GoCD, do you get a lot of requests from Thoughtworkers how to use GoCD in the machine learning environment and are you planning to develop more and new features in that direction?

Sheroy Marker:

So I think we've seen GoCD used in one instance with data engineering pipeline quite substantially. We've also seen other CD tools used in conjunction with data engineering pipeline tooling, but mostly so far they've been used as both flow orchestrators to either build and deploy machine learning models into data engineering pipeline. There hasn't been a layer of abstraction on top of that workflow orchestration capability that additionally helps you with either training or retraining models. So that is something that's been on our backlog that we will dig into at some point to see if that's tooling that we should start building.

Christoph Windheuser:

Yeah, I think that that will become really, really important features. Maybe one hint who is interested. Thoughtworks will be present on the world summit AI, which is a big, big AI event in Amsterdam in October, actually 10th and 11th of October. And we will run a workshop, a hands on workshop there on intelligence empowerment. So at the moment we are building up an infrastructure with a GoCD server, a machine learning environment and tasks and we will show live on the laptops of the participants, how to change the training set, how to transport that through the pipelines, make the tests the green light and then put it into a production environment.

Mike Mason:

So I'd like to thank both Christoph and Sheroy for joining me on today's podcast. And if you are interested in continuous intelligence, please look it up, have a look on Thoughtworks, thoughtworks.com and you can find out more there.

Mike Mason:

Thanks very much everyone.

Christoph Windheuser:

Thank you very much for listening. Bye bye.

View full transcript

View less

More episodes

Episode name

Published

Decoding DeepSeek

February 06, 2025

AI testing, benchmarks and evals

January 23, 2025

Exploring the intersections of software architecture

January 09, 2025

Who should make software architecture decisions?

December 26, 2024

Generative AI's uncanny valley: Problem or opportunity?

December 12, 2024

Using generative AI for legacy modernization

November 28, 2024

Data contracts: What are they and why do they matter?

November 14, 2024

Themes from Technology Radar Vol.31

October 17, 2024

Build Your Own Radar: Using the Technology Radar as a governance tool

October 03, 2024

Exploring DuckDB: A relational database built for online analytical processing

September 19, 2024

Software service granularity: Getting it right

September 05, 2024

Measuring developer experience

August 22, 2024

How can AI support designers?

August 08, 2024

Sensible defaults: A way to think about our technology practices

July 25, 2024

Tracking technology stacks, practices and experiences across teams

July 11, 2024

Inside Bahmni: An open-source digital public good

June 27, 2024

How to assess your organization's security maturity

June 13, 2024

Continuous delivery vs. continuous deployment: What should be the default?

May 30, 2024

Themes from Technology Radar Vol.30

May 16, 2024

Building at the intersection of machine learning and software engineering

May 02, 2024

Refactoring with AI

April 18, 2024

How to measure your cloud carbon footprint

April 04, 2024

Technology through the Looking Glass: Preparing for 2024 and beyond

March 21, 2024

Diving head first into software architecture

March 07, 2024

Exploring the building blocks of distributed systems

February 22, 2024

Software-defined vehicles: The future of the automotive industry?

February 08, 2024

Beyond the DORA metrics: Measuring engineering excellence

January 25, 2024

Asynchronous collaboration: Getting it right

January 11, 2024

Looking back at key themes across technology in 2023

December 28, 2023

Leveraging generative AI at Bosch

December 14, 2023

Jugalbandi: Building with AI for social impact

November 30, 2023

AI-assisted coding: Experiences and perspectives

November 16, 2023

What's it like to maintain an award-winning open source tool?

November 02, 2023

Engineering platforms and golden paths: Building better developer experiences

October 19, 2023

Managing cost efficiency at scale-ups

October 03, 2023

Exploring SQL and ETL

September 21, 2023

Driving innovation in radio astronomy

September 07, 2023

XR with impact: Building experiences that drive business value

August 24, 2023

Leadership styles in technology teams

August 10, 2023

Making design matter in technology organizations

July 27, 2023

Generative AI and the future of knowledge work

July 13, 2023

Scaling mobile delivery

June 29, 2023

Making privacy a first-class citizen in data science

June 15, 2023

Multi-cloud: Exploring the challenges and opportunities

June 01, 2023

Scaling up at Etsy

May 18, 2023

TinyML: Bringing machine learning to the edge

May 04, 2023

The weaponization of complexity

April 20, 2023

How we put together the Technology Radar

April 06, 2023

Inside India's Drug Discovery Hackathon

March 23, 2023

Serverless in 2023

March 09, 2023

My Thoughtworks journey: Rebecca Parsons

February 23, 2023

How to tackle friction between product and engineering in scale-ups

February 09, 2023

6 key technology trends for 2023

January 26, 2023

Tackling system complexity with domain-driven design

January 12, 2023

Shifting left on accessibility

December 29, 2022

Data Mesh revisited

December 15, 2022

Low-code/no-code platforms: The 10% trap and the limits of abstractions

December 01, 2022

Welcome to the fediverse: Exploring Mastodon, ActivityPub and beyond [Special]

November 24, 2022

Rethinking software governance: Reflecting on the second edition of Building Evolutionary Architectures

November 17, 2022

Reckoning with the force of Conway's Law

November 03, 2022

Exploring the Basal Cost of software

October 20, 2022

Why full-stack testing matters

October 05, 2022

Acknowledging and addressing technical debt in startups and scale-ups

September 22, 2022

XR in practice: the engineering challenges of extending reality

September 08, 2022

Agent-based modelling for epidemiology: EpiRust and BharatSim

August 19, 2022

Mastering architectural metrics

August 12, 2022

Building a culture of innovation

July 28, 2022

Starting out with sensible default practices

July 14, 2022

Better testing through mutations

June 30, 2022

Patterns of legacy displacement — Part two

June 16, 2022

Patterns of legacy displacement — Part one

June 02, 2022

Mitigating cognitive bias when coding

May 19, 2022

Following an usual career path: from dev to CEO

May 05, 2022

Software engineering with Dave Farley

April 21, 2022

Tackling bottlenecks at scale-ups

April 07, 2022

Coding lessons from the pandemic

March 24, 2022

Is there ever a good time for a code freeze?

March 10, 2022

Navigating the perils of multicloud

February 25, 2022

Compliance as a product

February 10, 2022

The big five tech trends for 2022

January 27, 2022

Fluent Python revisited

January 13, 2022

Creating a developer platform for a networked-enabled organization

December 30, 2021

The art of Lean inceptions

December 16, 2021

The hard parts of data architecture

December 02, 2021

TDD for today

November 18, 2021

You can't buy integration

November 04, 2021

The rise of NoSQL

October 21, 2021

The hard parts of software architecture

October 07, 2021

Machine learning in the wild

September 24, 2021

Delivering innovation at scale

September 09, 2021

Securing the software supply chain

August 12, 2021

Making retrospectives effective — and fun

July 22, 2021

Patterns of distributed systems

July 08, 2021

Refactoring databases — or evolutionary database design

June 24, 2021

Making developer effectiveness a reality

June 10, 2021

Team topologies and effective software delivery

May 20, 2021

How green is your cloud?

May 07, 2021

Green software engineering

April 22, 2021

Twenty years of agile

April 08, 2021

Talking with tech leads with Pat Kua

March 25, 2021

My Thoughtworks Journey: Patricia Mandarino

March 11, 2021

Exploring infrastructure as code

February 25, 2021

XR in the enterprise

February 11, 2021

Getting to grips with data visualization

January 21, 2021

Computational notebooks: the benefits and pitfalls

January 07, 2021

The architect elevator

December 24, 2020

The future of Clojure

December 10, 2020

The future of digital trust

November 27, 2020

Integration challenges in an ERP-heavy world — Pt 2

November 12, 2020

Democratizing programming

October 28, 2020

Integration challenges in an ERP-heavy world

October 16, 2020

Models of open sourcing software

October 01, 2020

Applying software engineering practices to data science

September 17, 2020

Using visualization tools to understand large polyglot code bases

September 03, 2020

Machine learning in astrophysics

August 20, 2020

Programming languages geek out

August 06, 2020

Observability does not equal monitoring

July 23, 2020

Working with 50% of code in the browser

July 09, 2020

Realising the full potential of CD

June 25, 2020

Testing the user journey

June 12, 2020

Continuous delivery in the wild

June 01, 2020

Lessons from a remote Tech Radar

May 13, 2020

The future of Python

April 30, 2020

A sensible approach to multi-cloud

April 17, 2020

Digital transformation: a tech perspective

April 02, 2020

IT delivery in unusual circumstances

March 20, 2020

Continuous delivery for today's enterprise

March 06, 2020

Fundamentals of Software Architecture

February 21, 2020

Cloud migration — part two

February 10, 2020

The price of reuse

January 24, 2020

Towards self-serve infrastructure

January 13, 2020

Martin Fowler: my Thoughtworks journey

December 27, 2019

Building an autonomous drone

December 13, 2019

Cloud migration is a journey not a destination

November 28, 2019

Getting to grips with functional programming

November 14, 2019

Compliance as code

November 01, 2019

Data meshes: a distributed domain-oriented data platform

October 18, 2019

Edge — a guide to value-driven digital transformation

October 04, 2019

Tech choices: CIO or CTO?

September 20, 2019

Microservices as complex adaptive systems

September 05, 2019

Supporting the Citizen Developer

August 22, 2019

Getting hands-on with RESTful web services

August 08, 2019

Zhong Tai: innovation in enterprise platforms from China

July 25, 2019

What’s so cool about micro frontends?

July 11, 2019

Unravelling the monoglot monopoly

June 27, 2019

Breaking down the barriers to innovation

June 13, 2019

Delivering strategic architectural transformation

May 30, 2019

Exploring programming languages via paradigms vs labels

May 16, 2019

Multicloud in a regulated environment

May 03, 2019

Can DevSecOps help secure the enterprise?

April 18, 2019

A11Y — Making web accessibility easier

April 04, 2019

Continuous delivery for modern architectures

March 21, 2019

Delivering developer value through platform thinking

March 07, 2019

Architectural governance: rethinking the Department of ‘No’

February 21, 2019

Serendipitous Events

February 08, 2019

Diving into serverless architecture

January 24, 2019

Seismic Shifts

January 10, 2019

Understanding bias in algorithmic systems

December 28, 2018

Microservices: The State of the Art

December 14, 2018

Evolving Interactions

November 29, 2018

The state of API design

November 15, 2018

How we build the Tech Radar

November 01, 2018

IoT Hardware

October 18, 2018

Continuous Intelligence

October 04, 2018

Distributed systems antipatterns

September 13, 2018

Agile Data Science

August 23, 2018

Services

Industries

Resource Hubs

Publications and Tools

All Insights

Continuous Intelligence

Brief summary

Check out the latest edition of the Technology Radar