MLOps on AWS: Five tips for successful adoption

Randy DeFauw and

Eric Nagler

Published: February 04, 2022

With 87% of data science projects not making it into production, it’s time to break down the barriers between data scientists, developers and operations teams. MLOps can help. Here’s what it takes to adopt it successfully using AWS’ cloud services and Thoughtworks’ Continuous Delivery for Machine Learning (CD4ML) approach.

When you’re experimenting with data and leading machine learning (ML) innovation projects, you can’t realistically expect every model or capability you design to make it into the hands of users in live environments. In fact, just one in 10 data science projects make it into production.

One of the biggest drivers behind that figure is that across many organizations, data science and development teams still tend to operate in silos. Data scientists create new models to address ML use cases, then developers try to find ways of applying them, creating a cycle where neither reliably has their needs met, and ML ultimately fails to deliver a great deal of value to the business.

It’s a scenario we’ve seen before. It’s very reminiscent of the ‘dev versus ops’ challenges that led to the creation and widespread adoption of DevOps. So, naturally, high-performing teams are applying the same principles to help solve it — leading to the creation of MLOps.

MLOps aims to bring the best of DevOps into ML, reducing the friction between development, release, and operations — and creating a culture where all teams can work together to continuously improve machine learning systems and increase the value they deliver to the organization. Continuous delivery for machine learning (CD4ML) is the de facto approach applied by Thoughtworks to realize MLOps.

Just like DevOps, MLOps needs to be carefully implemented to deliver the right results. In a recent webinar, Thoughtworks Lead Data Engineer Eric Nagler, and Amazon Web Services Principal Solutions Architect Randy DeFauw carefully explored what it takes to enable MLOps success using AWS. Here are five key tips from their session.

#1) Harness automation to close the ML development loop

Building machine learning models and use cases is always a learning process. There’s a huge amount of testing and experimentation that goes into model building and model operationalization. The outcomes of a development process won’t always be perfect but what’s critical is that everyone (and every system) involved can learn from those outcomes and use them to improve outputs next time around — turning the development process into a development cycle.

“That’s where automation becomes extremely valuable for MLOps,” said Eric. “If we can automate learning and feedback, and reduce the time it takes to start the development process over, we can run the cycle more times, helping us deliver models and use cases that are innovative, valuable and fit for purpose faster.”

#2) Eliminate duplicated effort with platform thinking

One of the main goals of MLOps is to help make it easier for development and operations teams to build and deploy ML use cases faster.

“With MLOps, the goal is to build capabilities on a platform that makes it easy for everyone to access and apply the work done by data scientists,” explained Eric. “We don’t need to reinvent the wheel every time a team wants to put out a new product, we just need to use platforms to make sure they can easily access the work that data scientists have already done and apply it themselves for their use case. That speeds everything up for everyone.”

By applying platform thinking, organizations can reduce data scientist workloads significantly, and make ML capabilities more accessible while also eliminating costly rework and duplicated effort.

#3) Work backwards together to identify the business problems you need to solve

Just like in DevOps and agile development, MLOps is all about ensuring that technology is applied in the right way to meet real business needs, and enabling stakeholders to work together towards those common goals.

“Working backwards from business challenges to understand how to apply technology is even more important in MLOps,” said Randy. “It’s moving from the art of the possible to the art of the practical. And it’s very likely the most important step of all in ensuring that machine learning delivers strong value for the business.”

Randy introduced three essential requirements teams need to have in place to do that effectively:

● Measurable results that can clearly be mapped to business value. If you can’t see how your ML use cases are contributing to business strategy, you can’t determine if they’re delivering value in the right way

● Realistic goals that can feasibly be delivered with the capabilities and resources you have. A clear path to production is a must-have for any potential ML use case.

● High data availability to ensure that data can easily be operationalized and used to bring the desired use case to life easily

#4) Leverage capabilities available to facilitate easy, effective experimentation

“During the iterative, rapid prototyping part of model development, data science teams are experimenting with a lot of different combinations of data inputs, parameters and algorithms,” said Randy. “Often, data scientists will record experiments and work within their own notebooks. But in MLOps, we need to record and track their work in a more open way, to help avoid the ‘it works in my notebook’ problem.”

For example, AWS has a wide range of Amazon SageMaker capabilities designed to help support rapid ML experimentation, including version control and tracking for notebooks and training scripts, as well as model artifacts and data sets stored across your data lakes.

Those tools are highly capable, but it’s important to recognize that adopting them does represent an evolution in workflows for data science teams and other stakeholders too. Teams should be guided through new journey tracking and version control processes, to ensure they understand how they need to work, and what their role is in tracking, recording, and upholding strong version control for MLOps.

#5) Prioritize people and cultural transformation to help everyone get the most from ML

“Machine learning isn’t an esoteric skillset that a specific group of data scientists has. It’s a horizontal enabler that’s useful for every part of every type of organization,” said Randy. “It’s hard to think of an area of the business that couldn’t use ML to make some part of their processes more efficient and more productive.”

So, for MLOps to be successful, it needs to be disseminated to every level of the organization. Practically, that means helping every person and every team understand what ML can help them achieve, and giving them the means to influence ML decision-making, and ultimately drive the company towards high-value ML use cases.

Closing the session, Randy shared five quick tips to help make sure that MLOps is effective across your organization:

● Ensure high-level executive sponsorship, and lead from the top to create and champion an MLOps culture

● Work with teams from across the business to discover the best places to apply machine learning

● Get your data in order, and improve data management and governance at every level to help accelerate ML development and experimentation cycles

● Build interdisciplinary teams to consider multiple perspectives on what might work best for the business, and give everyone a voice in ML decision-making

● Utilize process integration and automation to give all stakeholders more time to focus their attention where they can create and deliver the most value, and contribute the most to MLOps

Eric and Randy’s full session is available to watch on-demand any time. If you’d like to learn more about how Thoughtworks and AWS are working together to help organizations of all kinds achieve success and outcomes using machine learning, visit this page.

Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.