Applying continuous delivery in machine learning (CD4ML) projects is hard, for a few reasons:
- Two worlds (software and data) have collided in recent years, and it takes time and experience for data practitioners to adopt continuous delivery principles and practices (and vice versa!)
- Data tools and platforms are shipped so quickly by cloud providers, and they often focus on storage and compute, leaving CI/CD practices (e.g. unit testing, test data management) as second-class considerations to be figured out by teams
- It’s easy to choose a tool or platform, and find ourselves locked in and limited by the tool’s API
In our experience, we use CD4ML sensible defaults as a north star to help us navigate through this chaotic environment. Instead of looking for a single data platform as silver bullet, we’ve had greater success by:
- Composing implementations from first principles (such as automated testing, shifting quality left, post-deployment monitoring, etc.)
- Preferring composition over monolithic platforms
If you’re interested in discussing how we could help you on this journey, or want to chat about how you’re tackling them, we’d love to hear from you!
Disclaimer: The statements and opinions expressed in this article are those of the author(s) and do not necessarily reflect the positions of Thoughtworks.