Sensible defaults for CD4ML

Applying continuous delivery in machine learning (CD4ML) projects is hard, for a few reasons:

  • Two worlds (software and data) have collided in recent years, and it takes time and experience for data practitioners to adopt continuous delivery principles and practices (and vice versa!)
  • Data tools and platforms are shipped so quickly by cloud providers, and they often focus on storage and compute, leaving CI/CD practices (e.g. unit testing, test data management) as second-class considerations to be figured out by teams
  • It’s easy to choose a tool or platform, and find ourselves locked in and limited by the tool’s API


In our experience, we use CD4ML sensible defaults as a north star to help us navigate through this chaotic environment. Instead of looking for a single data platform as silver bullet, we’ve had greater success by:

  • Composing implementations from first principles (such as automated testing, shifting quality left, post-deployment monitoring, etc.)
  • Preferring composition over monolithic platforms


