Continuous delivery for machine learning (CD4ML)
With an increased popularity of ML-based applications, and the technical complexity involved in building them, our teams rely heavily on continuous delivery for machine learning (CD4ML) to deliver such applications safely, quickly and in a sustainable manner. CD4ML is the discipline of bringing CD principles and practices to ML applications. It removes long cycle times between training models and deploying them to production. CD4ML removes manual handoffs between different teams, data engineers, data scientists and ML engineers in the end-to-end process of build and deployment of a model served by an application. Using CD4ML, our teams have successfully implemented the automated versioning, testing and deployment of all components of ML-based applications: data, model and code.
Continuous delivery for machine learning (CD4ML) models apply continuous delivery practices to developing machine learning models so that they are always ready for production. This technique addresses two main problems of traditional machine learning model development: long cycle time between training models and deploying them to production, which often includes manually converting the model to production-ready code; and using production models that had been trained with stale data.
A continuous delivery pipeline of a machine learning model has two triggers: (1) changes to the structure of the model and (2) changes to the training and test data sets. For this to work we need to both version the data sets and the model's source code. The pipeline often includes steps such as testing the model against the test data set, applying automatic conversion of the model (if necessary) with tools such as H2O, and deploying the model to production to deliver value.