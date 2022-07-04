In the first part, we explained the phased approach to applying AI data models. This part will explore the challenges of each stage in greater detail.

Exploration phase

One of the main challenges of this phase is combinatorial explosion, multiple data processing steps, and multiple models, resulting in many more data preprocessing and model combinations that need to be compared and verified.

Assuming we are trying to train a classification model on a tabular dataset, we need to pick the column and model combination by selecting one certain column (e.g. Column A) or not and selecting Random Forest or LightGBM as classifier, so there are four different combinations that need to be verified now:

Has Column A, Random Forest model Has Column A, LightGBM model No Column A, Random Forest model No Column A, LightGBM model

Under normal circumstances, we will have more data processing and model selection. As the number of data and model combinations explode, we need to track several key pieces of information for each combination, including the configuration of data processing steps, training data, model hyperparameters, and the model metrics. Such information can help us complete experiment reproducibility.

While completing tasks in the exploration phase, we can use some mature tools. PyCaret ( https://pycaret.org/ ) provides a set of templates that are easy to understand and use to help us accelerate the exploration phase.

In terms of finding the most suitable models, PyCaret first provides a variety of models based on the type of problem to be solved such as Classification, Regression, Anomaly Detection, for each problem, it encapsulates a group of easy to use APIs, allowing users to train and evaluate multiple models at once.

For example, the following figure shows a simple line of compare_models. PyCaret trained 14 different classification models, and recorded a total of 8 metrics such as Accuracy, AUC, and Recall for each model.