Enable javascript in your browser for better experience. Need to know to enable it? Go here.
Published : Nov 05, 2025
NOT ON THE CURRENT EDITION
This blip is not on the current edition of the Radar. If it was on one of the last few editions, it is likely that it is still relevant. If the blip is older, it might no longer be relevant and our assessment might be different today. Unfortunately, we simply don't have the bandwidth to continuously review blips from previous editions of the Radar. Understand more
Nov 2025
Trial ?

In the data-centric AI paradigm, improving data set quality often delivers greater performance gains than tuning the model itself. Cleanlab is an open-source Python library designed to address this challenge by automatically identifying common data issues — such as mislabeling, outliers and duplicates — across text, image, tabular and audio data sets. Built on the principle of confident learning, Cleanlab leverages model-predicted probabilities to estimate label noise and quantify data quality.

This model-agnostic approach enables developers to diagnose and correct data set errors, then retrain models for improved robustness and accuracy. Our teams have used Cleanlab successfully in production, confirming its effectiveness in real-world settings. We recommend it as a valuable tool for promoting data standardization and improving data set quality in AI engineering projects.

Download the PDF

 

 

 

English | Português 

Sign up for the Technology Radar newsletter

 

 

Subscribe now

Visit our archive to read previous volumes