Technology Radar

Bloom

Published : Apr 15, 2026

Apr 2026

Assess

Bloom is a tool from Anthropic for AI safety researchers evaluating LLM behavior. It probes for behaviors such as sycophancy and self-preservation. Compared to static benchmarks, it uses a seed configuration that defines target behaviors and evaluation parameters to dynamically generate diverse test conversations and then evaluate the results. This approach to automated behavioral evaluation is necessary to keep up with the pace of model releases, enabling external research teams to conduct evaluations. Petri is a companion tool that identifies which behaviors emerge for a given model, while Bloom identifies in what scenarios and how often those behaviors occur, together forming a more complete evaluation suite. One concern is that Bloom requires a teacher (or evaluator) model that assesses a given student model. Teacher models may have blind spots and biases, so using multiple evaluators can reduce bias in the results. AI safety research teams should assess Bloom as a complement to static benchmarks for evaluating emergent model behaviors.

Download the PDF

English | Português

Sign up for the Technology Radar newsletter

Subscribe now

Industrias

Publicaciones Digitales y Herramientas

Todos los Insights

Bloom

Download the PDF

Sign up for the Technology Radar newsletter

Visit our archive to read the previous volumes