Enable javascript in your browser for better experience. Need to know to enable it? Go here.

LLM evaluation using semantic entropy

Published : Apr 15, 2026
Apr 2026
Assess ?

Confabulation, a form of hallucination in LLM QA applications, is difficult to address with traditional evaluation methods. One approach uses information entropy as a measure of uncertainty by analyzing lexical variation in outputs for a given input. LLM evaluation using semantic entropy extends this idea by focusing on differences in meaning rather than surface-level variation.

This approach evaluates meaning rather than word sequences, making it applicable across datasets and tasks without requiring prior knowledge. It generalizes well to unseen tasks, helping identify prompts likely to cause confabulations and encouraging caution when needed. Results show that naive entropy often fails to detect confabulations, while semantic entropy is more effective at filtering false claims.

Download the PDF

 

 

 

English | Português 

Sign up for the Technology Radar newsletter

 

 

Subscribe now

Visit our archive to read previous volumes