HalluWorld: A Controlled Benchmark for Hallucination via Reference World Models
Emmy Liu, Varun Gangal, Michael Yu, Zhuofu Tao, Karan Singh, Sachin Kumar, Steven Y. Feng

TL;DR
HalluWorld introduces a standardized, reference-world benchmark to systematically study and measure hallucinations in language models across diverse controlled environments.
Contribution
It provides an extensible, synthetic benchmark grounded in explicit reference worlds to analyze hallucination causes and evaluate model performance.
Findings
Perceptual hallucinations are nearly solved in frontier models.
Multi-step state tracking and causal simulation remain challenging.
Models struggle with abstaining in terminal tasks.
Abstract
Hallucination remains a central failure mode of large language models, but existing benchmarks operationalize it inconsistently across summarization, question answering, retrieval-augmented generation, and agentic interaction. This fragmentation makes it unclear whether a mitigation that works in one setting reduces hallucinations across contexts. Current benchmarks either require human annotation and fixed references that may be memorized, or rely on observations in settings that are difficult to reproduce. To study root causes, we introduce HalluWorld, an extensible benchmark grounded in an explicit reference-world formulation: a model hallucinates when it produces an observable claim that is false with respect to this world. Building on this view, we construct synthetic and semi-synthetic environments in which the reference world is fully specified, the model's view is controlled,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
