HALoGEN: Fantastic LLM Hallucinations and Where to Find Them

Abhilasha Ravichander; Shrusti Ghela; David Wadden; Yejin Choi

arXiv:2501.08292·cs.CL·January 15, 2025·3 cites

HALoGEN: Fantastic LLM Hallucinations and Where to Find Them

Abhilasha Ravichander, Shrusti Ghela, David Wadden, Yejin Choi

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces HALoGEN, a comprehensive benchmark with automatic verifiers to measure hallucinations in large language models across multiple domains, revealing high hallucination rates even in top models.

Contribution

The work provides a new benchmark and automatic verification framework for quantifying and analyzing hallucinations in LLMs, enabling more trustworthy model development.

Findings

01

Up to 86% hallucination rate in some models

02

Automatic verifiers effectively decompose and verify model outputs

03

Hallucinations stem from incorrect recollection, knowledge, or fabrication

Abstract

Despite their impressive ability to generate high-quality and fluent text, generative large language models (LLMs) also produce hallucinations: statements that are misaligned with established world knowledge or provided input context. However, measuring hallucination can be challenging, as having humans verify model generations on-the-fly is both expensive and time-consuming. In this work, we release HALoGEN, a comprehensive hallucination benchmark consisting of: (1) 10,923 prompts for generative models spanning nine domains including programming, scientific attribution, and summarization, and (2) automatic high-precision verifiers for each use case that decompose LLM generations into atomic units, and verify each unit against a high-quality knowledge source. We use this framework to evaluate ~150,000 generations from 14 language models, finding that even the best-performing models are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

lasha-nlp/HALoGEN-prompts
dataset· 25 dl
25 dl

Videos

HALoGEN: Fantastic LLM Hallucinations and Where to Find Them· underline

Taxonomy

TopicsMental Health and Psychiatry · Biofield Effects and Biophysics · Hallucinations in medical conditions