A Survey of Automatic Hallucination Evaluation on Natural Language Generation

Siya Qi; Lin Gui; Yulan He; Zheng Yuan

arXiv:2404.12041·cs.CL·October 22, 2025·2 cites

A Survey of Automatic Hallucination Evaluation on Natural Language Generation

Siya Qi, Lin Gui, Yulan He, Zheng Yuan

PDF

Open Access

TL;DR

This survey systematically analyzes 105 methods for automatic hallucination evaluation in natural language generation, highlighting current limitations and proposing a structured framework and future directions to improve model trustworthiness.

Contribution

It provides a comprehensive taxonomy and framework for evaluating hallucinations in LLMs, addressing fragmentation and guiding future research directions.

Findings

01

77.1% of methods target LLMs

02

Identified fundamental limitations in current approaches

03

Proposed strategic directions for future evaluation systems

Abstract

The rapid advancement of Large Language Models (LLMs) has brought a pressing challenge: how to reliably assess hallucinations to guarantee model trustworthiness. Although Automatic Hallucination Evaluation (AHE) has become an indispensable component of this effort, the field remains fragmented in its methodologies, limiting both conceptual clarity and practical progress. This survey addresses this critical gap through a systematic analysis of 105 evaluation methods, revealing that 77.1% specifically target LLMs, a paradigm shift that demands new evaluation frameworks. We formulate a structured framework to organize the field, based on a survey of foundational datasets and benchmarks and a taxonomy of evaluation methodologies, which together systematically document the evolution from pre-LLM to post-LLM approaches. Beyond taxonomical organization, we identify fundamental limitations in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Digital Mental Health Interventions