ANAH: Analytical Annotation of Hallucinations in Large Language Models

Ziwei Ji; Yuzhe Gu; Wenwei Zhang; Chengqi Lyu; Dahua Lin; Kai Chen

arXiv:2405.20315·cs.CL·May 31, 2024·1 cites

ANAH: Analytical Annotation of Hallucinations in Large Language Models

Ziwei Ji, Yuzhe Gu, Wenwei Zhang, Chengqi Lyu, Dahua Lin, Kai Chen

PDF

Open Access 1 Repo 2 Models 1 Datasets

TL;DR

ANAH is a bilingual dataset with detailed annotations of hallucinations in LLM-generated answers, enabling better measurement, training, and evaluation of hallucination detection and correction methods.

Contribution

The paper introduces ANAH, a comprehensive dataset with fine-grained hallucination annotations for LLM answers, and demonstrates its effectiveness in training and evaluating hallucination annotators.

Findings

01

Generative annotators trained on ANAH outperform open-source LLMs.

02

ANAH enables training models that approach GPT-4's performance.

03

Fine-grained annotations help understand hallucination accumulation in LLMs.

Abstract

Reducing the ` $hallucination$ ' problem of Large Language Models (LLMs) is crucial for their wide applications. A comprehensive and fine-grained measurement of the hallucination is the first key step for the governance of this issue but is under-explored in the community. Thus, we present $ANAH$ , a bilingual dataset that offers $AN$ alytical $A$ nnotation of $H$ allucinations in LLMs within Generative Question Answering. Each answer sentence in our dataset undergoes rigorous annotation, involving the retrieval of a reference fragment, the judgment of the hallucination type, and the correction of hallucinated content. ANAH consists of ~12k sentence-level annotations for ~4.3k LLM responses covering over 700 topics, constructed by a human-in-the-loop pipeline. Thanks to the fine granularity of the hallucination annotations, we can quantitatively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

open-compass/anah
pytorchOfficial

Models

Datasets

opencompass/anah
dataset· 164 dl
164 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Anomaly Detection Techniques and Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Label Smoothing · Adam · Position-Wise Feed-Forward Layer · Dropout · Dense Connections · Absolute Position Encodings · Softmax