RAGTruth: A Hallucination Corpus for Developing Trustworthy   Retrieval-Augmented Language Models

Cheng Niu; Yuanhao Wu; Juno Zhu; Siliang Xu; Kashun Shum; Randy Zhong,; Juntong Song; Tong Zhang

arXiv:2401.00396·cs.CL·May 20, 2024·5 cites

RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models

Cheng Niu, Yuanhao Wu, Juno Zhu, Siliang Xu, Kashun Shum, Randy Zhong,, Juntong Song, Tong Zhang

PDF

Open Access 3 Repos 1 Models 1 Datasets 1 Video

TL;DR

RAGTruth is a comprehensive dataset designed to analyze and measure hallucinations in retrieval-augmented language models, enabling better detection and mitigation strategies for more trustworthy LLM outputs.

Contribution

The paper introduces RAGTruth, a detailed corpus with manual annotations for word-level hallucination analysis in RAG-based LLMs, and demonstrates its utility in benchmarking and improving hallucination detection methods.

Findings

01

RAGTruth contains nearly 18,000 annotated responses from diverse LLMs.

02

Existing hallucination detection methods vary in effectiveness across models.

03

Fine-tuning small LLMs on RAGTruth can achieve competitive hallucination detection performance.

Abstract

Retrieval-augmented generation (RAG) has become a main technique for alleviating hallucinations in large language models (LLMs). Despite the integration of RAG, LLMs may still present unsupported or contradictory claims to the retrieved contents. In order to develop effective hallucination prevention strategies under RAG, it is important to create benchmark datasets that can measure the extent of hallucination. This paper presents RAGTruth, a corpus tailored for analyzing word-level hallucinations in various domains and tasks within the standard RAG frameworks for LLM applications. RAGTruth comprises nearly 18,000 naturally generated responses from diverse LLMs using RAG. These responses have undergone meticulous manual annotations at both the individual cases and word levels, incorporating evaluations of hallucination intensity. We not only benchmark hallucination frequencies across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
vectara/hallucination_evaluation_model
model· 72k dl· ♡ 348
72k dl♡ 348

Datasets

lytang/LLM-AggreFact
dataset· 1.1k dl
1.1k dl

Videos

RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models· underline

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Text Readability and Simplification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Weight Decay · WordPiece · Softmax · Label Smoothing · Adam