RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models
Cheng Niu, Yuanhao Wu, Juno Zhu, Siliang Xu, Kashun Shum, Randy Zhong,, Juntong Song, Tong Zhang

TL;DR
RAGTruth is a comprehensive dataset designed to analyze and measure hallucinations in retrieval-augmented language models, enabling better detection and mitigation strategies for more trustworthy LLM outputs.
Contribution
The paper introduces RAGTruth, a detailed corpus with manual annotations for word-level hallucination analysis in RAG-based LLMs, and demonstrates its utility in benchmarking and improving hallucination detection methods.
Findings
RAGTruth contains nearly 18,000 annotated responses from diverse LLMs.
Existing hallucination detection methods vary in effectiveness across models.
Fine-tuning small LLMs on RAGTruth can achieve competitive hallucination detection performance.
Abstract
Retrieval-augmented generation (RAG) has become a main technique for alleviating hallucinations in large language models (LLMs). Despite the integration of RAG, LLMs may still present unsupported or contradictory claims to the retrieved contents. In order to develop effective hallucination prevention strategies under RAG, it is important to create benchmark datasets that can measure the extent of hallucination. This paper presents RAGTruth, a corpus tailored for analyzing word-level hallucinations in various domains and tasks within the standard RAG frameworks for LLM applications. RAGTruth comprises nearly 18,000 naturally generated responses from diverse LLMs using RAG. These responses have undergone meticulous manual annotations at both the individual cases and word levels, incorporating evaluations of hallucination intensity. We not only benchmark hallucination frequencies across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Text Readability and Simplification
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Weight Decay · WordPiece · Softmax · Label Smoothing · Adam
