Factored Verification: Detecting and Reducing Hallucination in Summaries   of Academic Papers

Charlie George; Andreas Stuhlm\"uller

arXiv:2310.10627·cs.CL·October 17, 2023·2 cites

Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers

Charlie George, Andreas Stuhlm\"uller

PDF

Open Access 1 Repo

TL;DR

This paper introduces Factored Verification, a new automated method that detects and reduces hallucinations in summaries of academic papers, improving accuracy and providing insights into model hallucination rates.

Contribution

The paper presents Factored Verification, a novel approach that sets a new state-of-the-art in hallucination detection and demonstrates its effectiveness in reducing hallucinations in academic paper summaries.

Findings

01

Factored Verification achieves 76.2% accuracy on HaluEval benchmark.

02

Models hallucinate 0.62 to 1.55 times per summary, depending on the model.

03

Self-correction with Factored Critiques reduces hallucinations significantly.

Abstract

Hallucination plagues even frontier LLMs--but how bad is it really for summarizing academic papers? We evaluate Factored Verification, a simple automated method for detecting hallucinations in abstractive summaries. This method sets a new SotA on hallucination detection in the summarization task of the HaluEval benchmark, achieving 76.2% accuracy. We then use this method to estimate how often language models hallucinate when summarizing across multiple academic papers and find 0.62 hallucinations in the average ChatGPT (16k) summary, 0.84 for GPT-4, and 1.55 for Claude 2. We ask models to self-correct using Factored Critiques and find that this lowers the number of hallucinations to 0.49 for ChatGPT, 0.46 for GPT-4, and 0.95 for Claude 2. The hallucinations we find are often subtle, so we advise caution when using models to synthesize academic papers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

elicit/fave-dataset
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Advanced Text Analysis Techniques

MethodsAttention Is All You Need · Softmax · Residual Connection · Absolute Position Encodings · Layer Normalization · Dense Connections · Linear Layer · Multi-Head Attention · Adam · Position-Wise Feed-Forward Layer