Localizing and Mitigating Errors in Long-form Question Answering

Rachneet Sachdeva; Yixiao Song; Mohit Iyyer; Iryna Gurevych

arXiv:2407.11930·cs.CL·June 4, 2025

Localizing and Mitigating Errors in Long-form Question Answering

Rachneet Sachdeva, Yixiao Song, Mohit Iyyer, Iryna Gurevych

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper introduces HaluQuestQA, a dataset with localized error annotations for long-form question answering, and develops models and methods to detect, analyze, and reduce errors, significantly improving answer quality.

Contribution

The work presents the first hallucination dataset with span-level error annotations for LFQA, along with a feedback model and a prompt-based refinement approach to enhance answer accuracy.

Findings

01

Error-informed refinement reduces hallucinations in generated answers.

02

Humans prefer answers refined with our method (84%).

03

The dataset enables detailed analysis of LFQA errors.

Abstract

Long-form question answering (LFQA) aims to provide thorough and in-depth answers to complex questions, enhancing comprehension. However, such detailed responses are prone to hallucinations and factual inconsistencies, challenging their faithful evaluation. This work introduces HaluQuestQA, the first hallucination dataset with localized error annotations for human-written and model-generated LFQA answers. HaluQuestQA comprises 698 QA pairs with 1.8k span-level error annotations for five different error types by expert annotators, along with preference judgments. Using our collected data, we thoroughly analyze the shortcomings of long-form answers and find that they lack comprehensiveness and provide unhelpful references. We train an automatic feedback model on this dataset that predicts error spans with incomplete information and provides associated explanations. Finally, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ukplab/arxiv2024-lfqa-hallucination
pytorchOfficial

Datasets

UKPLab/HaluQuestQA
dataset· 23 dl
23 dl

Videos

Localizing and Mitigating Errors in Long-form Question Answering· underline

Taxonomy

TopicsSeismology and Earthquake Studies