Retrieving Supporting Evidence for Generative Question Answering

Siqing Huo; Negar Arabzadeh; Charles L. A. Clarke

arXiv:2309.11392·cs.IR·September 29, 2023

Retrieving Supporting Evidence for Generative Question Answering

Siqing Huo, Negar Arabzadeh, Charles L. A. Clarke

PDF

TL;DR

This paper explores methods for automatically validating the correctness of answers generated by large language models using external corpus retrieval and verification techniques, aiming to reduce hallucinations.

Contribution

It introduces two verification experiments that assess generated answers against retrieved evidence, demonstrating over 80% accuracy in correctness verification.

Findings

01

Verification accuracy exceeds 80%

02

Manual review shows some incorrect answers are missed

03

Verification reduces but does not eliminate hallucinations

Abstract

Current large language models (LLMs) can exhibit near-human levels of performance on many natural language-based tasks, including open-domain question answering. Unfortunately, at this time, they also convincingly hallucinate incorrect answers, so that responses to questions must be verified against external sources before they can be accepted at face value. In this paper, we report two simple experiments to automatically validate generated answers against a corpus. We base our experiments on questions and passages from the MS MARCO (V1) test collection, and a retrieval pipeline consisting of sparse retrieval, dense retrieval and neural rerankers. In the first experiment, we validate the generated answer in its entirety. After presenting a question to an LLM and receiving a generated answer, we query the corpus with the combination of the question + generated answer. We then present the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsBalanced Selection