Retrieving Supporting Evidence for Generative Question Answering
Siqing Huo, Negar Arabzadeh, Charles L. A. Clarke

TL;DR
This paper explores methods for automatically validating the correctness of answers generated by large language models using external corpus retrieval and verification techniques, aiming to reduce hallucinations.
Contribution
It introduces two verification experiments that assess generated answers against retrieved evidence, demonstrating over 80% accuracy in correctness verification.
Findings
Verification accuracy exceeds 80%
Manual review shows some incorrect answers are missed
Verification reduces but does not eliminate hallucinations
Abstract
Current large language models (LLMs) can exhibit near-human levels of performance on many natural language-based tasks, including open-domain question answering. Unfortunately, at this time, they also convincingly hallucinate incorrect answers, so that responses to questions must be verified against external sources before they can be accepted at face value. In this paper, we report two simple experiments to automatically validate generated answers against a corpus. We base our experiments on questions and passages from the MS MARCO (V1) test collection, and a retrieval pipeline consisting of sparse retrieval, dense retrieval and neural rerankers. In the first experiment, we validate the generated answer in its entirety. After presenting a question to an LLM and receiving a generated answer, we query the corpus with the combination of the question + generated answer. We then present the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBalanced Selection
