TRAQ: Trustworthy Retrieval Augmented Question Answering via Conformal Prediction
Shuo Li, Sangdon Park, Insup Lee, Osbert Bastani

TL;DR
TRAQ introduces a method that guarantees the correctness of retrieval-augmented question answering systems using conformal prediction, significantly reducing the size of prediction sets while ensuring high accuracy.
Contribution
TRAQ is the first approach to provide end-to-end statistical correctness guarantees for retrieval-augmented question answering systems.
Findings
TRAQ guarantees correctness with high probability.
It reduces prediction set size by 16.2% on average.
Experimental results validate the effectiveness of TRAQ.
Abstract
When applied to open-domain question answering, large language models (LLMs) frequently generate incorrect responses based on made-up facts, which are called . Retrieval augmented generation (RAG) is a promising strategy to avoid hallucinations, but it does not provide guarantees on its correctness. To address this challenge, we propose the Trustworthy Retrieval Augmented Question Answering, or , which provides the first end-to-end statistical correctness guarantee for RAG. TRAQ uses conformal prediction, a statistical technique for constructing prediction sets that are guaranteed to contain the semantically correct response with high probability. Additionally, TRAQ leverages Bayesian optimization to minimize the size of the constructed sets. In an extensive experimental evaluation, we demonstrate that TRAQ provides the desired correctness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBlockchain Technology Applications and Security · Topic Modeling · Artificial Intelligence in Healthcare and Education
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Weight Decay · Byte Pair Encoding · Linear Layer · Dense Connections · Attention Dropout · Residual Connection · Linear Warmup With Linear Decay
