Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA

Saahil Mathur; Ryan David Rittner; Vedant Ajit Thakur; Daniel Stuart Schiff; Tunazzina Islam

arXiv:2603.24580·cs.CL·March 26, 2026

Retrieval Improvements Do Not Guarantee Better Answers: A Study of RAG for AI Policy QA

Saahil Mathur, Ryan David Rittner, Vedant Ajit Thakur, Daniel Stuart Schiff, Tunazzina Islam

PDF

Open Access

TL;DR

This study investigates how improvements in retrieval components of RAG systems do not always lead to better, more reliable answers in complex AI policy domains, highlighting the need for holistic system evaluation.

Contribution

The paper demonstrates that enhancing retrieval quality alone does not guarantee improved answer accuracy in policy-focused RAG systems, emphasizing the importance of comprehensive evaluation.

Findings

01

Domain-specific fine-tuning improves retrieval metrics.

02

Stronger retrieval can increase hallucinations when relevant documents are missing.

03

Component improvements do not necessarily enhance end-to-end answer quality.

Abstract

Retrieval-augmented generation (RAG) systems are increasingly used to analyze complex policy documents, but achieving sufficient reliability for expert usage remains challenging in domains characterized by dense legal language and evolving, overlapping regulatory frameworks. We study the application of RAG to AI governance and policy analysis using the AI Governance and Regulatory Archive (AGORA) corpus, a curated collection of 947 AI policy documents. Our system combines a ColBERT-based retriever fine-tuned with contrastive learning and a generator aligned to human preferences using Direct Preference Optimization (DPO). We construct synthetic queries and collect pairwise preferences to adapt the system to the policy domain. Through experiments evaluating retrieval quality, answer relevance, and faithfulness, we find that domain-specific fine-tuning improves retrieval metrics but does…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Expert finding and Q&A systems · Multimodal Machine Learning Applications