Investigating the Robustness of Retrieval-Augmented Generation at the Query Level
Sezen Per\c{c}in, Xin Su, Qutub Sha Syed, Phillip Howard, Aleksei Kuvshinov, Leo Schwinn, Kay-Ulrich Scholl

TL;DR
This paper examines how small changes in input queries affect the performance of retrieval-augmented generation systems, highlighting their sensitivity and proposing a framework for robustness evaluation.
Contribution
It introduces a systematic evaluation framework for query-level robustness in RAG systems and provides extensive experimental analysis with practical recommendations.
Findings
Retrievers' performance degrades significantly with minor query perturbations
End-to-end RAG systems are highly sensitive to input variations
Proposed evaluation framework enables systematic robustness assessment
Abstract
Large language models (LLMs) are very costly and inefficient to update with new information. To address this limitation, retrieval-augmented generation (RAG) has been proposed as a solution that dynamically incorporates external knowledge during inference, improving factual consistency and reducing hallucinations. Despite its promise, RAG systems face practical challenges-most notably, a strong dependence on the quality of the input query for accurate retrieval. In this paper, we investigate the sensitivity of different components in the RAG pipeline to various types of query perturbations. Our analysis reveals that the performance of commonly used retrievers can degrade significantly even under minor query variations. We study each module in isolation as well as their combined effect in an end-to-end question answering setting, using both general-domain and domain-specific datasets.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Information Retrieval and Search Behavior
