Investigating the Robustness of Retrieval-Augmented Generation at the Query Level

Sezen Per\c{c}in; Xin Su; Qutub Sha Syed; Phillip Howard; Aleksei Kuvshinov; Leo Schwinn; Kay-Ulrich Scholl

arXiv:2507.06956·cs.CL·July 10, 2025

Investigating the Robustness of Retrieval-Augmented Generation at the Query Level

Sezen Per\c{c}in, Xin Su, Qutub Sha Syed, Phillip Howard, Aleksei Kuvshinov, Leo Schwinn, Kay-Ulrich Scholl

PDF

Open Access 1 Video

TL;DR

This paper examines how small changes in input queries affect the performance of retrieval-augmented generation systems, highlighting their sensitivity and proposing a framework for robustness evaluation.

Contribution

It introduces a systematic evaluation framework for query-level robustness in RAG systems and provides extensive experimental analysis with practical recommendations.

Findings

01

Retrievers' performance degrades significantly with minor query perturbations

02

End-to-end RAG systems are highly sensitive to input variations

03

Proposed evaluation framework enables systematic robustness assessment

Abstract

Large language models (LLMs) are very costly and inefficient to update with new information. To address this limitation, retrieval-augmented generation (RAG) has been proposed as a solution that dynamically incorporates external knowledge during inference, improving factual consistency and reducing hallucinations. Despite its promise, RAG systems face practical challenges-most notably, a strong dependence on the quality of the input query for accurate retrieval. In this paper, we investigate the sensitivity of different components in the RAG pipeline to various types of query perturbations. Our analysis reveals that the performance of commonly used retrievers can degrade significantly even under minor query variations. We study each module in isolation as well as their combined effect in an end-to-end question answering setting, using both general-domain and domain-specific datasets.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Investigating the Robustness of Retrieval-Augmented Generation at the Query Level· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Information Retrieval and Search Behavior