How You Ask Matters! Adaptive RAG Robustness to Query Variations
Yunah Jang, Megha Sundriyal, Kyomin Jung, and Meeyoung Cha

TL;DR
This paper introduces a large-scale benchmark to evaluate how Adaptive RAG systems handle semantically identical query variations, revealing significant robustness vulnerabilities despite larger models performing better.
Contribution
It presents the first comprehensive benchmark for assessing Adaptive RAG robustness to query surface variations, highlighting critical vulnerabilities in current methods.
Findings
Small query surface changes significantly affect retrieval and accuracy.
Larger models do not necessarily improve robustness.
Adaptive RAG systems are highly vulnerable to semantically identical query variations.
Abstract
Adaptive Retrieval-Augmented Generation (RAG) promises accuracy and efficiency by dynamically triggering retrieval only when needed and is widely used in practice. However, real-world queries vary in surface form even with the same intent, and their impact on Adaptive RAG remains under-explored. We introduce the first large-scale benchmark of diverse yet semantically identical query variations, combining human-written and model-generated rewrites. Our benchmark facilitates a systematic evaluation of Adaptive RAG robustness by examining its key components across three dimensions: answer quality, computational cost, and retrieval decisions. We discover a critical robustness gap, where small surface-level changes in queries dramatically alter retrieval behavior and accuracy. Although larger models show better performance, robustness does not improve accordingly. These findings reveal that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
