TempPerturb-Eval: On the Joint Effects of Internal Temperature and External Perturbations in RAG Robustness
Yongxin Zhou, Philippe Mulhem, Didier Schwab

TL;DR
This paper systematically investigates how internal temperature settings and external text perturbations interact to affect the robustness of Retrieval-Augmented Generation systems, providing new diagnostic tools and guidelines.
Contribution
It introduces a comprehensive analysis framework and benchmark for understanding perturbation-temperature interactions in RAG systems, with practical implications.
Findings
High-temperature settings increase vulnerability to perturbations.
Certain perturbation types show non-linear sensitivity across temperatures.
Performance degradation patterns vary with perturbation and temperature combinations.
Abstract
The evaluation of Retrieval-Augmented Generation (RAG) systems typically examines retrieval quality and generation parameters like temperature in isolation, overlooking their interaction. This work presents a systematic investigation of how text perturbations (simulating noisy retrieval) interact with temperature settings across multiple LLM runs. We propose a comprehensive RAG Perturbation-Temperature Analysis Framework that subjects retrieved documents to three distinct perturbation types across varying temperature settings. Through extensive experiments on HotpotQA with both open-source and proprietary LLMs, we demonstrate that performance degradation follows distinct patterns: high-temperature settings consistently amplify vulnerability to perturbations, while certain perturbation types exhibit non-linear sensitivity across the temperature range. Our work yields three key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Data Visualization and Analytics
