DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation
Jiashuo Sun, Xianrui Zhong, Sizhe Zhou, Jiawei Han

TL;DR
DynamicRAG introduces a reinforcement learning-based reranker that adaptively selects and orders retrieved documents for RAG systems, significantly improving performance on knowledge-intensive tasks by leveraging LLM feedback.
Contribution
It presents a novel dynamic reranking framework that adjusts retrieval based on query context, using RL and LLM response quality as supervisory signals.
Findings
Achieves state-of-the-art results on seven datasets
Demonstrates the effectiveness of adaptive reranking
Outperforms models with similar parameter sizes
Abstract
Retrieval-augmented generation (RAG) systems combine large language models (LLMs) with external knowledge retrieval, making them highly effective for knowledge-intensive tasks. A crucial but often under-explored component of these systems is the reranker. Since irrelevant documents in RAG systems can mislead the generator, the reranker plays a vital role in refining retrieved documents to enhance generation quality and explainability. However, it is challenging to determine the appropriate number of documents () that the reranker should select: too few may result in missing critical information, while too many introduce noise and inefficiencies. Although recent studies have explored LLM-based rerankers, they primarily leverage internal model knowledge and overlook the rich supervisory signals that LLMs can provide, such as using response quality as feedback for optimizing reranking…
Peer Reviews
Decision·NeurIPS 2025 poster
**Strengths:** DynamicRAG adapts the passage budget *k* to the difficulty of each query, trimming irrelevant documents and spotlighting the most salient evidence. Its reinforcement-learning training optimises directly for answer quality, making the system more robust than fixed-threshold or static-score methods and less sensitive to hyperparameter tuning. **Weaknesses:** DynamicRAG’s reward is computed directly from the generator’s output, so the reward distribution itself shifts whenever the g
Strength: 1. Clear motivation: The authors highlight that relying on a fixed number of retrieved documents (k) inherently fails to balance the trade-off between information loss (when k is too small) and noise introduction (when k is too large). This is a valuable and often overlooked insight in existing research. 2. Targeted method design: The paper proposes a dynamic reranking mechanism built on a reinforcement learning framework, allowing the reranker to adaptively adjust both the number and
### Strengths - The two-stage training idea is straightforward to improve RAG systems with irrelevant noise. The ablation studies verify the effectiveness of each component in DynamicRAG - Leveraging LLMs' feedback as a supervision for reranking is interesting, and the use of DPO makes sense. - The empirical results look good, with many popular datasets being included. ### Weaknesses - Relaying on relatively small LLMs (LLaMA2-7B, 13B, LLaMA3-8B) for evaluation limits generality to larger, more
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Dropout · Layer Normalization · Byte Pair Encoding · Attention Dropout · Softmax · Residual Connection · WordPiece
