SmartRAG: Jointly Learn RAG-Related Tasks From the Environment Feedback
Jingsheng Gao, Linxu Li, Weiyuan Li, Yuzhuo Fu, Bin Dai

TL;DR
SmartRAG introduces a jointly optimized pipeline for RAG systems, integrating modules like retrieval, query rewriting, and answer generation through reinforcement learning to enhance overall performance and efficiency.
Contribution
This work presents a novel joint optimization framework for RAG modules, improving their coordination and performance compared to traditional separately trained systems.
Findings
Jointly optimized SmartRAG outperforms separately trained systems.
Reinforcement learning effectively coordinates modules for better results.
System reduces retrieval costs while maintaining high accuracy.
Abstract
RAG systems consist of multiple modules to work together. However, these modules are usually separately trained. We argue that a system like RAG that incorporates multiple modules should be jointly optimized to achieve optimal performance. To demonstrate this, we design a specific pipeline called \textbf{SmartRAG} that includes a policy network and a retriever. The policy network can serve as 1) a decision maker that decides when to retrieve, 2) a query rewriter to generate a query most suited to the retriever, and 3) an answer generator that produces the final response with/without the observations. We then propose to jointly optimize the whole system using a reinforcement learning algorithm, with the reward designed to encourage the system to achieve the best performance with minimal retrieval cost. When jointly optimized, all the modules can be aware of how other modules are working…
Peer Reviews
Decision·ICLR 2025 Poster
a) The paper introduces reinforcement learning to jointly optimize RAG components, integrating retrieval decisions, query rewriting, and answer generation as policy-driven actions, representing progress in RAG architectures. b) The introduction of reinforcement learning further demonstrates the flexibility of RAG and explores the potential of integrating RAG with other AI technologies. c) Evaluation across several datasets demonstrates the effectiveness of the approach.
a) There exists many works focused on the question of "When to retrieval" in RAG, but in Section 4.1, only one baseline was selected for comparison. Incorporating a set of baseline examples into these studies would enhance comparative and analytical insights. b) The current baselines primarily apply process-wide optimizations, such as fine-tuning the generator, without optimizing individual modules (e.g., rewriter, generator, and decision-maker) separately. This setup limits the ability to dem
1. The main contribution of this work is the end to end trining of the RAG framework. The authors do a good job in analyzing the proposed framework (Section 4.1-4.4) 2. The paper is generally well written.
The proposed method is not generalizable. 1. The proposed method is very domain specific and requires the training of the entire module for each dataset. 2. The generalizability of the approach is further curtailed by the availability of datasets with true final answers. (Which is needed for SFT and PPO) 3. The end to end framework requires complete retraining if any part of the framework is changed.
1. The proposed SmartRAG is clear and easy to understand. Some of the designs are intuitive. 2. The paper writing is relatively smooth and easy to follow.
1. Part of the setups of SmartRAG are not well-justified. Please take a look at the questions. 2. Certain experimental observations are not elaborated clearly. Please take a look at the questions. 3. The RAG baselines seem not extensive, compared with references on this topic.
Videos
Taxonomy
TopicsInertial Sensor and Navigation · Context-Aware Activity Recognition Systems · Robotics and Automated Systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dropout · Dense Connections · Weight Decay · Byte Pair Encoding · BART · Layer Normalization · Residual Connection
