Curriculum Guided Reinforcement Learning for Efficient Multi Hop Retrieval Augmented Generation
Yuelyu Ji, Rui Meng, Zhuochun Li, Daqing He

TL;DR
EVO-RAG introduces a curriculum-guided reinforcement learning framework that improves multi-hop retrieval-augmented generation by optimizing query rewriting and search strategies, resulting in higher accuracy and efficiency.
Contribution
The paper proposes EVO-RAG, a novel reinforcement learning approach with curriculum guidance and dynamic rewards for more effective multi-hop retrieval in language models.
Findings
Boosts Exact Match by up to 4.6 points on benchmarks.
Reduces average retrieval depth by 15%.
Enhances retrieval efficiency and answer accuracy.
Abstract
Retrieval-augmented generation (RAG) grounds large language models (LLMs) in up-to-date external evidence, yet existing multi-hop RAG pipelines still issue redundant subqueries, explore too shallowly, or wander through overly long search chains. We introduce EVO-RAG, a curriculum-guided reinforcement learning framework that evolves a query-rewriting agent from broad early-stage exploration to concise late-stage refinement. EVO-RAG couples a seven-factor, step-level reward vector (covering relevance, redundancy, efficiency, and answer correctness) with a time-varying scheduler that reweights these signals as the episode unfolds. The agent is trained with Direct Preference Optimization over a multi-head reward model, enabling it to learn when to search, backtrack, answer, or refuse. Across four multi-hop QA benchmarks (HotpotQA, 2WikiMultiHopQA, MuSiQue, and Bamboogle), EVO-RAG boosts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
