Curriculum Guided Reinforcement Learning for Efficient Multi Hop Retrieval Augmented Generation

Yuelyu Ji; Rui Meng; Zhuochun Li; Daqing He

arXiv:2505.17391·cs.CL·May 26, 2025

Curriculum Guided Reinforcement Learning for Efficient Multi Hop Retrieval Augmented Generation

Yuelyu Ji, Rui Meng, Zhuochun Li, Daqing He

PDF

TL;DR

EVO-RAG introduces a curriculum-guided reinforcement learning framework that improves multi-hop retrieval-augmented generation by optimizing query rewriting and search strategies, resulting in higher accuracy and efficiency.

Contribution

The paper proposes EVO-RAG, a novel reinforcement learning approach with curriculum guidance and dynamic rewards for more effective multi-hop retrieval in language models.

Findings

01

Boosts Exact Match by up to 4.6 points on benchmarks.

02

Reduces average retrieval depth by 15%.

03

Enhances retrieval efficiency and answer accuracy.

Abstract

Retrieval-augmented generation (RAG) grounds large language models (LLMs) in up-to-date external evidence, yet existing multi-hop RAG pipelines still issue redundant subqueries, explore too shallowly, or wander through overly long search chains. We introduce EVO-RAG, a curriculum-guided reinforcement learning framework that evolves a query-rewriting agent from broad early-stage exploration to concise late-stage refinement. EVO-RAG couples a seven-factor, step-level reward vector (covering relevance, redundancy, efficiency, and answer correctness) with a time-varying scheduler that reweights these signals as the episode unfolds. The agent is trained with Direct Preference Optimization over a multi-head reward model, enabling it to learn when to search, backtrack, answer, or refuse. Across four multi-hop QA benchmarks (HotpotQA, 2WikiMultiHopQA, MuSiQue, and Bamboogle), EVO-RAG boosts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.