Advancing Multi-Agent RAG Systems with Minimalist Reinforcement Learning

Yihong Wu; Liheng Ma; Muzhi Li; Jiaming Zhou; Lei Ding; Jianye Hao; Ho-fung Leung; Irwin King; Yingxue Zhang; Jian-Yun Nie

arXiv:2505.17086·cs.CL·April 15, 2026

Advancing Multi-Agent RAG Systems with Minimalist Reinforcement Learning

Yihong Wu, Liheng Ma, Muzhi Li, Jiaming Zhou, Lei Ding, Jianye Hao, Ho-fung Leung, Irwin King, Yingxue Zhang, Jian-Yun Nie

PDF

TL;DR

This paper introduces Mujica-MyGo, a framework combining multi-agent RAG workflows and a lightweight reinforcement learning algorithm to improve multi-turn reasoning in LLMs, addressing context-length limitations.

Contribution

It proposes Mujica-MyGo, a novel divide-and-conquer multi-agent RAG system with a minimalist reinforcement learning method, enhancing efficiency and performance in complex reasoning tasks.

Findings

01

Mujica-MyGo outperforms existing methods on diverse QA benchmarks.

02

The MyGO algorithm converges to the optimal policy with theoretical guarantees.

03

Empirical results show improved reasoning with reduced context length.

Abstract

Large Language Models (LLMs) equipped with modern Retrieval-Augmented Generation (RAG) systems often employ multi-turn interaction pipelines to interface with search engines for complex reasoning tasks. However, such multi-turn interactions inevitably produce long intermediate contexts, as context length grows exponentially with exploration depth. This leads to a well-known limitation of LLMs: their difficulty in effectively leveraging information from long contexts. This problem is further amplified in RAG systems that depend on in-context learning, where few-shot demonstrations must also be included in the prompt, compounding the context-length bottleneck. To address these challenges, we propose Mujica-MyGo, a unified framework for efficient multi-turn reasoning in RAG. Inspired by the divide-and-conquer principle, we introduce Mujica (Multi-hop Joint Intelligence for Complex Question…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.