Advancing Multi-Agent RAG Systems with Minimalist Reinforcement Learning
Yihong Wu, Liheng Ma, Muzhi Li, Jiaming Zhou, Lei Ding, Jianye Hao, Ho-fung Leung, Irwin King, Yingxue Zhang, Jian-Yun Nie

TL;DR
This paper introduces Mujica-MyGo, a framework combining multi-agent RAG workflows and a lightweight reinforcement learning algorithm to improve multi-turn reasoning in LLMs, addressing context-length limitations.
Contribution
It proposes Mujica-MyGo, a novel divide-and-conquer multi-agent RAG system with a minimalist reinforcement learning method, enhancing efficiency and performance in complex reasoning tasks.
Findings
Mujica-MyGo outperforms existing methods on diverse QA benchmarks.
The MyGO algorithm converges to the optimal policy with theoretical guarantees.
Empirical results show improved reasoning with reduced context length.
Abstract
Large Language Models (LLMs) equipped with modern Retrieval-Augmented Generation (RAG) systems often employ multi-turn interaction pipelines to interface with search engines for complex reasoning tasks. However, such multi-turn interactions inevitably produce long intermediate contexts, as context length grows exponentially with exploration depth. This leads to a well-known limitation of LLMs: their difficulty in effectively leveraging information from long contexts. This problem is further amplified in RAG systems that depend on in-context learning, where few-shot demonstrations must also be included in the prompt, compounding the context-length bottleneck. To address these challenges, we propose Mujica-MyGo, a unified framework for efficient multi-turn reasoning in RAG. Inspired by the divide-and-conquer principle, we introduce Mujica (Multi-hop Joint Intelligence for Complex Question…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
