MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism
Shulin Liu, Dong Du, Tao Yang, Yang Li, Boyu Qiu

TL;DR
MarsRL is a reinforcement learning framework that enhances multi-agent reasoning systems by jointly optimizing agents with pipeline parallelism, significantly improving accuracy on reasoning benchmarks.
Contribution
Introduces MarsRL, a novel RL framework with agent-specific rewards and pipeline training to improve multi-agent reasoning in open-source models.
Findings
Improves AIME2025 accuracy from 86.5% to 93.3%.
Enhances BeyondAIME accuracy from 64.9% to 73.8%.
Surpasses larger models in reasoning tasks.
Abstract
Recent progress in large language models (LLMs) has been propelled by reinforcement learning with verifiable rewards (RLVR) and test-time scaling. However, the limited output length of LLMs constrains the depth of reasoning attainable in a single inference process. Multi-agent reasoning systems offer a promising alternative by employing multiple agents including Solver, Verifier, and Corrector, to iteratively refine solutions. While effective in closed-source models like Gemini 2.5 Pro, they struggle to generalize to open-source models due to insufficient critic and correction capabilities. To address this, we propose MarsRL, a novel reinforcement learning framework with agentic pipeline parallelism, designed to jointly optimize all agents in the system. MarsRL introduces agent-specific reward mechanisms to mitigate reward noise and employs pipeline-inspired training to enhance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Topic Modeling
