MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism

Shulin Liu; Dong Du; Tao Yang; Yang Li; Boyu Qiu

arXiv:2511.11373·cs.AI·November 17, 2025

MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism

Shulin Liu, Dong Du, Tao Yang, Yang Li, Boyu Qiu

PDF

Open Access 1 Models

TL;DR

MarsRL is a reinforcement learning framework that enhances multi-agent reasoning systems by jointly optimizing agents with pipeline parallelism, significantly improving accuracy on reasoning benchmarks.

Contribution

Introduces MarsRL, a novel RL framework with agent-specific rewards and pipeline training to improve multi-agent reasoning in open-source models.

Findings

01

Improves AIME2025 accuracy from 86.5% to 93.3%.

02

Enhances BeyondAIME accuracy from 64.9% to 73.8%.

03

Surpasses larger models in reasoning tasks.

Abstract

Recent progress in large language models (LLMs) has been propelled by reinforcement learning with verifiable rewards (RLVR) and test-time scaling. However, the limited output length of LLMs constrains the depth of reasoning attainable in a single inference process. Multi-agent reasoning systems offer a promising alternative by employing multiple agents including Solver, Verifier, and Corrector, to iteratively refine solutions. While effective in closed-source models like Gemini 2.5 Pro, they struggle to generalize to open-source models due to insufficient critic and correction capabilities. To address this, we propose MarsRL, a novel reinforcement learning framework with agentic pipeline parallelism, designed to jointly optimize all agents in the system. MarsRL introduces agent-specific reward mechanisms to mitigate reward noise and employs pipeline-inspired training to enhance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
forestliutc/MarsRL
model· 6 dl· ♡ 5
6 dl♡ 5

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Topic Modeling