MARTI-MARS$^2$: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation

Shijie Wang; Pengfei Li; Yikun Fu; Kaifeng Liu; Fangyuan Li; Yang Liu; Xiaowei Sun; Zonglin Li; Siyao Zhao; Jian Zhao; Kai Tian; Dong Li; Junqi Gao; Yutong Zhang; Yiqun Chen; Yuqiang Li; Zoe Li; Weinan Zhang; Peng Ye; Shuyue Hu; Lei Bai; Bowen Zhou; Kaiyan Zhang; and Biqing Qi

arXiv:2602.07848·cs.LG·February 10, 2026

MARTI-MARS$^2$: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation

Shijie Wang, Pengfei Li, Yikun Fu, Kaifeng Liu, Fangyuan Li, Yang Liu, Xiaowei Sun, Zonglin Li, Siyao Zhao, Jian Zhao, Kai Tian, Dong Li, Junqi Gao, Yutong Zhang, Yiqun Chen, Yuqiang Li, Zoe Li, Weinan Zhang, Peng Ye, Shuyue Hu, Lei Bai, Bowen Zhou, Kaiyan Zhang, and Biqing Qi

PDF

Open Access

TL;DR

This paper introduces MARTI-MARS2, a multi-agent reinforcement learning framework for code generation that leverages collaborative exploration and heterogeneous agent training to surpass single-agent performance limits, demonstrating significant improvements on benchmarks.

Contribution

It presents a novel multi-agent reinforcement learning framework with dynamic environment formulation and heterogeneous training, enabling scalable and diverse code generation capabilities.

Findings

01

Achieves 77.7% on code generation benchmarks with 32B models.

02

Demonstrates that multi-agent collaboration outperforms single-agent approaches.

03

Reveals a scaling law where increasing agent heterogeneity enhances performance.

Abstract

While the complex reasoning capability of Large Language Models (LLMs) has attracted significant attention, single-agent systems often encounter inherent performance ceilings in complex tasks such as code generation. Multi-agent collaboration offers a promising avenue to transcend these boundaries. However, existing frameworks typically rely on prompt-based test-time interactions or multi-role configurations trained with homogeneous parameters, limiting error correction capabilities and strategic diversity. In this paper, we propose a Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2), which integrates policy learning with multi-agent tree search by formulating the multi-agent collaborative exploration process as a dynamic and learnable environment. By allowing agents to iteratively explore and refine within the environment, the framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications