TL;DR
MARS$^2$ introduces a multi-agent reinforcement learning framework that enhances code generation by enabling collaborative search within a shared tree-structured environment, improving performance over existing methods.
Contribution
It presents a novel multi-agent RL approach that models search as a learnable environment, facilitating collaboration and structured exploration for better code generation results.
Findings
Consistently improves performance across code generation benchmarks.
Effectively leverages multi-agent collaboration within tree search.
Demonstrates robustness across diverse models and training settings.
Abstract
Reinforcement learning (RL) paradigms have demonstrated strong performance on reasoning-intensive tasks such as code generation. However, limited trajectory diversity often leads to diminishing returns, which constrains the achievable performance ceiling. Search-enhanced RL alleviates this issue by introducing structured exploration, which remains constrained by the single-agent policy priors. Meanwhile, leveraging multiple interacting policies can acquire more diverse exploratory signals, but existing approaches are typically decoupled from structured search. We propose \textbf{MARS} (Multi-Agent Reinforced Tree-Search Scaling), a unified RL framework in which multiple independently-optimized agents collaborate within a shared tree-structured search environment. MARS models the search tree as a learnable multi-agent interaction environment, enabling heterogeneous agents to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
