TL;DR
TMAS introduces a multi-agent framework for test-time scaling of large language models, enabling collaborative reasoning and improved performance through hierarchical memories and reinforcement learning.
Contribution
It proposes a novel multi-agent synergy approach with hierarchical memories and hybrid reward training to enhance test-time compute scaling.
Findings
TMAS outperforms existing test-time scaling methods on reasoning benchmarks.
Hierarchical memories improve information reuse and reasoning efficiency.
Hybrid reward reinforcement learning stabilizes and enhances scaling performance.
Abstract
Test-time scaling has become an effective paradigm for improving the reasoning ability of large language models by allocating additional computation during inference. Recent structured approaches have further advanced this paradigm by organizing inference across multiple trajectories, refinement rounds, and verification-based feedback. However, existing structured test-time scaling methods either weakly coordinate parallel reasoning trajectories or rely on noisy historical information without explicitly deciding what should be retained and reused, limiting their ability to balance exploration and exploitation. In this work, we propose TMAS, a framework for scaling test-time compute via multi-agent synergy. TMAS organizes inference as a collaborative process among specialized agents, enabling structured information flow across agents, trajectories, and refinement iterations. To support…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
