NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search
Sizhe Tang, Zuyuan Zhang, Mahdi Imani, Tian Lan

TL;DR
NonZero introduces an interaction-guided exploration method for multi-agent Monte Carlo Tree Search, improving efficiency by focusing on low-dimensional representations and coordination benefits.
Contribution
It proposes a novel surrogate-guided selection method with a bandit formulation, providing theoretical guarantees and empirical improvements in multi-agent MCTS.
Findings
NonZero improves sample efficiency on MatGame, SMAC, and SMACv2.
It achieves better final performance compared to strong baselines.
The method scales better by avoiding enumeration of the joint-action space.
Abstract
Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint actions, severely limiting exploration under realistic search budgets. We propose NonZero, which keeps multi-agent MCTS tractable by running surrogate-guided selection over a low-dimensional nonlinear representation using an interaction-guided proposal rule, instead of directly exploring the full joint-action space. Our exploration uses an interaction score: single-agent deviations are ranked by predicted gain, while two-agent deviations are scored by a mixed-difference measure that reveals coordination benefits even when no single agent can improve alone. We formalize candidate proposal as a bandit problem over local deviations and derive a proposal rule, NonZero, with a sublinear local-regret guarantee for reaching approximate graph-local…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
