TL;DR
This paper analyzes non-monotonic value decomposition in multi-agent reinforcement learning, showing that it can reliably recover optimal solutions and outperform monotonic methods through a dynamical systems perspective.
Contribution
It introduces a dynamical systems analysis of non-monotonic value decomposition, proving stability of IGM-consistent solutions and demonstrating empirical advantages over monotonic approaches.
Findings
Non-monotonic factorization reliably recovers IGM-optimal solutions.
Unconstrained, non-monotonic methods outperform monotonic baselines.
Stability analysis links learning dynamics to solution optimality.
Abstract
Value decomposition is a central approach in multi-agent reinforcement learning (MARL), enabling centralized training with decentralized execution by factorizing the global value function into local values. To ensure individual-global-max (IGM) consistency, existing methods either enforce monotonicity constraints, which limit expressive power, or adopt softer surrogates at the cost of algorithmic complexity. In this work, we present a dynamical systems analysis of non-monotonic value decomposition, modeling learning dynamics as continuous-time gradient flow. We prove that, under approximately greedy exploration, all zero-loss equilibria violating IGM consistency are unstable saddle points, while only IGM-consistent solutions are stable attractors of the learning dynamics. Extensive experiments on both synthetic matrix games and challenging MARL benchmarks demonstrate that unconstrained,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
