Territory Paint Wars: Diagnosing and Mitigating Failure Modes in Competitive Multi-Agent PPO
Diyansha Singh

TL;DR
This paper introduces Territory Paint Wars, a Unity-based environment to study failure modes in competitive multi-agent PPO, revealing issues like reward imbalance and overfitting, and proposes opponent mixing as a mitigation strategy.
Contribution
It systematically diagnoses failure modes in competitive MARL, uncovers competitive overfitting, and offers a simple intervention to improve generalization, all within a new open-source benchmark.
Findings
Identified five implementation failure modes affecting PPO in competitive settings.
Discovered competitive overfitting causes generalization collapse despite stable self-play.
Opponent mixing mitigates overfitting and restores high win rates.
Abstract
We present Territory Paint Wars, a minimal competitive multi-agent reinforcement learning environment implemented in Unity, and use it to systematically investigate failure modes of Proximal Policy Optimisation (PPO) under self-play. A first agent trained for episodes achieves only win rate against a uniformly-random opponent in a symmetric zero-sum game. Through controlled ablations we identify five implementation-level failure modes -- reward-scale imbalance, missing terminal signal, ineffective long-horizon credit assignment, unnormalised observations, and incorrect win detection -- each of which contributes critically to this failure in this setting. After correcting these issues, we uncover a distinct emergent pathology: competitive overfitting, where co-adapting agents maintain stable self-play performance while generalisation win rate collapses from …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
