Territory Paint Wars: Diagnosing and Mitigating Failure Modes in Competitive Multi-Agent PPO

Diyansha Singh

arXiv:2604.04983·cs.LG·April 8, 2026

Territory Paint Wars: Diagnosing and Mitigating Failure Modes in Competitive Multi-Agent PPO

Diyansha Singh

PDF

TL;DR

This paper introduces Territory Paint Wars, a Unity-based environment to study failure modes in competitive multi-agent PPO, revealing issues like reward imbalance and overfitting, and proposes opponent mixing as a mitigation strategy.

Contribution

It systematically diagnoses failure modes in competitive MARL, uncovers competitive overfitting, and offers a simple intervention to improve generalization, all within a new open-source benchmark.

Findings

01

Identified five implementation failure modes affecting PPO in competitive settings.

02

Discovered competitive overfitting causes generalization collapse despite stable self-play.

03

Opponent mixing mitigates overfitting and restores high win rates.

Abstract

We present Territory Paint Wars, a minimal competitive multi-agent reinforcement learning environment implemented in Unity, and use it to systematically investigate failure modes of Proximal Policy Optimisation (PPO) under self-play. A first agent trained for $84, 000$ episodes achieves only $26.8%$ win rate against a uniformly-random opponent in a symmetric zero-sum game. Through controlled ablations we identify five implementation-level failure modes -- reward-scale imbalance, missing terminal signal, ineffective long-horizon credit assignment, unnormalised observations, and incorrect win detection -- each of which contributes critically to this failure in this setting. After correcting these issues, we uncover a distinct emergent pathology: competitive overfitting, where co-adapting agents maintain stable self-play performance while generalisation win rate collapses from $73.5%$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.