Robust Multi-Agent Path Finding under Observation Attacks: A Principled Adversarial-Plus-Smoothing Training Recipe
Riad Ahmed

TL;DR
This paper introduces two training methods, Adv-PPO and Adv-PPO+MACER, that significantly improve the robustness of multi-agent pathfinding policies against observation perturbations while maintaining similar deployment complexity.
Contribution
The paper proposes novel adversarial training recipes that enhance the robustness of decentralized multi-agent pathfinding policies without changing the network or deployment process.
Findings
Adv-PPO improves worst-case success rate from 2.5% to 59.2%.
Adv-PPO+MACER achieves up to 77.5% success under attack.
Both methods maintain low impact on clean performance.
Abstract
Decentralized multi-agent path finding (MAPF) routes a team of agents on a shared grid, each acting from its own local view. The standard solution trains one shared neural policy with Proximal Policy Optimization (PPO), a popular on-policy reinforcement learning algorithm. Such a policy works well on clean observations, but a small input perturbation on one agent often changes its action, which then blocks a neighbour, and the team jams. In this paper we present two training recipes that keep the same network and the same deployment loop, yet make the policy hold up under perturbed observations. The first recipe, Adv-PPO, trains the shared policy against worst-case perturbations of its own input and selects the checkpoint by performance under adversarial perturbation. The second recipe, Adv-PPO+MACER, fine-tunes that checkpoint with a small on-policy smoothness term whose gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
