Multi-agent Reach-avoid MDP via Potential Games and Low-rank Policy Structure
Adam Casselman, Abraham P. Vinod, Sarah H.Q. Li

TL;DR
This paper introduces a novel approach for multi-agent reach-avoid MDPs using local feedback policies, potential game structure, and low-rank policy representations to reduce complexity while maintaining near-optimal performance.
Contribution
It demonstrates that local feedback policies form rank-one factorizations of global policies and leverages potential game theory for efficient multi-agent learning.
Findings
Significant reduction in memory and computation complexity.
Local feedback policies approximate global optimality effectively.
Guaranteed convergence of iterative best response to Nash equilibrium.
Abstract
We optimize finite horizon multi-agent reach-avoid Markov decision process (MDP) via \emph{local feedback policies}. The global feedback policy solution yields global optimality but its communication complexity, memory usage and computation complexity scale exponentially with the number of agents. We mitigate this exponential dependency by restricting the solution space to local feedback policies and show that local feedback policies are rank-one factorizations of global feedback policies, which provides a principled approach to reducing communication complexity and memory usage. Additionally, by demonstrating that multi-agent reach-avoid MDPs over local feedback policies has a potential game structure, we show that iterative best response is a tractable multi-agent learning scheme with guaranteed convergence to deterministic Nash equilibrium, and derive each agent's best response via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
