Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty
Laixi Shi, Eric Mazumdar, Yuejie Chi, Adam Wierman

TL;DR
This paper introduces a sample-efficient model-based algorithm for learning robust multi-agent policies in uncertain environments, ensuring near-optimal performance guarantees in distributionally robust Markov games.
Contribution
It proposes DRNVI, a novel algorithm with finite-sample guarantees for robust equilibrium strategies in multi-agent settings under environmental uncertainty.
Findings
DRNVI achieves near-optimal sample complexity.
The algorithm provides finite-sample guarantees.
An information-theoretic lower bound confirms the efficiency of DRNVI.
Abstract
To overcome the sim-to-real gap in reinforcement learning (RL), learned policies must maintain robustness against environmental uncertainties. While robust RL has been widely studied in single-agent regimes, in multi-agent environments, the problem remains understudied -- despite the fact that the problems posed by environmental uncertainties are often exacerbated by strategic interactions. This work focuses on learning in distributionally robust Markov games (RMGs), a robust variant of standard Markov games, wherein each agent aims to learn a policy that maximizes its own worst-case performance when the deployed environment deviates within its own prescribed uncertainty set. This results in a set of robust equilibrium strategies for all agents that align with classic notions of game-theoretic equilibria. Assuming a non-adaptive sampling mechanism from a generative model, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Fuzzy Logic and Control Systems · Data Stream Mining Techniques
MethodsSparse Evolutionary Training · ALIGN
