Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face   of Environmental Uncertainty

Laixi Shi; Eric Mazumdar; Yuejie Chi; Adam Wierman

arXiv:2404.18909·cs.LG·May 10, 2024

Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face of Environmental Uncertainty

Laixi Shi, Eric Mazumdar, Yuejie Chi, Adam Wierman

PDF

Open Access

TL;DR

This paper introduces a sample-efficient model-based algorithm for learning robust multi-agent policies in uncertain environments, ensuring near-optimal performance guarantees in distributionally robust Markov games.

Contribution

It proposes DRNVI, a novel algorithm with finite-sample guarantees for robust equilibrium strategies in multi-agent settings under environmental uncertainty.

Findings

01

DRNVI achieves near-optimal sample complexity.

02

The algorithm provides finite-sample guarantees.

03

An information-theoretic lower bound confirms the efficiency of DRNVI.

Abstract

To overcome the sim-to-real gap in reinforcement learning (RL), learned policies must maintain robustness against environmental uncertainties. While robust RL has been widely studied in single-agent regimes, in multi-agent environments, the problem remains understudied -- despite the fact that the problems posed by environmental uncertainties are often exacerbated by strategic interactions. This work focuses on learning in distributionally robust Markov games (RMGs), a robust variant of standard Markov games, wherein each agent aims to learn a policy that maximizes its own worst-case performance when the deployed environment deviates within its own prescribed uncertainty set. This results in a set of robust equilibrium strategies for all agents that align with classic notions of game-theoretic equilibria. Assuming a non-adaptive sampling mechanism from a generative model, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Fuzzy Logic and Control Systems · Data Stream Mining Techniques

MethodsSparse Evolutionary Training · ALIGN