Minimax-Optimal Multi-Agent Robust Reinforcement Learning

Yuchen Jiao; Gen Li

arXiv:2412.19873·cs.LG·December 31, 2024

Minimax-Optimal Multi-Agent Robust Reinforcement Learning

Yuchen Jiao, Gen Li

PDF

Open Access

TL;DR

This paper introduces a minimax-optimal algorithm for multi-agent robust reinforcement learning in finite-horizon Markov games, achieving near-optimal sample complexity for approximating equilibrium solutions under uncertainty.

Contribution

It extends the Q-FTRL algorithm to finite-horizon multi-agent settings with uncertainty, providing tight sample complexity bounds and achieving equilibrium with provable optimality.

Findings

01

Achieves minimax-optimal sample complexity for robust equilibrium

02

Extends Q-FTRL algorithm to multi-agent finite-horizon RMGs

03

Proves optimality via information-theoretic lower bounds

Abstract

Multi-agent robust reinforcement learning, also known as multi-player robust Markov games (RMGs), is a crucial framework for modeling competitive interactions under environmental uncertainties, with wide applications in multi-agent systems. However, existing results on sample complexity in RMGs suffer from at least one of three obstacles: restrictive range of uncertainty level or accuracy, the curse of multiple agents, and the barrier of long horizons, all of which cause existing results to significantly exceed the information-theoretic lower bound. To close this gap, we extend the Q-FTRL algorithm \citep{li2022minimax} to the RMGs in finite-horizon setting, assuming access to a generative model. We prove that the proposed algorithm achieves an $ε$ -robust coarse correlated equilibrium (CCE) with a sample complexity (up to log factors) of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics