Learning in Markov Games with Adaptive Adversaries: Policy Regret,   Fundamental Barriers, and Efficient Algorithms

Thanh Nguyen-Tang; Raman Arora

arXiv:2411.00707·cs.LG·December 11, 2024

Learning in Markov Games with Adaptive Adversaries: Policy Regret, Fundamental Barriers, and Efficient Algorithms

Thanh Nguyen-Tang, Raman Arora

PDF

Open Access

TL;DR

This paper investigates the challenges of learning in Markov games with adaptive opponents, introducing policy regret as a more suitable measure and proposing algorithms that achieve sublinear regret under certain conditions.

Contribution

It introduces the concept of consistent adaptive adversaries and develops algorithms that attain policy regret against such adversaries in Markov games.

Findings

01

Sample-efficient learning is impossible with unbounded memory or non-stationary adversaries.

02

Learning remains statistically hard with large strategy sets even for memory-bounded, stationary adversaries.

03

Algorithms are proposed that achieve policy regret against memory-bounded, stationary, and consistent adversaries.

Abstract

We study learning in a dynamically evolving environment modeled as a Markov game between a learner and a strategic opponent that can adapt to the learner's strategies. While most existing works in Markov games focus on external regret as the learning objective, external regret becomes inadequate when the adversaries are adaptive. In this work, we focus on \emph{policy regret} -- a counterfactual notion that aims to compete with the return that would have been attained if the learner had followed the best fixed sequence of policy, in hindsight. We show that if the opponent has unbounded memory or if it is non-stationary, then sample-efficient learning is not possible. For memory-bounded and stationary, we show that learning is still statistically hard if the set of feasible strategies for the learner is exponentially large. To guarantee learnability, we introduce a new notion of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Bayesian Modeling and Causal Inference · Data Stream Mining Techniques

MethodsSparse Evolutionary Training · Focus