Learning in Markov Games with Adaptive Adversaries: Policy Regret, Fundamental Barriers, and Efficient Algorithms
Thanh Nguyen-Tang, Raman Arora

TL;DR
This paper investigates the challenges of learning in Markov games with adaptive opponents, introducing policy regret as a more suitable measure and proposing algorithms that achieve sublinear regret under certain conditions.
Contribution
It introduces the concept of consistent adaptive adversaries and develops algorithms that attain policy regret against such adversaries in Markov games.
Findings
Sample-efficient learning is impossible with unbounded memory or non-stationary adversaries.
Learning remains statistically hard with large strategy sets even for memory-bounded, stationary adversaries.
Algorithms are proposed that achieve policy regret against memory-bounded, stationary, and consistent adversaries.
Abstract
We study learning in a dynamically evolving environment modeled as a Markov game between a learner and a strategic opponent that can adapt to the learner's strategies. While most existing works in Markov games focus on external regret as the learning objective, external regret becomes inadequate when the adversaries are adaptive. In this work, we focus on \emph{policy regret} -- a counterfactual notion that aims to compete with the return that would have been attained if the learner had followed the best fixed sequence of policy, in hindsight. We show that if the opponent has unbounded memory or if it is non-stationary, then sample-efficient learning is not possible. For memory-bounded and stationary, we show that learning is still statistically hard if the set of feasible strategies for the learner is exponentially large. To guarantee learnability, we introduce a new notion of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Bayesian Modeling and Causal Inference · Data Stream Mining Techniques
MethodsSparse Evolutionary Training · Focus
