Online Learning for Uninformed Markov Games: Empirical Nash-Value Regret and Non-Stationarity Adaptation
Junyan Liu, Haipeng Luo, Zihan Zhang, Lillian J. Ratliff

TL;DR
This paper introduces a new regret measure and an adaptive, parameter-free algorithm for online learning in uninformed Markov games, effectively handling non-stationary opponents and unifying different regret regimes.
Contribution
It proposes empirical Nash-value regret and an adaptive algorithm that interpolates between external and Nash-value regret, addressing previous limitations in non-stationary settings.
Findings
Achieves regret bounds that adapt to opponent's non-stationarity
Recovers optimal regret rates for fixed and adversarial opponents
Provides a new analysis of epoch-based V-learning algorithm
Abstract
We study online learning in two-player uninformed Markov games, where the opponent's actions and policies are unobserved. In this setting, Tian et al. (2021) show that achieving no-external-regret is impossible without incurring an exponential dependence on the episode length . They then turn to the weaker notion of Nash-value regret and propose a V-learning algorithm with regret after episodes. However, their algorithm and guarantee do not adapt to the difficulty of the problem: even in the case where the opponent follows a fixed policy and thus external regret is well-known to be achievable, their result is still the worse rate on a weaker metric. In this work, we fully address both limitations. First, we introduce empirical Nash-value regret, a new regret notion that is strictly stronger than Nash-value regret and naturally reduces to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization
