Online Learning for Uninformed Markov Games: Empirical Nash-Value Regret and Non-Stationarity Adaptation

Junyan Liu; Haipeng Luo; Zihan Zhang; Lillian J. Ratliff

arXiv:2602.07205·cs.LG·February 10, 2026

Online Learning for Uninformed Markov Games: Empirical Nash-Value Regret and Non-Stationarity Adaptation

Junyan Liu, Haipeng Luo, Zihan Zhang, Lillian J. Ratliff

PDF

Open Access

TL;DR

This paper introduces a new regret measure and an adaptive, parameter-free algorithm for online learning in uninformed Markov games, effectively handling non-stationary opponents and unifying different regret regimes.

Contribution

It proposes empirical Nash-value regret and an adaptive algorithm that interpolates between external and Nash-value regret, addressing previous limitations in non-stationary settings.

Findings

01

Achieves regret bounds that adapt to opponent's non-stationarity

02

Recovers optimal regret rates for fixed and adversarial opponents

03

Provides a new analysis of epoch-based V-learning algorithm

Abstract

We study online learning in two-player uninformed Markov games, where the opponent's actions and policies are unobserved. In this setting, Tian et al. (2021) show that achieving no-external-regret is impossible without incurring an exponential dependence on the episode length $H$ . They then turn to the weaker notion of Nash-value regret and propose a V-learning algorithm with regret $O (K^{2/3})$ after $K$ episodes. However, their algorithm and guarantee do not adapt to the difficulty of the problem: even in the case where the opponent follows a fixed policy and thus $O (K)$ external regret is well-known to be achievable, their result is still the worse rate $O (K^{2/3})$ on a weaker metric. In this work, we fully address both limitations. First, we introduce empirical Nash-value regret, a new regret notion that is strictly stronger than Nash-value regret and naturally reduces to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Age of Information Optimization