$O(T^{-1})$ Convergence of Optimistic-Follow-the-Regularized-Leader in   Two-Player Zero-Sum Markov Games

Yuepeng Yang; Cong Ma

arXiv:2209.12430·cs.LG·February 10, 2023·1 cites

$O(T^{-1})$ Convergence of Optimistic-Follow-the-Regularized-Leader in Two-Player Zero-Sum Markov Games

Yuepeng Yang, Cong Ma

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that the optimistic-follow-the-regularized-leader algorithm with smooth value updates achieves an optimal $O(T^{-1})$ convergence rate in two-player zero-sum Markov games, improving previous bounds.

Contribution

The paper introduces a refined analysis of OFTRL in Markov games, establishing an optimal convergence rate of $O(T^{-1})$, surpassing prior results.

Findings

01

Achieves $O(T^{-1})$ convergence rate in Markov games.

02

Key property: sum of regrets is approximately non-negative.

03

Tighter algebraic inequality reduces the $ ext{log} T$ factor.

Abstract

We prove that optimistic-follow-the-regularized-leader (OFTRL), together with smooth value updates, finds an $O (T^{- 1})$ -approximate Nash equilibrium in $T$ iterations for two-player zero-sum Markov games with full information. This improves the $\tilde{O} (T^{- 5/6})$ convergence rate recently shown in the paper Zhang et al (2022). The refined analysis hinges on two essential ingredients. First, the sum of the regrets of the two players, though not necessarily non-negative as in normal-form games, is approximately non-negative in Markov games. This property allows us to bound the second-order path lengths of the learning dynamics. Second, we prove a tighter algebraic inequality regarding the weights deployed by OFTRL that shaves an extra $lo g T$ factor. This crucial improvement enables the inductive analysis that leads to the final $O (T^{- 1})$ rate.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

$O(T^{-1})$ Convergence of Optimistic-Follow-the-Regularized-Leader in Two-Player Zero-Sum Markov Games· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Markov Chains and Monte Carlo Methods · Reinforcement Learning in Robotics