Provably Efficient Fictitious Play Policy Optimization for Zero-Sum   Markov Games with Structured Transitions

Shuang Qiu; Xiaohan Wei; Jieping Ye; Zhaoran Wang; Zhuoran Yang

arXiv:2207.12463·cs.LG·July 27, 2022·1 cites

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions

Shuang Qiu, Xiaohan Wei, Jieping Ye, Zhaoran Wang, Zhuoran Yang

PDF

Open Access 1 Video

TL;DR

This paper introduces and analyzes new fictitious play algorithms for zero-sum Markov games with structured unknown transitions, achieving near-optimal regret bounds in competitive settings.

Contribution

It proposes and theoretically analyzes novel fictitious play policy optimization algorithms for structured zero-sum Markov games, providing tight regret bounds.

Findings

01

Achieves $ ilde{O}( oot{K}rom{)}$ regret bounds for both transition structures.

02

Demonstrates algorithms' effectiveness in competitive, non-stationary environments.

03

Shows overall optimality gap of $ ilde{O}( oot{K}rom{)}$ when both players adopt the algorithms.

Abstract

While single-agent policy optimization in a fixed environment has attracted a lot of research attention recently in the reinforcement learning community, much less is known theoretically when there are multiple agents playing in a potentially competitive environment. We take steps forward by proposing and analyzing new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions. We consider two classes of transition structures: factored independent transition and single-controller transition. For both scenarios, we prove tight $O (K)$ regret bounds after $K$ episodes in a two-agent competitive game scenario. The regret of each agent is measured against a potentially adversarial opponent who can choose a single best policy in hindsight after observing the full policy sequence. Our algorithms feature a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research