Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions
Shuang Qiu, Xiaohan Wei, Jieping Ye, Zhaoran Wang, Zhuoran Yang

TL;DR
This paper introduces and analyzes new fictitious play algorithms for zero-sum Markov games with structured unknown transitions, achieving near-optimal regret bounds in competitive settings.
Contribution
It proposes and theoretically analyzes novel fictitious play policy optimization algorithms for structured zero-sum Markov games, providing tight regret bounds.
Findings
Achieves $ ilde{O}( oot{K}rom{)}$ regret bounds for both transition structures.
Demonstrates algorithms' effectiveness in competitive, non-stationary environments.
Shows overall optimality gap of $ ilde{O}( oot{K}rom{)}$ when both players adopt the algorithms.
Abstract
While single-agent policy optimization in a fixed environment has attracted a lot of research attention recently in the reinforcement learning community, much less is known theoretically when there are multiple agents playing in a potentially competitive environment. We take steps forward by proposing and analyzing new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions. We consider two classes of transition structures: factored independent transition and single-controller transition. For both scenarios, we prove tight regret bounds after episodes in a two-agent competitive game scenario. The regret of each agent is measured against a potentially adversarial opponent who can choose a single best policy in hindsight after observing the full policy sequence. Our algorithms feature a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research
