A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Qinghua Liu; Tiancheng Yu; Yu Bai; Chi Jin

arXiv:2010.01604·cs.LG·February 9, 2021·6 cites

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play

Qinghua Liu, Tiancheng Yu, Yu Bai, Chi Jin

PDF

Open Access 1 Video

TL;DR

This paper provides a sharp analysis and improved sample complexity guarantees for model-based self-play algorithms in multi-agent Markov games, achieving near-optimal bounds and practical policy outputs.

Contribution

It introduces the Optimistic Nash Value Iteration algorithm with improved sample complexity for two-player zero-sum Markov games, matching theoretical lower bounds.

Findings

01

Achieves $ ilde{O}(H^3SAB/\epsilon^2)$ sample complexity

02

Improves over previous $ ilde{O}(H^4S^2AB/\epsilon^2)$ guarantees

03

First to match the information-theoretic lower bound up to a small factor

Abstract

Model-based algorithms -- algorithms that explore the environment through building and utilizing an estimated model -- are widely used in reinforcement learning practice and theoretically shown to achieve optimal sample efficiency for single-agent reinforcement learning in Markov Decision Processes (MDPs). However, for multi-agent reinforcement learning in Markov games, the current best known sample complexity for model-based algorithms is rather suboptimal and compares unfavorably against recent model-free approaches. In this paper, we present a sharp analysis of model-based self-play algorithms for multi-agent Markov games. We design an algorithm -- Optimistic Nash Value Iteration (Nash-VI) for two-player zero-sum Markov games that is able to output an $ϵ$ -approximate Nash policy in $\tilde{O} (H^{3} S A B / ϵ^{2})$ episodes of game playing, where $S$ is the number of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Sharp Analysis of Model-based Reinforcement Learning with Self-Play· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Scheduling and Optimization Algorithms