Minimax Sample Complexity for Turn-based Stochastic Game

Qiwen Cui; Lin F. Yang

arXiv:2011.14267·cs.LG·December 1, 2020·5 cites

Minimax Sample Complexity for Turn-based Stochastic Game

Qiwen Cui, Lin F. Yang

PDF

Open Access

TL;DR

This paper proves that a natural reinforcement learning algorithm achieves minimax sample complexity for turn-based stochastic games, providing theoretical guarantees for learning near-optimal strategies.

Contribution

It establishes the first minimax sample complexity bounds for plug-in solvers in turn-based stochastic games, with novel techniques to handle complex dependencies.

Findings

01

Empirical Nash strategies approximate true Nash equilibria with bounded error.

02

Theoretical bounds are provided for sample complexity in TBSG.

03

New methods introduced for handling statistical dependence in complex games.

Abstract

The empirical success of Multi-agent reinforcement learning is encouraging, while few theoretical guarantees have been revealed. In this work, we prove that the plug-in solver approach, probably the most natural reinforcement learning algorithm, achieves minimax sample complexity for turn-based stochastic game (TBSG). Specifically, we plan in an empirical TBSG by utilizing a `simulator' that allows sampling from arbitrary state-action pair. We show that the empirical Nash equilibrium strategy is an approximate Nash equilibrium strategy in the true TBSG and give both problem-dependent and problem-independent bound. We develop absorbing TBSG and reward perturbation techniques to tackle the complex statistical dependence. The key idea is artificially introducing a suboptimality gap in TBSG and then the Nash equilibrium strategy lies in a finite set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Auction Theory and Applications