Minimax Sample Complexity for Turn-based Stochastic Game
Qiwen Cui, Lin F. Yang

TL;DR
This paper proves that a natural reinforcement learning algorithm achieves minimax sample complexity for turn-based stochastic games, providing theoretical guarantees for learning near-optimal strategies.
Contribution
It establishes the first minimax sample complexity bounds for plug-in solvers in turn-based stochastic games, with novel techniques to handle complex dependencies.
Findings
Empirical Nash strategies approximate true Nash equilibria with bounded error.
Theoretical bounds are provided for sample complexity in TBSG.
New methods introduced for handling statistical dependence in complex games.
Abstract
The empirical success of Multi-agent reinforcement learning is encouraging, while few theoretical guarantees have been revealed. In this work, we prove that the plug-in solver approach, probably the most natural reinforcement learning algorithm, achieves minimax sample complexity for turn-based stochastic game (TBSG). Specifically, we plan in an empirical TBSG by utilizing a `simulator' that allows sampling from arbitrary state-action pair. We show that the empirical Nash equilibrium strategy is an approximate Nash equilibrium strategy in the true TBSG and give both problem-dependent and problem-independent bound. We develop absorbing TBSG and reward perturbation techniques to tackle the complex statistical dependence. The key idea is artificially introducing a suboptimality gap in TBSG and then the Nash equilibrium strategy lies in a finite set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Auction Theory and Applications
