Feature-Based Q-Learning for Two-Player Stochastic Games

Zeyu Jia; Lin F. Yang; Mengdi Wang

arXiv:1906.00423·cs.LG·June 4, 2019·24 cites

Feature-Based Q-Learning for Two-Player Stochastic Games

Zeyu Jia, Lin F. Yang, Mengdi Wang

PDF

Open Access

TL;DR

This paper introduces a feature-based Q-learning algorithm for two-player stochastic games that efficiently approximates Nash equilibria with sample complexity independent of the game's original dimensions.

Contribution

It proposes a novel two-player Q-learning method leveraging feature embeddings and develops an accelerated version with proven sample efficiency guarantees.

Findings

01

The basic algorithm finds an $oldsymbol{ ext{ extit{ extepsilon}}}$-optimal strategy with samples linear in features.

02

The accelerated algorithm achieves $oldsymbol{ ext{ extit{ extepsilon}}}$-optimality with $ ilde{oldsymbol{ ext{O}}}(K/( ext{ extit{ extepsilon}}^{2}(1-oldsymbol{ ext{ extit{ extgamma}}})^{4}))$ samples.

03

Sample, time, and space complexities are independent of the original game dimensions.

Abstract

Consider a two-player zero-sum stochastic game where the transition function can be embedded in a given feature space. We propose a two-player Q-learning algorithm for approximating the Nash equilibrium strategy via sampling. The algorithm is shown to find an $ϵ$ -optimal strategy using sample size linear to the number of features. To further improve its sample efficiency, we develop an accelerated algorithm by adopting techniques such as variance reduction, monotonicity preservation and two-sided strategy approximation. We prove that the algorithm is guaranteed to find an $ϵ$ -optimal strategy using no more than $\tilde{O} (K / (ϵ^{2} (1 - γ)^{4}))$ samples with high probability, where $K$ is the number of features and $γ$ is a discount factor. The sample, time and space complexities of the algorithm are independent of original dimensions of the game.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adaptive Dynamic Programming Control

MethodsQ-Learning