Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and   Variance Reduction

Gen Li; Yuting Wei; Yuejie Chi; Yuantao Gu; Yuxin Chen

arXiv:2006.03041·cs.LG·September 13, 2022·5 cites

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction

Gen Li, Yuting Wei, Yuejie Chi, Yuantao Gu, Yuxin Chen

PDF

Open Access 1 Video

TL;DR

This paper provides a sharper analysis of asynchronous Q-learning's sample complexity, showing improved bounds and variance reduction techniques for learning optimal Q-functions in Markov decision processes from Markovian samples.

Contribution

It offers a novel, tighter sample complexity bound for asynchronous Q-learning and introduces variance reduction to improve the effective horizon scaling.

Findings

01

Sample complexity bound is improved by a factor of at least |S||A| over previous results.

02

The bound accounts for the mixing time and minimum state-action occupancy probability.

03

Variance reduction techniques improve the scaling with respect to the effective horizon.

Abstract

Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on a single trajectory of Markovian samples induced by a behavior policy. Focusing on a $γ$ -discounted MDP with state space $S$ and action space $A$ , we demonstrate that the $ℓ_{\infty}$ -based sample complexity of classical asynchronous Q-learning --- namely, the number of samples needed to yield an entrywise $ε$ -accurate estimate of the Q-function --- is at most on the order of $\frac{1}{μ _{m i n} ( 1 - γ ) ^{5} ε ^{2}} + \frac{t _{mi x}}{μ _{m i n} ( 1 - γ )}$ up to some logarithmic factor, provided that a proper constant learning rate is adopted. Here, $t_{mi x}$ and $μ_{m i n}$ denote respectively the mixing time and the minimum state-action occupancy probability of the sample trajectory. The first term of this bound…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction· slideslive

Taxonomy

TopicsAge of Information Optimization · Machine Learning and Algorithms · Bayesian Modeling and Causal Inference

MethodsQ-Learning