Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction
Gen Li, Yuting Wei, Yuejie Chi, Yuantao Gu, Yuxin Chen

TL;DR
This paper provides a sharper analysis of asynchronous Q-learning's sample complexity, showing improved bounds and variance reduction techniques for learning optimal Q-functions in Markov decision processes from Markovian samples.
Contribution
It offers a novel, tighter sample complexity bound for asynchronous Q-learning and introduces variance reduction to improve the effective horizon scaling.
Findings
Sample complexity bound is improved by a factor of at least |S||A| over previous results.
The bound accounts for the mixing time and minimum state-action occupancy probability.
Variance reduction techniques improve the scaling with respect to the effective horizon.
Abstract
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on a single trajectory of Markovian samples induced by a behavior policy. Focusing on a -discounted MDP with state space and action space , we demonstrate that the -based sample complexity of classical asynchronous Q-learning --- namely, the number of samples needed to yield an entrywise -accurate estimate of the Q-function --- is at most on the order of up to some logarithmic factor, provided that a proper constant learning rate is adopted. Here, and denote respectively the mixing time and the minimum state-action occupancy probability of the sample trajectory. The first term of this bound…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAge of Information Optimization · Machine Learning and Algorithms · Bayesian Modeling and Causal Inference
MethodsQ-Learning
