Variance-Reduced Cascade Q-learning: Algorithms and Sample Complexity
Mohammad Boveiri, Peyman Mohajerin Esfahani

TL;DR
This paper introduces Variance-Reduced Cascade Q-learning (VRCQ), a novel algorithm that achieves minimax optimal sample complexity for estimating optimal Q-functions in discounted MDPs, with strong theoretical guarantees and practical efficiency.
Contribution
The paper proposes VRCQ, combining variance reduction techniques to improve sample complexity and guarantee optimality in model-free Q-learning for discounted MDPs.
Findings
VRCQ achieves minimax optimal sample complexity.
VRCQ outperforms existing algorithms in $oldsymbol{ ext{l}_ extbf{ extit{infty}}}$-norm guarantees.
Numerical experiments validate theoretical results.
Abstract
We study the problem of estimating the optimal Q-function of -discounted Markov decision processes (MDPs) under the synchronous setting, where independent samples for all state-action pairs are drawn from a generative model at each iteration. We introduce and analyze a novel model-free algorithm called Variance-Reduced Cascade Q-learning (VRCQ). VRCQ comprises two key building blocks: (i) the established direct variance reduction technique and (ii) our proposed variance reduction scheme, Cascade Q-learning. By leveraging these techniques, VRCQ provides superior guarantees in the -norm compared with the existing model-free stochastic approximation-type algorithms. Specifically, we demonstrate that VRCQ is minimax optimal. Additionally, when the action set is a singleton (so that the Q-learning problem reduces to policy evaluation), it achieves non-asymptotic instance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Data Stream Mining Techniques · Anomaly Detection Techniques and Applications
MethodsSparse Evolutionary Training · Q-Learning
