Variance-Reduced Cascade Q-learning: Algorithms and Sample Complexity

Mohammad Boveiri; Peyman Mohajerin Esfahani

arXiv:2408.06544·stat.ML·May 27, 2025

Variance-Reduced Cascade Q-learning: Algorithms and Sample Complexity

Mohammad Boveiri, Peyman Mohajerin Esfahani

PDF

Open Access

TL;DR

This paper introduces Variance-Reduced Cascade Q-learning (VRCQ), a novel algorithm that achieves minimax optimal sample complexity for estimating optimal Q-functions in discounted MDPs, with strong theoretical guarantees and practical efficiency.

Contribution

The paper proposes VRCQ, combining variance reduction techniques to improve sample complexity and guarantee optimality in model-free Q-learning for discounted MDPs.

Findings

01

VRCQ achieves minimax optimal sample complexity.

02

VRCQ outperforms existing algorithms in $oldsymbol{ ext{l}_ extbf{ extit{infty}}}$-norm guarantees.

03

Numerical experiments validate theoretical results.

Abstract

We study the problem of estimating the optimal Q-function of $γ$ -discounted Markov decision processes (MDPs) under the synchronous setting, where independent samples for all state-action pairs are drawn from a generative model at each iteration. We introduce and analyze a novel model-free algorithm called Variance-Reduced Cascade Q-learning (VRCQ). VRCQ comprises two key building blocks: (i) the established direct variance reduction technique and (ii) our proposed variance reduction scheme, Cascade Q-learning. By leveraging these techniques, VRCQ provides superior guarantees in the $ℓ_{\infty}$ -norm compared with the existing model-free stochastic approximation-type algorithms. Specifically, we demonstrate that VRCQ is minimax optimal. Additionally, when the action set is a singleton (so that the Q-learning problem reduces to policy evaluation), it achieves non-asymptotic instance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Data Stream Mining Techniques · Anomaly Detection Techniques and Applications

MethodsSparse Evolutionary Training · Q-Learning