Gaussian Approximation for Asynchronous Q-learning

Artemy Rubtsov; Sergey Samsonov; Vladimir Ulyanov; Alexey Naumov

arXiv:2604.07323·stat.ML·April 9, 2026

Gaussian Approximation for Asynchronous Q-learning

Artemy Rubtsov, Sergey Samsonov, Vladimir Ulyanov, Alexey Naumov

PDF

TL;DR

This paper establishes convergence rates and a high-dimensional CLT for asynchronous Q-learning with polynomial stepsizes, under geometric ergodicity assumptions, contributing new theoretical insights into its statistical properties.

Contribution

It derives convergence rates and a high-dimensional CLT for asynchronous Q-learning, including bounds for moments of the last iterate, under geometric ergodicity.

Findings

01

Convergence rate up to n^{-1/6} log^{4}(n S A) for the algorithm.

02

High-dimensional CLT for sums of martingale differences.

03

Bounds for high-order moments of the last iterate.

Abstract

In this paper, we derive rates of convergence in the high-dimensional central limit theorem for Polyak-Ruppert averaged iterates generated by the asynchronous Q-learning algorithm with a polynomial stepsize $k^{- ω}, ω \in (1/2, 1]$ . Assuming that the sequence of state-action-next-state triples $(s_{k}, a_{k}, s_{k + 1})_{k \geq 0}$ forms a uniformly geometrically ergodic Markov chain, we establish a rate of order up to $n^{- 1/6} lo g^{4} (n S A)$ over the class of hyper-rectangles, where $n$ is the number of samples used by the algorithm and $S$ and $A$ denote the numbers of states and actions, respectively. To obtain this result, we prove a high-dimensional central limit theorem for sums of martingale differences, which may be of independent interest. Finally, we present bounds for high-order moments for the algorithm's last iterate.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.