On Gaussian approximation for entropy-regularized Q-learning with function approximation
Artemy Rubtsov, Rahul Singh, Eric Moulines, Alexey Naumov, and Sergey Samsonov

TL;DR
This paper establishes a Gaussian approximation rate for entropy-regularized Q-learning with function approximation in high dimensions, providing theoretical convergence guarantees under certain conditions.
Contribution
It derives the first high-dimensional Gaussian approximation bounds for entropy-regularized Q-learning with linear function approximation.
Findings
Gaussian approximation bound with rate n^{-1/4} up to polylog factors
High-order moment bounds for the last iterate of the algorithm
Convergence analysis under geometric ergodicity and regularity conditions
Abstract
In this paper, we derive rates of convergence in the high-dimensional central limit theorem for Polyak--Ruppert averaged iterates generated by entropy-regularized asynchronous Q-learning with linear function approximation and a polynomial stepsize , . Assuming that the sequence of observed triples forms a uniformly geometrically ergodic Markov chain, and under suitable regularity conditions for the projected soft Bellman equation, we establish a Gaussian approximation bound in the convex distance with rate of order , up to polylogarithmic factors in , where is the number of samples used by the algorithm. To obtain this result, we combine a linearization of the soft Bellman recursion with a Gaussian approximation for the leading martingale term. Finally, we derive high-order moment bounds for the algorithm's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
