Constant Stepsize Q-learning: Distributional Convergence, Bias and   Extrapolation

Yixuan Zhang; Qiaomin Xie

arXiv:2401.13884·stat.ML·January 26, 2024·1 cites

Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation

Yixuan Zhang, Qiaomin Xie

PDF

Open Access

TL;DR

This paper analyzes constant stepsize Q-learning, proving its distributional convergence, asymptotic normality, and explicit bias expansion, and introduces a Richardson-Romberg extrapolation method to improve estimation accuracy.

Contribution

It provides the first distributional convergence proof for constant stepsize Q-learning and develops a bias correction technique using Richardson-Romberg extrapolation.

Findings

01

Proves distributional convergence of Q-learning iterates in Wasserstein distance.

02

Establishes asymptotic normality of averaged Q-learning iterates.

03

Derives an explicit formula for the bias of the averaged iterate.

Abstract

Stochastic Approximation (SA) is a widely used algorithmic approach in various fields, including optimization and reinforcement learning (RL). Among RL algorithms, Q-learning is particularly popular due to its empirical success. In this paper, we study asynchronous Q-learning with constant stepsize, which is commonly used in practice for its fast convergence. By connecting the constant stepsize Q-learning to a time-homogeneous Markov chain, we show the distributional convergence of the iterates in Wasserstein distance and establish its exponential convergence rate. We also establish a Central Limit Theory for Q-learning iterates, demonstrating the asymptotic normality of the averaged iterates. Moreover, we provide an explicit expansion of the asymptotic bias of the averaged iterate in stepsize. Specifically, the bias is proportional to the stepsize up to higher-order terms and we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Portfolio Optimization · Stochastic processes and financial applications

MethodsQ-Learning