Linear $Q$-Learning Does Not Diverge in $L^2$: Convergence Rates to a Bounded Set
Xinyu Liu, Zixuan Xie, Shangtong Zhang

TL;DR
This paper proves the first $L^2$ convergence rate for linear $Q$-learning algorithms under realistic conditions, confirming their stability and efficiency without requiring modifications or restrictive assumptions.
Contribution
It establishes the first $L^2$ convergence rate for linear $Q$-learning without modifications, Bellman completeness, or near-optimality assumptions, using an $psilon$-softmax policy.
Findings
Linear $Q$-learning converges in $L^2$ to a bounded set.
The convergence rate is established under Markovian noise with fast-changing transitions.
The analysis applies to tabular $Q$-learning with a novel pseudo-contraction property.
Abstract
-learning is one of the most fundamental reinforcement learning algorithms. It is widely believed that -learning with linear function approximation (i.e., linear -learning) suffers from possible divergence until the recent work Meyn (2024) which establishes the ultimate almost sure boundedness of the iterates of linear -learning. Building on this success, this paper further establishes the first convergence rate of linear -learning iterates (to a bounded set). Similar to Meyn (2024), we do not make any modification to the original linear -learning algorithm, do not make any Bellman completeness assumption, and do not make any near-optimality assumption on the behavior policy. All we need is an -softmax behavior policy with an adaptive temperature. The key to our analysis is the general result of stochastic approximations under Markovian noise with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques
