Stability of Q-Learning Through Design and Optimism
Sean Meyn

TL;DR
This paper explores the stability of Q-learning algorithms, introduces new methods for ensuring stability and accelerated convergence, and presents the Zap Zero algorithm as a general stochastic approximation approach applicable to various settings.
Contribution
It demonstrates stability of Q-learning with linear function approximation using optimistic training and introduces the Zap Zero algorithm for stable, matrix-free Newton-Raphson flow approximation.
Findings
Stability of Q-learning with linear approximation established under optimistic training.
The Zap Zero algorithm is stable and convergent under mild assumptions.
Applicable to Q-learning with non-linear function approximation and oblivious training.
Abstract
Q-learning has become an important part of the reinforcement learning toolkit since its introduction in the dissertation of Chris Watkins in the 1980s. The purpose of this paper is in part a tutorial on stochastic approximation and Q-learning, providing details regarding the INFORMS APS inaugural Applied Probability Trust Plenary Lecture, presented in Nancy France, June 2023. The paper also presents new approaches to ensure stability and potentially accelerated convergence for these algorithms, and stochastic approximation in other settings. Two contributions are entirely new: 1. Stability of Q-learning with linear function approximation has been an open topic for research for over three decades. It is shown that with appropriate optimistic training in the form of a modified Gibbs policy, there exists a solution to the projected Bellman equation, and the algorithm is stable (in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods · Stochastic processes and financial applications
MethodsQ-Learning
