An Elementary Proof that Q-learning Converges Almost Surely
Matthew T. Regehr, Alex Ayoub

TL;DR
This paper provides a simple, self-contained proof that Q-learning converges almost surely, making the complex theoretical results more accessible to students and researchers by relying on minimal external results.
Contribution
It offers an elementary, complete proof of Q-learning convergence using only one external stochastic approximation result, simplifying understanding of the algorithm's theoretical foundations.
Findings
Q-learning converges almost surely under standard conditions
The proof is self-contained and accessible for learners
Minimal reliance on external complex theories
Abstract
Watkins' and Dayan's Q-learning is a model-free reinforcement learning algorithm that iteratively refines an estimate for the optimal action-value function of an MDP by stochastically "visiting" many state-ation pairs [Watkins and Dayan, 1992]. Variants of the algorithm lie at the heart of numerous recent state-of-the-art achievements in reinforcement learning, including the superhuman Atari-playing deep Q-network [Mnih et al., 2015]. The goal of this paper is to reproduce a precise and (nearly) self-contained proof that Q-learning converges. Much of the available literature leverages powerful theory to obtain highly generalizable results in this vein. However, this approach requires the reader to be familiar with and make many deep connections to different research areas. A student seeking to deepen their understand of Q-learning risks becoming caught in a vicious cycle of "RL-learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Computability, Logic, AI Algorithms · Advanced Bandit Algorithms Research
MethodsQ-Learning
