Fastest Convergence for Q-learning
Adithya M. Devraj, Sean P. Meyn

TL;DR
The paper introduces Zap Q-learning, an algorithm that achieves faster convergence and optimal asymptotic variance through a matrix-gain approach, supported by theoretical analysis and numerical experiments.
Contribution
It presents Zap Q-learning, a novel matrix-gain algorithm with optimal variance and rapid convergence, along with a tutorial survey on reinforcement learning algorithms.
Findings
Achieves quick convergence even in non-ideal settings
Provides an asymptotic variance that is optimal
Demonstrates stability and efficiency through numerical experiments
Abstract
The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-scale update equation for the matrix gain sequence. The analysis suggests that the approach will lead to stable and efficient computation even for non-ideal parameterized settings. Numerical experiments confirm the quick convergence, even in such non-ideal cases. A secondary goal of this paper is tutorial. The first half of the paper contains a survey on reinforcement learning algorithms, with a focus on minimum variance algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Iterative Learning Control Systems · Advancements in Semiconductor Devices and Circuit Design
MethodsQ-Learning
