Fastest Convergence for Q-learning

Adithya M. Devraj; Sean P. Meyn

arXiv:1707.03770·cs.SY·March 23, 2018·27 cites

Fastest Convergence for Q-learning

Adithya M. Devraj, Sean P. Meyn

PDF

Open Access

TL;DR

The paper introduces Zap Q-learning, an algorithm that achieves faster convergence and optimal asymptotic variance through a matrix-gain approach, supported by theoretical analysis and numerical experiments.

Contribution

It presents Zap Q-learning, a novel matrix-gain algorithm with optimal variance and rapid convergence, along with a tutorial survey on reinforcement learning algorithms.

Findings

01

Achieves quick convergence even in non-ideal settings

02

Provides an asymptotic variance that is optimal

03

Demonstrates stability and efficiency through numerical experiments

Abstract

The Zap Q-learning algorithm introduced in this paper is an improvement of Watkins' original algorithm and recent competitors in several respects. It is a matrix-gain algorithm designed so that its asymptotic variance is optimal. Moreover, an ODE analysis suggests that the transient behavior is a close match to a deterministic Newton-Raphson implementation. This is made possible by a two time-scale update equation for the matrix gain sequence. The analysis suggests that the approach will lead to stable and efficient computation even for non-ideal parameterized settings. Numerical experiments confirm the quick convergence, even in such non-ideal cases. A secondary goal of this paper is tutorial. The first half of the paper contains a survey on reinforcement learning algorithms, with a focus on minimum variance algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Iterative Learning Control Systems · Advancements in Semiconductor Devices and Circuit Design

MethodsQ-Learning