Stabilizing Q-learning with Linear Architectures for Provably Efficient   Learning

Andrea Zanette; Martin J. Wainwright

arXiv:2206.00796·cs.LG·June 3, 2022

Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning

Andrea Zanette, Martin J. Wainwright

PDF

Open Access

TL;DR

This paper introduces a stabilized Q-learning algorithm with linear function approximation that achieves provably efficient learning with regret bounds, space efficiency, and robustness to approximation errors.

Contribution

It provides a modular analysis of key mechanisms like target networks and experience replay, establishing their roles in stabilizing linear Q-learning with theoretical guarantees.

Findings

01

Achieves state-of-the-art regret bounds for linear MDPs.

02

Maintains space complexity independent of the number of steps.

03

Performance degrades gracefully with approximation errors.

Abstract

The $Q$ -learning algorithm is a simple and widely-used stochastic approximation scheme for reinforcement learning, but the basic protocol can exhibit instability in conjunction with function approximation. Such instability can be observed even with linear function approximation. In practice, tools such as target networks and experience replay appear to be essential, but the individual contribution of each of these mechanisms is not well understood theoretically. This work proposes an exploration variant of the basic $Q$ -learning protocol with linear function approximation. Our modular analysis illustrates the role played by each algorithmic tool that we adopt: a second order update rule, a set of target networks, and a mechanism akin to experience replay. Together, they enable state of the art regret bounds on linear MDPs while preserving the most prominent feature of the algorithm,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health Research Topics · Age of Information Optimization · Reinforcement Learning in Robotics

MethodsExperience Replay