Convergence Results For Q-Learning With Experience Replay

Liran Szlak; Ohad Shamir

arXiv:2112.04213·cs.LG·December 9, 2021

Convergence Results For Q-Learning With Experience Replay

Liran Szlak, Ohad Shamir

PDF

Open Access 1 Models

TL;DR

This paper provides a theoretical analysis of experience replay in tabular Q-learning, establishing convergence rates and conditions under which replay improves learning performance, supported by experiments.

Contribution

It offers the first rigorous convergence guarantees for Q-learning with experience replay and analyzes when replay enhances learning efficiency.

Findings

01

Experience replay can improve convergence rates under certain conditions.

02

Theoretical bounds depend on replay frequency and iterations.

03

Experiments support the theoretical predictions.

Abstract

A commonly used heuristic in RL is experience replay (e.g.~\citet{lin1993reinforcement, mnih2015human}), in which a learner stores and re-uses past trajectories as if they were sampled online. In this work, we initiate a rigorous study of this heuristic in the setting of tabular Q-learning. We provide a convergence rate guarantee, and discuss how it compares to the convergence of Q-learning depending on important parameters such as the frequency and number of replay iterations. We also provide theoretical evidence showing when we might expect this heuristic to strictly improve performance, by introducing and analyzing a simple class of MDPs. Finally, we provide some experiments to support our theoretical findings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
CarlosMN/CartPole
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCognitive Radio Networks and Spectrum Sensing · Advanced Bandit Algorithms Research · Analog and Mixed-Signal Circuit Design

MethodsQ-Learning · Experience Replay