Convergence Results For Q-Learning With Experience Replay
Liran Szlak, Ohad Shamir

TL;DR
This paper provides a theoretical analysis of experience replay in tabular Q-learning, establishing convergence rates and conditions under which replay improves learning performance, supported by experiments.
Contribution
It offers the first rigorous convergence guarantees for Q-learning with experience replay and analyzes when replay enhances learning efficiency.
Findings
Experience replay can improve convergence rates under certain conditions.
Theoretical bounds depend on replay frequency and iterations.
Experiments support the theoretical predictions.
Abstract
A commonly used heuristic in RL is experience replay (e.g.~\citet{lin1993reinforcement, mnih2015human}), in which a learner stores and re-uses past trajectories as if they were sampled online. In this work, we initiate a rigorous study of this heuristic in the setting of tabular Q-learning. We provide a convergence rate guarantee, and discuss how it compares to the convergence of Q-learning depending on important parameters such as the frequency and number of replay iterations. We also provide theoretical evidence showing when we might expect this heuristic to strictly improve performance, by introducing and analyzing a simple class of MDPs. Finally, we provide some experiments to support our theoretical findings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCognitive Radio Networks and Spectrum Sensing · Advanced Bandit Algorithms Research · Analog and Mixed-Signal Circuit Design
MethodsQ-Learning · Experience Replay
