Finite-Time Analysis of Temporal Difference Learning with Experience Replay
Han-Dong Lim, Donghwan Lee

TL;DR
This paper provides finite-time error bounds for TD-learning with experience replay in reinforcement learning, analyzing how buffer size and mini-batch sampling influence learning accuracy under Markovian observations.
Contribution
It introduces a simple noise decomposition and offers finite-time bounds for TD-learning with experience replay, advancing theoretical understanding of deep RL algorithms.
Findings
Error bounds depend on replay buffer size and mini-batch sampling.
Finite-time bounds are established for both averaged and final iterates.
Theoretical insights connect experience replay mechanics with TD-learning performance.
Abstract
Temporal-difference (TD) learning is widely regarded as one of the most popular algorithms in reinforcement learning (RL). Despite its widespread use, it has only been recently that researchers have begun to actively study its finite time behavior, including the finite time bound on mean squared error and sample complexity. On the empirical side, experience replay has been a key ingredient in the success of deep RL algorithms, but its theoretical effects on RL have yet to be fully understood. In this paper, we present a simple decomposition of the Markovian noise terms and provide finite-time error bounds for TD-learning with experience replay. Specifically, under the Markovian observation model, we demonstrate that for both the averaged iterate and final iterate cases, the error term induced by a constant step-size can be effectively controlled by the size of the replay buffer and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsExperience Replay
