Finite-Time Analysis of Temporal Difference Learning with Experience   Replay

Han-Dong Lim; Donghwan Lee

arXiv:2306.09746·cs.LG·April 16, 2025·1 cites

Finite-Time Analysis of Temporal Difference Learning with Experience Replay

Han-Dong Lim, Donghwan Lee

PDF

Open Access

TL;DR

This paper provides finite-time error bounds for TD-learning with experience replay in reinforcement learning, analyzing how buffer size and mini-batch sampling influence learning accuracy under Markovian observations.

Contribution

It introduces a simple noise decomposition and offers finite-time bounds for TD-learning with experience replay, advancing theoretical understanding of deep RL algorithms.

Findings

01

Error bounds depend on replay buffer size and mini-batch sampling.

02

Finite-time bounds are established for both averaged and final iterates.

03

Theoretical insights connect experience replay mechanics with TD-learning performance.

Abstract

Temporal-difference (TD) learning is widely regarded as one of the most popular algorithms in reinforcement learning (RL). Despite its widespread use, it has only been recently that researchers have begun to actively study its finite time behavior, including the finite time bound on mean squared error and sample complexity. On the empirical side, experience replay has been a key ingredient in the success of deep RL algorithms, but its theoretical effects on RL have yet to be fully understood. In this paper, we present a simple decomposition of the Markovian noise terms and provide finite-time error bounds for TD-learning with experience replay. Specifically, under the Markovian observation model, we demonstrate that for both the averaged iterate and final iterate cases, the error term induced by a constant step-size can be effectively controlled by the size of the replay buffer and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsExperience Replay