A Deeper Look at Experience Replay

Shangtong Zhang; Richard S. Sutton

arXiv:1712.01275·cs.LG·May 1, 2018·186 cites

A Deeper Look at Experience Replay

Shangtong Zhang, Richard S. Sutton

PDF

Open Access 4 Repos

TL;DR

This paper systematically studies experience replay in deep reinforcement learning, revealing that large buffers can harm performance and proposing a simple method to mitigate this issue, validated across various domains.

Contribution

It provides a comprehensive empirical analysis of experience replay, highlighting the importance of buffer size and introducing an effective O(1) remedy.

Findings

01

Large replay buffers can significantly degrade performance.

02

A simple O(1) method effectively mitigates negative effects of large buffers.

03

The proposed method improves results in both simple and complex RL environments.

Abstract

Recently experience replay is widely used in various deep reinforcement learning (RL) algorithms, in this paper we rethink the utility of experience replay. It introduces a new hyper-parameter, the memory buffer size, which needs carefully tuning. However unfortunately the importance of this new hyper-parameter has been underestimated in the community for a long time. In this paper we did a systematic empirical study of experience replay under various function representations. We showcase that a large replay buffer can significantly hurt the performance. Moreover, we propose a simple O(1) method to remedy the negative influence of a large replay buffer. We showcase its utility in both simple grid world and challenging domains like Atari games.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Mind wandering and attention · Advanced Bandit Algorithms Research

MethodsExperience Replay