Distributed Prioritized Experience Replay

Dan Horgan; John Quan; David Budden; Gabriel Barth-Maron; Matteo; Hessel; Hado van Hasselt; David Silver

arXiv:1803.00933·cs.LG·March 5, 2018·412 cites

Distributed Prioritized Experience Replay

Dan Horgan, John Quan, David Budden, Gabriel Barth-Maron, Matteo, Hessel, Hado van Hasselt, David Silver

PDF

Open Access 5 Repos

TL;DR

This paper introduces a distributed deep reinforcement learning architecture that leverages prioritized experience replay to efficiently learn from large-scale data, significantly improving performance and training speed on complex environments.

Contribution

It presents a scalable distributed architecture that decouples acting from learning, utilizing prioritized experience replay to enhance data efficiency and performance.

Findings

01

Achieved better final performance on Arcade Learning Environment

02

Reduced training time significantly compared to previous methods

03

Effectively scaled reinforcement learning to larger data sets

Abstract

We propose a distributed architecture for deep reinforcement learning at scale, that enables agents to learn effectively from orders of magnitude more data than previously possible. The algorithm decouples acting from learning: the actors interact with their own instances of the environment by selecting actions according to a shared neural network, and accumulate the resulting experience in a shared experience replay memory; the learner replays samples of experience and updates the neural network. The architecture relies on prioritized experience replay to focus only on the most significant data generated by the actors. Our architecture substantially improves the state of the art on the Arcade Learning Environment, achieving better final performance in a fraction of the wall-clock training time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Explainable Artificial Intelligence (XAI)

MethodsRMSProp · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Batch Normalization · Convolution · Double Q-learning · Prioritized Experience Replay · Weight Decay · Deep Deterministic Policy Gradient · Dense Connections