Distributed Distributional Deterministic Policy Gradients

Gabriel Barth-Maron; Matthew W. Hoffman; David Budden; Will Dabney,; Dan Horgan; Dhruva TB; Alistair Muldal; Nicolas Heess; Timothy Lillicrap

arXiv:1804.08617·cs.LG·April 25, 2018·283 cites

Distributed Distributional Deterministic Policy Gradients

Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney,, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, Timothy Lillicrap

PDF

Open Access 5 Repos

TL;DR

This paper introduces D4PG, a distributed reinforcement learning algorithm that combines distributional value functions with off-policy learning, achieving state-of-the-art results in various continuous control tasks.

Contribution

It adapts the distributional perspective to continuous control and integrates it into a distributed off-policy framework with simple enhancements.

Findings

01

D4PG outperforms previous methods on control tasks.

02

Distributional approach improves learning stability and performance.

03

Combining N-step returns and prioritized replay enhances results.

Abstract

This work adopts the very successful distributional perspective on reinforcement learning and adapts it to the continuous control setting. We combine this within a distributed framework for off-policy learning in order to develop what we call the Distributed Distributional Deep Deterministic Policy Gradient algorithm, D4PG. We also combine this technique with a number of additional, simple improvements such as the use of $N$ -step returns and prioritized experience replay. Experimentally we examine the contribution of each of these individual components, and show how they interact, as well as their combined contributions. Our results show that across a wide variety of simple control tasks, difficult manipulation tasks, and a set of hard obstacle-based locomotion tasks the D4PG algorithm achieves state of the art performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robotic Locomotion and Control · Zebrafish Biomedical Research Applications

MethodsN-step Returns · Prioritized Experience Replay · Adam · Batch Normalization · Distributed Distributional DDPG