IMPALA: Scalable Distributed Deep-RL with Importance Weighted   Actor-Learner Architectures

Lasse Espeholt; Hubert Soyer; Remi Munos; Karen Simonyan; Volodymir; Mnih; Tom Ward; Yotam Doron; Vlad Firoiu; Tim Harley; Iain Dunning; Shane; Legg; Koray Kavukcuoglu

arXiv:1802.01561·cs.LG·June 29, 2018·612 cites

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir, Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane, Legg, Koray Kavukcuoglu

PDF

Open Access 5 Repos 2 Videos

TL;DR

IMPALA introduces a scalable distributed reinforcement learning architecture that efficiently handles large data and training times, achieving superior multi-task performance through a novel off-policy correction method and resource-efficient design.

Contribution

The paper presents IMPALA, a new distributed RL agent that scales to thousands of machines, improves data efficiency, and enables effective multi-task learning with a novel off-policy correction.

Findings

01

IMPALA outperforms previous agents with less data.

02

It achieves stable, high-throughput learning.

03

Demonstrates positive transfer in multi-task settings.

Abstract

In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games in Arcade Learning Environment (Bellemare et al.,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures· youtube

DeepMind's AI Masters Even More Atari Games | Two Minute Papers #238· youtube

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adaptive Dynamic Programming Control

MethodsSigmoid Activation · Tanh Activation · Experience Replay · Entropy Regularization · Residual Connection · Gradient Clipping · RMSProp · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution