IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures
Lasse Espeholt, Hubert Soyer, Remi Munos, Karen Simonyan, Volodymir, Mnih, Tom Ward, Yotam Doron, Vlad Firoiu, Tim Harley, Iain Dunning, Shane, Legg, Koray Kavukcuoglu

TL;DR
IMPALA introduces a scalable distributed reinforcement learning architecture that efficiently handles large data and training times, achieving superior multi-task performance through a novel off-policy correction method and resource-efficient design.
Contribution
The paper presents IMPALA, a new distributed RL agent that scales to thousands of machines, improves data efficiency, and enables effective multi-task learning with a novel off-policy correction.
Findings
IMPALA outperforms previous agents with less data.
It achieves stable, high-throughput learning.
Demonstrates positive transfer in multi-task settings.
Abstract
In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games in Arcade Learning Environment (Bellemare et al.,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures· youtube
DeepMind's AI Masters Even More Atari Games | Two Minute Papers #238· youtube
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adaptive Dynamic Programming Control
MethodsSigmoid Activation · Tanh Activation · Experience Replay · Entropy Regularization · Residual Connection · Gradient Clipping · RMSProp · *Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Convolution
