Reinforcement Learning through Asynchronous Advantage Actor-Critic on a   GPU

Mohammad Babaeizadeh; Iuri Frosio; Stephen Tyree; Jason Clemons; Jan; Kautz

arXiv:1611.06256·cs.LG·March 8, 2017·28 cites

Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU

Mohammad Babaeizadeh, Iuri Frosio, Stephen Tyree, Jason Clemons, Jan, Kautz

PDF

Open Access 3 Repos

TL;DR

This paper presents a hybrid CPU/GPU implementation of the A3C reinforcement learning algorithm, significantly improving training speed by leveraging GPU computation and introducing a dynamic scheduling system.

Contribution

The paper introduces a novel hybrid CPU/GPU version of A3C with a dynamic scheduling strategy, enhancing computational efficiency and speed.

Findings

01

Achieved significant speedup over CPU implementation

02

Developed a system of queues and dynamic scheduling for asynchronous algorithms

03

Made the implementation publicly available for research use

Abstract

We introduce a hybrid CPU/GPU version of the Asynchronous Advantage Actor-Critic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. We analyze its computational traits and concentrate on aspects critical to leveraging the GPU's computational power. We introduce a system of queues and a dynamic scheduling strategy, potentially helpful for other asynchronous algorithms as well. Our hybrid CPU/GPU version of A3C, based on TensorFlow, achieves a significant speed up compared to a CPU implementation; we make it publicly available to other researchers at https://github.com/NVlabs/GA3C .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Parallel Computing and Optimization Techniques

MethodsEntropy Regularization · Convolution · Dense Connections · Softmax · A3C