TL;DR
This paper demonstrates that distributed deep reinforcement learning can be scaled efficiently using BA3C with Adam optimizer, achieving a 30-fold speedup in training Atari games from 10 hours to 21 minutes across 64 CPU nodes.
Contribution
The study introduces a scalable distributed reinforcement learning setup with optimized hyperparameters, enabling rapid training of Atari agents on large CPU clusters.
Findings
Linear scaling achieved up to 64 CPU nodes.
Training time reduced from 10 hours to 21 minutes.
Effective use of Adam optimizer with large batch sizes.
Abstract
We present a study in Distributed Deep Reinforcement Learning (DDRL) focused on scalability of a state-of-the-art Deep Reinforcement Learning algorithm known as Batch Asynchronous Advantage ActorCritic (BA3C). We show that using the Adam optimization algorithm with a batch size of up to 2048 is a viable choice for carrying out large scale machine learning computations. This, combined with careful reexamination of the optimizer's hyperparameters, using synchronous training on the node level (while keeping the local, single node part of the algorithm asynchronous) and minimizing the memory footprint of the model, allowed us to achieve linear scaling for up to 64 CPU nodes. This corresponds to a training time of 21 minutes on 768 CPU cores, as opposed to 10 hours when using a single node with 24 cores achieved by a baseline single-node implementation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAdam
