Parallel Actors and Learners: A Framework for Generating Scalable RL Implementations
Chi Zhang, Sanmukh Rao Kuppannagari, Viktor K Prasanna

TL;DR
This paper introduces a scalable reinforcement learning framework using parallel actors and learners, optimized data structures, and memory layouts to significantly improve training efficiency on multi-core CPU systems.
Contribution
It presents a novel framework with a new prioritized replay buffer, cache-efficient data layout, and lazy writing to enhance RL training scalability on multi-core architectures.
Findings
Achieved faster RL training on CPU+GPU platforms.
Supported multiple RL algorithms like DQN and DDPG.
Reduced synchronization overheads and cache misses.
Abstract
Reinforcement Learning (RL) has achieved significant success in application domains such as robotics, games and health care. However, training RL agents is very time consuming. Current implementations exhibit poor performance due to challenges such as irregular memory accesses and thread-level synchronization overheads on CPU. In this work, we propose a framework for generating scalable reinforcement learning implementations on multi-core systems. Replay Buffer is a key component of RL algorithms which facilitates storage of samples obtained from environmental interactions and data sampling for the learning process. We define a new data structure for Prioritized Replay Buffer based on -ary sum tree that supports asynchronous parallel insertions, sampling, and priority updates. To address the challenge of irregular memory accesses, we propose a novel data layout to store the nodes of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Target Policy Smoothing · Clipped Double Q-learning · Dilated Convolution · Convolution · Adam · Experience Replay · Batch Normalization · Weight Decay · Dense Connections
