Parallel Actors and Learners: A Framework for Generating Scalable RL   Implementations

Chi Zhang; Sanmukh Rao Kuppannagari; Viktor K Prasanna

arXiv:2110.01101·cs.LG·December 24, 2021

Parallel Actors and Learners: A Framework for Generating Scalable RL Implementations

Chi Zhang, Sanmukh Rao Kuppannagari, Viktor K Prasanna

PDF

Open Access

TL;DR

This paper introduces a scalable reinforcement learning framework using parallel actors and learners, optimized data structures, and memory layouts to significantly improve training efficiency on multi-core CPU systems.

Contribution

It presents a novel framework with a new prioritized replay buffer, cache-efficient data layout, and lazy writing to enhance RL training scalability on multi-core architectures.

Findings

01

Achieved faster RL training on CPU+GPU platforms.

02

Supported multiple RL algorithms like DQN and DDPG.

03

Reduced synchronization overheads and cache misses.

Abstract

Reinforcement Learning (RL) has achieved significant success in application domains such as robotics, games and health care. However, training RL agents is very time consuming. Current implementations exhibit poor performance due to challenges such as irregular memory accesses and thread-level synchronization overheads on CPU. In this work, we propose a framework for generating scalable reinforcement learning implementations on multi-core systems. Replay Buffer is a key component of RL algorithms which facilitates storage of samples obtained from environmental interactions and data sampling for the learning process. We define a new data structure for Prioritized Replay Buffer based on $K$ -ary sum tree that supports asynchronous parallel insertions, sampling, and priority updates. To address the challenge of irregular memory accesses, we propose a novel data layout to store the nodes of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Target Policy Smoothing · Clipped Double Q-learning · Dilated Convolution · Convolution · Adam · Experience Replay · Batch Normalization · Weight Decay · Dense Connections