GEAR: A GPU-Centric Experience Replay System for Large Reinforcement   Learning Models

Hanjing Wang; Man-Kit Sit; Congjie He; Ying Wen; Weinan Zhang; Jun; Wang; Yaodong Yang; Luo Mai

arXiv:2310.05205·cs.LG·October 10, 2023

GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models

Hanjing Wang, Man-Kit Sit, Congjie He, Ying Wen, Weinan Zhang, Jun, Wang, Yaodong Yang, Luo Mai

PDF

Open Access 1 Repo 1 Video

TL;DR

GEAR is a GPU-centric experience replay system that enhances scalability and efficiency for large reinforcement learning models by optimizing memory use and communication, outperforming existing systems like Reverb.

Contribution

The paper introduces GEAR, a novel distributed GPU-centric replay system that improves memory management and communication for large RL models, addressing bottlenecks in existing solutions.

Findings

01

GEAR achieves up to 6x performance improvement over Reverb.

02

It effectively manages trajectory data across GPU memory resources.

03

GEAR enables scalable training of large RL models.

Abstract

This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers). With such models, existing systems such as Reverb face considerable bottlenecks in memory, computation, and communication. GEAR, however, optimizes memory efficiency by enabling the memory resources on GPU servers (including host memory and device memory) to manage trajectory data. Furthermore, it facilitates decentralized GPU devices to expedite various trajectory selection strategies, circumventing computational bottlenecks. GEAR is equipped with GPU kernels capable of collecting trajectories using zero-copy access to host memory, along with remote-directed-memory access over InfiniBand, improving communication efficiency. Cluster experiments have shown that GEAR can achieve performance levels up to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bigrl-team/gear
pytorchOfficial

Videos

GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models· slideslive

Taxonomy

TopicsOpportunistic and Delay-Tolerant Networks · Reinforcement Learning in Robotics · Advanced MIMO Systems Optimization

MethodsExperience Replay