GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models
Hanjing Wang, Man-Kit Sit, Congjie He, Ying Wen, Weinan Zhang, Jun, Wang, Yaodong Yang, Luo Mai

TL;DR
GEAR is a GPU-centric experience replay system that enhances scalability and efficiency for large reinforcement learning models by optimizing memory use and communication, outperforming existing systems like Reverb.
Contribution
The paper introduces GEAR, a novel distributed GPU-centric replay system that improves memory management and communication for large RL models, addressing bottlenecks in existing solutions.
Findings
GEAR achieves up to 6x performance improvement over Reverb.
It effectively manages trajectory data across GPU memory resources.
GEAR enables scalable training of large RL models.
Abstract
This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers). With such models, existing systems such as Reverb face considerable bottlenecks in memory, computation, and communication. GEAR, however, optimizes memory efficiency by enabling the memory resources on GPU servers (including host memory and device memory) to manage trajectory data. Furthermore, it facilitates decentralized GPU devices to expedite various trajectory selection strategies, circumventing computational bottlenecks. GEAR is equipped with GPU kernels capable of collecting trajectories using zero-copy access to host memory, along with remote-directed-memory access over InfiniBand, improving communication efficiency. Cluster experiments have shown that GEAR can achieve performance levels up to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsOpportunistic and Delay-Tolerant Networks · Reinforcement Learning in Robotics · Advanced MIMO Systems Optimization
MethodsExperience Replay
