GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU   Spatial Multiplexing

Yuke Wang; Boyuan Feng; Zheng Wang; Tong Geng; Ang Li; Yufei Ding

arXiv:2206.08482·cs.DC·June 20, 2022

GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU Spatial Multiplexing

Yuke Wang, Boyuan Feng, Zheng Wang, Tong Geng, Ang Li, Yufei Ding

PDF

Open Access

TL;DR

GMI-DRL introduces GPU spatial multiplexing with resource-adjustable instances and adaptive management to significantly improve multi-GPU deep reinforcement learning performance on DGX-A100 systems.

Contribution

It presents a novel GPU multiplexing design and management strategy that enhances multi-GPU DRL efficiency and communication support.

Findings

01

Up to 2.81X training throughput improvement over NVIDIA Isaac Gym with NCCL.

02

Up to 2.34X improvement over Horovod.

03

Effective handling of heterogeneous workloads with GPU spatial multiplexing.

Abstract

With the increasing popularity of robotics in industrial control and autonomous driving, deep reinforcement learning (DRL) raises the attention of various fields. However, DRL computation on the modern powerful GPU platform is still inefficient due to its heterogeneous workloads and interleaved execution paradigm. To this end, we propose GMI-DRL, a systematic design to accelerate multi-GPU DRL via GPU spatial multiplexing. We introduce a novel design of resource-adjustable GPU multiplexing instances (GMIs) to match the actual needs of DRL tasks, an adaptive GMI management strategy to simultaneously achieve high GPU utilization and computation throughput, and a highly efficient inter-GMI communication support to meet the demands of various DRL communication patterns. Comprehensive experiments reveal that GMI-DRL outperforms state-of-the-art NVIDIA Isaac Gym with NCCL (up to 2.81X) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModular Robots and Swarm Intelligence · Reinforcement Learning in Robotics · Evolutionary Algorithms and Applications