OpenGraphGym-MG: Using Reinforcement Learning to Solve Large Graph Optimization Problems on MultiGPU Systems
Weijian Zheng, Dali Wang, Fengguang Song

TL;DR
This paper introduces OpenGraphGym-MG, a scalable reinforcement learning framework utilizing multi-GPU systems to efficiently solve large graph optimization problems with over 30 million edges, demonstrating high performance and scalability.
Contribution
The paper presents a novel, extensible framework combining deep RL and graph embedding techniques with advanced parallelization strategies for large-scale graph optimization on multi-GPU systems.
Findings
Efficient parallel RL training and inference algorithms demonstrated on up to six GPUs.
Significant reduction in training and inference time as GPU count increases.
Successful large graph experiments on real-world and generated graphs with over 30 million edges.
Abstract
Large scale graph optimization problems arise in many fields. This paper presents an extensible, high performance framework (named OpenGraphGym-MG) that uses deep reinforcement learning and graph embedding to solve large graph optimization problems with multiple GPUs. The paper uses a common RL algorithm (deep Q-learning) and a representative graph embedding (structure2vec) to demonstrate the extensibility of the framework and, most importantly, to illustrate the novel optimization techniques, such as spatial parallelism, graph-level and node-level batched processing, distributed sparse graph storage, efficient parallel RL training and inference algorithms, repeated gradient descent iterations, and adaptive multiple-node selections. This study performs a comprehensive performance analysis on parallel efficiency and memory cost that proves the parallel RL training and inference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Ferroelectric and Negative Capacitance Devices · Caching and Content Delivery
