CaPGNN: Optimizing Parallel Graph Neural Network Training with Joint Caching and Resource-Aware Graph Partitioning

Xianfeng Song; Yi Zou; Zheng Shi

arXiv:2508.13716·cs.DC·February 10, 2026

CaPGNN: Optimizing Parallel Graph Neural Network Training with Joint Caching and Resource-Aware Graph Partitioning

Xianfeng Song, Yi Zou, Zheng Shi

PDF

TL;DR

CaPGNN is a novel framework that significantly improves the efficiency of parallel GNN training on single-server multi-GPU systems by joint caching, resource-aware graph partitioning, and overlapping computation with communication.

Contribution

The paper introduces CaPGNN, a comprehensive framework combining adaptive caching and heuristic graph partitioning to optimize parallel GNN training efficiency and communication costs.

Findings

01

Training efficiency improved by up to 18.98x

02

Communication costs reduced by up to 99%

03

Maintains or improves accuracy in various scenarios

Abstract

Graph-structured data is ubiquitous in the real world, and Graph Neural Networks (GNNs) have become increasingly popular in various fields due to their ability to process such irregular data directly. However, as data scale, GNNs become inefficient. Although parallel training offers performance improvements, increased communication costs often offset these advantages. To address this, this paper introduces CaPGNN, a novel parallel full-batch GNN training framework on single-server with multi-GPU. Firstly, considering the fact that the number of remote vertices in a partition is often greater than or equal to the number of local vertices and there may exist many duplicate vertices, we propose a joint adaptive caching algorithm that leverages both CPU and GPU memory, integrating lightweight cache update and prefetch techniques to effectively reduce redundant communication costs.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.