FuseSampleAgg: Fused Neighbor Sampling and Aggregation for Mini-batch GNNs

Aleksandar Stankovi\'c

arXiv:2511.13645·cs.LG·November 18, 2025

FuseSampleAgg: Fused Neighbor Sampling and Aggregation for Mini-batch GNNs

Aleksandar Stankovi\'c

PDF

Open Access

TL;DR

FuseSampleAgg introduces a CUDA operator that fuses neighbor sampling and aggregation for GNNs, significantly improving speed and memory efficiency while maintaining accuracy, demonstrated on multiple benchmarks.

Contribution

It presents a novel fused CUDA operator for GNN sampling and aggregation, reducing memory traffic and overhead, and achieving substantial speedups and memory savings.

Findings

01

Up to 51x speedup on ogbn-products

02

Memory reduction up to 100x

03

Deterministic operator compatible with PyTorch

Abstract

We present FuseSampleAgg, a CUDA operator that fuses neighbor sampling and mean aggregation into a single pass for one and two hop GraphSAGE. By eliminating block materialization and extra kernel launches, FuseSampleAgg reduces memory traffic and overhead while preserving GraphSAGE mean semantics via saved index replay. Across the Reddit, ogbn-arxiv, and ogbn-products benchmarks (batch size 1024, automatic mixed precision enabled), we observe step time speedups up to 51x on ogbn-products, about 4x on Reddit with fanouts 10-10 and 15-10, and about 3.3x on ogbn-arxiv at larger fanouts, with peak GPU memory reductions up to 100x, 36x, and about 3.5x, respectively. The operator is deterministic, integrates with standard PyTorch optimizers, and ships with scripts that reproduce all tables and figures from CSV logs. Code and scripts are available at https://github.com/SV25-22/FuseSampleAgg.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGraph Theory and Algorithms · Software System Performance and Reliability · Cloud Computing and Resource Management