AutoSAGE: Input-Aware CUDA Scheduling for Sparse GNN Aggregation (SpMM/SDDMM) and CSR Attention

Aleksandar Stankovic

arXiv:2511.17594·cs.LG·November 25, 2025

AutoSAGE: Input-Aware CUDA Scheduling for Sparse GNN Aggregation (SpMM/SDDMM) and CSR Attention

Aleksandar Stankovic

PDF

Open Access

TL;DR

AutoSAGE is an input-aware CUDA scheduler that optimizes sparse GNN aggregation performance by dynamically selecting tiling and mapping strategies, achieving significant speedups over vendor kernels especially at small feature widths.

Contribution

It introduces AutoSAGE, a lightweight, input-aware CUDA scheduling framework for sparse GNN operations that adapts to input characteristics and improves performance.

Findings

01

Matches vendor baselines at bandwidth-bound feature widths

02

Achieves up to 4.7x speedups on synthetic stress tests

03

Provides a reproducible framework with CUDA sources and Python bindings

Abstract

Sparse GNN aggregations (CSR SpMM/SDDMM) vary widely in performance with degree skew, feature width, and GPU micro-architecture. We present AutoSAGE, an input-aware CUDA scheduler that chooses tiling and mapping per input using a lightweight estimate refined by on-device micro-probes, with a guardrail that safely falls back to vendor kernels and a persistent cache for deterministic replay. AutoSAGE covers SpMM and SDDMM and composes into a CSR attention pipeline (SDDMM -> row-softmax -> SpMM). On Reddit and OGBN-Products, it matches vendor baselines at bandwidth-bound feature widths and finds gains at small widths; on synthetic sparsity and skew stress tests it achieves up to 4.7x kernel-level speedups. We release CUDA sources, Python bindings, a reproducible harness, and replayable cache logs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Cloud Computing and Resource Management