AutoSAGE: Input-Aware CUDA Scheduling for Sparse GNN Aggregation (SpMM/SDDMM) and CSR Attention
Aleksandar Stankovic

TL;DR
AutoSAGE is an input-aware CUDA scheduler that optimizes sparse GNN aggregation performance by dynamically selecting tiling and mapping strategies, achieving significant speedups over vendor kernels especially at small feature widths.
Contribution
It introduces AutoSAGE, a lightweight, input-aware CUDA scheduling framework for sparse GNN operations that adapts to input characteristics and improves performance.
Findings
Matches vendor baselines at bandwidth-bound feature widths
Achieves up to 4.7x speedups on synthetic stress tests
Provides a reproducible framework with CUDA sources and Python bindings
Abstract
Sparse GNN aggregations (CSR SpMM/SDDMM) vary widely in performance with degree skew, feature width, and GPU micro-architecture. We present AutoSAGE, an input-aware CUDA scheduler that chooses tiling and mapping per input using a lightweight estimate refined by on-device micro-probes, with a guardrail that safely falls back to vendor kernels and a persistent cache for deterministic replay. AutoSAGE covers SpMM and SDDMM and composes into a CSR attention pipeline (SDDMM -> row-softmax -> SpMM). On Reddit and OGBN-Products, it matches vendor baselines at bandwidth-bound feature widths and finds gains at small widths; on synthetic sparsity and skew stress tests it achieves up to 4.7x kernel-level speedups. We release CUDA sources, Python bindings, a reproducible harness, and replayable cache logs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Cloud Computing and Resource Management
