TL;DR
Fused3S introduces a novel fused sparse matrix operation algorithm that significantly accelerates sparse attention computations on GPUs by maximizing tensor core utilization and reducing data movement, benefiting graph neural network models.
Contribution
It is the first to jointly optimize the three sparse matrix operations in the 3S pattern, achieving substantial speedups over previous methods on modern GPUs.
Findings
Achieves up to 16.3x speedup on H100 GPUs.
Accelerates Graph Transformer inference by up to 5.36x.
Outperforms existing sparse operation methods across multiple datasets and GPU architectures.
Abstract
Sparse attention is a core building block in many leading neural network models, from graph-structured learning to sparse sequence modeling. It can be decomposed into a sequence of three sparse matrix operations (3S): sampled dense-dense matrix multiplication (SDDMM), softmax normalization, and sparse matrix multiplication (SpMM). Efficiently executing the 3S computational pattern on modern GPUs remains challenging due to (a) the mismatch between unstructured sparsity and tensor cores optimized for dense operations, and (b) the high cost of data movement. Previous works have optimized these sparse operations individually or addressed one of these challenges. This paper introduces Fused3S, the first fused 3S algorithm that jointly maximizes tensor core utilization and minimizes data movement. Across real-world graph datasets, Fused3S achieves and speedup…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Laplacian EigenMap · Linear Layer · Laplacian Positional Encodings · Multi-Head Attention · Dense Connections · Graph Transformer · Dropout · Layer Normalization · Position-Wise Feed-Forward Layer
