SparStencil: Retargeting Sparse Tensor Cores to Scientific Stencil Computations via Structured Sparsity Transformation
Qi Li, Kun Li, Haozhi Han, Liang Yuan, Junshi Chen, Yunquan Zhang, Yifeng Chen, Hong An, Ting Cao, Mao Yang

TL;DR
SparStencil is a system that adapts sparse tensor cores for scientific stencil computations by transforming irregular sparsity patterns into structured formats, enabling significant performance improvements.
Contribution
It introduces a novel structured sparsity transformation and kernel generation approach to effectively utilize sparse tensor cores for scientific workloads.
Findings
Achieves up to 7.1x speedup over state-of-the-art frameworks.
Reduces code complexity while maintaining or exceeding expert-tuned performance.
Successfully applies to 79 diverse scientific stencil kernels.
Abstract
Sparse Tensor Cores offer exceptional performance gains for AI workloads by exploiting structured 2:4 sparsity. However, their potential remains untapped for core scientific workloads such as stencil computations, which exhibit irregular sparsity patterns.This paper presents SparStencil, the first system to retarget sparse TCUs for scientific stencil computations through structured sparsity transformation. SparStencil introduces three key techniques: (1) Adaptive Layout Morphing, which restructures stencil patterns into staircase-aligned sparse matrices via a flatten-and-crush pipeline; (2) Structured Sparsity Conversion, which formulates transformation as a graph matching problem to ensure compatibility with 2:4 sparsity constraints; (3) Automatic Kernel Generation, which compiles transformed stencils into optimized sparse MMA kernels via layout search and table-driven memory mapping.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Parallel Computing and Optimization Techniques · Tensor decomposition and applications
