SPIDER: Unleashing Sparse Tensor Cores for Stencil Computation via Strided Swapping
Qiqi GU, Chenpeng Wu, Heng Shi, Jianguo Yao

TL;DR
SPIDER transforms stencil computations to leverage Sparse Tensor Cores, turning sparsity into an optimization opportunity and significantly improving performance over existing methods.
Contribution
It introduces a novel transformation method combining strided swapping and row-swapping to efficiently utilize Sparse Tensor Cores for stencil acceleration.
Findings
SPIDER outperforms cuDNN by 6.20× in speed.
SPIDER is 2.00× faster than state-of-the-art Tensor Core approaches.
The method introduces minimal compile-time overhead and no runtime overhead.
Abstract
Recent research has focused on accelerating stencil computations by exploiting emerging hardware like Tensor Cores. To leverage these accelerators, the stencil operation must be transformed to matrix multiplications. However, this transformation introduces undesired sparsity into the kernel matrix, leading to significant redundant computation. In this paper, we present SPIDER, the first system to turn this unresolved sparsity into an optimization opportunity by exploring the potential of Sparse Tensor Cores (SpTCs) for stencil acceleration. Specifically, SPIDER introduces an efficient and elegant transformation method that integrates two cooperative techniques: an ahead-of-time strided swapping transformation for kernel matrices and an on-the-fly row-swapping mechanism for inputs. This rule-based approach effectively transforms stencil computation into operations compatible with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
