S4: a High-sparsity, High-performance AI Accelerator
Ian En-Hsu Yen, Zhibin Xiao, Dongkuan Xu

TL;DR
This paper introduces S4, a commercial AI accelerator that leverages high sparsity to significantly improve inference speed and efficiency, outperforming mainstream platforms like Nvidia T4.
Contribution
The paper presents the first hardware platform supporting high-degree sparsity acceleration up to 32 times, enabling practical speedups and improved accuracy with larger sparse models.
Findings
S4 achieves several-times inference speedup over Nvidia T4.
Sparse models on S4 can outperform smaller dense models in accuracy and throughput.
High-degree sparsity acceleration is feasible in commercial hardware.
Abstract
Exploiting sparsity underlying neural networks has become one of the most potential methodologies to reduce the memory footprint, I/O cost, and computation workloads during inference. And the degree of sparsity one can exploit has become higher as larger model sizes have been considered along with the trend of pre-training giant models. On the other hand, compared with quantization that has been a widely supported option, acceleration through high-degree sparsity is not supported in most computing platforms. In this work, we introduce the first commercial hardware platform supporting high-degree sparsity acceleration up to 32 times -- S4. Combined with state-of-the-art sparse pruning techniques, we demonstrate several-times practical inference speedup on S4 over mainstream inference platforms such as Nvidia T4. We also show that in practice a sparse model of larger size can achieve both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsPruning
