SWIS -- Shared Weight bIt Sparsity for Efficient Neural Network Acceleration
Shurui Li, Wojciech Romaszkan, Alexander Graening, Puneet Gupta

TL;DR
SWIS introduces a shared weight bit sparsity quantization framework that significantly enhances neural network inference efficiency and accuracy, enabling faster processing and better compression on commodity hardware.
Contribution
The paper proposes SWIS, a novel quantization method utilizing shared weight bit sparsity, with an offline decomposition and scheduling algorithm for improved neural network acceleration.
Findings
Achieves up to 54.3% accuracy improvement over weight truncation.
Provides up to 6x speedup and 1.9x energy efficiency over existing architectures.
Enables effective quantization of MobileNet-v2 to 2-4 bits.
Abstract
Quantization is spearheading the increase in performance and efficiency of neural network computing systems making headway into commodity hardware. We present SWIS - Shared Weight bIt Sparsity, a quantization framework for efficient neural network inference acceleration delivering improved performance and storage compression through an offline weight decomposition and scheduling algorithm. SWIS can achieve up to 54.3% (19.8%) point accuracy improvement compared to weight truncation when quantizing MobileNet-v2 to 4 (2) bits post-training (with retraining) showing the strength of leveraging shared bit-sparsity in weights. SWIS accelerator gives up to 6x speedup and 1.9x energy improvement overstate of the art bit-serial architectures.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Neural Networks and Applications
