Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch
Aojun Zhou, Yukun Ma, Junnan Zhu, Jianbo Liu, Zhijie Zhang, Kun Yuan,, Wenxiu Sun, Hongsheng Li

TL;DR
This paper introduces a novel N:M structured sparsity training method for neural networks that combines the benefits of unstructured and structured sparsity, achieving significant speed-ups on GPUs without performance loss.
Contribution
First to train N:M structured sparse networks from scratch, proposing SR-STE to improve gradient approximation, and introducing SAD to analyze topology changes during training.
Findings
2x speed-up on Nvidia A100 GPUs with no performance drop
SR-STE outperforms vanilla STE in training sparse networks
SAD effectively measures topology changes during training
Abstract
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments. It can be generally categorized into unstructured fine-grained sparsity that zeroes out multiple individual weights distributed across the neural network, and structured coarse-grained sparsity which prunes blocks of sub-networks of a neural network. Fine-grained sparsity can achieve a high compression ratio but is not hardware friendly and hence receives limited speed gains. On the other hand, coarse-grained sparsity cannot concurrently achieve both apparent acceleration on modern GPUs and decent performance. In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network, which can maintain the advantages of both unstructured fine-grained sparsity and structured coarse-grained sparsity simultaneously on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
