Load-balanced Gather-scatter Patterns for Sparse Deep Neural Networks
Fei Sun, Minghai Qin, Tianyun Zhang, Xiaolong Ma, Haoran Li, Junwen, Luo, Zihao Zhao, Yen-Kuang Chen, Yuan Xie

TL;DR
This paper introduces gather-scatter sparse patterns and a new pruning method to improve the efficiency of sparse deep neural networks, achieving near-unstructured accuracy with structured-like efficiency on modern hardware.
Contribution
It proposes novel gather-scatter sparse patterns and a pruning methodology that balance accuracy and computational efficiency on hardware with gather/scatter capabilities.
Findings
GS patterns improve accuracy-efficiency trade-offs.
Models with GS patterns run 2-3 times faster at similar accuracy.
Validated on machine translation, image recognition, and speech recognition.
Abstract
Deep neural networks (DNNs) have been proven to be effective in solving many real-life problems, but its high computation cost prohibits those models from being deployed to edge devices. Pruning, as a method to introduce zeros to model weights, has shown to be an effective method to provide good trade-offs between model accuracy and computation efficiency, and is a widely-used method to generate compressed models. However, the granularity of pruning makes important trade-offs. At the same sparsity level, a coarse-grained structured sparse pattern is more efficient on conventional hardware but results in worse accuracy, while a fine-grained unstructured sparse pattern can achieve better accuracy but is inefficient on existing hardware. On the other hand, some modern processors are equipped with fast on-chip scratchpad memories and gather/scatter engines that perform indirect load and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning
MethodsPruning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
