Load-balanced Gather-scatter Patterns for Sparse Deep Neural Networks

Fei Sun; Minghai Qin; Tianyun Zhang; Xiaolong Ma; Haoran Li; Junwen; Luo; Zihao Zhao; Yen-Kuang Chen; Yuan Xie

arXiv:2112.10898·cs.LG·December 22, 2021·1 cites

Load-balanced Gather-scatter Patterns for Sparse Deep Neural Networks

Fei Sun, Minghai Qin, Tianyun Zhang, Xiaolong Ma, Haoran Li, Junwen, Luo, Zihao Zhao, Yen-Kuang Chen, Yuan Xie

PDF

Open Access

TL;DR

This paper introduces gather-scatter sparse patterns and a new pruning method to improve the efficiency of sparse deep neural networks, achieving near-unstructured accuracy with structured-like efficiency on modern hardware.

Contribution

It proposes novel gather-scatter sparse patterns and a pruning methodology that balance accuracy and computational efficiency on hardware with gather/scatter capabilities.

Findings

01

GS patterns improve accuracy-efficiency trade-offs.

02

Models with GS patterns run 2-3 times faster at similar accuracy.

03

Validated on machine translation, image recognition, and speech recognition.

Abstract

Deep neural networks (DNNs) have been proven to be effective in solving many real-life problems, but its high computation cost prohibits those models from being deployed to edge devices. Pruning, as a method to introduce zeros to model weights, has shown to be an effective method to provide good trade-offs between model accuracy and computation efficiency, and is a widely-used method to generate compressed models. However, the granularity of pruning makes important trade-offs. At the same sparsity level, a coarse-grained structured sparse pattern is more efficient on conventional hardware but results in worse accuracy, while a fine-grained unstructured sparse pattern can achieve better accuracy but is inefficient on existing hardware. On the other hand, some modern processors are equipped with fast on-chip scratchpad memories and gather/scatter engines that perform indirect load and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning

MethodsPruning · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings