Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity
Matteo Grimaldi, Darshan C. Ganji, Ivan Lazarevich, Sudhakar Sah

TL;DR
This paper introduces a semi-structured activation sparsity method for deep neural networks that enables significant inference speedups with minimal accuracy loss, suitable for embedded devices.
Contribution
The authors propose a novel semi-structured sparsity technique and a training procedure that considers activation positions, improving inference speed with minimal accuracy degradation.
Findings
Achieves 1.25x speedup on ResNet18 with 1.1% accuracy drop.
Effective across image classification and object detection models.
Enhances structured pruning methods for better latency-accuracy trade-offs.
Abstract
The demand for efficient processing of deep neural networks (DNNs) on embedded devices is a significant challenge limiting their deployment. Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency. It is known that unstructured sparsity results in lower accuracy degradation with respect to structured sparsity but the former needs extensive inference engine changes to get latency benefits. To tackle this challenge, we propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications. To attain high speedup levels at inference time, we design a sparse training procedure with awareness of the final position of the activations while computing the General Matrix Multiplication (GEMM). We extensively evaluate the proposed solution across various models for image classification and object detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Domain Adaptation and Few-Shot Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Pruning
