Accelerating Deep Neural Networks via Semi-Structured Activation   Sparsity

Matteo Grimaldi; Darshan C. Ganji; Ivan Lazarevich; Sudhakar Sah

arXiv:2309.06626·cs.CV·September 28, 2023·1 cites

Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity

Matteo Grimaldi, Darshan C. Ganji, Ivan Lazarevich, Sudhakar Sah

PDF

Open Access

TL;DR

This paper introduces a semi-structured activation sparsity method for deep neural networks that enables significant inference speedups with minimal accuracy loss, suitable for embedded devices.

Contribution

The authors propose a novel semi-structured sparsity technique and a training procedure that considers activation positions, improving inference speed with minimal accuracy degradation.

Findings

01

Achieves 1.25x speedup on ResNet18 with 1.1% accuracy drop.

02

Effective across image classification and object detection models.

03

Enhances structured pruning methods for better latency-accuracy trade-offs.

Abstract

The demand for efficient processing of deep neural networks (DNNs) on embedded devices is a significant challenge limiting their deployment. Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency. It is known that unstructured sparsity results in lower accuracy degradation with respect to structured sparsity but the former needs extensive inference engine changes to get latency benefits. To tackle this challenge, we propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications. To attain high speedup levels at inference time, we design a sparse training procedure with awareness of the final position of the activations while computing the General Matrix Multiplication (GEMM). We extensively evaluate the proposed solution across various models for image classification and object detection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Domain Adaptation and Few-Shot Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Pruning