ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transformer   Acceleration

Ning-Chi Huang; Chi-Chih Chang; Wei-Cheng Lin; Endri Taka; Diana; Marculescu; Kai-Chiang Wu

arXiv:2409.09708·cs.CV·September 17, 2024

ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transformer Acceleration

Ning-Chi Huang, Chi-Chih Chang, Wei-Cheng Lin, Endri Taka, Diana, Marculescu, Kai-Chiang Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces ELSA, a method for customizing layer-wise N:M sparsity configurations in vision transformers to optimize inference speed and memory use on specialized accelerators, with minimal accuracy loss.

Contribution

ELSA is the first approach to optimize layer-wise N:M sparsity for ViTs, considering accelerator support and throughput, enabling significant FLOP reduction with negligible accuracy impact.

Findings

01

Achieves 2.9× FLOPs reduction on Swin-B and DeiT-B models.

02

Maintains high accuracy with minimal degradation on ImageNet.

03

Effectively leverages mixed sparsity support in accelerators.

Abstract

$N : M$ sparsity is an emerging model compression method supported by more and more accelerators to speed up sparse matrix multiplication in deep neural networks. Most existing $N : M$ sparsity methods compress neural networks with a uniform setting for all layers in a network or heuristically determine the layer-wise configuration by considering the number of parameters in each layer. However, very few methods have been designed for obtaining a layer-wise customized $N : M$ sparse configuration for vision transformers (ViTs), which usually consist of transformer blocks involving the same number of parameters. In this work, to address the challenge of selecting suitable sparse configuration for ViTs on $N : M$ sparsity-supporting accelerators, we propose ELSA, Exploiting Layer-wise $N : M$ Sparsity for ViTs. Considering not only all $N : M$ sparsity levels supported by a given…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ningchihuang/ELSA
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Infrared Target Detection Methodologies · Image Processing Techniques and Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings