ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transformer Acceleration
Ning-Chi Huang, Chi-Chih Chang, Wei-Cheng Lin, Endri Taka, Diana, Marculescu, Kai-Chiang Wu

TL;DR
This paper introduces ELSA, a method for customizing layer-wise N:M sparsity configurations in vision transformers to optimize inference speed and memory use on specialized accelerators, with minimal accuracy loss.
Contribution
ELSA is the first approach to optimize layer-wise N:M sparsity for ViTs, considering accelerator support and throughput, enabling significant FLOP reduction with negligible accuracy impact.
Findings
Achieves 2.9× FLOPs reduction on Swin-B and DeiT-B models.
Maintains high accuracy with minimal degradation on ImageNet.
Effectively leverages mixed sparsity support in accelerators.
Abstract
sparsity is an emerging model compression method supported by more and more accelerators to speed up sparse matrix multiplication in deep neural networks. Most existing sparsity methods compress neural networks with a uniform setting for all layers in a network or heuristically determine the layer-wise configuration by considering the number of parameters in each layer. However, very few methods have been designed for obtaining a layer-wise customized sparse configuration for vision transformers (ViTs), which usually consist of transformer blocks involving the same number of parameters. In this work, to address the challenge of selecting suitable sparse configuration for ViTs on sparsity-supporting accelerators, we propose ELSA, Exploiting Layer-wise Sparsity for ViTs. Considering not only all sparsity levels supported by a given…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Infrared Target Detection Methodologies · Image Processing Techniques and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
