ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
Haoran You, Huihong Shi, Yipin Guo, Yingyan Celine Lin

TL;DR
This paper introduces ShiftAddViT, a novel approach that reparameterizes Vision Transformers with multiplication primitives like shifts and additions, significantly improving GPU inference speed and energy efficiency while maintaining accuracy.
Contribution
It proposes a new reparameterization method using multiplication primitives and a mixture of experts framework to reduce computation in Vision Transformers without retraining from scratch.
Findings
Achieves up to 5.18x latency reduction on GPUs.
Saves up to 42.9% energy consumption.
Maintains comparable accuracy to original ViTs.
Abstract
Vision Transformers (ViTs) have shown impressive performance and have become a unified backbone for multiple vision tasks. However, both the attention mechanism and multi-layer perceptrons (MLPs) in ViTs are not sufficiently efficient due to dense multiplications, leading to costly training and inference. To this end, we propose to reparameterize pre-trained ViTs with a mixture of multiplication primitives, e.g., bitwise shifts and additions, towards a new type of multiplication-reduced model, dubbed , which aims to achieve end-to-end inference speedups on GPUs without requiring training from scratch. Specifically, all among queries, keys, and values are reparameterized using additive kernels, after mapping queries and keys to binary codes in Hamming space. The remaining MLPs or linear layers are then reparameterized with shift kernels. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Advanced Image and Video Retrieval Techniques
