ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient   Vision Transformer

Haoran You; Huihong Shi; Yipin Guo; Yingyan Celine Lin

arXiv:2306.06446·cs.LG·July 26, 2024·6 cites

ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer

Haoran You, Huihong Shi, Yipin Guo, Yingyan Celine Lin

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces ShiftAddViT, a novel approach that reparameterizes Vision Transformers with multiplication primitives like shifts and additions, significantly improving GPU inference speed and energy efficiency while maintaining accuracy.

Contribution

It proposes a new reparameterization method using multiplication primitives and a mixture of experts framework to reduce computation in Vision Transformers without retraining from scratch.

Findings

01

Achieves up to 5.18x latency reduction on GPUs.

02

Saves up to 42.9% energy consumption.

03

Maintains comparable accuracy to original ViTs.

Abstract

Vision Transformers (ViTs) have shown impressive performance and have become a unified backbone for multiple vision tasks. However, both the attention mechanism and multi-layer perceptrons (MLPs) in ViTs are not sufficiently efficient due to dense multiplications, leading to costly training and inference. To this end, we propose to reparameterize pre-trained ViTs with a mixture of multiplication primitives, e.g., bitwise shifts and additions, towards a new type of multiplication-reduced model, dubbed $ShiftAddViT$ , which aims to achieve end-to-end inference speedups on GPUs without requiring training from scratch. Specifically, all $MatMuls$ among queries, keys, and values are reparameterized using additive kernels, after mapping queries and keys to binary codes in Hamming space. The remaining MLPs or linear layers are then reparameterized with shift kernels. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gatech-eic/shiftaddvit
pytorchOfficial

Videos

ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Advanced Image and Video Retrieval Techniques