Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free   Efficient Vision Transformer

Huihong Shi; Haikuo Shao; Wendong Mao; and Zhongfeng Wang

arXiv:2405.03882·cs.CV·October 1, 2024

Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer

Huihong Shi, Haikuo Shao, Wendong Mao, and Zhongfeng Wang

PDF

Open Access 1 Repo

TL;DR

Trio-ViT introduces a Softmax-free efficient Vision Transformer with tailored post-training quantization and dedicated hardware acceleration, significantly improving speed and efficiency for deployment on embedded devices.

Contribution

It proposes a Softmax-free efficient ViT architecture combined with a specialized post-training quantization method and custom hardware accelerator, addressing accuracy and efficiency challenges.

Findings

01

Achieves up to 3.6x FPS improvement over state-of-the-art ViT accelerators.

02

Enhances DSP efficiency by up to 2.1x.

03

Demonstrates effective quantization accuracy with the proposed engine.

Abstract

Motivated by the huge success of Transformers in the field of natural language processing (NLP), Vision Transformers (ViTs) have been rapidly developed and achieved remarkable performance in various computer vision tasks. However, their huge model sizes and intensive computations hinder ViTs' deployment on embedded devices, calling for effective model compression methods, such as quantization. Unfortunately, due to the existence of hardware-unfriendly and quantization-sensitive non-linear operations, particularly {Softmax}, it is non-trivial to completely quantize all operations in ViTs, yielding either significant accuracy drops or non-negligible hardware costs. In response to challenges associated with \textit{standard ViTs}, we focus our attention towards the quantization and acceleration for \textit{efficient ViTs}, which not only eliminate the troublesome Softmax but also integrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shihuihong214/trio-vit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Infrared Target Detection Methodologies · Image Processing Techniques and Applications

MethodsSoftmax · Focus