Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with   Bridge Block Reconstruction for IoT Systems

Jemin Lee; Yongin Kwon; Sihyeong Park; Misun Yu; Jeman Park; Hwanjun; Song

arXiv:2303.12557·cs.CV·May 20, 2024·5 cites

Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with Bridge Block Reconstruction for IoT Systems

Jemin Lee, Yongin Kwon, Sihyeong Park, Misun Yu, Jeman Park, Hwanjun, Song

PDF

Open Access 1 Repo

TL;DR

This paper introduces Q-HyViT, a novel post-training quantization method specifically designed for efficient hybrid vision transformers, significantly improving their accuracy and making them more suitable for IoT devices.

Contribution

It is the first to successfully apply post-training quantization to efficient hybrid vision transformers, overcoming key challenges and enhancing performance.

Findings

01

Achieved 17.73% average accuracy improvement at 8-bit quantization.

02

Achieved 29.75% average accuracy improvement at 6-bit quantization.

03

Demonstrated effectiveness on multiple hybrid ViT architectures.

Abstract

Recently, vision transformers (ViTs) have superseded convolutional neural networks in numerous applications, including classification, detection, and segmentation. However, the high computational requirements of ViTs hinder their widespread implementation. To address this issue, researchers have proposed efficient hybrid transformer architectures that combine convolutional and transformer layers with optimized attention computation of linear complexity. Additionally, post-training quantization has been proposed as a means of mitigating computational demands. For mobile devices, achieving optimal acceleration for ViTs necessitates the strategic integration of quantization techniques and efficient hybrid transformer structures. However, no prior investigation has applied quantization to efficient hybrid transformers. In this paper, we discover that applying existing post-training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://gitlab.com/ones-ai/q-hyvit
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCCD and CMOS Imaging Sensors · Infrared Target Detection Methodologies · Advanced Memory and Neural Computing

MethodsMobileViTv2