Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with Bridge Block Reconstruction for IoT Systems
Jemin Lee, Yongin Kwon, Sihyeong Park, Misun Yu, Jeman Park, Hwanjun, Song

TL;DR
This paper introduces Q-HyViT, a novel post-training quantization method specifically designed for efficient hybrid vision transformers, significantly improving their accuracy and making them more suitable for IoT devices.
Contribution
It is the first to successfully apply post-training quantization to efficient hybrid vision transformers, overcoming key challenges and enhancing performance.
Findings
Achieved 17.73% average accuracy improvement at 8-bit quantization.
Achieved 29.75% average accuracy improvement at 6-bit quantization.
Demonstrated effectiveness on multiple hybrid ViT architectures.
Abstract
Recently, vision transformers (ViTs) have superseded convolutional neural networks in numerous applications, including classification, detection, and segmentation. However, the high computational requirements of ViTs hinder their widespread implementation. To address this issue, researchers have proposed efficient hybrid transformer architectures that combine convolutional and transformer layers with optimized attention computation of linear complexity. Additionally, post-training quantization has been proposed as a means of mitigating computational demands. For mobile devices, achieving optimal acceleration for ViTs necessitates the strategic integration of quantization techniques and efficient hybrid transformer structures. However, no prior investigation has applied quantization to efficient hybrid transformers. In this paper, we discover that applying existing post-training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Infrared Target Detection Methodologies · Advanced Memory and Neural Computing
MethodsMobileViTv2
