Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey
Dayou Du, Gu Gong, Xiaowen Chu

TL;DR
This survey reviews the recent advances in quantization and hardware acceleration techniques for Vision Transformers, emphasizing the importance of co-design to optimize performance on resource-limited devices.
Contribution
It provides a comprehensive analysis of ViT architecture, quantization methods, and hardware acceleration strategies, highlighting challenges and future directions in the field.
Findings
Quantization reduces ViT model size and computational demands.
Hardware-aware quantization techniques improve efficiency.
Open-source resources facilitate further research.
Abstract
Vision Transformers (ViTs) have recently garnered considerable attention, emerging as a promising alternative to convolutional neural networks (CNNs) in several vision-related applications. However, their large model sizes and high computational and memory demands hinder deployment, especially on resource-constrained devices. This underscores the necessity of algorithm-hardware co-design specific to ViTs, aiming to optimize their performance by tailoring both the algorithmic structure and the underlying hardware accelerator to each other's strengths. Model quantization, by converting high-precision numbers to lower-precision, reduces the computational demands and memory needs of ViTs, allowing the creation of hardware specifically optimized for these quantized algorithms, boosting efficiency. This article provides a comprehensive survey of ViTs quantization and its hardware…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Advanced Vision and Imaging · Image Processing Techniques and Applications
