GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers
Guang Liang, Xinyao Liu, Jianxin Wu

TL;DR
GPLQ is a novel quantization framework for Vision Transformers that significantly reduces training time and memory usage while maintaining high accuracy and generalization across various vision tasks.
Contribution
The paper introduces GPLQ, a practical and efficient quantization method that preserves model generalization and achieves competitive accuracy with minimal training overhead.
Findings
GPLQ is 100x faster than existing QAT methods.
GPLQ reduces memory footprint below FP32 training levels.
GPLQ maintains high accuracy and generalization across multiple vision tasks.
Abstract
Vision Transformers (ViTs) are essential in computer vision but are computationally intensive, too. Model quantization, particularly to low bit-widths like 4-bit, aims to alleviate this difficulty, yet existing Post-Training Quantization (PTQ) and Quantization-Aware Training (QAT) methods exhibit significant limitations. PTQ often incurs substantial accuracy drop, while QAT achieves high accuracy but suffers from prohibitive computational costs, limited generalization to downstream tasks, training instability, and lacking of open-source codebase. To address these challenges, this paper introduces General, Practical, and Lightning Quantization (GPLQ), a novel framework designed for efficient and effective ViT quantization. GPLQ is founded on two key empirical insights: the paramount importance of activation quantization and the necessity of preserving the model's original optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsCCD and CMOS Imaging Sensors · Infrared Target Detection Methodologies · Industrial Vision Systems and Defect Detection
