QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization
Xiuying Wei, Ruihao Gong, Yuhang Li, Xianglong Liu, Fengwei Yu

TL;DR
This paper introduces QDROP, a novel method for post-training quantization that randomly drops activation quantization, significantly improving accuracy in extremely low-bit neural network quantization, including 2-bit activation scenarios.
Contribution
The paper proposes QDROP, a simple yet effective technique that enhances PTQ by randomly dropping activation quantization, enabling 2-bit activation quantization with high accuracy.
Findings
QDROP pushes PTQ to 2-bit activation for the first time.
QDROP achieves up to 51.49% accuracy improvement.
QDROP establishes new state-of-the-art results for PTQ.
Abstract
Recently, post-training quantization (PTQ) has driven much attention to produce efficient neural networks without long-time retraining. Despite its low cost, current PTQ works tend to fail under the extremely low-bit setting. In this study, we pioneeringly confirm that properly incorporating activation quantization into the PTQ reconstruction benefits the final accuracy. To deeply understand the inherent reason, a theoretical framework is established, indicating that the flatness of the optimized low-bit model on calibration and test data is crucial. Based on the conclusion, a simple yet effective approach dubbed as QDROP is proposed, which randomly drops the quantization of activations during PTQ. Extensive experiments on various tasks including computer vision (image classification, object detection) and natural language processing (text classification and question answering) prove…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · COVID-19 diagnosis using AI
