Quantized Feature Distillation for Network Quantization
Ke Zhu, Yin-Yin He, Jianxin Wu

TL;DR
This paper introduces Quantized Feature Distillation (QFD), a novel knowledge distillation-based quantization method that simplifies training and improves performance of low-bit neural networks across various vision tasks.
Contribution
QFD is a new, effective QAT method that trains quantized representations as teachers and uses knowledge distillation, enabling better quantization of vision transformers for detection and segmentation.
Findings
QFD outperforms previous quantization methods in image classification and detection.
QFD effectively quantizes ViT and Swin-Transformer models.
First successful quantization of vision transformers for object detection and segmentation.
Abstract
Neural network quantization aims to accelerate and trim full-precision neural network models by using low bit approximations. Methods adopting the quantization aware training (QAT) paradigm have recently seen a rapid growth, but are often conceptually complicated. This paper proposes a novel and highly effective QAT method, quantized feature distillation (QFD). QFD first trains a quantized (or binarized) representation as the teacher, then quantize the network using knowledge distillation (KD). Quantitative results show that QFD is more flexible and effective (i.e., quantization friendly) than previous quantization methods. QFD surpasses existing methods by a noticeable margin on not only image classification but also object detection, albeit being much simpler. Furthermore, QFD quantizes ViT and Swin-Transformer on MS-COCO detection and segmentation, which verifies its potential in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Advanced Image and Video Retrieval Techniques
MethodsKnowledge Distillation · Attentive Walk-Aggregating Graph Neural Network
