Self-Supervised Quantization-Aware Knowledge Distillation
Kaiqi Zhao, Ming Zhao

TL;DR
This paper introduces SQAKD, a self-supervised framework that combines quantization-aware training and knowledge distillation, eliminating the need for labeled data and extensive hyper-parameter tuning, leading to superior low-bit model performance.
Contribution
It unifies the dynamics of quantization functions and formulates QAT as a co-optimization problem, enabling effective self-supervised knowledge distillation without supervision.
Findings
Outperforms state-of-the-art QAT and KD methods across various architectures.
Eliminates the need for labeled data and complex hyper-parameter tuning.
Provides a flexible framework compatible with different quantization techniques.
Abstract
Quantization-aware training (QAT) and Knowledge Distillation (KD) are combined to achieve competitive performance in creating low-bit deep learning models. However, existing works applying KD to QAT require tedious hyper-parameter tuning to balance the weights of different loss terms, assume the availability of labeled training data, and require complex, computationally intensive training procedures for good performance. To address these limitations, this paper proposes a novel Self-Supervised Quantization-Aware Knowledge Distillation (SQAKD) framework. SQAKD first unifies the forward and backward dynamics of various quantization functions, making it flexible for incorporating various QAT works. Then it formulates QAT as a co-optimization problem that simultaneously minimizes the KL-Loss between the full-precision and low-bit models for KD and the discretization error for quantization,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Image Processing Techniques and Applications · Advanced Algorithms and Applications
MethodsKnowledge Distillation
