TL;DR
This paper introduces SPEQ, a novel stochastic ensemble knowledge distillation method for quantized deep neural networks that improves performance by using a dynamically changing teacher formed from the student model itself.
Contribution
SPEQ employs stochastic bit-precision variation to create a self-knowledge distillation scheme, eliminating the need for separate teacher networks and enhancing quantized model accuracy.
Findings
Outperforms existing quantization methods in image classification.
Effective in question-answering and transfer learning tasks.
Reduces activation quantization noise through stochastic soft labels.
Abstract
The quantization of deep neural networks (QDNNs) has been actively studied for deployment in edge devices. Recent studies employ the knowledge distillation (KD) method to improve the performance of quantized networks. In this study, we propose stochastic precision ensemble training for QDNNs (SPEQ). SPEQ is a knowledge distillation training scheme; however, the teacher is formed by sharing the model parameters of the student network. We obtain the soft labels of the teacher by changing the bit precision of the activation stochastically at each layer of the forward-pass computation. The student model is trained with these soft labels to reduce the activation quantization noise. The cosine similarity loss is employed, instead of the KL-divergence, for KD training. As the teacher model changes continuously by random bit-precision assignment, it exploits the effect of stochastic ensemble…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
MethodsKnowledge Distillation
