PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation
Jangho Kim, Simyung Chang, Nojun Kwak

TL;DR
This paper introduces PQK, a novel model compression technique combining pruning, quantization, and knowledge distillation that creates an efficient, lightweight DNN suitable for edge devices without requiring pre-trained teacher models.
Contribution
PQK uniquely integrates pruning, quantization, and in-network knowledge distillation to produce compact models without pre-training a teacher network.
Findings
Effective in keyword spotting tasks
Reduces model size and computational cost
Maintains high accuracy on image recognition
Abstract
As edge devices become prevalent, deploying Deep Neural Networks (DNN) on edge devices has become a critical issue. However, DNN requires a high computational resource which is rarely available for edge devices. To handle this, we propose a novel model compression method for the devices with limited computational resources, called PQK consisting of pruning, quantization, and knowledge distillation (KD) processes. Unlike traditional pruning and KD, PQK makes use of unimportant weights pruned in the pruning process to make a teacher network for training a better student network without pre-training the teacher model. PQK has two phases. Phase 1 exploits iterative pruning and quantization-aware training to make a lightweight and power-efficient model. In phase 2, we make a teacher network by adding unimportant weights unused in phase 1 to a pruned network. By using this teacher network, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning · Knowledge Distillation
