Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy
Asit Mishra, Debbie Marr

TL;DR
This paper introduces Apprentice, a method that combines knowledge distillation with low-precision numerics to significantly enhance the accuracy of resource-efficient neural networks, achieving state-of-the-art results on ImageNet.
Contribution
It demonstrates how knowledge distillation can be effectively integrated with low-precision networks to improve their accuracy, providing three practical schemes for implementation.
Findings
Achieves state-of-the-art accuracy with ternary and 4-bit ResNet models.
Shows significant performance improvements over baseline low-precision networks.
Provides versatile schemes for applying knowledge distillation in training and deployment.
Abstract
Deep learning networks have achieved state-of-the-art accuracies on computer vision workloads like image classification and object detection. The performant systems, however, typically involve big models with numerous parameters. Once trained, a challenging aspect for such top performing models is deployment on resource constrained inference systems - the models (often deep networks or wide networks or both) are compute and memory intensive. Low-precision numerics and model compression using knowledge distillation are popular techniques to lower both the compute requirements and memory footprint of these deployed models. In this paper, we study the combination of these two techniques and show that the performance of low-precision networks can be significantly improved by using knowledge distillation techniques. Our approach, Apprentice, achieves state-of-the-art accuracies using ternary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsKnowledge Distillation · Average Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling
