Model compression using knowledge distillation with integrated gradients
David E. Hernandez, Jose Chang, Torbj\"orn E. M. Nordling

TL;DR
This paper presents a novel model compression method that enhances knowledge distillation with integrated gradients, significantly improving accuracy and reducing inference time for deployment on resource-limited devices.
Contribution
The paper introduces a new IG-augmented knowledge distillation technique that precomputes integrated gradients to improve model compression efficiency and accuracy.
Findings
Achieves 92.6% accuracy with 4.1x compression on CIFAR-10.
Reduces inference time from 140 ms to 13 ms.
Outperforms conventional methods and is validated on ImageNet subset.
Abstract
Model compression is critical for deploying deep learning models on resource-constrained devices. We introduce a novel method enhancing knowledge distillation with integrated gradients (IG) as a data augmentation strategy. Our approach overlays IG maps onto input images during training, providing student models with deeper insights into teacher models' decision-making processes. Extensive evaluation on CIFAR-10 demonstrates that our IG-augmented knowledge distillation achieves 92.6% testing accuracy with a 4.1x compression factor-a significant 1.1 percentage point improvement () over non-distilled models (91.5%). This compression reduces inference time from 140 ms to 13 ms. Our method precomputes IG maps before training, transforming substantial runtime costs into a one-time preprocessing step. Our comprehensive experiments include: (1) comparisons with attention transfer,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsKnowledge Distillation
