Towards Optimal Trade-offs in Knowledge Distillation for CNNs and Vision Transformers at the Edge
John Violos, Symeon Papadopoulos, Ioannis Kompatsiaris

TL;DR
This paper investigates how to optimize knowledge distillation for CNNs and Vision Transformers on edge devices by analyzing architecture choices, model size, image resolution, and fine-tuning effects.
Contribution
It provides a comprehensive analysis of factors affecting KD effectiveness on edge devices, including architecture, model size, image resolution, and post-distillation fine-tuning.
Findings
CNNs outperform ViTs in low-resource settings
Larger student models improve accuracy but reduce speed
Higher resolution images increase accuracy and memory usage
Abstract
This paper discusses four facets of the Knowledge Distillation (KD) process for Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) architectures, particularly when executed on edge devices with constrained processing capabilities. First, we conduct a comparative analysis of the KD process between CNNs and ViT architectures, aiming to elucidate the feasibility and efficacy of employing different architectural configurations for the teacher and student, while assessing their performance and efficiency. Second, we explore the impact of varying the size of the student model on accuracy and inference speed, while maintaining a constant KD duration. Third, we examine the effects of employing higher resolution images on the accuracy, memory footprint and computational workload. Last, we examine the performance improvements obtained by fine-tuning the student model after KD to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · CCD and CMOS Imaging Sensors · Neural Networks and Applications
MethodsAttention Is All You Need · Residual Connection · Adam · Dropout · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Vision Transformer · Softmax
