Towards Optimal Trade-offs in Knowledge Distillation for CNNs and Vision   Transformers at the Edge

John Violos; Symeon Papadopoulos; Ioannis Kompatsiaris

arXiv:2407.12808·cs.CV·July 19, 2024

Towards Optimal Trade-offs in Knowledge Distillation for CNNs and Vision Transformers at the Edge

John Violos, Symeon Papadopoulos, Ioannis Kompatsiaris

PDF

Open Access

TL;DR

This paper investigates how to optimize knowledge distillation for CNNs and Vision Transformers on edge devices by analyzing architecture choices, model size, image resolution, and fine-tuning effects.

Contribution

It provides a comprehensive analysis of factors affecting KD effectiveness on edge devices, including architecture, model size, image resolution, and post-distillation fine-tuning.

Findings

01

CNNs outperform ViTs in low-resource settings

02

Larger student models improve accuracy but reduce speed

03

Higher resolution images increase accuracy and memory usage

Abstract

This paper discusses four facets of the Knowledge Distillation (KD) process for Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) architectures, particularly when executed on edge devices with constrained processing capabilities. First, we conduct a comparative analysis of the KD process between CNNs and ViT architectures, aiming to elucidate the feasibility and efficacy of employing different architectural configurations for the teacher and student, while assessing their performance and efficiency. Second, we explore the impact of varying the size of the student model on accuracy and inference speed, while maintaining a constant KD duration. Third, we examine the effects of employing higher resolution images on the accuracy, memory footprint and computational workload. Last, we examine the performance improvements obtained by fine-tuning the student model after KD to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · CCD and CMOS Imaging Sensors · Neural Networks and Applications

MethodsAttention Is All You Need · Residual Connection · Adam · Dropout · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Vision Transformer · Softmax