Representative Teacher Keys for Knowledge Distillation Model Compression   Based on Attention Mechanism for Image Classification

Jun-Teng Yang; Sheng-Che Kao; Scott C.-H. Huang

arXiv:2206.12788·cs.CV·October 21, 2022

Representative Teacher Keys for Knowledge Distillation Model Compression Based on Attention Mechanism for Image Classification

Jun-Teng Yang, Sheng-Che Kao, Scott C.-H. Huang

PDF

Open Access

TL;DR

This paper introduces a novel knowledge distillation method called RTK that leverages attention mechanisms to filter useful features from teacher models, significantly improving student model accuracy in image classification tasks.

Contribution

The paper proposes RTK, a new KD approach that considers feature map similarity and filters out useless information using attention, enhancing model compression effectiveness.

Findings

01

RTK outperforms existing attention-based KD methods on multiple datasets.

02

RTK improves classification accuracy across various backbone networks.

03

Experimental results validate RTK's effectiveness in model compression.

Abstract

With the improvement of AI chips (e.g., GPU, TPU, and NPU) and the fast development of the Internet of Things (IoT), some robust deep neural networks (DNNs) are usually composed of millions or even hundreds of millions of parameters. Such a large model may not be suitable for directly deploying on low computation and low capacity units (e.g., edge devices). Knowledge distillation (KD) has recently been recognized as a powerful model compression method to decrease the model parameters effectively. The central concept of KD is to extract useful information from the feature maps of a large model (i.e., teacher model) as a reference to successfully train a small model (i.e., student model) in which the model size is much smaller than the teacher one. Although many KD methods have been proposed to utilize the information from the feature maps of intermediate layers in the teacher model, most…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Brain Tumor Detection and Classification · Advanced Image and Video Retrieval Techniques

Methods1x1 Convolution · Bottleneck Residual Block · Average Pooling · Convolution · Max Pooling · Knowledge Distillation · Batch Normalization · Residual Connection · Kaiming Initialization · *Communicated@Fast*How Do I Communicate to Expedia?