Attention-based Knowledge Distillation in Multi-attention Tasks: The Impact of a DCT-driven Loss
Alejandro L\'opez-Cifuentes, Marcos Escudero-Vi\~nolo, Jes\'us, Besc\'os, Juan C. SanMiguel

TL;DR
This paper introduces a novel frequency transform-based knowledge distillation method using DCT on activation maps, significantly improving scene recognition performance by emphasizing global image cues over pixel-level details.
Contribution
It proposes a DCT-driven loss for feature-based knowledge distillation, enhancing transferability by leveraging global frequency information in scene recognition tasks.
Findings
Outperforms state-of-the-art methods in scene recognition accuracy
Enables student networks to focus on relevant image regions
Improves feature descriptiveness and transfer performance
Abstract
Knowledge Distillation (KD) is a strategy for the definition of a set of transferability gangways to improve the efficiency of Convolutional Neural Networks. Feature-based Knowledge Distillation is a subfield of KD that relies on intermediate network representations, either unaltered or depth-reduced via maximum activation maps, as the source knowledge. In this paper, we propose and analyse the use of a 2D frequency transform of the activation maps before transferring them. We pose that\textemdash by using global image cues rather than pixel estimates, this strategy enhances knowledge transferability in tasks such as scene recognition, defined by strong spatial and contextual relationships between multiple and varied concepts. To validate the proposed method, an extensive evaluation of the state-of-the-art in scene recognition is presented. Experimental results provide strong evidences…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Image Processing Techniques and Applications
MethodsKnowledge Distillation
