CanKD: Cross-Attention-based Non-local operation for Feature-based Knowledge Distillation

Shizhe Sun; Wataru Ohyama

arXiv:2511.21503·cs.CV·November 27, 2025

CanKD: Cross-Attention-based Non-local operation for Feature-based Knowledge Distillation

Shizhe Sun, Wataru Ohyama

PDF

Open Access

TL;DR

CanKD introduces a cross-attention-based non-local approach for feature-based knowledge distillation, enabling more comprehensive pixel-wise knowledge transfer and outperforming existing methods in vision tasks.

Contribution

The paper presents a novel cross-attention mechanism for feature distillation that captures pixel relationships more effectively than traditional self-attention methods.

Findings

01

Outperforms state-of-the-art distillation methods in object detection.

02

Improves feature representation in image segmentation.

03

Requires only an additional loss function for implementation.

Abstract

We propose Cross-Attention-based Non-local Knowledge Distillation (CanKD), a novel feature-based knowledge distillation framework that leverages cross-attention mechanisms to enhance the knowledge transfer process. Unlike traditional self-attention-based distillation methods that align teacher and student feature maps independently, CanKD enables each pixel in the student feature map to dynamically consider all pixels in the teacher feature map. This non-local knowledge transfer more thoroughly captures pixel-wise relationships, improving feature representation learning. Our method introduces only an additional loss function to achieve superior performance compared with existing attention-guided distillation methods. Extensive experiments on object detection and image segmentation tasks demonstrate that CanKD outperforms state-of-the-art feature and hybrid distillation methods. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning