CrossKD: Cross-Head Knowledge Distillation for Object Detection
Jiabao Wang, Yuming Chen, Zhaohui Zheng, Xiang Li, Ming-Ming Cheng,, Qibin Hou

TL;DR
CrossKD introduces a novel prediction mimicking distillation scheme for object detection, where the student's detection head mimics the teacher's predictions, leading to significant performance improvements over existing methods.
Contribution
The paper proposes CrossKD, a general distillation approach that uses cross-head prediction mimicking, providing more task-oriented supervision and outperforming feature imitation methods.
Findings
Boosts AP of GFL ResNet-50 from 40.2 to 43.7 on MS COCO
Outperforms all existing KD methods for object detection
Effective with heterogeneous backbones
Abstract
Knowledge Distillation (KD) has been validated as an effective model compression technique for learning compact object detectors. Existing state-of-the-art KD methods for object detection are mostly based on feature imitation. In this paper, we present a general and effective prediction mimicking distillation scheme, called CrossKD, which delivers the intermediate features of the student's detection head to the teacher's detection head. The resulting cross-head predictions are then forced to mimic the teacher's predictions. This manner relieves the student's head from receiving contradictory supervision signals from the annotations and the teacher's predictions, greatly improving the student's detection performance. Moreover, as mimicking the teacher's predictions is the target of KD, CrossKD offers more task-oriented information in contrast with feature imitation. On MS COCO, with only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
