Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation
Gang Li, Xiang Li, Yujie Wang, Shanshan Zhang, Yichao Wu, Ding Liang

TL;DR
This paper introduces novel knowledge distillation techniques for object detection, focusing on rank mimicking and prediction-guided feature imitation, leading to significant accuracy improvements over existing methods.
Contribution
The paper proposes two new KD methods, Rank Mimicking and Prediction-guided Feature Imitation, specifically designed for one-stage object detectors, outperforming traditional distillation approaches.
Findings
Rank Mimicking outperforms soft label distillation.
Prediction-guided Feature Imitation improves accuracy by correlating features with predictions.
Achieved 40.4% mAP on MS COCO with RetinaNet, surpassing previous KD methods.
Abstract
Knowledge Distillation (KD) is a widely-used technology to inherit information from cumbersome teacher models to compact student models, consequently realizing model compression and acceleration. Compared with image classification, object detection is a more complex task, and designing specific KD methods for object detection is non-trivial. In this work, we elaborately study the behaviour difference between the teacher and student detection models, and obtain two intriguing observations: First, the teacher and student rank their detected candidate boxes quite differently, which results in their precision discrepancy. Second, there is a considerable gap between the feature response differences and prediction differences between teacher and student, indicating that equally imitating all the feature maps of the teacher is the sub-optimal choice for improving the student's accuracy. Based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
Methods1x1 Convolution · Convolution · Feature Pyramid Network · Focal Loss · RetinaNet
