Knowledge Distillation for Object Detection via Rank Mimicking and   Prediction-guided Feature Imitation

Gang Li; Xiang Li; Yujie Wang; Shanshan Zhang; Yichao Wu; Ding Liang

arXiv:2112.04840·cs.CV·December 10, 2021

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-guided Feature Imitation

Gang Li, Xiang Li, Yujie Wang, Shanshan Zhang, Yichao Wu, Ding Liang

PDF

Open Access 1 Video

TL;DR

This paper introduces novel knowledge distillation techniques for object detection, focusing on rank mimicking and prediction-guided feature imitation, leading to significant accuracy improvements over existing methods.

Contribution

The paper proposes two new KD methods, Rank Mimicking and Prediction-guided Feature Imitation, specifically designed for one-stage object detectors, outperforming traditional distillation approaches.

Findings

01

Rank Mimicking outperforms soft label distillation.

02

Prediction-guided Feature Imitation improves accuracy by correlating features with predictions.

03

Achieved 40.4% mAP on MS COCO with RetinaNet, surpassing previous KD methods.

Abstract

Knowledge Distillation (KD) is a widely-used technology to inherit information from cumbersome teacher models to compact student models, consequently realizing model compression and acceleration. Compared with image classification, object detection is a more complex task, and designing specific KD methods for object detection is non-trivial. In this work, we elaborately study the behaviour difference between the teacher and student detection models, and obtain two intriguing observations: First, the teacher and student rank their detected candidate boxes quite differently, which results in their precision discrepancy. Second, there is a considerable gap between the feature response differences and prediction differences between teacher and student, indicating that equally imitating all the feature maps of the teacher is the sub-optimal choice for improving the student's accuracy. Based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Knowledge Distillation for Object Detection via Rank Mimicking and Prediction-Guided Feature Imitation· underline

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

Methods1x1 Convolution · Convolution · Feature Pyramid Network · Focal Loss · RetinaNet