PKD: General Distillation Framework for Object Detectors via Pearson Correlation Coefficient
Weihan Cao, Yifan Zhang, Jianfei Gao, Anda Cheng, Ke Cheng, Jian Cheng

TL;DR
This paper introduces a novel knowledge distillation method for object detectors that uses Pearson Correlation Coefficient to better transfer relational features from a teacher to a student, improving performance and convergence.
Contribution
The paper proposes a Pearson Correlation Coefficient-based feature imitation method for heterogeneous detector distillation, addressing magnitude mismatch and dominance issues in feature alignment.
Findings
Outperforms existing detection KD methods.
Achieves higher mAP on COCO2017 with various student detectors.
Converges faster than previous methods.
Abstract
Knowledge distillation(KD) is a widely-used technique to train compact models in object detection. However, there is still a lack of study on how to distill between heterogeneous detectors. In this paper, we empirically find that better FPN features from a heterogeneous teacher detector can help the student although their detection heads and label assignments are different. However, directly aligning the feature maps to distill detectors suffers from two problems. First, the difference in feature magnitude between the teacher and the student could enforce overly strict constraints on the student. Second, the FPN stages and channels with large feature magnitude from the teacher model could dominate the gradient of distillation loss, which will overwhelm the effects of other features in KD and introduce much noise. To address the above issues, we propose to imitate features with Pearson…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Remote-Sensing Image Classification · Advanced Image and Video Retrieval Techniques
MethodsFeature Pyramid Network · Focal Loss · 1x1 Convolution · Non Maximum Suppression · FCOS · Convolution · RetinaNet
