Distilling Object Detectors With Global Knowledge
Sanli Tang, Zhongyu Zhang, Zhanzhan Cheng, Jing Lu, Yunlu Xu, Yi Niu, and Fan He

TL;DR
This paper introduces a global knowledge distillation method for object detectors that reduces noise and improves performance by aligning representations with common basis vectors, surpassing teacher models in some cases.
Contribution
It proposes a novel global knowledge distillation framework using prototype generation and robust filtering, enhancing object detector compression beyond local knowledge methods.
Findings
Achieves state-of-the-art results on PASCAL and COCO datasets.
Surpasses teacher model performance in some cases.
Can be combined with existing methods for further gains.
Abstract
Knowledge distillation learns a lightweight student model that mimics a cumbersome teacher. Existing methods regard the knowledge as the feature of each instance or their relations, which is the instance-level knowledge only from the teacher model, i.e., the local knowledge. However, the empirical studies show that the local knowledge is much noisy in object detection tasks, especially on the blurred, occluded, or small instances. Thus, a more intrinsic approach is to measure the representations of instances w.r.t. a group of common basis vectors in the two feature spaces of the teacher and the student detectors, i.e., global knowledge. Then, the distilling algorithm can be applied as space alignment. To this end, a novel prototype generation module (PGM) is proposed to find the common basis vectors, dubbed prototypes, in the two feature spaces. Then, a robust distilling module (RDM) is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
Methods1x1 Convolution · Convolution · Feature Pyramid Network · Focal Loss · RetinaNet
