Instance-Conditional Knowledge Distillation for Object Detection
Zijian Kang, Peizhen Zhang, Xiangyu Zhang, Jian Sun, Nanning Zheng

TL;DR
This paper introduces a novel instance-conditional knowledge distillation framework for object detection, which adaptively transfers beneficial knowledge for each detection instance, leading to significant performance improvements.
Contribution
It proposes a learnable conditional decoding module that dynamically retrieves instance-specific knowledge, enhancing detection accuracy beyond existing methods.
Findings
Boosts RetinaNet ResNet-50 from 37.4 to 40.7 mAP
Surpasses teacher performance with ResNet-101 backbone
Demonstrates effectiveness across various settings
Abstract
Knowledge distillation has shown great success in classification, however, it is still challenging for detection. In a typical image for detection, representations from different locations may have different contributions to detection targets, making the distillation hard to balance. In this paper, we propose a conditional distillation framework to distill the desired knowledge, namely knowledge that is beneficial in terms of both classification and localization for every instance. The framework introduces a learnable conditional decoding module, which retrieves information given each target instance as query. Specifically, we encode the condition information as query and use the teacher's representations as key. The attention between query and key is used to measure the contribution of different features, guided by a localization-recognition-sensitive auxiliary task. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
MethodsFeature Pyramid Network · Knowledge Distillation · Convolution · 1x1 Convolution · Focal Loss · RetinaNet
