Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection

Junfei Yi; Jianxu Mao; Tengfei Liu; Mingjie Li; Hanyu Gu; Hui Zhang; Xiaojun Chang; Yaonan Wang

arXiv:2406.06999·cs.CV·March 24, 2026

Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection

Junfei Yi, Jianxu Mao, Tengfei Liu, Mingjie Li, Hanyu Gu, Hui Zhang, Xiaojun Chang, Yaonan Wang

PDF

3 Reviews

TL;DR

This paper introduces a novel knowledge distillation method for object detection that incorporates uncertainty estimation, enabling the student model to better learn latent knowledge from the teacher, leading to improved detection performance.

Contribution

The paper proposes a new distillation paradigm with uncertainty estimation using Monte Carlo dropout, enhancing knowledge transfer without complex structures or high computational costs.

Findings

01

Achieves state-of-the-art performance on COCO with ResNet50 GFL.

02

Surpasses baseline methods by 3.9% mAP.

03

Effective across various distillation strategies and architectures.

Abstract

Knowledge distillation (KD) is a widely adopted and effective method for compressing models in object detection tasks. Particularly, feature-based distillation methods have shown remarkable performance. Existing approaches often ignore the uncertainty in the teacher model's knowledge, which stems from data noise and imperfect training. This limits the student model's ability to learn latent knowledge, as it may overly rely on the teacher's imperfect guidance. In this paper, we propose a novel feature-based distillation paradigm with knowledge uncertainty for object detection, termed "Uncertainty Estimation-Discriminative Knowledge Extraction-Knowledge Transfer (UET)", which can seamlessly integrate with existing distillation methods. By leveraging the Monte Carlo dropout technique, we introduce knowledge uncertainty into the training process of the student model, facilitating deeper…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 5Confidence 4

Strengths

1.The motivation is convincing, i.e., multiple teachers can provide more diverse and informative supervision to the student. 2.UET can be seamlessly integrated with the existing KD methods and achieve a new state-of-the-art result with FGD [1] on COCO. [1]. Focal and Global Knowledge Distillation for Detectors, 2022, CVPR. 3. UET is successfully extended into classification and semantic segmentation tasks.

Weaknesses

1.The idea is similar to [1], weakening the overall novelty of this paper, and the major difference with [1] is that the UET also included the original teacher in the distillation process. However, an ablative study on w/ and w/o the original teacher is not given, and the direct comparison with [1] is limited (only Table 3). [1] Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty, 2023, MM. 2. The superior results in Table 1 and Table 2 are obtained with FGD, which is

Reviewer 02Rating 6Confidence 4

Strengths

I think the paper has the following strengths: 1) The idea is extremely simple, and seems to work okay in practice. In general, I tend to like papers that find gaps in literature, and are obvious in hindsight. So, while it can look A (knowledge distillation) + B (MC-dropout) = C (better results), I actually think that this is a strength rather than weakness of the paper, and are quite happy to see that at times simple ideas can outperform more complex ones. 2) The paper is relatively well writ

Weaknesses

I think the paper has the following weaknesses: 1) While the paper is relatively well-written, it still has some writing issues: a) I feel like the writing in the Method part is slightly obfuscated and gives the feeling that the method is more complex than it is. For example, I find the equations 2 and 3 quite superflous and the writer could have directly gone to equation 4 with minimal changes. I think that the authors should be proud of their simple idea that works well, instead of making it

Reviewer 03Rating 6Confidence 4

Strengths

- The paper introduces uncertainty estimation into object detection KD, which is kind novel in my view. - The proposed UET achieves a SOTA performace on COCO dataset.

Weaknesses

- A minor weaknesses I think is that the Monte Carlo dropout needs to sample several times, which may slow down the training process.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMonte Carlo Dropout · Dropout