TL;DR
This paper introduces a novel knowledge distillation method for object detection that incorporates uncertainty estimation, enabling the student model to better learn latent knowledge from the teacher, leading to improved detection performance.
Contribution
The paper proposes a new distillation paradigm with uncertainty estimation using Monte Carlo dropout, enhancing knowledge transfer without complex structures or high computational costs.
Findings
Achieves state-of-the-art performance on COCO with ResNet50 GFL.
Surpasses baseline methods by 3.9% mAP.
Effective across various distillation strategies and architectures.
Abstract
Knowledge distillation (KD) is a widely adopted and effective method for compressing models in object detection tasks. Particularly, feature-based distillation methods have shown remarkable performance. Existing approaches often ignore the uncertainty in the teacher model's knowledge, which stems from data noise and imperfect training. This limits the student model's ability to learn latent knowledge, as it may overly rely on the teacher's imperfect guidance. In this paper, we propose a novel feature-based distillation paradigm with knowledge uncertainty for object detection, termed "Uncertainty Estimation-Discriminative Knowledge Extraction-Knowledge Transfer (UET)", which can seamlessly integrate with existing distillation methods. By leveraging the Monte Carlo dropout technique, we introduce knowledge uncertainty into the training process of the student model, facilitating deeper…
Peer Reviews
Decision·Submitted to ICLR 2025
1.The motivation is convincing, i.e., multiple teachers can provide more diverse and informative supervision to the student. 2.UET can be seamlessly integrated with the existing KD methods and achieve a new state-of-the-art result with FGD [1] on COCO. [1]. Focal and Global Knowledge Distillation for Detectors, 2022, CVPR. 3. UET is successfully extended into classification and semantic segmentation tasks.
1.The idea is similar to [1], weakening the overall novelty of this paper, and the major difference with [1] is that the UET also included the original teacher in the distillation process. However, an ablative study on w/ and w/o the original teacher is not given, and the direct comparison with [1] is limited (only Table 3). [1] Avatar Knowledge Distillation: Self-ensemble Teacher Paradigm with Uncertainty, 2023, MM. 2. The superior results in Table 1 and Table 2 are obtained with FGD, which is
I think the paper has the following strengths: 1) The idea is extremely simple, and seems to work okay in practice. In general, I tend to like papers that find gaps in literature, and are obvious in hindsight. So, while it can look A (knowledge distillation) + B (MC-dropout) = C (better results), I actually think that this is a strength rather than weakness of the paper, and are quite happy to see that at times simple ideas can outperform more complex ones. 2) The paper is relatively well writ
I think the paper has the following weaknesses: 1) While the paper is relatively well-written, it still has some writing issues: a) I feel like the writing in the Method part is slightly obfuscated and gives the feeling that the method is more complex than it is. For example, I find the equations 2 and 3 quite superflous and the writer could have directly gone to equation 4 with minimal changes. I think that the authors should be proud of their simple idea that works well, instead of making it
- The paper introduces uncertainty estimation into object detection KD, which is kind novel in my view. - The proposed UET achieves a SOTA performace on COCO dataset.
- A minor weaknesses I think is that the Monte Carlo dropout needs to sample several times, which may slow down the training process.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMonte Carlo Dropout · Dropout
