D$^3$ETR: Decoder Distillation for Detection Transformer
Xiaokang Chen, Jiahui Chen, Yan Liu, Gang Zeng

TL;DR
This paper introduces D$^3$ETR, a novel knowledge distillation method for DETR-based detectors that aligns decoder outputs using MixMatcher, significantly enhancing student detector performance.
Contribution
It proposes MixMatcher for aligning DETR decoder outputs and introduces D$^3$ETR, a distillation framework for decoder predictions and attention maps, improving detection accuracy.
Findings
D$^3$ETR improves Conditional DETR-R50-C5 by 7.8 mAP with 12 epochs.
D$^3$ETR enhances performance across various DETR-based detectors.
The method effectively distills knowledge from teacher to student in transformer decoders.
Abstract
While various knowledge distillation (KD) methods in CNN-based detectors show their effectiveness in improving small students, the baselines and recipes for DETR-based detectors are yet to be built. In this paper, we focus on the transformer decoder of DETR-based detectors and explore KD methods for them. The outputs of the transformer decoder lie in random order, which gives no direct correspondence between the predictions of the teacher and the student, thus posing a challenge for knowledge distillation. To this end, we propose MixMatcher to align the decoder outputs of DETR-based teachers and students, which mixes two teacher-student matching strategies, i.e., Adaptive Matching and Fixed Matching. Specifically, Adaptive Matching applies bipartite matching to adaptively match the outputs of the teacher and the student in each decoder layer, while Fixed Matching fixes the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Multimodal Machine Learning Applications
MethodsKnowledge Distillation · ALIGN
