D$^3$ETR: Decoder Distillation for Detection Transformer

Xiaokang Chen; Jiahui Chen; Yan Liu; Gang Zeng

arXiv:2211.09768·cs.CV·November 18, 2022·5 cites

D$^3$ETR: Decoder Distillation for Detection Transformer

Xiaokang Chen, Jiahui Chen, Yan Liu, Gang Zeng

PDF

Open Access

TL;DR

This paper introduces D$^3$ETR, a novel knowledge distillation method for DETR-based detectors that aligns decoder outputs using MixMatcher, significantly enhancing student detector performance.

Contribution

It proposes MixMatcher for aligning DETR decoder outputs and introduces D$^3$ETR, a distillation framework for decoder predictions and attention maps, improving detection accuracy.

Findings

01

D$^3$ETR improves Conditional DETR-R50-C5 by 7.8 mAP with 12 epochs.

02

D$^3$ETR enhances performance across various DETR-based detectors.

03

The method effectively distills knowledge from teacher to student in transformer decoders.

Abstract

While various knowledge distillation (KD) methods in CNN-based detectors show their effectiveness in improving small students, the baselines and recipes for DETR-based detectors are yet to be built. In this paper, we focus on the transformer decoder of DETR-based detectors and explore KD methods for them. The outputs of the transformer decoder lie in random order, which gives no direct correspondence between the predictions of the teacher and the student, thus posing a challenge for knowledge distillation. To this end, we propose MixMatcher to align the decoder outputs of DETR-based teachers and students, which mixes two teacher-student matching strategies, i.e., Adaptive Matching and Fixed Matching. Specifically, Adaptive Matching applies bipartite matching to adaptively match the outputs of the teacher and the student in each decoder layer, while Fixed Matching fixes the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Multimodal Machine Learning Applications

MethodsKnowledge Distillation · ALIGN