Knowledge Distillation via Query Selection for Detection Transformer
Yi Liu, Luting Wang, Zongheng Tang, Yue Liao, Yifan Sun, Lijun Zhang, and Si Liu

TL;DR
This paper proposes a novel knowledge distillation method for DETR object detectors that leverages hard-negative query selection and specialized distillation techniques to improve performance while reducing model size.
Contribution
It introduces a Group Query Selection strategy and the QSKD framework with AGFD and LAPD components for effective DETR model compression.
Findings
Significant AP improvement on MS-COCO dataset.
Enhanced performance of DETR architectures with minimal computational overhead.
AP of Conditional DETR ResNet-18 increased from 35.8 to 39.9.
Abstract
Transformers have revolutionized the object detection landscape by introducing DETRs, acclaimed for their simplicity and efficacy. Despite their advantages, the substantial size of these models poses significant challenges for practical deployment, particularly in resource-constrained environments. This paper addresses the challenge of compressing DETR by leveraging knowledge distillation, a technique that holds promise for maintaining model performance while reducing size. A critical aspect of DETRs' performance is their reliance on queries to interpret object representations accurately. Traditional distillation methods often focus exclusively on positive queries, identified through bipartite matching, neglecting the rich information present in hard-negative queries. Our visual analysis indicates that hard-negative queries, focusing on foreground elements, are crucial for enhancing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsByte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Layer Normalization · Dropout · Attention Is All You Need · Position-Wise Feed-Forward Layer · Residual Connection · Linear Layer
