Knowledge Distillation via Query Selection for Detection Transformer

Yi Liu; Luting Wang; Zongheng Tang; Yue Liao; Yifan Sun; Lijun Zhang; and Si Liu

arXiv:2409.06443·cs.CV·September 11, 2024

Knowledge Distillation via Query Selection for Detection Transformer

Yi Liu, Luting Wang, Zongheng Tang, Yue Liao, Yifan Sun, Lijun Zhang, and Si Liu

PDF

Open Access

TL;DR

This paper proposes a novel knowledge distillation method for DETR object detectors that leverages hard-negative query selection and specialized distillation techniques to improve performance while reducing model size.

Contribution

It introduces a Group Query Selection strategy and the QSKD framework with AGFD and LAPD components for effective DETR model compression.

Findings

01

Significant AP improvement on MS-COCO dataset.

02

Enhanced performance of DETR architectures with minimal computational overhead.

03

AP of Conditional DETR ResNet-18 increased from 35.8 to 39.9.

Abstract

Transformers have revolutionized the object detection landscape by introducing DETRs, acclaimed for their simplicity and efficacy. Despite their advantages, the substantial size of these models poses significant challenges for practical deployment, particularly in resource-constrained environments. This paper addresses the challenge of compressing DETR by leveraging knowledge distillation, a technique that holds promise for maintaining model performance while reducing size. A critical aspect of DETRs' performance is their reliance on queries to interpret object representations accurately. Traditional distillation methods often focus exclusively on positive queries, identified through bipartite matching, neglecting the rich information present in hard-negative queries. Our visual analysis indicates that hard-negative queries, focusing on foreground elements, are crucial for enhancing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsByte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Layer Normalization · Dropout · Attention Is All You Need · Position-Wise Feed-Forward Layer · Residual Connection · Linear Layer