OD-DETR: Online Distillation for Stabilizing Training of Detection   Transformer

Shengjian Wu; Li Sun; Qingli Li

arXiv:2406.05791·cs.CV·June 11, 2024

OD-DETR: Online Distillation for Stabilizing Training of Detection Transformer

Shengjian Wu, Li Sun, Qingli Li

PDF

Open Access

TL;DR

OD-DETR introduces an online distillation method that stabilizes training and enhances performance of DETR models by leveraging a teacher model's knowledge without increasing parameters.

Contribution

The paper proposes a novel online distillation approach that improves DETR training stability and accuracy by utilizing teacher-guided query matching and multi-stage query distillation.

Findings

01

Training stability of DETR is significantly improved.

02

Model performance increases without additional parameters.

03

Training convergence is accelerated with the proposed method.

Abstract

DEtection TRansformer (DETR) becomes a dominant paradigm, mainly due to its common architecture with high accuracy and no post-processing. However, DETR suffers from unstable training dynamics. It consumes more data and epochs to converge compared with CNN-based detectors. This paper aims to stabilize DETR training through the online distillation. It utilizes a teacher model, accumulated by Exponential Moving Average (EMA), and distills its knowledge into the online model in following three aspects. First, the matching relation between object queries and ground truth (GT) boxes in the teacher is employed to guide the student, so queries within the student are not only assigned labels based on their own predictions, but also refer to the matching results from the teacher. Second, the teacher's initial query is given to the online student, and its prediction is directly constrained by the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems

MethodsAttention Is All You Need · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Adam · Linear Layer · Multi-Head Attention · Position-Wise Feed-Forward Layer