Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss

Zhi Cai; Songtao Liu; Guodong Wang; Zheng Ge; Xiangyu Zhang; Di; Huang

arXiv:2304.07527·cs.CV·December 24, 2024·20 cites

Align-DETR: Enhancing End-to-end Object Detection with Aligned Loss

Zhi Cai, Songtao Liu, Guodong Wang, Zheng Ge, Xiangyu Zhang, Di, Huang

PDF

Open Access 1 Repo

TL;DR

Align-DETR introduces a novel aligned loss function to address misalignments in DETR, significantly improving convergence and detection accuracy in end-to-end object detection tasks.

Contribution

The paper proposes Align Loss to resolve classification-regression and cross-layer misalignments, enhancing DETR's performance and robustness with a joint quality metric and intermediate layer supervision.

Findings

01

Achieves 50.5% AP in 1x setting, surpassing previous methods.

02

Sets new state-of-the-art performance in object detection.

03

Improves convergence stability and detection accuracy.

Abstract

DETR has set up a simple end-to-end pipeline for object detection by formulating this task as a set prediction problem, showing promising potential. Despite its notable advancements, this paper identifies two key forms of misalignment within the model: classification-regression misalignment and cross-layer target misalignment. Both issues impede DETR's convergence and degrade its overall performance. To tackle both issues simultaneously, we introduce a novel loss function, termed as Align Loss, designed to resolve the discrepancy between the two tasks. Align Loss guides the optimization of DETR through a joint quality metric, strengthening the connection between classification and regression. Furthermore, it incorporates an exponential down-weighting term to facilitate a smooth transition from positive to negative samples. Align-DETR also employs many-to-one matching for supervision of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

felixcaae/aligndetr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsAttention Is All You Need · Vision Transformer · Linear Layer · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing · Dropout · Absolute Position Encodings · Residual Connection