D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution   Refinement

Yansong Peng; Hebei Li; Peixi Wu; Yueyi Zhang; Xiaoyan Sun; Feng Wu

arXiv:2410.13842·cs.CV·October 18, 2024·22 cites

D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement

Yansong Peng, Hebei Li, Peixi Wu, Yueyi Zhang, Xiaoyan Sun, Feng Wu

PDF

Open Access 5 Repos 10 Models

TL;DR

D-FINE introduces a novel approach to object detection by refining bounding box regression into a distribution refinement process, significantly improving localization accuracy and real-time performance in DETR models.

Contribution

The paper proposes D-FINE, which redefines regression as distribution refinement and introduces self-distillation, achieving state-of-the-art real-time detection accuracy.

Findings

01

D-FINE-L / X achieves 54.0% / 55.8% AP at 124 / 78 FPS.

02

Pretraining on Objects365 boosts AP to 57.1% / 59.3%.

03

Enhances various DETR models by up to 5.3% AP with minimal extra cost.

Abstract

We introduce D-FINE, a powerful real-time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR models. D-FINE comprises two key components: Fine-grained Distribution Refinement (FDR) and Global Optimal Localization Self-Distillation (GO-LSD). FDR transforms the regression process from predicting fixed coordinates to iteratively refining probability distributions, providing a fine-grained intermediate representation that significantly enhances localization accuracy. GO-LSD is a bidirectional optimization strategy that transfers localization knowledge from refined distributions to shallower layers through self-distillation, while also simplifying the residual prediction tasks for deeper layers. Additionally, D-FINE incorporates lightweight optimizations in computationally intensive modules and operations, achieving a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsAttention Is All You Need · Dropout · Layer Normalization · Adam · Residual Connection · Convolution · Position-Wise Feed-Forward Layer · Linear Layer · Byte Pair Encoding · Absolute Position Encodings