D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement
Yansong Peng, Hebei Li, Peixi Wu, Yueyi Zhang, Xiaoyan Sun, Feng Wu

TL;DR
D-FINE introduces a novel approach to object detection by refining bounding box regression into a distribution refinement process, significantly improving localization accuracy and real-time performance in DETR models.
Contribution
The paper proposes D-FINE, which redefines regression as distribution refinement and introduces self-distillation, achieving state-of-the-art real-time detection accuracy.
Findings
D-FINE-L / X achieves 54.0% / 55.8% AP at 124 / 78 FPS.
Pretraining on Objects365 boosts AP to 57.1% / 59.3%.
Enhances various DETR models by up to 5.3% AP with minimal extra cost.
Abstract
We introduce D-FINE, a powerful real-time object detector that achieves outstanding localization precision by redefining the bounding box regression task in DETR models. D-FINE comprises two key components: Fine-grained Distribution Refinement (FDR) and Global Optimal Localization Self-Distillation (GO-LSD). FDR transforms the regression process from predicting fixed coordinates to iteratively refining probability distributions, providing a fine-grained intermediate representation that significantly enhances localization accuracy. GO-LSD is a bidirectional optimization strategy that transfers localization knowledge from refined distributions to shallower layers through self-distillation, while also simplifying the residual prediction tasks for deeper layers. Additionally, D-FINE incorporates lightweight optimizations in computationally intensive modules and operations, achieving a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Peterande/D-FINEmodel· ♡ 9♡ 9
- 🤗ustc-community/dfine-xlarge-cocomodel· 1.5k dl· ♡ 91.5k dl♡ 9
- 🤗ustc-community/dfine-small-cocomodel· 7.7k dl· ♡ 127.7k dl♡ 12
- 🤗ustc-community/dfine-large-cocomodel· 3.9k dl3.9k dl
- 🤗ustc-community/dfine-medium-cocomodel· 672 dl· ♡ 3672 dl♡ 3
- 🤗ustc-community/dfine-xlarge-obj2cocomodel· 136 dl· ♡ 6136 dl♡ 6
- 🤗ustc-community/dfine-large-obj2coco-e25model· 59 dl· ♡ 459 dl♡ 4
- 🤗ustc-community/dfine-medium-obj2cocomodel· 593 dl· ♡ 4593 dl♡ 4
- 🤗ustc-community/dfine-small-obj2cocomodel· 294 dl294 dl
- 🤗ustc-community/dfine-nano-cocomodel· 1.8k dl· ♡ 71.8k dl♡ 7
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
MethodsAttention Is All You Need · Dropout · Layer Normalization · Adam · Residual Connection · Convolution · Position-Wise Feed-Forward Layer · Linear Layer · Byte Pair Encoding · Absolute Position Encodings
