DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object   Detection

Hao Zhang; Feng Li; Shilong Liu; Lei Zhang; Hang Su; Jun Zhu; Lionel; M. Ni; Heung-Yeung Shum

arXiv:2203.03605·cs.CV·July 12, 2022·754 cites

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel, M. Ni, Heung-Yeung Shum

PDF

Open Access 5 Repos 1 Models 1 Video

TL;DR

DINO is a state-of-the-art end-to-end object detection model that improves performance and efficiency over previous DETR-like models through innovative training and prediction techniques, achieving high accuracy on COCO benchmarks.

Contribution

The paper introduces DINO, a novel DETR-based object detector with improved denoising, anchor initialization, and prediction schemes, leading to superior performance and scalability.

Findings

01

Achieves 49.4 AP in 12 epochs on COCO with ResNet-50.

02

Attains 63.2 AP on COCO val2017 after pre-training on Objects365.

03

Outperforms previous DETR-like models with fewer resources.

Abstract

We present DINO (\textbf{D}ETR with \textbf{I}mproved de\textbf{N}oising anch\textbf{O}r boxes), a state-of-the-art end-to-end object detector. % in this paper. DINO improves over previous DETR-like models in performance and efficiency by using a contrastive way for denoising training, a mixed query selection method for anchor initialization, and a look forward twice scheme for box prediction. DINO achieves $49.4$ AP in $12$ epochs and $51.3$ AP in $24$ epochs on COCO with a ResNet-50 backbone and multi-scale features, yielding a significant improvement of $+6.0$ \textbf{AP} and $+2.7$ \textbf{AP}, respectively, compared to DN-DETR, the previous best DETR-like model. DINO scales well in both model size and data size. Without bells and whistles, after pre-training on the Objects365 dataset with a SwinL backbone, DINO obtains the best results on both COCO \texttt{val2017}…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
seglinglin/Historical-Diagram-Vectorization
model

Videos

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection· slideslive

Taxonomy

TopicsAdvanced Neural Network Applications · Handwritten Text Recognition Techniques · Multimodal Machine Learning Applications