DEYOv2: Rank Feature with Greedy Matching for End-to-End Object   Detection

Haodong Ouyang

arXiv:2306.09165·cs.CV·July 4, 2023·5 cites

DEYOv2: Rank Feature with Greedy Matching for End-to-End Object Detection

Haodong Ouyang

PDF

Open Access

TL;DR

DEYOv2 is a new end-to-end object detection model that improves training speed and accuracy by using Rank Feature and Greedy Matching, outperforming existing query-based detectors on COCO.

Contribution

It introduces Rank Feature and Greedy Matching to address one-to-one matching limitations, enabling fully end-to-end optimization in object detection.

Findings

01

Achieves 51.1 AP on COCO with ResNet-50 backbone.

02

Outperforms DINO by 2.1 AP in 12 epochs.

03

First fully end-to-end detector combining classical and query-based strengths.

Abstract

This paper presents a novel object detector called DEYOv2, an improved version of the first-generation DEYO (DETR with YOLO) model. DEYOv2, similar to its predecessor, DEYOv2 employs a progressive reasoning approach to accelerate model training and enhance performance. The study delves into the limitations of one-to-one matching in optimization and proposes solutions to effectively address the issue, such as Rank Feature and Greedy Matching. This approach enables the third stage of DEYOv2 to maximize information acquisition from the first and second stages without needing NMS, achieving end-to-end optimization. By combining dense queries, sparse queries, one-to-many matching, and one-to-one matching, DEYOv2 leverages the advantages of each method. It outperforms all existing query-based end-to-end detectors under the same settings. When using ResNet-50 as the backbone and multi-scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Linear Layer · Dense Connections · Residual Connection · Vision Transformer