DEYOv2: Rank Feature with Greedy Matching for End-to-End Object Detection
Haodong Ouyang

TL;DR
DEYOv2 is a new end-to-end object detection model that improves training speed and accuracy by using Rank Feature and Greedy Matching, outperforming existing query-based detectors on COCO.
Contribution
It introduces Rank Feature and Greedy Matching to address one-to-one matching limitations, enabling fully end-to-end optimization in object detection.
Findings
Achieves 51.1 AP on COCO with ResNet-50 backbone.
Outperforms DINO by 2.1 AP in 12 epochs.
First fully end-to-end detector combining classical and query-based strengths.
Abstract
This paper presents a novel object detector called DEYOv2, an improved version of the first-generation DEYO (DETR with YOLO) model. DEYOv2, similar to its predecessor, DEYOv2 employs a progressive reasoning approach to accelerate model training and enhance performance. The study delves into the limitations of one-to-one matching in optimization and proposes solutions to effectively address the issue, such as Rank Feature and Greedy Matching. This approach enables the third stage of DEYOv2 to maximize information acquisition from the first and second stages without needing NMS, achieving end-to-end optimization. By combining dense queries, sparse queries, one-to-many matching, and one-to-one matching, DEYOv2 leverages the advantages of each method. It outperforms all existing query-based end-to-end detectors under the same settings. When using ResNet-50 as the backbone and multi-scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Linear Layer · Dense Connections · Residual Connection · Vision Transformer
