DEYOv3: DETR with YOLO for Real-time Object Detection

Haodong Ouyang

arXiv:2309.11851·cs.CV·September 26, 2023·2 cites

DEYOv3: DETR with YOLO for Real-time Object Detection

Haodong Ouyang

PDF

Open Access

TL;DR

DEYOv3 introduces a step-by-step training approach for real-time object detection that eliminates the need for ImageNet pretraining, reducing costs and improving accuracy and speed over existing methods.

Contribution

The paper proposes a novel step-by-step training method for DETR-like models, enabling flexible backbone design and higher accuracy without additional datasets.

Findings

01

DEYOv3 achieves 41.1% AP at 270 FPS on COCO.

02

DEYOv3-L achieves 51.3% AP at 102 FPS.

03

Training can be completed on a single RTX3090 GPU.

Abstract

Recently, end-to-end object detectors have gained significant attention from the research community due to their outstanding performance. However, DETR typically relies on supervised pretraining of the backbone on ImageNet, which limits the practical application of DETR and the design of the backbone, affecting the model's potential generalization ability. In this paper, we propose a new training method called step-by-step training. Specifically, in the first stage, the one-to-many pre-trained YOLO detector is used to initialize the end-to-end detector. In the second stage, the backbone and encoder are consistent with the DETR-like model, but only the detector needs to be trained from scratch. Due to this training method, the object detector does not need the additional dataset (ImageNet) to train the backbone, which makes the design of the backbone more flexible and dramatically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · COVID-19 diagnosis using AI · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Residual Connection · Adam · Feedforward Network · Dropout · Linear Layer · Layer Normalization · Label Smoothing · Multi-Head Attention · Byte Pair Encoding