Revisiting DETR Pre-training for Object Detection

Yan Ma; Weicong Liang; Bohan Chen; Yiduo Hao; Bojian Hou; Xiangyu Yue,; Chao Zhang; Yuhui Yuan

arXiv:2308.01300·cs.CV·December 4, 2023·1 cites

Revisiting DETR Pre-training for Object Detection

Yan Ma, Weicong Liang, Bohan Chen, Yiduo Hao, Bojian Hou, Xiangyu Yue,, Chao Zhang, Yuhui Yuan

PDF

Open Access

TL;DR

This paper critically evaluates self-supervised pre-training methods for DETR-based object detection, introduces an improved self-training approach, and demonstrates significant performance gains on COCO and PASCAL VOC benchmarks.

Contribution

It identifies limitations of existing pre-training methods like DETReg, proposes Simple Self-training, and shows how synthetic datasets further boost detection accuracy.

Findings

01

Simple Self-training improves AP scores on COCO.

02

Synthetic datasets from image-to-text and text-to-image models enhance detection.

03

Achieved 59.3% AP on COCO val set, surpassing previous methods.

Abstract

Motivated by the remarkable achievements of DETR-based approaches on COCO object detection and segmentation benchmarks, recent endeavors have been directed towards elevating their performance through self-supervised pre-training of Transformers while preserving a frozen backbone. Noteworthy advancements in accuracy have been documented in certain studies. Our investigation delved deeply into a representative approach, DETReg, and its performance assessment in the context of emerging models like $H$ -Deformable-DETR. Regrettably, DETReg proves inadequate in enhancing the performance of robust DETR-based models under full data conditions. To dissect the underlying causes, we conduct extensive experiments on COCO and PASCAL VOC probing elements such as the selection of pre-training datasets and strategies for pre-training target generation. By contrast, we employ an optimized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · COVID-19 diagnosis using AI · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Position-Wise Feed-Forward Layer · Layer Normalization · Softmax · Linear Layer · Adam · Dense Connections · Label Smoothing · Dropout