UP-DETR: Unsupervised Pre-training for Object Detection with   Transformers

Zhigang Dai; Bolun Cai; Yugeng Lin; Junying Chen

arXiv:2011.09094·cs.CV·July 25, 2023·64 cites

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Zhigang Dai, Bolun Cai, Yugeng Lin, Junying Chen

PDF

Open Access 2 Repos

TL;DR

UP-DETR introduces an unsupervised pre-training method for object detection using transformers, significantly improving performance and convergence speed on various detection tasks by employing a novel patch detection pretext task.

Contribution

The paper proposes UP-DETR, a novel unsupervised pre-training approach for transformers in object detection, utilizing random query patch detection and addressing multi-task and multi-query localization issues.

Findings

01

Boosts DETR performance with faster convergence and higher accuracy

02

Effective pre-training reduces training data and time requirements

03

Unifies fine-tuning for object detection and one-shot detection

Abstract

DEtection TRansformer (DETR) for object detection reaches competitive performance compared with Faster R-CNN via a transformer encoder-decoder architecture. However, trained with scratch transformers, DETR needs large-scale training data and an extreme long training schedule even on COCO dataset. Inspired by the great success of pre-training transformers in natural language processing, we propose a novel pretext task named random query patch detection in Unsupervised Pre-training DETR (UP-DETR). Specifically, we randomly crop patches from the given image and then feed them as queries to the decoder. The model is pre-trained to detect these query patches from the input image. During the pre-training, we address two critical issues: multi-task learning and multi-query localization. (1) To trade off classification and localization preferences in the pretext task, we find that freezing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Residual Connection · Dense Connections · Label Smoothing · Adam · Attention Is All You Need · Byte Pair Encoding