UP-DETR: Unsupervised Pre-training for Object Detection with Transformers
Zhigang Dai, Bolun Cai, Yugeng Lin, Junying Chen

TL;DR
UP-DETR introduces an unsupervised pre-training method for object detection using transformers, significantly improving performance and convergence speed on various detection tasks by employing a novel patch detection pretext task.
Contribution
The paper proposes UP-DETR, a novel unsupervised pre-training approach for transformers in object detection, utilizing random query patch detection and addressing multi-task and multi-query localization issues.
Findings
Boosts DETR performance with faster convergence and higher accuracy
Effective pre-training reduces training data and time requirements
Unifies fine-tuning for object detection and one-shot detection
Abstract
DEtection TRansformer (DETR) for object detection reaches competitive performance compared with Faster R-CNN via a transformer encoder-decoder architecture. However, trained with scratch transformers, DETR needs large-scale training data and an extreme long training schedule even on COCO dataset. Inspired by the great success of pre-training transformers in natural language processing, we propose a novel pretext task named random query patch detection in Unsupervised Pre-training DETR (UP-DETR). Specifically, we randomly crop patches from the given image and then feed them as queries to the decoder. The model is pre-trained to detect these query patches from the input image. During the pre-training, we address two critical issues: multi-task learning and multi-query localization. (1) To trade off classification and localization preferences in the pretext task, we find that freezing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Residual Connection · Dense Connections · Label Smoothing · Adam · Attention Is All You Need · Byte Pair Encoding
