An Empirical Study Of Self-supervised Learning Approaches For Object   Detection With Transformers

Gokul Karthik Kumar; Sahal Shaji Mullappilly; Abhishek Singh Gehlot

arXiv:2205.05543·cs.CV·May 12, 2022

An Empirical Study Of Self-supervised Learning Approaches For Object Detection With Transformers

Gokul Karthik Kumar, Sahal Shaji Mullappilly, Abhishek Singh Gehlot

PDF

Open Access 2 Repos

TL;DR

This paper investigates self-supervised learning methods for training object detection transformers, demonstrating faster initial convergence in some models and exploring various approaches like image reconstruction and jigsaw puzzles.

Contribution

It introduces self-supervised training approaches tailored for object detection transformers using CNN feature maps, an area not extensively studied before.

Findings

01

Faster convergence of DETR in early training epochs

02

No similar improvement observed with Deformable DETR in multi-task learning

03

Exploration of multiple self-supervised methods for object detection transformers

Abstract

Self-supervised learning (SSL) methods such as masked language modeling have shown massive performance gains by pretraining transformer models for a variety of natural language processing tasks. The follow-up research adapted similar methods like masked image modeling in vision transformer and demonstrated improvements in the image classification task. Such simple self-supervised methods are not exhaustively studied for object detection transformers (DETR, Deformable DETR) as their transformer encoder modules take input in the convolutional neural network (CNN) extracted feature space rather than the image space as in general vision transformers. However, the CNN feature maps still maintain the spatial relationship and we utilize this property to design self-supervised learning approaches to train the encoder of object detection transformers in pretraining and multi-task learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Linear Layer · Adam · Byte Pair Encoding · Absolute Position Encodings · Label Smoothing · Convolution · Dropout · Layer Normalization · Softmax