Representation Learning by Detecting Incorrect Location Embeddings

Sepehr Sameni; Simon Jenni; Paolo Favaro

arXiv:2204.04788·cs.CV·March 14, 2023·1 cites

Representation Learning by Detecting Incorrect Location Embeddings

Sepehr Sameni, Simon Jenni, Paolo Favaro

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces DILEMMA, a self-supervised learning method that detects artificially misplaced object parts to improve image representation learning, leading to better performance on shape-dependent tasks.

Contribution

The novel DILEMMA method detects incorrect location embeddings in self-supervised learning, enhancing existing models' performance and robustness, especially for shape-reliant tasks.

Findings

01

Improves MoCoV3, DINO, and SimCLR performance by 0.5-4.41%.

02

Enhances fine-tuning results on ImageNet-100.

03

Significantly benefits shape-dependent downstream tasks.

Abstract

In this paper, we introduce a novel self-supervised learning (SSL) loss for image representation learning. There is a growing belief that generalization in deep neural networks is linked to their ability to discriminate object shapes. Since object shape is related to the location of its parts, we propose to detect those that have been artificially misplaced. We represent object parts with image tokens and train a ViT to detect which token has been combined with an incorrect positional embedding. We then introduce sparsity in the inputs to make the model more robust to occlusions and to speed up the training. We call our method DILEMMA, which stands for Detection of Incorrect Location EMbeddings with MAsked inputs. We apply DILEMMA to MoCoV3, DINO and SimCLR and show an improvement in their performance of respectively 4.41%, 3.97%, and 0.5% under the same training time and with a linear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

separius/dilemma
pytorchOfficial

Videos

Representation Learning by Detecting Incorrect Location Embeddings· underline

Taxonomy

TopicsHuman Pose and Action Recognition · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Linear Layer · Vision Transformer · *Communicated@Fast*How Do I Communicate to Expedia? · Masked autoencoder · 1x1 Convolution · Dense Connections