Location-Aware Self-Supervised Transformers for Semantic Segmentation

Mathilde Caron; Neil Houlsby; Cordelia Schmid

arXiv:2212.02400·cs.CV·March 17, 2023

Location-Aware Self-Supervised Transformers for Semantic Segmentation

Mathilde Caron, Neil Houlsby, Cordelia Schmid

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

This paper introduces a location-aware self-supervised pretraining method for transformers that enhances dense feature learning and improves semantic segmentation performance by incorporating spatial information during pretraining.

Contribution

It proposes a novel pretraining approach combining patch-level clustering and relative location prediction to better model spatial information for segmentation tasks.

Findings

01

LOCA pretraining improves transferability to semantic segmentation datasets.

02

The method fosters the emergence of strong dense features.

03

Results show competitive performance on diverse datasets.

Abstract

Pixel-level labels are particularly expensive to acquire. Hence, pretraining is a critical step to improve models on a task like semantic segmentation. However, prominent algorithms for pretraining neural networks use image-level objectives, e.g. image classification, image-text alignment a la CLIP, or self-supervised contrastive learning. These objectives do not model spatial information, which might be sub-optimal when finetuning on downstream tasks with spatial reasoning. In this work, we pretrain network with a location-aware (LOCA) self-supervised method which fosters the emergence of strong dense features. Specifically, we use both a patch-level clustering scheme to mine dense pseudo-labels and a relative location prediction task to encourage learning about object parts and their spatial arrangements. Our experiments show that LOCA pretraining leads to representations that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

google-research/scenic
jaxOfficial

Models

🤗
fcxfcx/owlv2
model· ♡ 1
♡ 1

Videos

Location-Aware Self-Supervised Transformers for Semantic Segmentation· youtube

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications

MethodsContrastive Language-Image Pre-training