Location-Aware Self-Supervised Transformers for Semantic Segmentation
Mathilde Caron, Neil Houlsby, Cordelia Schmid

TL;DR
This paper introduces a location-aware self-supervised pretraining method for transformers that enhances dense feature learning and improves semantic segmentation performance by incorporating spatial information during pretraining.
Contribution
It proposes a novel pretraining approach combining patch-level clustering and relative location prediction to better model spatial information for segmentation tasks.
Findings
LOCA pretraining improves transferability to semantic segmentation datasets.
The method fosters the emergence of strong dense features.
Results show competitive performance on diverse datasets.
Abstract
Pixel-level labels are particularly expensive to acquire. Hence, pretraining is a critical step to improve models on a task like semantic segmentation. However, prominent algorithms for pretraining neural networks use image-level objectives, e.g. image classification, image-text alignment a la CLIP, or self-supervised contrastive learning. These objectives do not model spatial information, which might be sub-optimal when finetuning on downstream tasks with spatial reasoning. In this work, we pretrain network with a location-aware (LOCA) self-supervised method which fosters the emergence of strong dense features. Specifically, we use both a patch-level clustering scheme to mine dense pseudo-labels and a relative location prediction task to encourage learning about object parts and their spatial arrangements. Our experiments show that LOCA pretraining leads to representations that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Location-Aware Self-Supervised Transformers for Semantic Segmentation· youtube
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications
MethodsContrastive Language-Image Pre-training
