SAMCLR: Contrastive pre-training on complex scenes using SAM for view   sampling

Benjamin Missaoui; Chongbin Yuan

arXiv:2310.14736·cs.CV·October 31, 2023·1 cites

SAMCLR: Contrastive pre-training on complex scenes using SAM for view sampling

Benjamin Missaoui, Chongbin Yuan

PDF

Open Access

TL;DR

SAMCLR enhances contrastive pre-training on complex scenes by integrating SAM for semantic segmentation, enabling more effective view sampling from the same regions, leading to improved downstream classification performance.

Contribution

Introduces SAMCLR, a novel extension of SimCLR that uses SAM for semantic segmentation to improve contrastive learning on complex scenes.

Findings

01

SAMCLR outperforms SimCLR, DINO, and MoCo on several benchmarks.

02

Pre-training on Cityscapes and ADE20K improves classification accuracy.

03

Semantic region sampling enhances contrastive learning effectiveness.

Abstract

In Computer Vision, self-supervised contrastive learning enforces similar representations between different views of the same image. The pre-training is most often performed on image classification datasets, like ImageNet, where images mainly contain a single class of objects. However, when dealing with complex scenes with multiple items, it becomes very unlikely for several views of the same image to represent the same object category. In this setting, we propose SAMCLR, an add-on to SimCLR which uses SAM to segment the image into semantic regions, then sample the two views from the same region. Preliminary results show empirically that when pre-training on Cityscapes and ADE20K, then evaluating on classification on CIFAR-10, STL10 and ImageNette, SAMCLR performs at least on par with, and most often significantly outperforms not only SimCLR, but also DINO and MoCo.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote-Sensing Image Classification · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsBitcoin Customer Service Number +1-833-534-1729 · *Communicated@Fast*How Do I Communicate to Expedia? · Multi-Head Attention · Attention Is All You Need · Segment Anything Model · 1x1 Convolution · Max Pooling · Convolution · Bottleneck Residual Block · Residual Block