Exploring Localization for Self-supervised Fine-grained Contrastive   Learning

Di Wu; Siyuan Li; Zelin Zang; Stan Z. Li

arXiv:2106.15788·cs.CV·October 12, 2022·1 cites

Exploring Localization for Self-supervised Fine-grained Contrastive Learning

Di Wu, Siyuan Li, Zelin Zang, Stan Z. Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces CVSA, a contrastive learning framework that enhances fine-grained visual representation by improving object localization through saliency region manipulation and cross-view alignment.

Contribution

The paper proposes a novel saliency-based view generation method and a cross-view alignment loss to improve localization in self-supervised fine-grained learning.

Findings

01

CVSA significantly improves fine-grained classification accuracy.

02

The method enhances the model's ability to localize foreground objects.

03

Experiments show superior performance on multiple benchmarks.

Abstract

Self-supervised contrastive learning has demonstrated great potential in learning visual representations. Despite their success in various downstream tasks such as image classification and object detection, self-supervised pre-training for fine-grained scenarios is not fully explored. We point out that current contrastive methods are prone to memorizing background/foreground texture and therefore have a limitation in localizing the foreground object. Analysis suggests that learning to extract discriminative texture information and localization are equally crucial for fine-grained self-supervised pre-training. Based on our findings, we introduce cross-view saliency alignment (CVSA), a contrastive learning framework that first crops and swaps saliency regions of images as a novel view generation and then guides the model to localize on foreground objects via a cross-view alignment loss.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Westlake-AI/openmixup
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Visual Attention and Saliency Detection · Advanced Neural Network Applications

MethodsContrastive Learning