Visual Representation Learning with Self-Supervised Attention for   Low-Label High-data Regime

Prarthana Bhattacharyya; Chenge Li; Xiaonan Zhao; Istv\'an; Feh\'erv\'ari; Jason Sun

arXiv:2201.08951·cs.CV·February 1, 2022

Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Prarthana Bhattacharyya, Chenge Li, Xiaonan Zhao, Istv\'an, Feh\'erv\'ari, Jason Sun

PDF

Open Access 1 Repo

TL;DR

This paper explores the adaptation of self-supervised vision transformers to low-label, high-data scenarios, demonstrating improved performance in few-shot classification and zero-shot retrieval without requiring manual annotations.

Contribution

It introduces a novel approach using self-supervised vision transformers for low-label regimes, achieving state-of-the-art results in few-shot and zero-shot tasks.

Findings

01

Outperforms state-of-the-art on miniImageNet and CUB200 for few-shot classification.

02

Achieves up to 11% improvement on zero-shot image retrieval benchmarks.

03

Demonstrates effectiveness without manual annotations in low-label settings.

Abstract

Self-supervision has shown outstanding results for natural language processing, and more recently, for image recognition. Simultaneously, vision transformers and its variants have emerged as a promising and scalable alternative to convolutions on various computer vision tasks. In this paper, we are the first to question if self-supervised vision transformers (SSL-ViTs) can be adapted to two important computer vision tasks in the low-label, high-data regime: few-shot image classification and zero-shot image retrieval. The motivation is to reduce the number of manual annotations required to train a visual embedder, and to produce generalizable and semantically meaningful embeddings. For few-shot image classification we train SSL-ViTs without any supervision, on external data, and use this trained embedder to adapt quickly to novel classes with limited number of labels. For zero-shot image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AutoVision-cloud/SSL-ViT-lowlabel-highdata
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI