SERE: Exploring Feature Self-relation for Self-supervised Transformer

Zhong-Yu Li; Shanghua Gao; Ming-Ming Cheng

arXiv:2206.05184·cs.CV·September 19, 2023

SERE: Exploring Feature Self-relation for Self-supervised Transformer

Zhong-Yu Li, Shanghua Gao, Ming-Ming Cheng

PDF

Open Access 1 Repo

TL;DR

This paper introduces SERE, a self-supervised learning method for vision transformers that leverages feature self-relations across spatial and channel dimensions to improve representation quality for various vision tasks.

Contribution

The paper proposes a novel self-supervised learning approach that utilizes feature self-relations in ViT, addressing limitations of CNN-based strategies and enhancing relation modeling capabilities.

Findings

01

Improved downstream task performance with SERE

02

Enhanced relation modeling in ViT

03

Stable and stronger representations

Abstract

Learning representations with self-supervision for convolutional networks (CNN) has been validated to be effective for vision tasks. As an alternative to CNN, vision transformers (ViT) have strong representation ability with spatial self-attention and channel-level feedforward networks. Recent works reveal that self-supervised learning helps unleash the great potential of ViT. Still, most works follow self-supervised strategies designed for CNN, e.g., instance-level discrimination of samples, but they ignore the properties of ViT. We observe that relational modeling on spatial and channel dimensions distinguishes ViT from other networks. To enforce this property, we explore the feature SElf-RElation (SERE) for training self-supervised ViT. Specifically, instead of conducting self-supervised learning solely on feature embeddings from multiple views, we utilize the feature self-relations,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MCG-NKU/SERE
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Visual Attention and Saliency Detection