Exploring the Diversity and Invariance in Yourself for Visual   Pre-Training Task

Longhui Wei; Lingxi Xie; Wengang Zhou; Houqiang Li; Qi Tian

arXiv:2106.00537·cs.CV·June 2, 2021

Exploring the Diversity and Invariance in Yourself for Visual Pre-Training Task

Longhui Wei, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian

PDF

Open Access

TL;DR

This paper introduces E-DIY, a self-supervised learning mechanism that enhances visual pre-training by promoting diversity and invariance in region-level features, leading to improved transfer performance.

Contribution

E-DIY is a novel approach that explicitly encourages diversity among different regions and invariance across augmented views, addressing representation collapse in self-supervised learning.

Findings

01

Achieves 2.1% improvement over BYOL on COCO object detection.

02

Effectively preserves multi-grained visual information inside images.

03

Demonstrates superior transfer performance on downstream tasks.

Abstract

Recently, self-supervised learning methods have achieved remarkable success in visual pre-training task. By simply pulling the different augmented views of each image together or other novel mechanisms, they can learn much unsupervised knowledge and significantly improve the transfer performance of pre-training models. However, these works still cannot avoid the representation collapse problem, i.e., they only focus on limited regions or the extracted features on totally different regions inside each image are nearly the same. Generally, this problem makes the pre-training models cannot sufficiently describe the multi-grained information inside images, which further limits the upper bound of their transfer performance. To alleviate this issue, this paper introduces a simple but effective mechanism, called Exploring the Diversity and Invariance in Yourself E-DIY. By simply pushing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing Techniques and Applications · Domain Adaptation and Few-Shot Learning · Advanced Vision and Imaging

MethodsRegion Proposal Network · Convolution · Softmax · RoIAlign · Bootstrap Your Own Latent · Mask R-CNN