Exploring the Diversity and Invariance in Yourself for Visual Pre-Training Task
Longhui Wei, Lingxi Xie, Wengang Zhou, Houqiang Li, Qi Tian

TL;DR
This paper introduces E-DIY, a self-supervised learning mechanism that enhances visual pre-training by promoting diversity and invariance in region-level features, leading to improved transfer performance.
Contribution
E-DIY is a novel approach that explicitly encourages diversity among different regions and invariance across augmented views, addressing representation collapse in self-supervised learning.
Findings
Achieves 2.1% improvement over BYOL on COCO object detection.
Effectively preserves multi-grained visual information inside images.
Demonstrates superior transfer performance on downstream tasks.
Abstract
Recently, self-supervised learning methods have achieved remarkable success in visual pre-training task. By simply pulling the different augmented views of each image together or other novel mechanisms, they can learn much unsupervised knowledge and significantly improve the transfer performance of pre-training models. However, these works still cannot avoid the representation collapse problem, i.e., they only focus on limited regions or the extracted features on totally different regions inside each image are nearly the same. Generally, this problem makes the pre-training models cannot sufficiently describe the multi-grained information inside images, which further limits the upper bound of their transfer performance. To alleviate this issue, this paper introduces a simple but effective mechanism, called Exploring the Diversity and Invariance in Yourself E-DIY. By simply pushing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Domain Adaptation and Few-Shot Learning · Advanced Vision and Imaging
MethodsRegion Proposal Network · Convolution · Softmax · RoIAlign · Bootstrap Your Own Latent · Mask R-CNN
