MSR: Making Self-supervised learning Robust to Aggressive Augmentations
Yingbin Bai, Erkun Yang, Zhaoqing Wang, Yuxuan Du, Bo Han, Cheng Deng,, Dadong Wang, Tongliang Liu

TL;DR
This paper introduces a novel self-supervised learning approach that balances weak and aggressive augmentations to mitigate semantic shift issues, leading to improved image representation and transfer performance.
Contribution
It proposes a training paradigm that dynamically adjusts the importance of aggressive augmentations to counteract semantic shift in SSL.
Findings
Achieves 73.1% top-1 accuracy on ImageNet-1K with ResNet-50.
Outperforms BYOL by 2.5% in top-1 accuracy.
Learned representations transfer well to downstream tasks.
Abstract
Most recent self-supervised learning methods learn visual representation by contrasting different augmented views of images. Compared with supervised learning, more aggressive augmentations have been introduced to further improve the diversity of training pairs. However, aggressive augmentations may distort images' structures leading to a severe semantic shift problem that augmented views of the same image may not share the same semantics, thus degrading the transfer performance. To address this problem, we propose a new SSL paradigm, which counteracts the impact of semantic shift by balancing the role of weak and aggressively augmented pairs. Specifically, semantically inconsistent pairs are of minority and we treat them as noisy pairs. Note that deep neural networks (DNNs) have a crucial memorization effect that DNNs tend to first memorize clean (majority) examples before overfitting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
MethodsBootstrap Your Own Latent
