TL;DR
This paper introduces a mean-shift based self-supervised learning algorithm that groups images by shifting embeddings towards their neighbors, achieving competitive results without explicit contrastive or clustering objectives.
Contribution
The paper proposes a simple mean-shift algorithm for SSL that does not rely on contrastive learning or clustering, aligning with BYOL when using a single neighbor.
Findings
Achieves 72.4% on ImageNet linear evaluation with ResNet50 at 200 epochs
Outperforms BYOL in experiments
Provides open-source code for reproducibility
Abstract
Most recent self-supervised learning (SSL) algorithms learn features by contrasting between instances of images or by clustering the images and then contrasting between the image clusters. We introduce a simple mean-shift algorithm that learns representations by grouping images together without contrasting between them or adopting much of prior on the structure of the clusters. We simply "shift" the embedding of each image to be close to the "mean" of its neighbors. Since in our setting, the closest neighbor is always another augmentation of the same image, our model will be identical to BYOL when using only one nearest neighbor instead of 5 as used in our experiments. Our model achieves 72.4% on ImageNet linear evaluation with ResNet50 at 200 epochs outperforming BYOL. Our code is available here: https://github.com/UMBCvision/MSF
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBootstrap Your Own Latent
