Exploring Simple Siamese Representation Learning
Xinlei Chen, Kaiming He

TL;DR
This paper demonstrates that simple Siamese networks can learn meaningful visual representations without negative pairs, large batches, or momentum encoders, primarily relying on a stop-gradient operation to prevent collapse, achieving competitive results.
Contribution
The paper introduces SimSiam, a simple Siamese network approach that challenges existing beliefs by showing effective unsupervised learning without complex components.
Findings
Simple Siamese networks can learn meaningful representations.
Stop-gradient operation is crucial to prevent collapse.
SimSiam achieves competitive results on ImageNet.
Abstract
Siamese networks have become a common structure in various recent models for unsupervised visual representation learning. These models maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing solutions. In this paper, we report surprising empirical results that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders. Our experiments show that collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing. We provide a hypothesis on the implication of stop-gradient, and further show proof-of-concept experiments verifying it. Our "SimSiam" method achieves competitive results on ImageNet and downstream tasks. We hope this simple baseline will…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications
