Understanding Self-supervised Learning with Dual Deep Networks
Yuandong Tian, Lantao Yu, Xinlei Chen, Surya Ganguli

TL;DR
This paper provides a theoretical understanding of contrastive self-supervised learning with dual deep networks, showing how hierarchical features emerge through covariance operators and latent variable models, supported by numerical validation.
Contribution
It introduces a novel theoretical framework explaining how contrastive SSL amplifies initial random selectivities to learn hierarchical features without direct supervision.
Findings
Weights are updated by a covariance operator that amplifies initial selectivities.
Deep ReLU networks can learn latent variables in hierarchical models without supervision.
Numerical studies support the theoretical insights.
Abstract
We propose a novel theoretical framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks (e.g., SimCLR). First, we prove that in each SGD update of SimCLR with various loss functions, including simple contrastive loss, soft Triplet loss and InfoNCE loss, the weights at each layer are updated by a \emph{covariance operator} that specifically amplifies initial random selectivities that vary across data samples but survive averages over data augmentations. To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a \emph{hierarchical latent tree model} (HLTM) and prove that the hidden neurons of deep ReLU networks can learn the latent variables in HLTM, despite the fact that the network receives \emph{no direct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Topic Modeling
MethodsTriplet Loss · InfoNCE · 1x1 Convolution · Batch Normalization · Residual Connection · Residual Block · Convolution · Bottleneck Residual Block · Average Pooling · Global Average Pooling
