Understanding Self-supervised Learning with Dual Deep Networks

Yuandong Tian; Lantao Yu; Xinlei Chen; Surya Ganguli

arXiv:2010.00578·cs.LG·February 16, 2021·35 cites

Understanding Self-supervised Learning with Dual Deep Networks

Yuandong Tian, Lantao Yu, Xinlei Chen, Surya Ganguli

PDF

Open Access 2 Repos

TL;DR

This paper provides a theoretical understanding of contrastive self-supervised learning with dual deep networks, showing how hierarchical features emerge through covariance operators and latent variable models, supported by numerical validation.

Contribution

It introduces a novel theoretical framework explaining how contrastive SSL amplifies initial random selectivities to learn hierarchical features without direct supervision.

Findings

01

Weights are updated by a covariance operator that amplifies initial selectivities.

02

Deep ReLU networks can learn latent variables in hierarchical models without supervision.

03

Numerical studies support the theoretical insights.

Abstract

We propose a novel theoretical framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks (e.g., SimCLR). First, we prove that in each SGD update of SimCLR with various loss functions, including simple contrastive loss, soft Triplet loss and InfoNCE loss, the weights at each layer are updated by a \emph{covariance operator} that specifically amplifies initial random selectivities that vary across data samples but survive averages over data augmentations. To further study what role the covariance operator plays and which features are learned in such a process, we model data generation and augmentation processes through a \emph{hierarchical latent tree model} (HLTM) and prove that the hidden neurons of deep ReLU networks can learn the latent variables in HLTM, despite the fact that the network receives \emph{no direct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Topic Modeling

MethodsTriplet Loss · InfoNCE · 1x1 Convolution · Batch Normalization · Residual Connection · Residual Block · Convolution · Bottleneck Residual Block · Average Pooling · Global Average Pooling