Predicting What You Already Know Helps: Provable Self-Supervised   Learning

Jason D. Lee; Qi Lei; Nikunj Saunshi; Jiacheng Zhuo

arXiv:2008.01064·cs.LG·November 16, 2021·51 cites

Predicting What You Already Know Helps: Provable Self-Supervised Learning

Jason D. Lee, Qi Lei, Nikunj Saunshi, Jiacheng Zhuo

PDF

Open Access 1 Video

TL;DR

This paper provides a theoretical framework for self-supervised learning, demonstrating how predicting known information can lead to effective representations and reduce labeled data requirements, with guarantees for linear and nonlinear CCA methods.

Contribution

It introduces a formal analysis of reconstruction-based pretext tasks, showing their ability to learn useful representations with provable guarantees and reduced sample complexity.

Findings

01

Guarantees effective downstream task performance using simple linear classifiers

02

Proves small approximation error for complex functions with learned representations

03

Extends analysis to nonlinear CCA similar to SimSiam with comparable guarantees

Abstract

Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data to learn useful semantic representations. These pretext tasks are created solely using the input features, such as predicting a missing image patch, recovering the color channels of an image from context, or predicting missing words in text; yet predicting this \textit{known} information helps in learning representations effective for downstream prediction tasks. We posit a mechanism exploiting the statistical connections between certain {\em reconstruction-based} pretext tasks that guarantee to learn a good representation. Formally, we quantify how the approximate independence between the components of the pretext task (conditional on the label and latent variables) allows us to learn representations that can solve the downstream task by just training a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Predicting What You Already Know Helps: Provable Self-Supervised Learning· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Model Reduction and Neural Networks

MethodsLinear Layer