Understanding Augmentation-based Self-Supervised Representation Learning   via RKHS Approximation and Regression

Runtian Zhai; Bingbin Liu; Andrej Risteski; Zico Kolter; Pradeep; Ravikumar

arXiv:2306.00788·cs.LG·January 19, 2024·1 cites

Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression

Runtian Zhai, Bingbin Liu, Andrej Risteski, Zico Kolter, Pradeep, Ravikumar

PDF

Open Access 1 Video

TL;DR

This paper provides a theoretical framework for understanding how data augmentation influences self-supervised learning, using RKHS approximation and regression to analyze generalization and the effects of different augmentations.

Contribution

It introduces a geometric and statistical analysis of augmentation-based pretraining, deriving bounds that separate model and augmentation effects, and introduces augmentation complexity as a key factor.

Findings

01

Generalization bounds for augmentation-based learning free of model complexity

02

Decomposition of prediction error into estimation and approximation errors

03

Quantitative comparison of augmentations via augmentation complexity

Abstract

Data augmentation is critical to the empirical success of modern self-supervised representation learning, such as contrastive learning and masked language modeling. However, a theoretical understanding of the exact role of augmentation remains limited. Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator, suggesting that learning a linear probe atop such representation can be connected to RKHS regression. Building on this insight, this work delves into a statistical analysis of augmentation-based pretraining. Starting from the isometry property, a geometric characterization of the target function given by the augmentation, we disentangle the effects of the model and the augmentation, and prove two generalization bounds that are free of model complexity. Our first bound works for an arbitrary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression· slideslive

Taxonomy

TopicsAdvanced Graph Neural Networks · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsContrastive Learning