How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self   Distillation Networks

Etai Littwin; Omid Saremi; Madhu Advani; Vimal Thilak; Preetum; Nakkiran; Chen Huang; Joshua Susskind

arXiv:2407.03475·cs.LG·July 8, 2024

How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks

Etai Littwin, Omid Saremi, Madhu Advani, Vimal Thilak, Preetum, Nakkiran, Chen Huang, Joshua Susskind

PDF

Open Access

TL;DR

This paper analyzes the training dynamics of deep linear self-distillation networks, revealing an implicit bias towards high-influence features in JEPA that explains its effectiveness in learning abstract representations.

Contribution

It uncovers the implicit bias of JEPA towards high-influence features through a simplified linear model analysis, explaining its empirical success.

Findings

01

JEPA biases learning towards high-influence features

02

Linear analysis reveals the mechanism behind JEPA's feature selection

03

JEPA's implicit bias favors abstract, high-impact features

Abstract

Two competing paradigms exist for self-supervised learning of data representations. Joint Embedding Predictive Architecture (JEPA) is a class of architectures in which semantically similar inputs are encoded into representations that are predictive of each other. A recent successful approach that falls under the JEPA framework is self-distillation, where an online encoder is trained to predict the output of the target encoder, sometimes using a lightweight predictor network. This is contrasted with the Masked AutoEncoder (MAE) paradigm, where an encoder and decoder are trained to reconstruct missing parts of the input in the data space rather, than its latent representation. A common motivation for using the JEPA approach over MAE is that the JEPA objective prioritizes abstract features over fine-grained pixel information (which can be unpredictable and uninformative). In this work, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsMasked autoencoder