Why Self-Supervised Encoders Want to Be Normal

Yuval Domb

arXiv:2604.27743·cs.IT·May 5, 2026

Why Self-Supervised Encoders Want to Be Normal

Yuval Domb

PDF

TL;DR

This paper explains why self-supervised encoders tend to produce normal distributions by linking it to the Information Bottleneck principle, providing a theoretical foundation and practical loss functions for better representations.

Contribution

It offers a novel theoretical perspective connecting the IB principle to the normality of representations and introduces practical loss objectives based on this insight.

Findings

01

Latent representations tend toward isotropic Gaussian states.

02

The framework unifies various supervised and self-supervised objectives.

03

New loss functions improve embedding quality on benchmarks.

Abstract

Self-supervised learning has achieved remarkable empirical success in learning robust representations without explicit labels, most recently demonstrated within the framework of Joint-Embedding Predictive Architectures (JEPA). However, a fundamental question remains: what analytical principles drive these encoders toward specific distributional states? In this paper, we demonstrate that the preference for normal distributions in self-supervised encoders is a direct consequence of the Information Bottleneck (IB) principle. By recasting the IB objective as a rate-distortion problem over the predictive manifold, we provide a theoretical basis for why optimal, target-neutral, latent representations should tend towards isotropic Gaussian states. Under this framework, we show that latent representations correspond to soft clustering of inputs sharing similar predictive distributions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.