Information theoretic underpinning of self-supervised learning by clustering

Josef Kittler; Sara Atito; Muhammad Awais

arXiv:2605.11870·cs.LG·May 13, 2026

Information theoretic underpinning of self-supervised learning by clustering

Josef Kittler, Sara Atito, Muhammad Awais

PDF

TL;DR

This paper develops a theoretical framework for self-supervised learning based on clustering, explaining common heuristics like centering and distillation through information theory and divergence optimization.

Contribution

It introduces a novel information-theoretic formulation of SSL as K-L divergence optimization, providing a theoretical basis for practices like centering and distillation.

Findings

01

Normalization via inverse cluster priors simplifies to batch centering.

02

The model supports existing SSL methods and guides future research.

03

Theoretical underpinnings explain heuristics used in SSL.

Abstract

Self-supervised learning (SSL) is recognized as an essential tool for building foundation models for Artificial Intelligence applications. The advances in SSL have been made thanks to vigorous arguments about the principles of SSL and through extensive empirical research. The aim of this paper is to contribute to the development of the underpinning theory of SSL, focusing on the deep clustering approach. By analogy to supervised learning, we formulate SSL as K-L divergence optimization. The mode collapse is prevented by imposing an optimisation constraint on the teacher distribution. This leads to normalization using inverse cluster priors. We show that using Jensen inequality this normalization simplifies to the popular batch centering procedure. Distillation and centering are common {heuristics-based} practices in SSL, {but our work underpins them theoretically.} The theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.