Information theoretic underpinning of self-supervised learning by clustering
Josef Kittler, Sara Atito, Muhammad Awais

TL;DR
This paper develops a theoretical framework for self-supervised learning based on clustering, explaining common heuristics like centering and distillation through information theory and divergence optimization.
Contribution
It introduces a novel information-theoretic formulation of SSL as K-L divergence optimization, providing a theoretical basis for practices like centering and distillation.
Findings
Normalization via inverse cluster priors simplifies to batch centering.
The model supports existing SSL methods and guides future research.
Theoretical underpinnings explain heuristics used in SSL.
Abstract
Self-supervised learning (SSL) is recognized as an essential tool for building foundation models for Artificial Intelligence applications. The advances in SSL have been made thanks to vigorous arguments about the principles of SSL and through extensive empirical research. The aim of this paper is to contribute to the development of the underpinning theory of SSL, focusing on the deep clustering approach. By analogy to supervised learning, we formulate SSL as K-L divergence optimization. The mode collapse is prevented by imposing an optimisation constraint on the teacher distribution. This leads to normalization using inverse cluster priors. We show that using Jensen inequality this normalization simplifies to the popular batch centering procedure. Distillation and centering are common {heuristics-based} practices in SSL, {but our work underpins them theoretically.} The theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
