The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning
Borja Rodr\'iguez-G\'alvez, Arno Blaas, Pau Rodr\'iguez, Adam, Goli\'nski, Xavier Suau, Jason Ramapuram, Dan Busbridge, Luca Zappella

TL;DR
This paper analyzes multi-view self-supervised learning (MVSSL) using an entropy and reconstruction (ER) mutual information bound, revealing how different methods maximize MI or reconstruction, and improving stability and performance with this perspective.
Contribution
It introduces an ER-based MI bound to unify and reinterpret MVSSL methods, demonstrating how clustering and distillation approaches optimize different components.
Findings
Clustering methods maximize MI under the ER bound.
Distillation methods explicitly maximize reconstruction and stabilize entropy.
Replacing objectives with ER bound improves stability and performance.
Abstract
The mechanisms behind the success of multi-view self-supervised learning (MVSSL) are not yet fully understood. Contrastive MVSSL methods have been studied through the lens of InfoNCE, a lower bound of the Mutual Information (MI). However, the relation between other MVSSL methods and MI remains unclear. We consider a different lower bound on the MI consisting of an entropy and a reconstruction term (ER), and analyze the main MVSSL families through its lens. Through this ER bound, we show that clustering-based methods such as DeepCluster and SwAV maximize the MI. We also re-interpret the mechanisms of distillation-based approaches such as BYOL and DINO, showing that they explicitly maximize the reconstruction term and implicitly encourage a stable entropy, and we confirm this empirically. We show that replacing the objectives of common MVSSL methods with this ER bound achieves competitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Computing and Algorithms · Video Surveillance and Tracking Methods
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Linear Layer · Dense Connections · Residual Connection · Vision Transformer · k-Means Clustering · LARS
