On Partial Prototype Collapse in the DINO Family of Self-Supervised Methods
Hariprasath Govindarajan, Per Sid\'en, Jacob Roll, Fredrik Lindsten

TL;DR
This paper identifies a partial prototype collapse issue in DINO self-supervised methods, which causes redundancies and limits representation diversity, and proposes encouraging diverse prototypes to improve clustering and representation quality.
Contribution
The paper reveals the partial prototype collapse problem in DINO methods and introduces a strategy to promote prototype diversity, enhancing learned representations.
Findings
Encouraging prototype diversity reduces redundancies.
More fine-grained clusters are learned with diverse prototypes.
Improved representations are especially evident on long-tailed datasets.
Abstract
A prominent self-supervised learning paradigm is to model the representations as clusters, or more generally as a mixture model. Learning to map the data samples to compact representations and fitting the mixture model simultaneously leads to the representation collapse problem. Regularizing the distribution of data points over the clusters is the prevalent strategy to avoid this issue. While this is sufficient to prevent full representation collapse, we show that a partial prototype collapse problem still exists in the DINO family of methods, that leads to significant redundancies in the prototypes. Such prototype redundancies serve as shortcuts for the method to achieve a marginal latent class distribution that matches the prescribed prior. We show that by encouraging the model to use diverse prototypes, the partial prototype collapse can be mitigated. Effective utilization of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Control Systems Optimization · Neural Networks and Applications · Fault Detection and Control Systems
MethodsDense Connections · Layer Normalization · Residual Connection · Attention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Vision Transformer · self-DIstillation with NO labels
