Why Prototypes Collapse: Diagnosing and Preventing Partial Collapse in Prototypical Self-Supervised Learning

Gabriel Y. Arteaga; Marius Aasan; Rwiddhi Chakraborty; Martine Hjelkrem-Tan; Thalles Silva; Michael Kampffmeyer; Ad\'in Ram\'irez Rivera

arXiv:2510.20108·cs.LG·February 13, 2026

Why Prototypes Collapse: Diagnosing and Preventing Partial Collapse in Prototypical Self-Supervised Learning

Gabriel Y. Arteaga, Marius Aasan, Rwiddhi Chakraborty, Martine Hjelkrem-Tan, Thalles Silva, Michael Kampffmeyer, Ad\'in Ram\'irez Rivera

PDF

Open Access 3 Reviews

TL;DR

This paper identifies the root cause of partial prototype collapse in self-supervised learning and proposes a decoupled training method that maintains diverse prototypes and improves downstream results.

Contribution

It introduces a novel decoupled training strategy that separates prototype and encoder optimization, preventing collapse without extra regularizers.

Findings

01

Decoupled training prevents prototype collapse effectively.

02

Diverse prototypes lead to better downstream performance.

03

The method outperforms existing regularization techniques.

Abstract

Prototypical self-supervised learning methods consistently suffer from partial prototype collapse, where multiple prototypes converge to nearly identical representations. This undermines their central purpose -- providing diverse and informative targets to guide encoders toward rich representations -- and has led practitioners to over-parameterize prototype sets or add ad-hoc regularizers, which mitigate symptoms rather than address the root cause. We empirically trace the collapse to the joint optimization of encoders and prototypes, which encourages a type of shortcut learning: early in training prototypes drift toward redundant representations that minimize loss without necessarily enhancing representation diversity. To break the joint optimization, we introduce a fully decoupled training strategy that learns prototypes and encoders under separate objectives. Concretely, we model…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

- findings in Section 2 that not all methods suffer from partial prototype collapse is novel and nicely motivates the decoupled training method proposed. - the authors connect this collapse to recent findings regarding the muted effect of data scaling in SSL methods (lines 388). This is a fundamental open question that is carefully studied here through the lens of prototypes. - examining training dynamics of the prototype collapse in Section 4.3 is an insightful approach that goes beyond simply

Weaknesses

- define prototype in the introduction, and provide a clear accessible explanation of the partial prototype collapse you're aiming to solve. - as described in section 2.1 the setting you're working with is focused on image domain, make sure this is clearly defined early on in the paper so readers have the right expectation of the setting you're exploring. - similarly, technial terms such as KP regularization also require even a brief explanation to make this work accessible to broader audience.

Reviewer 02Rating 4Confidence 5

Strengths

- Prototypical SSL methods such as DINO have grown to become widely used and better understanding the partial prototype collapse issue sheds light on the inner workings of these methods and provides insights on how to further improve them. - The development of the fully decoupled training methodology by using an online Gaussian Mixture Model to update the prototypes is interesting. This is a valuable contribution as this involves specific choice of techniques such as responsibility based forgett

Weaknesses

**Major:** - The partial prototype collapse occurs in many prototypical learning methods. The decoupled training is presented as a more general contribution, but the paper only experiments with CARP. So, it is unclear if the proposed method is indeed general. - In Table 3, there is a direct comparison between CARP and CARP+Decoupling only for the Resnet-50 model. Can the authors produce similar results for the other backbones (ViT-S, ViT-B)? Otherwise, it is hard to identify the impact of the pr

Reviewer 03Rating 4Confidence 3

Strengths

- The article offers a practically valuable observation about the potential cause of the prototype collapse problem in relation to the joint optimization of encoder parameters and prototype points. - It also provides a loss separation approach (for encoder and prototype points) that is demonstrated to be successful in avoiding the collapse problem. - A clear measure of collapse is also introduced in the article.

Weaknesses

The main weakness of the article is that it is entirely based on an empirical approach, lacking a satisfactory analytical foundation for both the cause and the proposed solution of the prototype collapse problem. Although the article is useful in terms of practical implementations, a stronger theoretical grounding is needed to make its arguments more convincing and provide deeper insights for a potential ICLR submission.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Gaussian Processes and Bayesian Inference