Generalized Neural Collapse for a Large Number of Classes

Jiachen Jiang; Jinxin Zhou; Peng Wang; Qing Qu; Dustin Mixon; Chong; You; Zhihui Zhu

arXiv:2310.05351·cs.LG·October 30, 2023

Generalized Neural Collapse for a Large Number of Classes

Jiachen Jiang, Jinxin Zhou, Peng Wang, Qing Qu, Dustin Mixon, Chong, You, Zhihui Zhu

PDF

Open Access 3 Reviews

TL;DR

This paper extends the concept of neural collapse to scenarios with many classes exceeding feature dimension, providing both empirical evidence and theoretical proof of the phenomenon in large-class deep models.

Contribution

It introduces the generalized neural collapse concept for large-class settings and proves its occurrence under specific theoretical conditions, expanding understanding of neural collapse.

Findings

01

Generalized neural collapse observed in practical deep networks

02

Maximization of one-vs-rest margins during collapse

03

Theoretical proof under spherical feature constraints

Abstract

Neural collapse provides an elegant mathematical characterization of learned last layer representations (a.k.a. features) and classifier weights in deep classification models. Such results not only provide insights but also motivate new techniques for improving practical deep models. However, most of the existing empirical and theoretical studies in neural collapse focus on the case that the number of classes is small relative to the dimension of the feature space. This paper extends neural collapse to cases where the number of classes are much larger than the dimension of feature space, which broadly occur for language models, retrieval systems, and face recognition applications. We show that the features and classifier exhibit a generalized neural collapse phenomenon, where the minimum one-vs-rest margins is maximized.We provide empirical study to verify the occurrence of generalized…

Peer Reviews

Decision·ICML 2024 Poster

Reviewer 01Rating 8· accept, good paperConfidence 3

Strengths

* important problem / open question: NC geometry for large K * technical results appear solid although I did not have time to fully check the proofs * result is to my knowledge novel

Weaknesses

* the requirement for feature/weight normalization might seem restrictive. The authors mention that this is standard practice, but looking at the three references, only one of them is on CE loss. * the description of geometry is elegant, but is in general not explicit. Instead given as solution to a non-convex problem. It nevertheless gives an implicit way of thinking about the classifiers geometry. For d=2 it is nice to have the closed-form, but perhaps uniformity on the sphere is not very sur

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

+ The paper is well-organized, allowing readers to easily follow the author's ideas. The writer makes sure the content is clear and simple, so readers from different backgrounds can understand the main points. + The author expanded the "Neural Collapse" (NC) idea to the "General Neural Collapse" (GNC), considering cases where the number of categories is greater than the feature size.

Weaknesses

+ **Too strong assumption in the theoretical analysis**: $\tau \rightarrow 0$ means the norm of the feature goes to infinity, which means that a small angle between every two classes can enjoy the CE closing to $0$. Therefore, practically, GNC can not be achieved when $\tau$ is very small. + **Insufficient Discussions**: The Tammes problem is the limit case of Thomson-P problem [1], authors should also consider it and provide more discussions for it, especially Thomson-1 problem. [1] Numerical

Reviewer 03Rating 8· accept, good paperConfidence 5

Strengths

- This paper provides a clear view of the global optimal conditions for generalized neural collapse, especially for a large number of classes, while the existing work often considers a closed-form case (i.e., a simplex of ETF) in which the number of classes is smaller than the feature dimensionality. Therefore, the contribution of the work undoubtedly broadens the horizon of Neural Collapse. - The paper's crucial contribution lies in its demonstration that minimizing the asymptotic CE loss is eq

Weaknesses

- The theoretical results primarily emphasize the asymptotic CE loss rather than the original CE loss, potentially limiting the direct applicability of the findings to real-world scenarios. - The features and class weights are constrained on the unit sphere. - There are some minor typos, such as "where the number of classes are ... which occur" should be corrected to "where the number of classes is ... which occurs" - Incorrect cite command in the paragraph 'Related Work' and others

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques

MethodsFocus