Linguistic Collapse: Neural Collapse in (Large) Language Models
Robert Wu, Vardan Papyan

TL;DR
This paper investigates neural collapse in large language models, revealing that scale and regularization influence its development and are linked to better generalization, extending neural collapse understanding to language modeling.
Contribution
It empirically explores neural collapse in language models, demonstrating its occurrence despite challenging conditions and its relation to model generalization.
Findings
Neural collapse properties develop with scale and regularization.
Neural collapse is linked to improved generalization in language models.
Neural collapse extends to the complex setting of language modeling.
Abstract
Neural collapse () is a phenomenon observed in classification tasks where top-layer representations collapse into their class means, which become equinorm, equiangular and aligned with the classifiers. These behaviours -- associated with generalization and robustness -- would manifest under specific conditions: models are trained towards zero loss, with noise-free labels belonging to balanced classes, which do not outnumber the model's hidden dimension. Recent studies have explored in the absence of one or more of these conditions to extend and capitalize on the associated benefits of ideal geometries. Language modelling presents a curious frontier, as \textit{training by token prediction} constitutes a classification task where none of the conditions exist: the vocabulary is imbalanced and exceeds the embedding dimension; different tokens might correspond…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
