Linguistic Collapse: Neural Collapse in (Large) Language Models

Robert Wu; Vardan Papyan

arXiv:2405.17767·cs.LG·November 27, 2024

Linguistic Collapse: Neural Collapse in (Large) Language Models

Robert Wu, Vardan Papyan

PDF

Open Access 2 Repos 1 Models 1 Video

TL;DR

This paper investigates neural collapse in large language models, revealing that scale and regularization influence its development and are linked to better generalization, extending neural collapse understanding to language modeling.

Contribution

It empirically explores neural collapse in language models, demonstrating its occurrence despite challenging conditions and its relation to model generalization.

Findings

01

Neural collapse properties develop with scale and regularization.

02

Neural collapse is linked to improved generalization in language models.

03

Neural collapse extends to the complex setting of language modeling.

Abstract

Neural collapse ( $N C$ ) is a phenomenon observed in classification tasks where top-layer representations collapse into their class means, which become equinorm, equiangular and aligned with the classifiers. These behaviours -- associated with generalization and robustness -- would manifest under specific conditions: models are trained towards zero loss, with noise-free labels belonging to balanced classes, which do not outnumber the model's hidden dimension. Recent studies have explored $N C$ in the absence of one or more of these conditions to extend and capitalize on the associated benefits of ideal geometries. Language modelling presents a curious frontier, as \textit{training by token prediction} constitutes a classification task where none of the conditions exist: the vocabulary is imbalanced and exceeds the embedding dimension; different tokens might correspond…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
rhubarbwu/TinyStories-12x1024_10L
model

Videos

Linguistic Collapse: Neural Collapse in (Large) Language Models· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling