TL;DR
This paper investigates Neural Collapse, a phenomenon observed during the final phase of deep neural network training, revealing a symmetric geometric structure that improves generalization, robustness, and interpretability.
Contribution
It provides the first comprehensive empirical measurement of Neural Collapse across multiple architectures and datasets, elucidating its geometric properties and benefits.
Findings
Neural Collapse occurs consistently during the terminal phase of training.
Class activations and classifiers form a Simplex ETF geometry.
Neural Collapse enhances model generalization and robustness.
Abstract
Modern practice for training classification deepnets involves a Terminal Phase of Training (TPT), which begins at the epoch where training error first vanishes; During TPT, the training error stays effectively zero while training loss is pushed towards zero. Direct measurements of TPT, for three prototypical deepnet architectures and across seven canonical classification datasets, expose a pervasive inductive bias we call Neural Collapse, involving four deeply interconnected phenomena: (NC1) Cross-example within-class variability of last-layer training activations collapses to zero, as the individual activations themselves collapse to their class-means; (NC2) The class-means collapse to the vertices of a Simplex Equiangular Tight Frame (ETF); (NC3) Up to rescaling, the last-layer classifiers collapse to the class-means, or in other words to the Simplex ETF, i.e. to a self-dual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
