Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse
Arthur Jacot, Peter S\'uken\'ik, Zihan Wang, Marco Mondelli

TL;DR
This paper proves that wide neural networks trained with weight decay naturally exhibit neural collapse, a symmetric geometric structure in the last layer, addressing a key gap in theoretical understanding of trained DNNs.
Contribution
It moves beyond unconstrained feature models to prove neural collapse in end-to-end trained DNNs with linear layers and weight decay, under realistic assumptions.
Findings
Neural collapse occurs in wide neural networks trained with weight decay.
Theoretical guarantees are provided for neural collapse under certain training conditions.
Neural collapse is shown to emerge in practical training scenarios with linear layers and weight decay.
Abstract
Deep neural networks (DNNs) at convergence consistently represent the training data in the last layer via a highly symmetric geometric structure referred to as neural collapse. This empirical evidence has spurred a line of theoretical research aimed at proving the emergence of neural collapse, mostly focusing on the unconstrained features model. Here, the features of the penultimate layer are free variables, which makes the model data-agnostic and, hence, puts into question its ability to capture DNN training. Our work addresses the issue, moving away from unconstrained features and studying DNNs that end with at least two linear layers. We first prove generic guarantees on neural collapse that assume (i) low training error and balancedness of the linear layers (for within-class variability collapse), and (ii) bounded conditioning of the features before the linear part (for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsBrain Tumor Detection and Classification
