Neural Collapse under Gradient Flow on Shallow ReLU Networks for Orthogonally Separable Data
Hancheng Min, Zhihui Zhu, Ren\'e Vidal

TL;DR
This paper proves that gradient flow on shallow ReLU networks for orthogonally separable data naturally leads to Neural Collapse, highlighting the influence of data structure, nonlinear activations, and training dynamics on this phenomenon.
Contribution
It demonstrates Neural Collapse under gradient flow without unconstrained features, emphasizing the roles of data structure, nonlinear activations, and implicit bias.
Findings
Gradient flow on shallow ReLU networks exhibits Neural Collapse.
Data structure and nonlinear activations influence Neural Collapse.
Implicit bias of training dynamics facilitates Neural Collapse.
Abstract
Among many mysteries behind the success of deep networks lies the exceptional discriminative power of their learned representations as manifested by the intriguing Neural Collapse (NC) phenomenon, where simple feature structures emerge at the last layer of a trained neural network. Prior works on the theoretical understandings of NC have focused on analyzing the optimization landscape of matrix-factorization-like problems by considering the last-layer features as unconstrained free optimization variables and showing that their global minima exhibit NC. In this paper, we show that gradient flow on a two-layer ReLU network for classifying orthogonally separable data provably exhibits NC, thereby advancing prior results in two ways: First, we relax the assumption of unconstrained features, showing the effect of data structure and nonlinear activations on NC characterizations. Second, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Reservoir Computing · Neural Networks and Applications
