Neural Collapse Dynamics: Depth, Activation, Regularisation, and Feature Norm Threshold

Anamika Paul Rupa

arXiv:2604.00230·cs.LG·April 2, 2026

Neural Collapse Dynamics: Depth, Activation, Regularisation, and Feature Norm Threshold

Anamika Paul Rupa

PDF

TL;DR

This paper identifies a critical feature norm value that predicts neural collapse onset in deep networks, revealing how architecture, activation, and regularization influence this process.

Contribution

It uncovers a simple, predictive regularity linking feature norm to neural collapse, and characterizes how network structure and training conditions affect this dynamics.

Findings

01

Neural collapse occurs when feature norm reaches a dataset-specific threshold.

02

Crossing the norm threshold reliably predicts neural collapse onset with an average lead of 62 epochs.

03

Regularities in depth, activation, and regularization significantly influence collapse speed and feature norm threshold.

Abstract

Neural collapse (NC) -- the convergence of penultimate-layer features to a simplex equiangular tight frame -- is well understood at equilibrium, but the dynamics governing its onset remain poorly characterised. We identify a simple and predictive regularity: NC occurs when the mean feature norm reaches a model-dataset-specific critical value, fn*, that is largely invariant to training conditions. This value concentrates tightly within each (model, dataset) pair (CV < 8%); training dynamics primarily affect the rate at which fn approaches fn*, rather than the value itself. In standard training trajectories, the crossing of fn below fn* consistently precedes NC onset, providing a practical predictor with a mean lead time of 62 epochs (MAE 24 epochs). A direct intervention experiment confirms fn* is a stable attractor of the gradient flow -- perturbations to feature scale are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.