Progressive Feedforward Collapse of ResNet Training
Sicong Wang, Kuo Gai, Shihua Zhang

TL;DR
This paper introduces the concept of progressive feedforward collapse (PFC) in ResNet training, analyzing how intermediate layer features become increasingly aligned and simplified during training, and proposes models to understand this phenomenon.
Contribution
It extends neural collapse to intermediate layers with the PFC conjecture and develops the MUFM model to connect layer features via optimal transport, providing new theoretical insights.
Findings
Metrics of PFC decrease monotonically across depth.
ResNet with weight decay approximates geodesic curves in Wasserstein space.
MUFM model produces features more concentrated than input data.
Abstract
Neural collapse (NC) is a simple and symmetric phenomenon for deep neural networks (DNNs) at the terminal phase of training, where the last-layer features collapse to their class means and form a simplex equiangular tight frame aligning with the classifier vectors. However, the relationship of the last-layer features to the data and intermediate layers during training remains unexplored. To this end, we characterize the geometry of intermediate layers of ResNet and propose a novel conjecture, progressive feedforward collapse (PFC), claiming the degree of collapse increases during the forward propagation of DNNs. We derive a transparent model for the well-trained ResNet according to that ResNet with weight decay approximates the geodesic curve in Wasserstein space at the terminal phase. The metrics of PFC indeed monotonically decrease across depth on various datasets. We propose a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsAverage Pooling · Global Average Pooling · Kaiming Initialization · Max Pooling · Weight Decay · Convolution
