On the Optimization Landscape of Neural Collapse under MSE Loss: Global   Optimality with Unconstrained Features

Jinxin Zhou; Xiao Li; Tianyu Ding; Chong You; Qing Qu; Zhihui Zhu

arXiv:2203.01238·cs.LG·March 15, 2022·6 cites

On the Optimization Landscape of Neural Collapse under MSE Loss: Global Optimality with Unconstrained Features

Jinxin Zhou, Xiao Li, Tianyu Ding, Chong You, Qing Qu, Zhihui Zhu

PDF

Open Access

TL;DR

This paper analyzes the optimization landscape of neural collapse under MSE loss, proving that global minima correspond to neural collapse solutions and that all other critical points are saddles, with experimental validation on neural networks.

Contribution

It provides the first global landscape analysis for MSE loss in neural collapse, showing that only neural collapse solutions are global minima and others are saddle points.

Findings

01

Global minimizers are neural collapse solutions.

02

All other critical points are strict saddles with negative curvature.

03

Rescaling hyperparameters can improve the landscape around NC solutions.

Abstract

When training deep neural networks for classification tasks, an intriguing empirical phenomenon has been widely observed in the last-layer classifiers and features, where (i) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and (ii) cross-example within-class variability of last-layer activations collapses to zero. This phenomenon is called Neural Collapse (NC), which seems to take place regardless of the choice of loss functions. In this work, we justify NC under the mean squared error (MSE) loss, where recent empirical evidence shows that it performs comparably or even better than the de-facto cross-entropy loss. Under a simplified unconstrained feature model, we provide the first global landscape analysis for vanilla nonconvex MSE loss and show that the (only!) global minimizers are neural collapse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning