Stable Minima of ReLU Neural Networks Suffer from the Curse of Dimensionality: The Neural Shattering Phenomenon
Tongtong Liang, Dan Qiao, Yu-Xiang Wang, Rahul Parhi

TL;DR
This paper demonstrates that flat minima in overparameterized ReLU neural networks lead to exponentially worse generalization in high-dimensional settings, due to a phenomenon called neural shattering, which explains why flat minima may fail in high dimensions.
Contribution
The paper provides the first systematic theoretical analysis showing how flat minima cause poor generalization in high-dimensional ReLU networks through neural shattering.
Findings
Flat solutions generalize poorly as input dimension increases.
Exponential deterioration of convergence rates for flat minima in high dimensions.
Neural shattering explains the failure of flat minima to generalize in high-dimensional spaces.
Abstract
We study the implicit bias of flatness / low (loss) curvature and its effects on generalization in two-layer overparameterized ReLU networks with multivariate inputs -- a problem well motivated by the minima stability and edge-of-stability phenomena in gradient-descent training. Existing work either requires interpolation or focuses only on univariate inputs. This paper presents new and somewhat surprising theoretical results for multivariate inputs. On two natural settings (1) generalization gap for flat solutions, and (2) mean-squared error (MSE) in nonparametric function estimation by stable minima, we prove upper and lower bounds, which establish that while flatness does imply generalization, the resulting rates of convergence necessarily deteriorate exponentially as the input dimension grows. This gives an exponential separation between the flat solutions compared to low-norm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia?
