Linear Stability Hypothesis and Rank Stratification for Nonlinear Models

Yaoyu Zhang; Zhongwang Zhang; Leyang Zhang; Zhiwei Bai; Tao Luo,; Zhi-Qin John Xu

arXiv:2211.11623·cs.LG·November 22, 2022·1 cites

Linear Stability Hypothesis and Rank Stratification for Nonlinear Models

Yaoyu Zhang, Zhongwang Zhang, Leyang Zhang, Zhiwei Bai, Tao Luo,, Zhi-Qin John Xu

PDF

Open Access

TL;DR

This paper introduces a rank stratification and linear stability theory to explain how nonlinear models like DNNs can successfully recover target functions even when heavily overparameterized, predicting minimal data requirements.

Contribution

It proposes a novel rank stratification framework and linear stability hypothesis that unify understanding of target recovery in nonlinear models at overparameterization.

Findings

01

Model rank predicts minimal data size for target recovery.

02

Target functions become linearly stable at their model rank.

03

Model rank can be much lower than total parameters in DNNs.

Abstract

Models with nonlinear architectures/parameterizations such as deep neural networks (DNNs) are well known for their mysteriously good generalization performance at overparameterization. In this work, we tackle this mystery from a novel perspective focusing on the transition of the target recovery/fitting accuracy as a function of the training data size. We propose a rank stratification for general nonlinear models to uncover a model rank as an "effective size of parameters" for each function in the function space of the corresponding model. Moreover, we establish a linear stability theory proving that a target function almost surely becomes linearly stable when the training data size equals its model rank. Supported by our experiments, we propose a linear stability hypothesis that linearly stable functions are preferred by nonlinear training. By these results, model rank of a target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Optical Polarization and Ellipsometry · Stochastic Gradient Optimization Techniques