Nonasymptotic theory for two-layer neural networks: Beyond the bias-variance trade-off
Huiyuan Wang, Wei Lin

TL;DR
This paper develops a nonasymptotic generalization theory for two-layer neural networks, explaining their effectiveness in overparametrized regimes and revealing the double descent phenomenon through new prediction bounds.
Contribution
It introduces a scaled variation regularization framework that unifies ridge and lasso effects, providing sharp bounds and insights into overparametrized neural networks' generalization.
Findings
Overparametrized networks can outperform underparametrized ones with strong signals.
The theory reproduces the double descent phenomenon.
Random feature models are shown to be suboptimal due to the curse of dimensionality.
Abstract
Large neural networks have proved remarkably effective in modern deep learning practice, even in the overparametrized regime where the number of active parameters is large relative to the sample size. This contradicts the classical perspective that a machine learning model must trade off bias and variance for optimal generalization. To resolve this conflict, we present a nonasymptotic generalization theory for two-layer neural networks with ReLU activation function by incorporating scaled variation regularization. Interestingly, the regularizer is equivalent to ridge regression from the angle of gradient-based optimization, but plays a similar role to the group lasso in controlling the model complexity. By exploiting this "ridge-lasso duality," we obtain new prediction bounds for all network widths, which reproduce the double descent phenomenon. Moreover, the overparametrized minimum…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Gaussian Processes and Bayesian Inference
