Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes
Dan Qiao, Kaiqi Zhang, Esha Singh, Daniel Soudry, Yu-Xiang Wang

TL;DR
This paper develops a new theory showing that gradient descent with large step sizes in univariate ReLU networks converges to smooth, stable minima that generalize well, even in noisy, nonparametric regression settings.
Contribution
It introduces a novel generalization framework based on minima stability for univariate ReLU networks trained with large step sizes, demonstrating near-optimal rates without regularization.
Findings
Gradient descent with fixed large step size converges to smooth minima.
Stable minima have bounded weighted first order total variation.
Achieves near-optimal MSE bounds of rom data.
Abstract
We study the generalization of two-layer ReLU neural networks in a univariate nonparametric regression problem with noisy labels. This is a problem where kernels (\emph{e.g.} NTK) are provably sub-optimal and benign overfitting does not happen, thus disqualifying existing theory for interpolating (0-loss, global optimal) solutions. We present a new theory of generalization for local minima that gradient descent with a constant learning rate can \emph{stably} converge to. We show that gradient descent with a fixed learning rate can only find local minima that represent smooth functions with a certain weighted \emph{first order total variation} bounded by where is the label noise level, is short for mean squared error against the ground truth, and hides a logarithmic factor.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsVLSI and FPGA Design Techniques
