Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization   by Large Step Sizes

Dan Qiao; Kaiqi Zhang; Esha Singh; Daniel Soudry; Yu-Xiang Wang

arXiv:2406.06838·cs.LG·June 12, 2024

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes

Dan Qiao, Kaiqi Zhang, Esha Singh, Daniel Soudry, Yu-Xiang Wang

PDF

Open Access 1 Video

TL;DR

This paper develops a new theory showing that gradient descent with large step sizes in univariate ReLU networks converges to smooth, stable minima that generalize well, even in noisy, nonparametric regression settings.

Contribution

It introduces a novel generalization framework based on minima stability for univariate ReLU networks trained with large step sizes, demonstrating near-optimal rates without regularization.

Findings

01

Gradient descent with fixed large step size converges to smooth minima.

02

Stable minima have bounded weighted first order total variation.

03

Achieves near-optimal MSE bounds of rom data.

Abstract

We study the generalization of two-layer ReLU neural networks in a univariate nonparametric regression problem with noisy labels. This is a problem where kernels (\emph{e.g.} NTK) are provably sub-optimal and benign overfitting does not happen, thus disqualifying existing theory for interpolating (0-loss, global optimal) solutions. We present a new theory of generalization for local minima that gradient descent with a constant learning rate can \emph{stably} converge to. We show that gradient descent with a fixed learning rate $η$ can only find local minima that represent smooth functions with a certain weighted \emph{first order total variation} bounded by $1/ η - 1/2 + O (σ + MSE)$ where $σ$ is the label noise level, $MSE$ is short for mean squared error against the ground truth, and $O (\cdot)$ hides a logarithmic factor.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes· slideslive

Taxonomy

TopicsVLSI and FPGA Design Techniques