On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias
Itay Safran, Gal Vardi, Jason D. Lee

TL;DR
This paper analyzes the convergence and implicit bias of gradient flow in shallow univariate ReLU networks, showing they tend to simplify their decision boundaries to at most proportional to the number of target neurons, with implications for generalization.
Contribution
It provides the first convergence guarantees for gradient flow in shallow ReLU networks with implicit bias towards networks with limited linear regions, even under mild over-parameterization.
Findings
Gradient flow converges to a network with at most O(r) linear regions.
The result holds with high probability over initialization and data sampling.
Implications for generalization bounds in shallow neural networks.
Abstract
We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural networks with a single hidden layer in a binary classification setting. We show that when the labels are determined by the sign of a target network with neurons, with high probability over the initialization of the network and the sampling of the dataset, GF converges in direction (suitably defined) to a network achieving perfect training accuracy and having at most linear regions, implying a generalization bound. Unlike many other results in the literature, under an additional assumption on the distribution of the data, our result holds even for mild over-parameterization, where the width is and independent of the sample size.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques
