How Implicit Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part I: the 1-D Case of Two Layers with Random First Layer
Jakob Heiss, Josef Teichmann, Hanna Wutte

TL;DR
This paper analyzes how implicit regularization in shallow ReLU neural networks with random first layers influences the learned function, revealing connections to smoothing splines and function regularization.
Contribution
It establishes a mathematical link between L2 regularization in such networks and second derivative regularization of the function, also relating early stopping to smoothing splines.
Findings
Networks converge to smooth spline interpolation as hidden nodes increase
L2 regularization corresponds to second derivative regularization in function space
Early stopping mimics smoothing spline regression
Abstract
In this paper, we consider one dimensional (shallow) ReLU neural networks in which weights are chosen randomly and only the terminal layer is trained. First, we mathematically show that for such networks L2-regularized regression corresponds in function space to regularizing the estimate's second derivative for fairly general loss functionals. For least squares regression, we show that the trained network converges to the smooth spline interpolation of the training data as the number of hidden nodes tends to infinity. Moreover, we derive a novel correspondence between the early stopped gradient descent (without any explicit regularization of the weights) and the smoothing spline regression.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Numerical methods in inverse problems · Model Reduction and Neural Networks
Methods*Communicated@Fast*How Do I Communicate to Expedia?
