Implicit Regularization in ReLU Networks with the Square Loss
Gal Vardi, Ohad Shamir

TL;DR
This paper investigates the implicit regularization effects of gradient descent in ReLU neural networks with square loss, revealing fundamental limitations in characterizing these effects explicitly and suggesting the need for new theoretical frameworks.
Contribution
It proves that implicit regularization cannot be fully characterized by explicit functions of parameters in simple ReLU models, highlighting the complexity of nonlinear neural network regularization.
Findings
Implicit regularization cannot be explicitly characterized for single ReLU neurons.
For one hidden-layer networks, only the 'balancedness' property can be characterized explicitly.
Results indicate the need for more general frameworks to understand implicit regularization in nonlinear models.
Abstract
Understanding the implicit regularization (or implicit bias) of gradient descent has recently been a very active research area. However, the implicit regularization in nonlinear neural networks is still poorly understood, especially for regression losses such as the square loss. Perhaps surprisingly, we prove that even for a single ReLU neuron, it is impossible to characterize the implicit regularization with the square loss by any explicit function of the model parameters (although on the positive side, we show it can be characterized approximately). For one hidden-layer networks, we prove a similar result, where in general it is impossible to characterize implicit regularization properties in this manner, except for the "balancedness" property identified in Du et al. [2018]. Our results suggest that a more general framework than the one considered so far may be needed to understand…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Machine Learning and ELM
Methods*Communicated@Fast*How Do I Communicate to Expedia?
