Linear regression with overparameterized linear neural networks: Tight upper and lower bounds for implicit $\ell^1$-regularization
Hannes Matt, Dominik St\"oger

TL;DR
This paper analyzes how overparameterized diagonal linear neural networks implicitly regularize towards sparse solutions, deriving tight bounds on approximation errors that depend on network depth and initialization scale.
Contribution
It provides the first precise bounds on the approximation error of implicit -regularization in deep diagonal linear neural networks, revealing depth-dependent behaviors.
Findings
For -regularization, error decreases linearly with initialization scale for depth or more.
For depth 2, the error decreases at a rate lpha^{1- ho}, with ounded .
Deeper networks ( ) potentially offer better generalization in practice.
Abstract
Modern machine learning models are often trained in a setting where the number of parameters exceeds the number of training samples. To understand the implicit bias of gradient descent in such overparameterized models, prior work has studied diagonal linear neural networks in the regression setting. These studies have shown that, when initialized with small weights, gradient descent tends to favor solutions with minimal -norm - an effect known as implicit regularization. In this paper, we investigate implicit regularization in diagonal linear neural networks of depth for overparameterized linear regression problems. We focus on analyzing the approximation error between the limit point of gradient flow trajectories and the solution to the -minimization problem. By deriving tight upper and lower bounds on the approximation error, we precisely characterize how the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Numerical methods in inverse problems · Neural Networks and Applications
MethodsFocus · Linear Regression
