Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks
Gauthier Gidel, Francis Bach, Simon Lacoste-Julien

TL;DR
This paper investigates how the discrete gradient dynamics in training two-layer linear neural networks implicitly regularize the solutions, leading to a sequential learning of reduced-rank regression solutions as training progresses.
Contribution
It provides a theoretical analysis of the implicit regularization effect in discrete gradient descent for linear networks, revealing a rank-increasing learning process.
Findings
Gradient dynamics induce a sequential rank-increasing solution
Training with small step size approximates reduced-rank regression
Implicit regularization guides the solution towards low-rank structures
Abstract
When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces biases that will lead to convergence to specific minimizers of the objective. Consequently, this choice can be considered as an implicit regularization for the training of over-parametrized models. In this work, we push this idea further by studying the discrete gradient dynamics of the training of a two-layer linear network with the least-squares loss. Using a time rescaling, we show that, with a vanishing initialization and a small enough step size, this dynamics sequentially learns the solutions of a reduced-rank regression with a gradually increasing rank.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Machine Learning and ELM
