Implicit Regularization of Discrete Gradient Dynamics in Linear Neural   Networks

Gauthier Gidel; Francis Bach; Simon Lacoste-Julien

arXiv:1904.13262·cs.LG·December 6, 2019·27 cites

Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks

Gauthier Gidel, Francis Bach, Simon Lacoste-Julien

PDF

Open Access 1 Repo

TL;DR

This paper investigates how the discrete gradient dynamics in training two-layer linear neural networks implicitly regularize the solutions, leading to a sequential learning of reduced-rank regression solutions as training progresses.

Contribution

It provides a theoretical analysis of the implicit regularization effect in discrete gradient descent for linear networks, revealing a rank-increasing learning process.

Findings

01

Gradient dynamics induce a sequential rank-increasing solution

02

Training with small step size approximates reduced-rank regression

03

Implicit regularization guides the solution towards low-rank structures

Abstract

When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces biases that will lead to convergence to specific minimizers of the objective. Consequently, this choice can be considered as an implicit regularization for the training of over-parametrized models. In this work, we push this idea further by studying the discrete gradient dynamics of the training of a two-layer linear network with the least-squares loss. Using a time rescaling, we show that, with a vanishing initialization and a small enough step size, this dynamics sequentially learns the solutions of a reduced-rank regression with a gradually increasing rank.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GauthierGidel/Implicit-Regularization-of-Discrete-Gradient-Dynamics-in-Linear-Neural-Networks
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Machine Learning and ELM