Implicit bias of SGD in $L_{2}$-regularized linear DNNs: One-way jumps   from high to low rank

Zihan Wang; Arthur Jacot

arXiv:2305.16038·cs.LG·October 2, 2023·1 cites

Implicit bias of SGD in $L_{2}$-regularized linear DNNs: One-way jumps from high to low rank

Zihan Wang, Arthur Jacot

PDF

Open Access 1 Video

TL;DR

This paper investigates the implicit bias of stochastic gradient descent (SGD) in deep linear neural networks with $L_2$ regularization, showing that SGD tends to jump from higher to lower rank minima with zero probability of returning, influencing convergence to optimal solutions.

Contribution

It introduces a probabilistic framework demonstrating SGD's tendency to move from higher to lower rank minima in $L_2$-regularized deep linear networks, revealing a one-way jump behavior.

Findings

01

SGD can probabilistically jump from high to low rank minima.

02

The probability of jumping back from low to high rank minima is zero.

03

SGD's behavior is characterized by absorbing sets for different ranks.

Abstract

The $L_{2}$ -regularized loss of Deep Linear Networks (DLNs) with more than one hidden layers has multiple local minima, corresponding to matrices with different ranks. In tasks such as matrix completion, the goal is to converge to the local minimum with the smallest rank that still fits the training data. While rank-underestimating minima can be avoided since they do not fit the data, GD might get stuck at rank-overestimating minima. We show that with SGD, there is always a probability to jump from a higher rank minimum to a lower rank one, but the probability of jumping back is zero. More precisely, we define a sequence of sets $B_{1} \subset B_{2} \subset \dots \subset B_{R}$ so that $B_{r}$ contains all minima of rank $r$ or less (and not more) that are absorbing for small enough ridge parameters $λ$ and learning rates $η$ : SGD has prob. 0 of leaving $B_{r}$ , and from any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Implicit bias of SGD in $L_2$-regularized linear DNNs: One-way jumps from high to low rank· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Sparse and Compressive Sensing Techniques · Face and Expression Recognition

MethodsStochastic Gradient Descent