Algorithmic Regularization in Learning Deep Homogeneous Models: Layers   are Automatically Balanced

Simon S. Du; Wei Hu; Jason D. Lee

arXiv:1806.00900·cs.LG·November 1, 2018·22 cites

Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

Simon S. Du, Wei Hu, Jason D. Lee

PDF

Open Access

TL;DR

This paper demonstrates that gradient descent inherently balances layer norms in deep homogeneous neural networks and low-rank matrix factorization, leading to implicit regularization and convergence to global optima without explicit regularization.

Contribution

It provides rigorous proofs that gradient flow and gradient descent enforce layer norm invariance and balance, offering new insights into implicit regularization in deep learning models.

Findings

01

Gradient flow maintains invariance of squared norms across layers.

02

Gradient descent with positive step size balances low-rank factors.

03

Constant step size gradient descent converges linearly to the global minimum in rank-1 matrix factorization.

Abstract

We study the implicit regularization imposed by gradient descent for learning multi-layer homogeneous functions including feed-forward fully connected and convolutional deep neural networks with linear, ReLU or Leaky ReLU activation. We rigorously prove that gradient flow (i.e. gradient descent with infinitesimal step size) effectively enforces the differences between squared norms across different layers to remain invariant without any explicit regularization. This result implies that if the weights are initially small, gradient flow automatically balances the magnitudes of all layers. Using a discretization argument, we analyze gradient descent with positive step size for the non-convex low-rank asymmetric matrix factorization problem without any regularization. Inspired by our findings for gradient flow, we prove that gradient descent with step sizes $\eta_t = O\left(t^{-\left(…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Matrix Theory and Algorithms

Methods*Communicated@Fast*How Do I Communicate to Expedia? · HuMan(Expedia)||How do I get a human at Expedia?