Depth Without the Magic: Inductive Bias of Natural Gradient Descent

Anna Kerekes; Anna M\'esz\'aros; Ferenc Husz\'ar

arXiv:2111.11542·stat.ML·November 24, 2021

Depth Without the Magic: Inductive Bias of Natural Gradient Descent

Anna Kerekes, Anna M\'esz\'aros, Ferenc Husz\'ar

PDF

Open Access

TL;DR

This paper investigates how natural gradient descent's invariance to reparameterization affects the solutions it finds in deep linear networks and matrix factorization, revealing limitations in generalization compared to standard gradient descent.

Contribution

It characterizes the behavior of natural gradient flow in deep linear models and shows conditions where it fails to generalize, contrasting with standard gradient descent.

Findings

01

Natural gradient flow follows the same trajectory regardless of parameterization.

02

Natural gradient descent can fail to generalize in certain problems.

03

Standard gradient descent with appropriate architecture can outperform natural gradient in generalization.

Abstract

In gradient descent, changing how we parametrize the model can lead to drastically different optimization trajectories, giving rise to a surprising range of meaningful inductive biases: identifying sparse classifiers or reconstructing low-rank matrices without explicit regularization. This implicit regularization has been hypothesised to be a contributing factor to good generalization in deep learning. However, natural gradient descent is approximately invariant to reparameterization, it always follows the same trajectory and finds the same optimum. The question naturally arises: What happens if we eliminate the role of parameterization, which solution will be found, what new properties occur? We characterize the behaviour of natural gradient flow in deep linear networks for separable classification under logistic loss and deep matrix factorization. Some of our findings extend to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Neural Networks and Applications

MethodsNatural Gradient Descent