Depth Without the Magic: Inductive Bias of Natural Gradient Descent
Anna Kerekes, Anna M\'esz\'aros, Ferenc Husz\'ar

TL;DR
This paper investigates how natural gradient descent's invariance to reparameterization affects the solutions it finds in deep linear networks and matrix factorization, revealing limitations in generalization compared to standard gradient descent.
Contribution
It characterizes the behavior of natural gradient flow in deep linear models and shows conditions where it fails to generalize, contrasting with standard gradient descent.
Findings
Natural gradient flow follows the same trajectory regardless of parameterization.
Natural gradient descent can fail to generalize in certain problems.
Standard gradient descent with appropriate architecture can outperform natural gradient in generalization.
Abstract
In gradient descent, changing how we parametrize the model can lead to drastically different optimization trajectories, giving rise to a surprising range of meaningful inductive biases: identifying sparse classifiers or reconstructing low-rank matrices without explicit regularization. This implicit regularization has been hypothesised to be a contributing factor to good generalization in deep learning. However, natural gradient descent is approximately invariant to reparameterization, it always follows the same trajectory and finds the same optimum. The question naturally arises: What happens if we eliminate the role of parameterization, which solution will be found, what new properties occur? We characterize the behaviour of natural gradient flow in deep linear networks for separable classification under logistic loss and deep matrix factorization. Some of our findings extend to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Neural Networks and Applications
MethodsNatural Gradient Descent
