Rethinking Gauss-Newton for learning over-parameterized models
Michael Arbel, Romain Menegaux, Pierre Wolinski

TL;DR
This paper analyzes the convergence and implicit bias of Gauss-Newton optimization in over-parameterized neural networks, revealing a trade-off between convergence speed and generalization ability.
Contribution
It provides the first global convergence analysis of Gauss-Newton in the mean-field regime and explores its implicit bias through empirical studies.
Findings
GN converges faster than gradient descent due to better conditioning.
Under certain conditions, GN finds solutions with good generalization.
A trade-off exists between convergence speed and generalization in GN.
Abstract
This work studies the global convergence and implicit bias of Gauss Newton's (GN) when optimizing over-parameterized one-hidden layer networks in the mean-field regime. We first establish a global convergence result for GN in the continuous-time limit exhibiting a faster convergence rate compared to GD due to improved conditioning. We then perform an empirical study on a synthetic regression task to investigate the implicit bias of GN's method. While GN is consistently faster than GD in finding a global optimum, the learned model generalizes well on test data when starting from random initial weights with a small variance and using a small step size to slow down convergence. Specifically, our study shows that such a setting results in a hidden learning phenomenon, where the dynamics are able to recover features with good generalization properties despite the model having sub-optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Face and Expression Recognition · Stochastic Gradient Optimization Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
