Natural gradient descent with momentum
Anthony Nouy, Agust\'in Somacal

TL;DR
This paper introduces a natural gradient descent method with momentum to improve optimization in nonlinear models like neural networks, addressing local minima and conditioning issues.
Contribution
It proposes a natural inertia-based approach, combining natural gradient descent with momentum techniques to enhance learning in nonlinear approximation models.
Findings
The method improves convergence in nonlinear model training.
It addresses issues of local minima and poor conditioning.
Experimental results demonstrate enhanced optimization performance.
Abstract
We consider the problem of approximating a function by an element of a nonlinear manifold which admits a differentiable parametrization, typical examples being neural networks with differentiable activation functions or tensor networks. Natural gradient descent (NGD) for the optimization of a loss function can be seen as a preconditioned gradient descent where updates in the parameter space are driven by a functional perspective. In a spirit similar to Newton's method, a NGD step uses, instead of the Hessian, the Gram matrix of the generating system of the tangent space to the approximation manifold at the current iterate, with respect to a suitable metric. This corresponds to a locally optimal update in function space, following a projected gradient onto the tangent space to the manifold. Still, both gradient and natural gradient descent methods get stuck in local minima. Furthermore,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
