Natural gradient descent with momentum

Anthony Nouy; Agust\'in Somacal

arXiv:2604.15554·cs.LG·April 20, 2026

Natural gradient descent with momentum

Anthony Nouy, Agust\'in Somacal

PDF

TL;DR

This paper introduces a natural gradient descent method with momentum to improve optimization in nonlinear models like neural networks, addressing local minima and conditioning issues.

Contribution

It proposes a natural inertia-based approach, combining natural gradient descent with momentum techniques to enhance learning in nonlinear approximation models.

Findings

01

The method improves convergence in nonlinear model training.

02

It addresses issues of local minima and poor conditioning.

03

Experimental results demonstrate enhanced optimization performance.

Abstract

We consider the problem of approximating a function by an element of a nonlinear manifold which admits a differentiable parametrization, typical examples being neural networks with differentiable activation functions or tensor networks. Natural gradient descent (NGD) for the optimization of a loss function can be seen as a preconditioned gradient descent where updates in the parameter space are driven by a functional perspective. In a spirit similar to Newton's method, a NGD step uses, instead of the Hessian, the Gram matrix of the generating system of the tangent space to the approximation manifold at the current iterate, with respect to a suitable metric. This corresponds to a locally optimal update in function space, following a projected gradient onto the tangent space to the manifold. Still, both gradient and natural gradient descent methods get stuck in local minima. Furthermore,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.