Riemannian Natural Gradient Methods
Jiang Hu, Ruicheng Ao, Anthony Man-Cho So, Minghan Yang, and Zaiwen, Wen

TL;DR
This paper introduces a Riemannian natural gradient method for large-scale optimization on manifolds, establishing its convergence properties and demonstrating its effectiveness in neural network training with batch normalization.
Contribution
It extends the natural gradient method to Riemannian manifolds, providing convergence analysis and practical validation in neural network applications.
Findings
Proposed method converges globally under standard assumptions.
Achieves local linear or quadratic convergence rates under certain conditions.
Neural network Jacobian stability is satisfied with high probability for wide networks.
Abstract
This paper studies large-scale optimization problems on Riemannian manifolds whose objective function is a finite sum of negative log-probability losses. Such problems arise in various machine learning and signal processing applications. By introducing the notion of Fisher information matrix in the manifold setting, we propose a novel Riemannian natural gradient method, which can be viewed as a natural extension of the natural gradient method from the Euclidean setting to the manifold setting. We establish the almost-sure global convergence of our proposed method under standard assumptions. Moreover, we show that if the loss function satisfies certain convexity and smoothness conditions and the input-output map satisfies a Riemannian Jacobian stability condition, then our proposed method enjoys a local linear -- or, under the Lipschitz continuity of the Riemannian Jacobian of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Neural Networks and Applications · Advanced Numerical Analysis Techniques
MethodsBatch Normalization
