Spectral-factorized Positive-definite Curvature Learning for NN Training
Wu Lin, Felix Dangel, Runa Eschenhagen, Juhan Bae, Richard E. Turner,, Roger B. Grosse

TL;DR
This paper introduces a Riemannian optimization method for neural network training that efficiently learns and applies spectral-factorized positive-definite curvature matrices, improving computational efficiency and versatility over existing methods.
Contribution
It proposes a novel spectral-factorized Riemannian approach enabling efficient arbitrary matrix root computations for curvature learning in neural network training.
Findings
Effective curvature learning for neural networks.
Improved efficiency over traditional matrix decomposition methods.
Versatile application in covariance adaptation and gradient-free optimization.
Abstract
Many training methods, such as Adam(W) and Shampoo, learn a positive-definite curvature matrix and apply an inverse root before preconditioning. Recently, non-diagonal training methods, such as Shampoo, have gained significant attention; however, they remain computationally inefficient and are limited to specific types of curvature information due to the costly matrix root computation via matrix decomposition. To address this, we propose a Riemannian optimization approach that dynamically adapts spectral-factorized positive-definite curvature estimates, enabling the efficient application of arbitrary matrix roots and generic curvature learning. We demonstrate the efficacy and versatility of our approach in positive-definite matrix optimization and covariance adaptation for gradient-free optimization, as well as its efficiency in curvature learning for neural net training.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging and Analysis · Neural Networks and Applications · Domain Adaptation and Few-Shot Learning
