Eigenvalue-corrected Natural Gradient Based on a New Approximation
Kai-Xin Gao, Xiao-Lei Liu, Zheng-Hai Huang, Min Wang, Shuangling Wang,, Zidong Wang, Dachuan Xu, Fan Yu

TL;DR
This paper introduces TEKFAC, a novel second-order optimization method for deep neural networks that combines eigenvalue correction, a new Fisher information matrix approximation, and damping techniques, leading to improved training performance.
Contribution
The paper proposes TEKFAC, integrating eigenvalue correction with a new Fisher approximation and damping, advancing second-order optimization methods for DNN training.
Findings
TEKFAC outperforms SGD with momentum, Adam, EKFAC, and TKFAC in experiments.
The method effectively corrects re-scaling factors in the eigenbasis.
Empirical results show improved convergence and training efficiency.
Abstract
Using second-order optimization methods for training deep neural networks (DNNs) has attracted many researchers. A recently proposed method, Eigenvalue-corrected Kronecker Factorization (EKFAC) (George et al., 2018), proposes an interpretation of viewing natural gradient update as a diagonal method, and corrects the inaccurate re-scaling factor in the Kronecker-factored eigenbasis. Gao et al. (2020) considers a new approximation to the natural gradient, which approximates the Fisher information matrix (FIM) to a constant multiplied by the Kronecker product of two matrices and keeps the trace equal before and after the approximation. In this work, we combine the ideas of these two methods and propose Trace-restricted Eigenvalue-corrected Kronecker Factorization (TEKFAC). The proposed method not only corrects the inexact re-scaling factor under the Kronecker-factored eigenbasis, but also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsStochastic Gradient Descent · Adam
