A Trace-restricted Kronecker-Factored Approximation to Natural Gradient
Kai-Xin Gao, Xiao-Lei Liu, Zheng-Hai Huang, Min Wang, Zidong Wang,, Dachuan Xu, Fan Yu

TL;DR
This paper introduces TKFAC, a new second-order optimization method that approximates the Fisher information matrix with trace constraints, improving training efficiency for deep neural networks.
Contribution
We propose TKFAC, a trace-restricted Kronecker-factored approximation to the Fisher matrix, with theoretical error bounds and a novel damping technique for CNNs.
Findings
TKFAC outperforms state-of-the-art algorithms in deep network training.
Theoretical analysis provides an upper bound on approximation error.
A new damping method enhances second-order optimization stability.
Abstract
Second-order optimization methods have the ability to accelerate convergence by modifying the gradient through the curvature matrix. There have been many attempts to use second-order optimization methods for training deep neural networks. Inspired by diagonal approximations and factored approximations such as Kronecker-Factored Approximate Curvature (KFAC), we propose a new approximation to the Fisher information matrix (FIM) called Trace-restricted Kronecker-factored Approximate Curvature (TKFAC) in this work, which can hold the certain trace relationship between the exact and the approximate FIM. In TKFAC, we decompose each block of the approximate FIM as a Kronecker product of two smaller matrices and scaled by a coefficient related to trace. We theoretically analyze TKFAC's approximation error and give an upper bound of it. We also propose a new damping technique for TKFAC on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Face and Expression Recognition · Blind Source Separation Techniques
