An Improved Empirical Fisher Approximation for Natural Gradient Descent
Xiaodong Wu, Wenyi Yu, Chao Zhang, Philip Woodland

TL;DR
This paper introduces an improved empirical Fisher (iEF) method for natural gradient descent that enhances approximation quality, convergence, and robustness in deep learning optimization compared to existing EF and sampled Fisher methods.
Contribution
The paper proposes a novel iEF method addressing EF's limitations, providing better Fisher approximation and optimization performance in deep learning.
Findings
iEF outperforms EF and sampled Fisher in approximation quality.
Applying exact iEF as an optimizer yields strong convergence and generalization.
iEF demonstrates robustness to damping choices across tasks and training stages.
Abstract
Approximate Natural Gradient Descent (NGD) methods are an important family of optimisers for deep learning models, which use approximate Fisher information matrices to pre-condition gradients during training. The empirical Fisher (EF) method approximates the Fisher information matrix empirically by reusing the per-sample gradients collected during back-propagation. Despite its ease of implementation, the EF approximation has its theoretical and practical limitations. This paper investigates the inversely-scaled projection issue of EF, which is shown to be a major cause of its poor empirical approximation quality. An improved empirical Fisher (iEF) method is proposed to address this issue, which is motivated as a generalised NGD method from a loss reduction perspective, meanwhile retaining the practical convenience of EF. The exact iEF and EF methods are experimentally evaluated using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Advanced Measurement and Metrology Techniques · Optical Systems and Laser Technology
MethodsNatural Gradient Descent
