Component-Wise Natural Gradient Descent -- An Efficient Neural Network Optimization
Tran Van Sang, Mhd Irvan, Rie Shigetomi Yamaguchi, Toshiyuki, Nakata

TL;DR
This paper introduces CW-NGD, a novel efficient second-order optimization method for neural networks that approximates the Fisher Information Matrix in a way that simplifies inversion, leading to faster convergence.
Contribution
The paper proposes a new layer-structure-aware decomposition of the Fisher Information Matrix for efficient natural gradient descent in neural networks.
Findings
Requires fewer iterations to converge compared to existing methods.
Effective for networks with dense and convolutional layers.
Supports practical second-order optimization in neural training.
Abstract
Natural Gradient Descent (NGD) is a second-order neural network training that preconditions the gradient descent with the inverse of the Fisher Information Matrix (FIM). Although NGD provides an efficient preconditioner, it is not practicable due to the expensive computation required when inverting the FIM. This paper proposes a new NGD variant algorithm named Component-Wise Natural Gradient Descent (CW-NGD). CW-NGD is composed of 2 steps. Similar to several existing works, the first step is to consider the FIM matrix as a block-diagonal matrix whose diagonal blocks correspond to the FIM of each layer's weights. In the second step, unique to CW-NGD, we analyze the layer's structure and further decompose the layer's FIM into smaller segments whose derivatives are approximately independent. As a result, individual layers' FIMs are approximated in a block-diagonal form that trivially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Neural Networks and Applications
MethodsNatural Gradient Descent
