TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion
Saeed Soori, Bugra Can, Baourun Mu, Mert G\"urb\"uzbalaban, Maryam, Mehri Dehnavi

TL;DR
TENGraD introduces a time-efficient natural gradient descent method that computes Fisher block inverses exactly, achieving faster convergence and improved performance over existing NGD methods in image classification tasks.
Contribution
It presents a novel Fisher block inversion technique using Woodbury identity, enabling exact inverse computation efficiently and enhancing NGD's practical applicability.
Findings
TENGraD outperforms existing NGD methods in wall-clock time.
It achieves linear convergence guarantees.
It improves accuracy and convergence speed on CIFAR and Fashion-MNIST datasets.
Abstract
This work proposes a time-efficient Natural Gradient Descent method, called TENGraD, with linear convergence guarantees. Computing the inverse of the neural network's Fisher information matrix is expensive in NGD because the Fisher matrix is large. Approximate NGD methods such as KFAC attempt to improve NGD's running time and practical application by reducing the Fisher matrix inversion cost with approximation. However, the approximations do not reduce the overall time significantly and lead to less accurate parameter updates and loss of curvature information. TENGraD improves the time efficiency of NGD by computing Fisher block inverses with a computationally efficient covariance factorization and reuse method. It computes the inverse of each block exactly using the Woodbury matrix identity to preserve curvature information while admitting (linear) fast convergence rates. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Medical Image Segmentation Techniques
MethodsNatural Gradient Descent
