Beyond the Mean: Fisher-Orthogonal Projection for Natural Gradient Descent in Large Batch Training
Yishun Lu, Wesley Armour

TL;DR
This paper introduces Fisher-Orthogonal Projection (FOP), a new method that enhances natural gradient descent at large batch sizes, leading to better scalability, convergence, and generalization in training neural networks.
Contribution
FOP restores the effectiveness of second-order natural gradient methods at large batch sizes by constructing variance-aware updates using orthogonal gradient components.
Findings
FOP improves training stability and convergence speed at large batch sizes.
FOP enhances model generalization compared to standard methods.
FOP maintains curvature information with lower damping requirements.
Abstract
Modern GPUs are equipped with large amounts of high-bandwidth memory, enabling them to support mini-batch sizes of up to tens of thousands of training samples. However, most existing optimizers struggle to perform effectively at such a large batch size. As batch size increases, gradient noise decreases due to averaging over many samples, limiting the ability of first-order methods to escape sharp or suboptimal minima and reach the global minimum. Meanwhile, second-order methods like the natural gradient with Kronecker-Factored Approximate Curvature (KFAC) often require excessively high damping to remain stable at large batch sizes. This high damping effectively washes out the curvature information that gives these methods their advantage, reducing their performance to that of simple gradient descent. In this paper, we introduce Fisher-Orthogonal Projection (FOP), a novel technique that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPhotoacoustic and Ultrasonic Imaging · Optical measurement and interference techniques · Spectroscopy Techniques in Biomedical and Chemical Research
