A Computationally Efficient Sparsified Online Newton Method
Fnu Devvrit, Sai Surya Duvvuri, Rohan Anil, Vineet Gupta, Cho-Jui, Hsieh, Inderjit Dhillon

TL;DR
The paper introduces SONew, a scalable, memory-efficient second-order optimization method that employs sparsity and the LogDet divergence to accelerate training of large neural networks with minimal overhead.
Contribution
SONew is a novel sparsified second-order method that achieves faster convergence and better performance on large-scale benchmarks with low computational overhead.
Findings
Up to 30% faster convergence compared to first-order optimizers.
3.4% relative improvement in validation performance.
80% relative reduction in training loss.
Abstract
Second-order methods hold significant promise for enhancing the convergence of deep neural network training; however, their large memory and computational demands have limited their practicality. Thus there is a need for scalable second-order methods that can efficiently train large models. In this paper, we introduce the Sparsified Online Newton (SONew) method, a memory-efficient second-order algorithm that yields a sparsified yet effective preconditioner. The algorithm emerges from a novel use of the LogDet matrix divergence measure; we combine it with sparsity constraints to minimize regret in the online convex optimization framework. Empirically, we test our method on large scale benchmarks of up to 1B parameters. We achieve up to 30% faster convergence, 3.4% relative improvement in validation performance, and 80% relative improvement in training loss, in comparison to memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM
