A Computationally Efficient Sparsified Online Newton Method

Fnu Devvrit; Sai Surya Duvvuri; Rohan Anil; Vineet Gupta; Cho-Jui; Hsieh; Inderjit Dhillon

arXiv:2311.10085·cs.LG·November 17, 2023·1 cites

A Computationally Efficient Sparsified Online Newton Method

Fnu Devvrit, Sai Surya Duvvuri, Rohan Anil, Vineet Gupta, Cho-Jui, Hsieh, Inderjit Dhillon

PDF

Open Access

TL;DR

The paper introduces SONew, a scalable, memory-efficient second-order optimization method that employs sparsity and the LogDet divergence to accelerate training of large neural networks with minimal overhead.

Contribution

SONew is a novel sparsified second-order method that achieves faster convergence and better performance on large-scale benchmarks with low computational overhead.

Findings

01

Up to 30% faster convergence compared to first-order optimizers.

02

3.4% relative improvement in validation performance.

03

80% relative reduction in training loss.

Abstract

Second-order methods hold significant promise for enhancing the convergence of deep neural network training; however, their large memory and computational demands have limited their practicality. Thus there is a need for scalable second-order methods that can efficiently train large models. In this paper, we introduce the Sparsified Online Newton (SONew) method, a memory-efficient second-order algorithm that yields a sparsified yet effective preconditioner. The algorithm emerges from a novel use of the LogDet matrix divergence measure; we combine it with sparsity constraints to minimize regret in the online convex optimization framework. Empirically, we test our method on large scale benchmarks of up to 1B parameters. We achieve up to 30% faster convergence, 3.4% relative improvement in validation performance, and 80% relative improvement in training loss, in comparison to memory…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM