Newton-LESS: Sparsification without Trade-offs for the Sketched Newton   Update

Micha{\l} Derezi\'nski; Jonathan Lacotte; Mert Pilanci; Michael W.; Mahoney

arXiv:2107.07480·math.OC·July 16, 2021·6 cites

Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update

Micha{\l} Derezi\'nski, Jonathan Lacotte, Mert Pilanci, Michael W., Mahoney

PDF

Open Access 1 Repo 1 Video

TL;DR

Newton-LESS introduces a sparsification technique for sketching matrices in second-order optimization, significantly reducing computational costs while maintaining near-optimal convergence guarantees.

Contribution

The paper proposes Newton-LESS, a sparsified sketching method that retains the convergence properties of dense Gaussian sketches, enabling more efficient second-order optimization.

Findings

01

Newton-LESS achieves similar convergence rates as dense Gaussian sketches.

02

Sparsified embeddings reduce computational costs substantially.

03

The method performs well in numerical experiments.

Abstract

In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration. Randomized sketching has emerged as a powerful technique for constructing estimates of the Hessian which can be used to perform approximate Newton steps. This involves multiplication by a random sketching matrix, which introduces a trade-off between the computational cost of sketching and the convergence rate of the optimization algorithm. A theoretically desirable but practically much too expensive choice is to use a dense Gaussian sketching matrix, which produces unbiased estimates of the exact Newton step and which offers strong problem-independent convergence guarantees. We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching, without substantially affecting its convergence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lessketching/newtonsketch
pytorchOfficial

Videos

Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update· slideslive

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Tensor decomposition and applications