Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update
Micha{\l} Derezi\'nski, Jonathan Lacotte, Mert Pilanci, Michael W., Mahoney

TL;DR
Newton-LESS introduces a sparsification technique for sketching matrices in second-order optimization, significantly reducing computational costs while maintaining near-optimal convergence guarantees.
Contribution
The paper proposes Newton-LESS, a sparsified sketching method that retains the convergence properties of dense Gaussian sketches, enabling more efficient second-order optimization.
Findings
Newton-LESS achieves similar convergence rates as dense Gaussian sketches.
Sparsified embeddings reduce computational costs substantially.
The method performs well in numerical experiments.
Abstract
In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration. Randomized sketching has emerged as a powerful technique for constructing estimates of the Hessian which can be used to perform approximate Newton steps. This involves multiplication by a random sketching matrix, which introduces a trade-off between the computational cost of sketching and the convergence rate of the optimization algorithm. A theoretically desirable but practically much too expensive choice is to use a dense Gaussian sketching matrix, which produces unbiased estimates of the exact Newton step and which offers strong problem-independent convergence guarantees. We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching, without substantially affecting its convergence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Tensor decomposition and applications
