Fast Unconstrained Optimization via Hessian Averaging and Adaptive Gradient Sampling Methods
Thomas O'Leary-Roseberry, Raghu Bollapragada

TL;DR
This paper introduces Hessian-averaged methods with adaptive sampling for efficient unconstrained optimization, achieving improved convergence rates and practical scalability for both finite-sum and expectation problems, including deep learning tasks.
Contribution
It develops Hessian-averaged algorithms that incorporate gradient inexactness and adaptive sampling, providing new convergence guarantees and scalable variants like the diagonally-averaged Newton method.
Findings
Achieves local superlinear convergence of (1/k) for strongly convex functions.
Provides global linear and sublinear convergence rates for finite-sum problems.
Demonstrates state-of-the-art performance on CIFAR100 classification with ResNets.
Abstract
We consider minimizing finite-sum and expectation objective functions via Hessian-averaging based subsampled Newton methods. These methods allow for gradient inexactness and have fixed per-iteration Hessian approximation costs. The recent work (Na et al. 2023) demonstrated that Hessian averaging can be utilized to achieve fast local superlinear convergence for strongly convex functions in high probability, while maintaining fixed per-iteration Hessian costs. These methods, however, require gradient exactness and strong convexity, which poses challenges for their practical implementation. To address this concern we consider Hessian-averaged methods that allow gradient inexactness via norm condition based adaptive-sampling strategies. For the finite-sum problem we utilize deterministic sampling techniques which lead to global linear and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Image and Signal Denoising Methods · Stochastic Gradient Optimization Techniques
