Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering
Ricky T. Q. Chen, Dami Choi, Lukas Balles, David Duvenaud, Philipp, Hennig

TL;DR
This paper introduces a curvature-aware, self-tuning stochastic optimizer that leverages Hessian-vector products to adaptively improve gradient estimates, aiming to reduce hyperparameter tuning and enhance convergence.
Contribution
It proposes a novel optimizer using exact Hessian-vector products for curvature correction, enabling hyperparameter-free, noise-adaptive gradient updates with proven convergence in quadratic settings.
Findings
Matches performance of well-tuned optimizers in deep learning tasks
Converges in noisy quadratic settings
Provides a step towards self-tuning optimization algorithms
Abstract
Standard first-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as the curvature can help de-sensitize common hyperparameters. Based on this intuition, we explore the use of exact per-sample Hessian-vector products and gradients to construct optimizers that are self-tuning and hyperparameter-free. Based on a dynamics model of the gradient, we derive a process which leads to a curvature-corrected, noise-adaptive online gradient estimate. The smoothness of our updates makes it more amenable to simple step size selection schemes, which we also base off of our estimates quantities. We prove that our model-based procedure converges in the noisy quadratic setting. Though we do not see similar gains in deep learning tasks, we can match the performance of well-tuned optimizers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Advanced Neural Network Applications
