Efficient Curvature-Aware Hypergradient Approximation for Bilevel Optimization
Youran Dong, Junfeng Yang, Wei Yao, Jin Zhang

TL;DR
This paper introduces a new curvature-aware method for hypergradient approximation in bilevel optimization, improving computational efficiency and convergence rates, with strong theoretical guarantees and practical performance gains.
Contribution
It presents a novel curvature-aware hypergradient approximation technique with convergence guarantees, enhancing efficiency over existing methods in bilevel optimization.
Findings
Improved convergence rates in deterministic bilevel optimization.
Reduced computational complexity compared to traditional gradient-based methods.
Numerical experiments show significant practical performance improvements.
Abstract
Bilevel optimization is a powerful tool for many machine learning problems, such as hyperparameter optimization and meta-learning. Estimating hypergradients (also known as implicit gradients) is crucial for developing gradient-based methods for bilevel optimization. In this work, we propose a computationally efficient technique for incorporating curvature information into the approximation of hypergradients and present a novel algorithmic framework based on the resulting enhanced hypergradient computation. We provide convergence rate guarantees for the proposed framework in both deterministic and stochastic scenarios, particularly showing improved computational complexity over popular gradient-based methods in the deterministic setting. This improvement in complexity arises from a careful exploitation of the hypergradient structure and the inexact Newton method. In addition to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic processes and financial applications · Optimization and Variational Analysis · Advanced Optimization Algorithms Research
