Convergence Properties of Stochastic Hypergradients
Riccardo Grazzi, Massimiliano Pontil, Saverio Salzo

TL;DR
This paper analyzes stochastic approximation methods for hypergradients in bilevel optimization, providing theoretical bounds and demonstrating practical benefits for large-scale machine learning tasks.
Contribution
It introduces a stochastic variant of implicit differentiation for hypergradients, with error bounds independent of solver choices, and validates the approach through experiments.
Findings
The proposed method achieves bounded mean square error in hypergradient approximation.
Stochastic hypergradients outperform deterministic methods in large-scale settings.
Numerical results confirm theoretical error bounds and practical efficiency.
Abstract
Bilevel optimization problems are receiving increasing attention in machine learning as they provide a natural framework for hyperparameter optimization and meta-learning. A key step to tackle these problems is the efficient computation of the gradient of the upper-level objective (hypergradient). In this work, we study stochastic approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk minimization on a large dataset. The method that we propose is a stochastic variant of the approximate implicit differentiation approach in (Pedregosa, 2016). We provide bounds for the mean square error of the hypergradient approximation, under the assumption that the lower-level problem is accessible only through a stochastic mapping which is a contraction in expectation. In particular, our main bound is agnostic to the choice of the two stochastic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Topology and Set Theory · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques
