On the Iteration Complexity of Hypergradient Computation
Riccardo Grazzi, Luca Franceschi, Massimiliano Pontil, Saverio Salzo

TL;DR
This paper analyzes the iteration complexity of methods for computing hypergradients in bilevel problems, providing theoretical bounds and empirical comparisons to identify the most efficient approach, especially favoring approximate implicit differentiation with conjugate gradient.
Contribution
It offers a unified theoretical framework for comparing hypergradient computation methods in bilevel problems, including explicit bounds and efficiency hierarchy.
Findings
Approximate implicit differentiation with conjugate gradient is most efficient.
Theoretical bounds for iteration complexity are established for various methods.
Experimental results confirm the theoretical efficiency hierarchy.
Abstract
We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of the upper-level objective (hypergradient) is hard or even impossible to compute exactly, which has raised the interest in approximation methods. We investigate some popular approaches to compute the hypergradient, based on reverse mode iterative differentiation and approximate implicit differentiation. Under the hypothesis that the fixed point equation is defined by a contraction mapping, we present a unified analysis which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsModel Reduction and Neural Networks · Medical Imaging Techniques and Applications · Tensor decomposition and applications
