Efficient gradient-based methods for bilevel learning via recycling Krylov subspaces
Matthias J. Ehrhardt, Silvia Gazzola, Sebastian J. Scott

TL;DR
This paper introduces a novel recycling Krylov subspace method using Ritz generalized singular vectors to efficiently compute hypergradients in bilevel learning, significantly reducing computational costs in inverse imaging problems.
Contribution
It proposes a new recycling strategy based on Ritz generalized singular vectors and a stopping criterion that directly estimates hypergradient error, advancing bilevel optimization techniques.
Findings
Reduces computational cost of hypergradient computation in bilevel learning.
Improves convergence and accuracy with the new recycling strategy.
Validated through extensive inverse imaging experiments.
Abstract
Many optimization problems require hyperparameters, i.e., parameters that must be pre-specified in advance, such as regularization parameters and parametric regularizers in variational regularization methods for inverse problems, and dictionaries in compressed sensing. A data-driven approach to determine appropriate hyperparameter values is via a nested optimization framework known as bilevel learning. Even when it is possible to employ a gradient-based solver to the bilevel optimization problem, construction of the gradients, known as hypergradients, is computationally challenging, each one requiring both a solution of a minimization problem and a linear system solve. These systems do not change much during the iterations, which motivates us to apply recycling Krylov subspace methods, wherein information from one linear system solve is re-used to solve the next linear system. Existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
