Bilevel Learning with Inexact Stochastic Gradients
Mohammad Sadegh Salehi, Subhadip Mukherjee, Lindon Roberts, Matthias J. Ehrhardt

TL;DR
This paper introduces a stochastic bilevel learning framework with inexact hypergradients, demonstrating convergence and efficiency improvements for large-scale nonconvex problems in imaging applications.
Contribution
It develops a stochastic bilevel optimization method with convergence guarantees for inexact hypergradients in nonconvex settings, addressing practical limitations of existing approaches.
Findings
Significant speed-ups in image denoising and deblurring tasks.
Improved generalization over deterministic bilevel methods.
Theoretical convergence under mild assumptions.
Abstract
Bilevel learning has gained prominence in machine learning, inverse problems, and imaging applications, including hyperparameter optimization, learning data-adaptive regularizers, and optimizing forward operators. The large-scale nature of these problems has led to the development of inexact and computationally efficient methods. Existing adaptive methods predominantly rely on deterministic formulations, while stochastic approaches often adopt a doubly-stochastic framework with impractical variance assumptions, enforces a fixed number of lower-level iterations, and requires extensive tuning. In this work, we focus on bilevel learning with strongly convex lower-level problems and a nonconvex sum-of-functions in the upper-level. Stochasticity arises from data sampling in the upper-level which leads to inexact stochastic hypergradients. We establish their connection to state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Field-Flow Fractionation Techniques
MethodsADaptive gradient method with the OPTimal convergence rate · Focus
