Bilevel Learning via Inexact Stochastic Gradient Descent
Mohammad Sadegh Salehi, Subhadip Mukherjee, Lindon Roberts, Matthias J. Ehrhardt

TL;DR
This paper develops a theoretical framework for inexact stochastic bilevel optimization, proving convergence rates under decaying accuracy and step size schedules, with practical experiments in image processing confirming the results.
Contribution
It advances the theory of inexact stochastic bilevel optimization by establishing convergence guarantees with decaying accuracy and step sizes, bridging the gap between theory and large-scale applications.
Findings
Decreasing step sizes improve stability in practice.
Accuracy scheduling is more critical than step size strategy.
Adaptive preconditioning enhances performance.
Abstract
Bilevel optimization is a central tool in machine learning for high-dimensional hyperparameter tuning. Its applications are vast; for instance, in imaging it can be used for learning data-adaptive regularizers and optimizing forward operators in variational regularization. These problems are large in many ways: a lot of data is usually available to train a large number of parameters, calling for stochastic gradient-based algorithms. However, exact gradients with respect to parameters (so-called hypergradients) are not available, and their precision is usually linearly related to computational cost. Hence, algorithms must solve the problem efficiently without unnecessary precision. The design of such methods is still not fully understood, especially regarding how accuracy requirements and step size schedules affect theoretical guarantees and practical performance. Existing approaches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Domain Adaptation and Few-Shot Learning
