Greedy Learning to Optimize with Convergence Guarantees

Patrick Fahy; Mohammad Golbabaee; Matthias J. Ehrhardt

arXiv:2406.00260·math.OC·July 15, 2025·1 cites

Greedy Learning to Optimize with Convergence Guarantees

Patrick Fahy, Mohammad Golbabaee, Matthias J. Ehrhardt

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a greedy learning method for optimization that allows training over many iterations with guaranteed convergence and improved empirical performance on inverse problems.

Contribution

A novel greedy strategy for learning iteration-specific parameters that ensures convergence guarantees and enables training over many more iterations with constant memory.

Findings

01

Convergence guarantees for learned algorithms on training and unseen functions.

02

Improved empirical performance on image deblurring and CT reconstruction.

03

Effective convolutional preconditioners outperform classical methods.

Abstract

Learning to optimize is an approach that leverages training data to accelerate the solution of optimization problems. Many approaches use unrolling to parametrize the update step and learn optimal parameters. Although L2O has shown empirical advantages over classical optimization algorithms, memory restrictions often greatly limit the unroll length and learned algorithms usually do not provide convergence guarantees. In contrast, we introduce a novel method employing a greedy strategy that learns iteration-specific parameters by minimizing the function value at the next iteration. This enables training over significantly more iterations while maintaining constant device memory usage. We parameterize the update such that parameter learning is convex when the objective function is convex. In particular, we explore preconditioned gradient descent and an extension of Polyak's Heavy Ball…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 4

Strengths

The theoretical part of the paper is well-written and provides a very clear story. The theoretical framework also provides a good starting point for further generalising analysis of L2O schemes, especially when moving to non-convex settings. The proposed preconditioner learning is memory-efficient, fast, with convergence guarantees and empirical evidence on small problems.

Weaknesses

Major: Main weaknesses appear in the numerical section of this paper. Overall, the numerical section is very difficult to read, as it seems to mix the exact parameter choices with the rest of the explanations, further obfuscating everything. The various details seemed to have been mixed into a single soup of information - this should be summarised better. The numerical comparison seems to be missing the main evaluation - it is unclear whether the method actually generalises. Only a small exa

Reviewer 02Rating 5Confidence 3

Strengths

The learning to optimize literature is still in development, and does not yet have well defined formalism or benchmark problems. As such, the goal of the paper as bringin formal guarantees to this setting is a new and relevant contribution to the community. The proposed algorithms show promising results on the experimental setup.

Weaknesses

**Safeguards.** The proposed algorithm still relies on hand-crafted a-priori knowledge of the optimization problem in the form of the maximum step-size that would work for all training problem, $\tau$. As such, it does not significantly differ from alternative approaches that require safeguarding or search within a predefined set that guarantees convergence. **Experimental validation.** The definition of train and test problems in the numerical experiments is not sufficiently transparent. If I

Reviewer 03Rating 6Confidence 3

Strengths

The greedy training approach is a novel and interesting scheme for L2O. Indeed, current L2O methods are mostly depending on unrolling several iterations, the proposed scheme is very timely for L2O area for scalability of training the optimizer. The numerical performance of the proposed scheme on inverse problems is very impressive.

Weaknesses

The theoretical part of the paper seems to be weak. The convergence analysis is relying on an unrealistic assumption named BGD (better than gradient descent) assumption across each iteration -- you can't just simply assume what you wish to proof. Corollary 1 seems to give a very strong claim but no explicit proof is given (it is unclear how simply applying Lemma 2 can lead to such claim). The reviewer believes that whether or not the learned linear preconditioner is BGD should be non-trivial to

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical methods in inverse problems · Neural Networks and Applications