L-SR1: Learned Symmetric-Rank-One Preconditioning
Gal Lifshitz, Shahar Zuler, Ori Fouks, Dan Raviv

TL;DR
This paper introduces L-SR1, a learned second-order optimizer with a trainable preconditioning unit that improves convergence and generalization in optimization tasks, demonstrated on Monocular Human Mesh Recovery.
Contribution
It presents a novel learned second-order optimizer that integrates a trainable preconditioning unit into the classical SR1 algorithm, enhancing efficiency and applicability.
Findings
Outperforms existing learned optimizers on HMR task
Requires no annotated data or fine-tuning
Offers strong generalization and lightweight design
Abstract
End-to-end deep learning has achieved impressive results but remains limited by its reliance on large labeled datasets, poor generalization to unseen scenarios, and growing computational demands. In contrast, classical optimization methods are data-efficient and lightweight but often suffer from slow convergence. While learned optimizers offer a promising fusion of both worlds, most focus on first-order methods, leaving learned second-order approaches largely unexplored. We propose a novel learned second-order optimizer that introduces a trainable preconditioning unit to enhance the classical Symmetric-Rank-One (SR1) algorithm. This unit generates data-driven vectors used to construct positive semi-definite rank-one matrices, aligned with the secant constraint via a learned projection. Our method is evaluated through analytic experiments and on the real-world task of Monocular Human…
Peer Reviews
Decision·Submitted to ICLR 2026
A lightweight, self-supervised learned optimizer that integrates a trainable preconditioning unit into the SR1 framework is introduced. A learned projection mechanism that enforces both the secant condition and positive semi-definiteness, preserving core Quasi-Newton properties within a learned architecture is introduced. Experiments show that the proposed method works well on HMR. The paper is well written and easy to read.
The claimed generalization of the proposed Learned-SR1 is not effectively validated. Currently, there is only simple evaluation on HMR task, and the compared baselines do not represent the current state-of-the-art for HMR. The paper lacks comparison with more optimization algorithms, e.g., AdamW, AdaHessian, etc. Moreover, the theretical analysis of the learned projection mechanism is insufficient. Currently, the evaluation is conducted only on a single dataset (3DPW), which fails to demonst
The theoretical foundation of this work is presented with clarity.
The method is only tested on HMR, a specific application in image processing. It is not know whether this method is applicable or not for other tasks.
- Principled design bridging QN and learning. The method is explicitly grounded in the QN update, the secant condition, and the need for PSD preconditioners for descent directions; the learned projection aims to satisfy both simultaneously. - Lightweight, limited-memory, dimension-invariant formulation. L-SR1 uses rank-one outer products with a fixed-size buffer and element-wise modules to generalize across problem sizes without retraining. - Learned projection and per-coordinate step sizes
- **Insufficient Comparative Analysis**: The paper should compare against a wider set of HMR methods (both optimization-based and modern learned/learnable refiners beyond LGD/SPIN) and report more metrics (e.g., MPJPE, PA-MPJPE, PVE, jitter/contact, temporal stability) on more datasets (e.g., Human3.6M, EHF, AGORA) under matched settings. As written, the HMR main table includes only a few baselines. Moreover, the paper fails to provide an analysis of the computational cost and runtime of the pro
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProbabilistic and Robust Engineering Design · Nuclear reactor physics and engineering · Educational Robotics and Engineering
