Transformer-Based Learned Optimization
Erik G\"artner, Luke Metz, Mykhaylo Andriluka, C. Daniel Freeman,, Cristian Sminchisescu

TL;DR
This paper introduces a Transformer-based learned optimizer called Optimus, which predicts optimization updates using a neural network inspired by BFGS, enabling efficient and adaptable optimization across various tasks.
Contribution
The paper presents a novel neural network architecture, Optimus, that improves learned optimization by conditioning across parameter dimensions and handling variable problem sizes.
Findings
Outperforms existing learned optimizers on benchmark functions.
Successfully applied to physics-based 3D human motion reconstruction.
Demonstrates adaptability to different optimization problem sizes.
Abstract
We propose a new approach to learned optimization where we represent the computation of an optimizer's update step using a neural network. The parameters of the optimizer are then learned by training on a set of optimization tasks with the objective to perform minimization efficiently. Our innovation is a new neural network architecture, Optimus, for the learned optimizer inspired by the classic BFGS algorithm. As in BFGS, we estimate a preconditioning matrix as a sum of rank-one updates but use a Transformer-based neural network to predict these updates jointly with the step length and direction. In contrast to several recent learned optimization-based approaches, our formulation allows for conditioning across the dimensions of the parameter space of the target problem while remaining applicable to optimization tasks of variable dimensionality without retraining. We demonstrate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Generative Adversarial Networks and Image Synthesis
