Learning to Optimize Quasi-Newton Methods

Isaac Liao; Rumen R. Dangovski; Jakob N. Foerster; Marin; Solja\v{c}i\'c

arXiv:2210.06171·cs.LG·September 12, 2023·1 cites

Learning to Optimize Quasi-Newton Methods

Isaac Liao, Rumen R. Dangovski, Jakob N. Foerster, Marin, Solja\v{c}i\'c

PDF

Open Access

TL;DR

LODO is a novel meta-learning optimizer that dynamically learns preconditioners during training, combining L2O and quasi-Newton methods to adapt to the loss landscape and improve optimization efficiency.

Contribution

The paper introduces LODO, a meta-learning optimizer that learns preconditioners on the fly without prior meta-training, merging L2O with quasi-Newton techniques for flexible inverse Hessian approximation.

Findings

01

LODO effectively optimizes in noisy loss landscapes.

02

Simpler inverse Hessian representations reduce performance.

03

LODO trains a neural network with 95k parameters efficiently.

Abstract

Fast gradient-based optimization algorithms have become increasingly essential for the computationally efficient training of machine learning models. One technique is to multiply the gradient by a preconditioner matrix to produce a step, but it is unclear what the best preconditioner matrix is. This paper introduces a novel machine learning optimizer called LODO, which tries to online meta-learn the best preconditioner during optimization. Specifically, our optimizer merges Learning to Optimize (L2O) techniques with quasi-Newton methods to learn preconditioners parameterized as neural networks; they are more flexible than preconditioners in other quasi-Newton methods. Unlike other L2O methods, LODO does not require any meta-training on a training task distribution, and instead learns to optimize on the fly while optimizing on the test task, adapting to the local characteristics of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Machine Learning and ELM

MethodsTest