Layerwise LQR for Geometry-Aware Optimization of Deep Networks

Simon Dufort-Labb\'e; Pierre-Luc Bacon; Razvan Pascanu; Simon Lacoste-Julien; Aristide Baratin

arXiv:2605.04230·cs.LG·May 7, 2026

Layerwise LQR for Geometry-Aware Optimization of Deep Networks

Simon Dufort-Labb\'e, Pierre-Luc Bacon, Razvan Pascanu, Simon Lacoste-Julien, Aristide Baratin

PDF

TL;DR

The paper introduces Layerwise LQR, a scalable framework for learning structured inverse preconditioners that improve deep network optimization by leveraging second-order geometry without global matrix inversion.

Contribution

It formulates layerwise preconditioning as a finite-horizon LQR problem, enabling scalable learning of structured preconditioners that enhance optimization in deep networks.

Findings

01

LLQR improves optimization dynamics in ResNets and Transformers.

02

It often leads to better final test performance.

03

The method adds modest computational overhead.

Abstract

Geometry-aware optimizers such as Newton and natural gradient can improve conditioning in deep learning, but scalable variants such as K-FAC, Shampoo, and related preconditioners usually impose structural approximations early, often discarding cross-layer interactions induced by the network computation. We introduce Layerwise LQR (LLQR), a framework for learning structured inverse preconditioners under a global layerwise optimal-control objective. The starting point is an exact equivalence: the steepest-descent step under a broad class of divergence-induced quadratic models--including Newton, Gauss-Newton, Fisher/natural-gradient, and intermediate-layer metrics--can be written as a finite-horizon Linear Quadratic Regulator (LQR) problem. This formulation serves as a reference that exposes the layerwise dynamics and cost matrices encoding the original dense geometry. We then derive a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.