Understanding Optimization in Deep Learning with Central Flows

Jeremy M. Cohen; Alex Damian; Ameet Talwalkar; J. Zico Kolter; Jason D. Lee

arXiv:2410.24206·cs.LG·September 26, 2025

Understanding Optimization in Deep Learning with Central Flows

Jeremy M. Cohen, Alex Damian, Ameet Talwalkar, J. Zico Kolter, Jason D. Lee

PDF

Open Access 3 Reviews

TL;DR

This paper introduces the concept of central flows, differential equations that model the smoothed trajectories of optimizers in deep learning, providing new insights into their dynamics in the edge of stability regime.

Contribution

It develops a novel theoretical framework called central flows to analyze the long-term behavior of optimizers in deep learning, especially in oscillatory regimes.

Findings

01

Central flows accurately predict optimization trajectories.

02

They explain how gradient descent progresses despite loss oscillations.

03

They reveal how adaptive optimizers navigate the loss landscape.

Abstract

Traditional theories of optimization cannot describe the dynamics of optimization in deep learning, even in the simple setting of deterministic training. The challenge is that optimizers typically operate in a complex, oscillatory regime called the "edge of stability." In this paper, we develop theory that can describe the dynamics of optimization in this regime. Our key insight is that while the *exact* trajectory of an oscillatory optimizer may be challenging to analyze, the *time-averaged* (i.e. smoothed) trajectory is often much more tractable. To analyze an optimizer, we derive a differential equation called a "central flow" that characterizes this time-averaged trajectory. We empirically show that these central flows can predict long-term optimization trajectories for generic neural networks with a high degree of numerical accuracy. By interpreting these central flows, we are able…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 3

Strengths

1. The article is clearly written and reader-friendly, with insightful discussions on the oscillatory behavior of optimization methods. 2. The time-averaging and Taylor expansion framework is intuitively sound and appears to be broadly applicable, capable of handling various adaptive optimizers, and could potentially be extended to more general optimization methods. 3. Extensive empirical results are presented, supporting the validity of the central flow. Overall, I think this is a nice paper

Weaknesses

The central flow derivations are grounded in empirical observations, and several mathematical steps rely on informal reasoning. For example, it is not clear how to rigorously justify the time averaging step and quantify the high-order error in the Taylor expansion. Rigorous proofs or formal analysis would enhance the robustness of the claims.

Reviewer 02Rating 6Confidence 3

Strengths

1. This work was well written and well structured, making it easy to follow along. 2. The authors analyzed a wide range of neural networks, including modern architectures such as ViT. 3. The authors developed their approach in settings where the interpretation was straightforward to make their point more clear. 4. The figures were well done and easy to read. 5. The authors convincingly show that their central flow approach accurately captures the general behavior of neural network model t

Weaknesses

1. My main reservation of this work as it stands (which I hope the authors can easily clarify for me), is the claim that the central flow analysis of RMSProp-Norm demonstrates how the optimizer regularizes against sharpness and moves towards regions of parameter space with lower sharpness. While I understand the argument mathematically, and the analysis of the central flow (Fig. 5) illustrates this point, it does not seem consistent with the results of training ViT (Fig. 4). In particular, in Fi

Reviewer 03Rating 5Confidence 4

Strengths

The work presents novel contributions. Additionally, it represents an advancement in the study of locally analyzed optimizers and suggests progress for other optimizers.

Weaknesses

1. In the abstract, the introduction and related work the authors repeats the same phrase 'Optimization in deep learning remains poorly understood' (lines 11, 26 and 68). Maybe the authors can phrases like: 'How optimization works in deep learning continues to be largely unclear' or 'The optimization process in deep learning is still not well understood.' 2. The contributions of the work are unclear in the introduction, which is somewhat difficult to follow in that sense. Maybe the authors can

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications

MethodsRMSProp