Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures

Yedi Zhang; Andrew Saxe; Peter E. Latham

arXiv:2512.20607·cs.LG·March 12, 2026

Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures

Yedi Zhang, Andrew Saxe, Peter E. Latham

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a unifying theoretical framework explaining how saddle-to-saddle dynamics during gradient descent lead neural networks to progressively learn more complex solutions across various architectures.

Contribution

It provides a comprehensive theory linking saddle-to-saddle dynamics to the simplicity bias in neural networks, applicable to fully-connected, convolutional, and attention-based models.

Findings

01

Linear networks learn solutions with increasing rank.

02

ReLU networks develop solutions with more kinks.

03

Convolutional networks increase the number of kernels.

Abstract

Neural networks trained with gradient descent often learn solutions of increasing complexity over time, a phenomenon known as simplicity bias. Despite being widely observed across architectures, existing theoretical treatments lack a unifying framework. We present a theoretical framework that explains a simplicity bias arising from saddle-to-saddle learning dynamics for a general class of neural networks, incorporating fully-connected, convolutional, and attention-based architectures. Here, simple means expressible with few hidden units, i.e., hidden neurons, convolutional kernels, or attention heads. Specifically, we show that linear networks learn solutions of increasing rank, ReLU networks learn solutions with an increasing number of kinks, convolutional networks learn solutions with an increasing number of convolutional kernels, and self-attention models learn solutions with an…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. The paper offers a broad, architecture-agnostic framework that connects dynamical simplicity bias phenomena seen across many model types. The idea of interpreting learning stages as saddle-to-saddle transitions is elegant and theoretically grounded. 2. The analysis of fixed points, invariant manifolds, and timescale separation (linear vs. quadratic cases) is mathematically sound, and the simulations (Figure 1 & 2) convincingly demonstrate the predicted dynamics and validate the theory’s imp

Weaknesses

1. The simulations, while well-chosen, are limited to small synthetic examples. It would strengthen the paper to show whether the predicted stage transitions are visible in real-world or larger-scale training runs. 2. Some proofs depend on idealized gradient flow dynamics and homogeneity assumptions that may not hold under practical stochastic optimization with large step size.

Reviewer 02Rating 6Confidence 3

Strengths

The paper presents a unifying perspective on the widely assumed simplicity bias underlying saddle-to-saddle dynamics across various architectures and, importantly, elucidates its connection to permutation symmetry. This formalization and unification constitute the core contribution of the work. Although the analysis is based on a heuristic scenario, it is consistent with both experimental observations and existing understanding of saddle-to-saddle dynamics. The analysis further reveals an unexp

Weaknesses

While I fully agree with the reasoning presented in Section 5 and consider it a sufficient contribution, several steps remain heuristic. For instance, the transition between invariant manifolds is not thoroughly explored: while the paper studies the existence of invariant manifolds and the dynamics within them, it offers limited explanation for why the dynamics should adhere to these manifolds in the first place. Proposition 5 states that if one variable takes a larger value, the others must be

Reviewer 03Rating 4Confidence 4

Strengths

The structure of the paper is good and related literature is thoroughly documented. Equations (6) and (7) in Theorem 1 provide new examples of embedded fixed points which may be of independent interest. Theorem 3 shows that the properties of units being zero, have equal weights, have proportional weights in the case of $\phi$ homogeneous in $u$, or have linearly dependent weights in the case of $\phi$ linear in $u$ are preserved under gradient flow. I found these results to be interesting. In Ap

Weaknesses

I am concerned that Theorem 4 is not novel. As the authors note in Appendix A.1, the incremental learning dynamics of linear networks are quite well understood. [12] and [8] seem to cover the same ground as Theorem 4, and closely related work includes [1, 9, 10, 6, 13]. Could the authors be more explicit in explaining the novelty of this theorem? It is noted that one of the conditions for saddle-to-saddle dynamics is that escape paths closely follow invariant manifolds. The authors do not discu

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning