Feature Learning Dynamics in Infinite-Depth Neural Networks

Zihan Yao; Ruoyu Wu; Tianxiang Gao

arXiv:2512.21075·cs.LG·May 14, 2026

Feature Learning Dynamics in Infinite-Depth Neural Networks

Zihan Yao, Ruoyu Wu, Tianxiang Gao

PDF

TL;DR

This paper rigorously analyzes the feature learning dynamics in infinite-depth ResNets, revealing how weight reuse affects training and proposing a simplified Neural Feature Dynamics model.

Contribution

It introduces a novel analysis of weight reuse effects in deep ResNets and derives a Neural Feature Dynamics model capturing feature-gradient interactions.

Findings

01

Coupling effects at initialization vanish at rate O(n^{-1})

02

SGD induces nontrivial correlations surviving infinite-width limit

03

Depth scaling suppresses coupling effects, enabling a simplified feature dynamics model

Abstract

Deep neural networks have achieved remarkable success in practice, yet a mechanistic understanding of how features evolve during training remains incomplete, especially in the large-depth limit. For ResNets under depth- $μ$ P scaling, prior work treats the layer index $ℓ$ as a continuous time $t_{ℓ} = ℓ / L$ , yielding SDE descriptions of the training dynamics. A key unresolved issue is that backpropagation reuses each forward weight matrix $W_{ℓ}$ through its transpose $W_{ℓ}^{⊤}$ , creating correlations between forward features and backward gradients whose behavior and role in feature learning remain unclear. We study this reused-weight forward--backward coupling in one-layer ResNets under depth- $μ$ P. Using conditional Gaussian representations, we explicitly separate the coupling terms induced by weight reuse from decoupled Gaussian fluctuations before taking any network…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.