Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets

Arthur Jacot; Alexandre Kaiser

arXiv:2405.17573·stat.ML·March 26, 2026

Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets

Arthur Jacot, Alexandre Kaiser

PDF

Open Access 4 Reviews

TL;DR

This paper models Leaky ResNets using Hamiltonian mechanics to explain the emergence of bottleneck structures in feature learning, revealing a balance between kinetic and potential energies that governs representation dynamics.

Contribution

It introduces a Hamiltonian framework for analyzing Leaky ResNets, providing new insights into the feature learning process and the formation of bottleneck structures.

Findings

01

Hamiltonian reformulation highlights key forces in feature learning

02

Bottleneck structure explained by separation of timescales

03

Adaptive layer step-size improves training efficiency

Abstract

We study Leaky ResNets, which interpolate between ResNets and Fully-Connected nets depending on an 'effective depth' hyper-parameter $\tilde{L}$ . In the infinite depth limit, we study 'representation geodesics' $A_{p}$ : continuous paths in representation space (similar to NeuralODEs) from input $p = 0$ to output $p = 1$ that minimize the parameter norm of the network. We give a Lagrangian and Hamiltonian reformulation, which highlight the importance of two terms: a kinetic energy which favors small layer derivatives $\partial_{p} A_{p}$ and a potential energy that favors low-dimensional representations, as measured by the 'Cost of Identity'. The balance between these two forces offers an intuitive understanding of feature learning in ResNets. We leverage this intuition to explain the emergence of a bottleneck structure, as observed in previous work: for large $\tilde{L}$ the potential energy…

Peer Reviews

Decision·Submitted to NeurIPS 2024

Reviewer 01Rating 4Confidence 4

Strengths

- This paper offers a novel approach for understanding feature learning by applying Hamiltonian mechanics to Leaky ResNets, bridging a gap between theoretical physics and machine learning. - This paper conducts experiments to validate the findings. Based on experiments, some interesting observations are obtained, which may give some new insights for future works. - The insights gained from this study have the potential to influence future research in neural network optimization and feature learn

Weaknesses

1. There are multiple typos in the article, which affect readability. Below are several obvious typos, and it is recommended that the authors carefully polish the language of the article. - The third word in line 24, "phenomenon"$\rightarrow$ "phenomena". - In line 27, "determines" $\rightarrow$ "determine". - In line 40, "lead" $\rightarrow$ "leads". - In line 68, the preposition "in" should be added after "interested". - The formula at the end of line 78 should be$$\alpha_{q}^{'}

Reviewer 02Rating 5Confidence 2

Strengths

1. Studies the problem of understanding feature learning in NNs, which is of broader interest in the NeurIPS community. 2. The paper identifies the effect of “effective depth” in Leaky ResNets on the previously observed Bottleneck structure, through a Hamiltonian decomposition into kinetic and potential energy. 3. In particular, the authors provide a nice intuition that the potential energy is minimised at large effective depths, which corresponds to low rank solutions.

Weaknesses

1. The paper is unclear in several important moments which compromises readability. For example, is the leakage parameter $\tilde{L}$ suppose to lie in [0,1] (as suggested by line 80) or in [0,\infty] (as is necessary for the “separation of timescales” arguments in section 2.1). Moreover, in line 224 the authors write closed forms for the Hamiltonian but it is not clear how they obtain this object, from the previously stated Hamiltonian on linear 195. 2. Theory seems tied to several simplifying

Reviewer 03Rating 6Confidence 3

Strengths

1. The paper addresses a timely and important topic: feature learning in DNNs. 2. The introduction provides a good connection to previous work. 3. The mapping to a Hamiltonian formulation is interesting and provides a valuable intuition. 4. The propositions and theorems are mostly clearly stated and the proofs seem sound.

Weaknesses

1. Numerical experiments: a. Many of the figures are poorly explained and have missing labels etc, e.g. in Figs 1b, 2b what is the color code? b. I failed to find a mention of what data the models were trained / evaluated on. c. Fig 2c - what is the projection on to? 2. It is sometimes hard to follow the rationale and motivation for the "storyline" of the paper and its different sections could be better connected to each other. 3. Novelty wrt previous works - in lines 206-208 a difference

Reviewer 04Rating 6Confidence 3

Strengths

* The idea of showing that neural networks have a certain property by constructing a model where trajectories spend most of their time in regions with that property is interesting. * The authors explain their intuition as well as the underlying assumptions of their derivations and highlight the limitations of their analysis. * The COI seems to be novel and reflect some interesting properties of ODE models for neural-networks.

Weaknesses

* The main results are not clearly stated in the abstract or introduction. The author's stated goal is to study Leaky ResNets, it would be nice to have a rigorous statement of the results of that study at the beginning of the paper. * In the abstract, the authors state that the paper explains the emergence of a bottleneck structure in ResNets. It is not clear how this claim can be derived from the results in the paper. * There is no rigorous justification that the results from the study apply

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning