Hamiltonian Mechanics of Feature Learning: Bottleneck Structure in Leaky ResNets
Arthur Jacot, Alexandre Kaiser

TL;DR
This paper models Leaky ResNets using Hamiltonian mechanics to explain the emergence of bottleneck structures in feature learning, revealing a balance between kinetic and potential energies that governs representation dynamics.
Contribution
It introduces a Hamiltonian framework for analyzing Leaky ResNets, providing new insights into the feature learning process and the formation of bottleneck structures.
Findings
Hamiltonian reformulation highlights key forces in feature learning
Bottleneck structure explained by separation of timescales
Adaptive layer step-size improves training efficiency
Abstract
We study Leaky ResNets, which interpolate between ResNets and Fully-Connected nets depending on an 'effective depth' hyper-parameter . In the infinite depth limit, we study 'representation geodesics' : continuous paths in representation space (similar to NeuralODEs) from input to output that minimize the parameter norm of the network. We give a Lagrangian and Hamiltonian reformulation, which highlight the importance of two terms: a kinetic energy which favors small layer derivatives and a potential energy that favors low-dimensional representations, as measured by the 'Cost of Identity'. The balance between these two forces offers an intuitive understanding of feature learning in ResNets. We leverage this intuition to explain the emergence of a bottleneck structure, as observed in previous work: for large the potential energy…
Peer Reviews
Decision·Submitted to NeurIPS 2024
- This paper offers a novel approach for understanding feature learning by applying Hamiltonian mechanics to Leaky ResNets, bridging a gap between theoretical physics and machine learning. - This paper conducts experiments to validate the findings. Based on experiments, some interesting observations are obtained, which may give some new insights for future works. - The insights gained from this study have the potential to influence future research in neural network optimization and feature learn
1. There are multiple typos in the article, which affect readability. Below are several obvious typos, and it is recommended that the authors carefully polish the language of the article. - The third word in line 24, "phenomenon"$\rightarrow$ "phenomena". - In line 27, "determines" $\rightarrow$ "determine". - In line 40, "lead" $\rightarrow$ "leads". - In line 68, the preposition "in" should be added after "interested". - The formula at the end of line 78 should be$$\alpha_{q}^{'}
1. Studies the problem of understanding feature learning in NNs, which is of broader interest in the NeurIPS community. 2. The paper identifies the effect of “effective depth” in Leaky ResNets on the previously observed Bottleneck structure, through a Hamiltonian decomposition into kinetic and potential energy. 3. In particular, the authors provide a nice intuition that the potential energy is minimised at large effective depths, which corresponds to low rank solutions.
1. The paper is unclear in several important moments which compromises readability. For example, is the leakage parameter $\tilde{L}$ suppose to lie in [0,1] (as suggested by line 80) or in [0,\infty] (as is necessary for the “separation of timescales” arguments in section 2.1). Moreover, in line 224 the authors write closed forms for the Hamiltonian but it is not clear how they obtain this object, from the previously stated Hamiltonian on linear 195. 2. Theory seems tied to several simplifying
1. The paper addresses a timely and important topic: feature learning in DNNs. 2. The introduction provides a good connection to previous work. 3. The mapping to a Hamiltonian formulation is interesting and provides a valuable intuition. 4. The propositions and theorems are mostly clearly stated and the proofs seem sound.
1. Numerical experiments: a. Many of the figures are poorly explained and have missing labels etc, e.g. in Figs 1b, 2b what is the color code? b. I failed to find a mention of what data the models were trained / evaluated on. c. Fig 2c - what is the projection on to? 2. It is sometimes hard to follow the rationale and motivation for the "storyline" of the paper and its different sections could be better connected to each other. 3. Novelty wrt previous works - in lines 206-208 a difference
* The idea of showing that neural networks have a certain property by constructing a model where trajectories spend most of their time in regions with that property is interesting. * The authors explain their intuition as well as the underlying assumptions of their derivations and highlight the limitations of their analysis. * The COI seems to be novel and reflect some interesting properties of ODE models for neural-networks.
* The main results are not clearly stated in the abstract or introduction. The author's stated goal is to study Leaky ResNets, it would be nice to have a rigorous statement of the results of that study at the beginning of the paper. * In the abstract, the authors state that the paper explains the emergence of a bottleneck structure in ResNets. It is not clear how this claim can be derived from the results in the paper. * There is no rigorous justification that the results from the study apply
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
