Separation of Scales and a Thermodynamic Description of Feature Learning in Some CNNs
Inbar Seroussi, Gadi Naveh, Zohar Ringel

TL;DR
This paper develops a thermodynamic framework for understanding feature learning in deep CNNs and FCNs by identifying scale separation and kernel dynamics, enabling accurate predictions and new insights into DNN behavior.
Contribution
It introduces a thermodynamic theory based on kernel dynamics and Gaussian fluctuations, providing a novel analytical approach for trained deep neural networks.
Findings
Deep CNNs and FCNs exhibit a separation of scales in their layer interactions.
Finite width networks adapt kernels based on data, unlike infinite width models.
The thermodynamic theory accurately predicts network behavior across various settings.
Abstract
Deep neural networks (DNNs) are powerful tools for compressing and distilling information. Their scale and complexity, often involving billions of inter-dependent parameters, render direct microscopic analysis difficult. Under such circumstances, a common strategy is to identify slow variables that average the erratic behavior of the fast microscopic variables. Here, we identify a similar separation of scales occurring in fully trained finitely over-parameterized deep convolutional neural networks (CNNs) and fully connected networks (FCNs). Specifically, we show that DNN layers couple only through the second moment (kernels) of their activations and pre-activations. Moreover, the latter fluctuates in a nearly Gaussian manner. For infinite width DNNs, these kernels are inert, while for finite ones they adapt to the data and yield a tractable data-aware Gaussian Process. The resulting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
MethodsGaussian Process
