Separation of Scales and a Thermodynamic Description of Feature Learning   in Some CNNs

Inbar Seroussi; Gadi Naveh; Zohar Ringel

arXiv:2112.15383·stat.ML·September 26, 2022

Separation of Scales and a Thermodynamic Description of Feature Learning in Some CNNs

Inbar Seroussi, Gadi Naveh, Zohar Ringel

PDF

Open Access

TL;DR

This paper develops a thermodynamic framework for understanding feature learning in deep CNNs and FCNs by identifying scale separation and kernel dynamics, enabling accurate predictions and new insights into DNN behavior.

Contribution

It introduces a thermodynamic theory based on kernel dynamics and Gaussian fluctuations, providing a novel analytical approach for trained deep neural networks.

Findings

01

Deep CNNs and FCNs exhibit a separation of scales in their layer interactions.

02

Finite width networks adapt kernels based on data, unlike infinite width models.

03

The thermodynamic theory accurately predicts network behavior across various settings.

Abstract

Deep neural networks (DNNs) are powerful tools for compressing and distilling information. Their scale and complexity, often involving billions of inter-dependent parameters, render direct microscopic analysis difficult. Under such circumstances, a common strategy is to identify slow variables that average the erratic behavior of the fast microscopic variables. Here, we identify a similar separation of scales occurring in fully trained finitely over-parameterized deep convolutional neural networks (CNNs) and fully connected networks (FCNs). Specifically, we show that DNN layers couple only through the second moment (kernels) of their activations and pre-activations. Moreover, the latter fluctuates in a nearly Gaussian manner. For infinite width DNNs, these kernels are inert, while for finite ones they adapt to the data and yield a tractable data-aware Gaussian Process. The resulting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Materials Science · Neural Networks and Applications · Stochastic Gradient Optimization Techniques

MethodsGaussian Process