Liquid and solid layers in a thermal deep learning machine
Gang Huang, Lai Shun Chan, Hajime Yoshino, Ge Zhang, Yuliang Jin

TL;DR
This paper introduces a thermal deep learning machine model inspired by statistical mechanics, revealing a phase structure of liquid and solid layers in neural networks that explains generalization and parameter constraints.
Contribution
It establishes a physical model of deep learning with a Hamiltonian, demonstrating phase transitions and layer-specific dynamics through numerical experiments.
Findings
Identification of liquid and solid phases in neural network parameters
Layer-specific dynamics show hierarchical and structureless behavior
Phase diagram correlates network depth with phase states
Abstract
Based on deep neural networks (DNNs), deep learning has been successfully applied to many problems, but its mechanism is still not well understood -- especially the reason why over-parametrized DNNs can generalize. A recent statistical mechanics theory on supervised learning by a prototypical multi-layer perceptron (MLP) on some artificial learning scenarios predicts that adjustable parameters of over-parametrized MLPs become strongly constrained by the training data close to the input/output boundaries, while the parameters in the center remain largely free, giving rise to a solid-liquid-solid structure. Here we establish this picture, through numerical experiments on benchmark real-world data using a thermal deep learning machine that explores the phase space of the synaptic weights and neurons. The supervised training is implemented by a GPU-accelerated molecular dynamics algorithm,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Materials Science · Quantum many-body systems · Model Reduction and Neural Networks
