Extended critical regimes of deep neural networks
Cheng Kevin Qu, Asem Wardak, Pulin Gong

TL;DR
This paper introduces a new mean field theory for deep neural networks that incorporates heavy-tailed weight distributions, revealing an extended critical regime that enhances computational capabilities and training efficiency.
Contribution
It develops a novel theoretical framework combining heavy-tailed random matrix theory and non-equilibrium physics to explain extended criticality in DNNs without parameter fine-tuning.
Findings
Heavy-tailed weights lead to an extended critical regime in DNNs.
Extended criticality improves propagation dynamics and computational efficiency.
The theory guides the design of more effective neural architectures.
Abstract
Deep neural networks (DNNs) have been successfully applied to many real-world problems, but a complete understanding of their dynamical and computational principles is still lacking. Conventional theoretical frameworks for analysing DNNs often assume random networks with coupling weights obeying Gaussian statistics. However, non-Gaussian, heavy-tailed coupling is a ubiquitous phenomenon in DNNs. Here, by weaving together theories of heavy-tailed random matrices and non-equilibrium statistical physics, we develop a new type of mean field theory for DNNs which predicts that heavy-tailed weights enable the emergence of an extended critical regime without fine-tuning parameters. In this extended critical regime, DNNs exhibit rich and complex propagation dynamics across layers. We further elucidate that the extended criticality endows DNNs with profound computational advantages: balancing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Mechanics and Entropy · Neural Networks and Applications · Model Reduction and Neural Networks
