Traditional and Heavy-Tailed Self Regularization in Neural Network Models
Charles H. Martin, Michael W. Mahoney

TL;DR
This paper applies Random Matrix Theory to analyze neural network weight matrices, revealing implicit self-regularization phenomena, including heavy-tailed behaviors, that influence training dynamics and generalization.
Contribution
It introduces a theory of 5+1 phases of training based on spectral properties, linking implicit regularization to heavy-tailed matrix behaviors in neural networks.
Findings
Spectral density of DNN layers shows regularization signatures.
Heavy-tailed self-regularization emerges in state-of-the-art models.
Training phases can be manipulated via batch size.
Abstract
Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet. Empirical and theoretical results clearly indicate that the empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of regularization, such as Dropout or Weight Norm constraints. Building on recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, we develop a theory to identify \emph{5+1 Phases of Training}, corresponding to increasing amounts of \emph{Implicit Self-Regularization}. For smaller and/or older DNNs, this Implicit Self-Regularization is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Mechanics and Entropy · Neural Networks and Applications · Sparse and Compressive Sensing Techniques
Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/ · Dropout
