Traditional and Heavy-Tailed Self Regularization in Neural Network   Models

Charles H. Martin; Michael W. Mahoney

arXiv:1901.08276·cs.LG·January 25, 2019·40 cites

Traditional and Heavy-Tailed Self Regularization in Neural Network Models

Charles H. Martin, Michael W. Mahoney

PDF

Open Access 2 Repos

TL;DR

This paper applies Random Matrix Theory to analyze neural network weight matrices, revealing implicit self-regularization phenomena, including heavy-tailed behaviors, that influence training dynamics and generalization.

Contribution

It introduces a theory of 5+1 phases of training based on spectral properties, linking implicit regularization to heavy-tailed matrix behaviors in neural networks.

Findings

01

Spectral density of DNN layers shows regularization signatures.

02

Heavy-tailed self-regularization emerges in state-of-the-art models.

03

Training phases can be manipulated via batch size.

Abstract

Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet. Empirical and theoretical results clearly indicate that the empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of regularization, such as Dropout or Weight Norm constraints. Building on recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, we develop a theory to identify \emph{5+1 Phases of Training}, corresponding to increasing amounts of \emph{Implicit Self-Regularization}. For smaller and/or older DNNs, this Implicit Self-Regularization is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Mechanics and Entropy · Neural Networks and Applications · Sparse and Compressive Sensing Techniques

Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/ · Dropout