Implicit Self-Regularization in Deep Neural Networks: Evidence from   Random Matrix Theory and Implications for Learning

Charles H. Martin; Michael W. Mahoney

arXiv:1810.01075·cs.LG·October 3, 2018·74 cites

Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning

Charles H. Martin, Michael W. Mahoney

PDF

Open Access 3 Repos

TL;DR

This paper uses Random Matrix Theory to analyze the spectral properties of DNN weight matrices, revealing that training inherently induces implicit self-regularization, which varies with training parameters like batch size and impacts generalization.

Contribution

It introduces a theory identifying 5+1 phases of implicit self-regularization in DNNs, linking spectral properties to training dynamics and generalization phenomena.

Findings

01

Spectral density of DNN weights shows signs of regularization without explicit methods.

02

Training induces implicit self-regularization, observable through spectral phases.

03

Batch size influences the degree of implicit regularization and generalization gap.

Abstract

Random Matrix Theory (RMT) is applied to analyze weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet. Empirical and theoretical results clearly indicate that the DNN training process itself implicitly implements a form of Self-Regularization. The empirical spectral density (ESD) of DNN layer matrices displays signatures of traditionally-regularized statistical models, even in the absence of exogenously specifying traditional forms of explicit regularization. Building on relatively recent results in RMT, most notably its extension to Universality classes of Heavy-Tailed matrices, we develop a theory to identify 5+1 Phases of Training, corresponding to increasing amounts of Implicit Self-Regularization. These phases can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Mechanics and Entropy · Face and Expression Recognition · Random Matrices and Applications

Methods1x1 Convolution · Convolution · Local Response Normalization · Grouped Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Dense Connections · Max Pooling · Softmax · How do I speak to a person at Expedia?-/+/