Deep learning generalizes because the parameter-function map is biased   towards simple functions

Guillermo Valle-P\'erez; Chico Q. Camargo; Ard A. Louis

arXiv:1805.08522·stat.ML·April 23, 2019·86 cites

Deep learning generalizes because the parameter-function map is biased towards simple functions

Guillermo Valle-P\'erez, Chico Q. Camargo, Ard A. Louis

PDF

Open Access

TL;DR

This paper explains why deep neural networks generalize well by showing their parameter-function map is biased towards simple functions, supported by theoretical bounds and empirical evidence across various architectures and datasets.

Contribution

It introduces a new explanation for DNN generalization based on a bias towards simple functions, supported by algorithmic information theory and PAC-Bayes bounds.

Findings

01

Parameter-function map is exponentially biased towards simple functions.

02

Empirical evidence of simplicity bias in various neural network architectures.

03

PAC-Bayes bounds correlate with actual error on datasets like MNIST and CIFAR10.

Abstract

Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in the strongly over-parametrized regime where classical learning theory would instead predict that they would severely overfit. While many proposals for some kind of implicit regularization have been made to rationalise this success, there is no consensus for the fundamental reason why DNNs do not strongly overfit. In this paper, we provide a new explanation. By applying a very general probability-complexity bound recently derived from algorithmic information theory (AIT), we argue that the parameter-function map of many DNNs should be exponentially biased towards simple functions. We then provide clear evidence for this strong simplicity bias in a model DNN for Boolean functions, as well as in much larger fully connected and convolutional networks applied to CIFAR10 and MNIST. As the target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Neural Networks and Applications · Generative Adversarial Networks and Image Synthesis