Deep learning generalizes because the parameter-function map is biased towards simple functions
Guillermo Valle-P\'erez, Chico Q. Camargo, Ard A. Louis

TL;DR
This paper explains why deep neural networks generalize well by showing their parameter-function map is biased towards simple functions, supported by theoretical bounds and empirical evidence across various architectures and datasets.
Contribution
It introduces a new explanation for DNN generalization based on a bias towards simple functions, supported by algorithmic information theory and PAC-Bayes bounds.
Findings
Parameter-function map is exponentially biased towards simple functions.
Empirical evidence of simplicity bias in various neural network architectures.
PAC-Bayes bounds correlate with actual error on datasets like MNIST and CIFAR10.
Abstract
Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in the strongly over-parametrized regime where classical learning theory would instead predict that they would severely overfit. While many proposals for some kind of implicit regularization have been made to rationalise this success, there is no consensus for the fundamental reason why DNNs do not strongly overfit. In this paper, we provide a new explanation. By applying a very general probability-complexity bound recently derived from algorithmic information theory (AIT), we argue that the parameter-function map of many DNNs should be exponentially biased towards simple functions. We then provide clear evidence for this strong simplicity bias in a model DNN for Boolean functions, as well as in much larger fully connected and convolutional networks applied to CIFAR10 and MNIST. As the target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Neural Networks and Applications · Generative Adversarial Networks and Image Synthesis
