The smooth output assumption, and why deep networks are better than wide ones
Luis Sa-Couto, Jose Miguel Ramos, Andreas Wichert

TL;DR
This paper introduces an unsupervised measure called output sharpness to predict neural network generalization, demonstrating that deep networks inherently bias against unsharp boundaries, explaining their superior performance over wide shallow networks.
Contribution
It proposes a novel measure of output sharpness for predicting generalization and provides a theoretical argument linking network depth to this measure, supporting deep networks' effectiveness.
Findings
Output sharpness correlates strongly with test performance.
Deep networks are biased against unsharp boundaries.
The measure can guide model selection and regularization.
Abstract
When several models have similar training scores, classical model selection heuristics follow Occam's razor and advise choosing the ones with least capacity. Yet, modern practice with large neural networks has often led to situations where two networks with exactly the same number of parameters score similar on the training set, but the deeper one generalizes better to unseen examples. With this in mind, it is well accepted that deep networks are superior to shallow wide ones. However, theoretically there is no difference between the two. In fact, they are both universal approximators. In this work we propose a new unsupervised measure that predicts how well a model will generalize. We call it the output sharpness, and it is based on the fact that, in reality, boundaries between concepts are generally unsharp. We test this new measure on several neural network settings, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Face and Expression Recognition · Domain Adaptation and Few-Shot Learning
MethodsTest
