The smooth output assumption, and why deep networks are better than wide   ones

Luis Sa-Couto; Jose Miguel Ramos; Andreas Wichert

arXiv:2211.14347·cs.LG·November 29, 2022

The smooth output assumption, and why deep networks are better than wide ones

Luis Sa-Couto, Jose Miguel Ramos, Andreas Wichert

PDF

Open Access

TL;DR

This paper introduces an unsupervised measure called output sharpness to predict neural network generalization, demonstrating that deep networks inherently bias against unsharp boundaries, explaining their superior performance over wide shallow networks.

Contribution

It proposes a novel measure of output sharpness for predicting generalization and provides a theoretical argument linking network depth to this measure, supporting deep networks' effectiveness.

Findings

01

Output sharpness correlates strongly with test performance.

02

Deep networks are biased against unsharp boundaries.

03

The measure can guide model selection and regularization.

Abstract

When several models have similar training scores, classical model selection heuristics follow Occam's razor and advise choosing the ones with least capacity. Yet, modern practice with large neural networks has often led to situations where two networks with exactly the same number of parameters score similar on the training set, but the deeper one generalizes better to unseen examples. With this in mind, it is well accepted that deep networks are superior to shallow wide ones. However, theoretically there is no difference between the two. In fact, they are both universal approximators. In this work we propose a new unsupervised measure that predicts how well a model will generalize. We call it the output sharpness, and it is based on the fact that, in reality, boundaries between concepts are generally unsharp. We test this new measure on several neural network settings, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Face and Expression Recognition · Domain Adaptation and Few-Shot Learning

MethodsTest