Rethinking Parameter Counting in Deep Models: Effective Dimensionality   Revisited

Wesley J. Maddox; Gregory Benton; Andrew Gordon Wilson

arXiv:2003.02139·cs.LG·May 26, 2020·28 cites

Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited

Wesley J. Maddox, Gregory Benton, Andrew Gordon Wilson

PDF

Open Access 1 Repo

TL;DR

This paper revisits the concept of effective dimensionality to better understand neural network generalization, explaining phenomena like double descent and relating it to Bayesian and model complexity measures.

Contribution

It introduces effective dimensionality as a key lens to interpret neural network generalization and connects it to various theoretical and empirical aspects of deep learning.

Findings

01

Effective dimensionality explains double descent behaviour.

02

It correlates with Bayesian posterior contraction and model selection.

03

Outperforms norm- and flatness-based measures in predicting generalization.

Abstract

Neural networks appear to have mysterious generalization properties when using parameter counting as a proxy for complexity. Indeed, neural networks often have many more parameters than there are data points, yet still provide good generalization performance. Moreover, when we measure generalization as a function of parameters, we see double descent behaviour, where the test error decreases, increases, and then again decreases. We show that many of these properties become understandable when viewed through the lens of effective dimensionality, which measures the dimensionality of the parameter space determined by the data. We relate effective dimensionality to posterior contraction in Bayesian deep learning, model selection, width-depth tradeoffs, double descent, and functional diversity in loss surfaces, leading to a richer understanding of the interplay between parameters and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

g-benton/hessian-eff-dim
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Gaussian Processes and Bayesian Inference · Model Reduction and Neural Networks