Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited
Wesley J. Maddox, Gregory Benton, Andrew Gordon Wilson

TL;DR
This paper revisits the concept of effective dimensionality to better understand neural network generalization, explaining phenomena like double descent and relating it to Bayesian and model complexity measures.
Contribution
It introduces effective dimensionality as a key lens to interpret neural network generalization and connects it to various theoretical and empirical aspects of deep learning.
Findings
Effective dimensionality explains double descent behaviour.
It correlates with Bayesian posterior contraction and model selection.
Outperforms norm- and flatness-based measures in predicting generalization.
Abstract
Neural networks appear to have mysterious generalization properties when using parameter counting as a proxy for complexity. Indeed, neural networks often have many more parameters than there are data points, yet still provide good generalization performance. Moreover, when we measure generalization as a function of parameters, we see double descent behaviour, where the test error decreases, increases, and then again decreases. We show that many of these properties become understandable when viewed through the lens of effective dimensionality, which measures the dimensionality of the parameter space determined by the data. We relate effective dimensionality to posterior contraction in Bayesian deep learning, model selection, width-depth tradeoffs, double descent, and functional diversity in loss surfaces, leading to a richer understanding of the interplay between parameters and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Gaussian Processes and Bayesian Inference · Model Reduction and Neural Networks
