Learning Capacity: A Measure of the Effective Dimensionality of a Model
Daiwei Chen, Wei-Kai Chang, Pratik Chaudhari

TL;DR
This paper introduces the concept of learning capacity as a measure of a model's effective dimensionality, linking it to generalization and data requirements, applicable across various model types.
Contribution
It formalizes learning capacity using thermodynamics-inference analogy, demonstrating its correlation with test loss and its utility in guiding data and architecture choices.
Findings
Learning capacity correlates with test loss.
It is a small fraction of total parameters in deep networks.
Learning capacity saturates at extreme sample sizes.
Abstract
We use a formal correspondence between thermodynamics and inference, where the number of samples can be thought of as the inverse temperature, to study a quantity called ``learning capacity'' which is a measure of the effective dimensionality of a model. We show that the learning capacity is a useful notion of the complexity because (a) it correlates well with the test loss and it is a tiny fraction of the number of parameters for many deep networks trained on typical datasets, (b) it depends upon the number of samples used for training, (c) it is numerically consistent with notions of capacity obtained from PAC-Bayes generalization bounds, and (d) the test loss as a function of the learning capacity does not exhibit double descent. We show that the learning capacity saturates at very small and very large sample sizes; the threshold that characterizes the transition between these two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Machine Learning in Materials Science · Neural Networks and Applications
MethodsTest
