Estimating the Probability of Sampling a Trained Neural Network at Random
Adam Scherlis, Nora Belrose

TL;DR
This paper introduces an improved algorithm for estimating the local volume in neural network parameter space, which correlates with generalization ability and network complexity, aiding interpretability.
Contribution
It adapts an existing basin-volume estimator with importance sampling using gradient info, providing a faster, more accurate measure of local volume in neural networks.
Findings
Smaller local volumes are associated with overfitting and poor generalization.
The local volume increases during language model training, indicating growing complexity.
The proposed estimator offers a practical metric for network complexity and inductive bias.
Abstract
We present and analyze an algorithm for estimating the size, under a Gaussian or uniform measure, of a localized neighborhood in neural network parameter space with behavior similar to an ``anchor'' point. We refer to this as the "local volume" of the anchor. We adapt an existing basin-volume estimator, which is very fast but in many cases only provides a lower bound. We show that this lower bound can be improved with an importance-sampling method using gradient information that is already provided by popular optimizers. The negative logarithm of local volume can also be interpreted as a measure of the anchor network's information content. As expected for a measure of complexity, this quantity increases during language model training. We find that overfit, badly-generalizing neighborhoods are smaller, indicating a more complex learned behavior. This smaller volume can also be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
