Rethinking Neural Networks With Benford's Law
Surya Kant Sahu, Abhinav Java, Arshad Shaikh, Yannic Kilcher

TL;DR
This paper explores the relationship between neural network parameter distributions and generalization, introducing a metric based on Benford's Law that predicts accuracy and enables validation-free early stopping.
Contribution
It introduces MLH, a novel metric based on Benford's Law, linking parameter distributions to generalization and proposing validation-free early stopping methods.
Findings
MLH strongly predicts validation accuracy.
Validation-free early stopping can outperform traditional methods.
MLH relates to thermodynamic principles in learning systems.
Abstract
Benford's Law (BL) or the Significant Digit Law defines the probability distribution of the first digit of numerical values in a data sample. This Law is observed in many naturally occurring datasets. It can be seen as a measure of naturalness of a given distribution and finds its application in areas like anomaly and fraud detection. In this work, we address the following question: Is the distribution of the Neural Network parameters related to the network's generalization capability? To that end, we first define a metric, MLH (Model Enthalpy), that measures the closeness of a set of numbers to Benford's Law and we show empirically that it is a strong predictor of Validation Accuracy. Second, we use MLH as an alternative to Validation Accuracy for Early Stopping, removing the need for a Validation set. We provide experimental evidence that even if the optimal size of the validation set…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBenford’s Law and Fraud Detection · Statistical Mechanics and Entropy · Explainable Artificial Intelligence (XAI)
MethodsEarly Stopping · Average Pooling · Batch Normalization · Kaiming Initialization · Global Average Pooling · Residual Connection · ResNeXt Block · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Grouped Convolution
