The Randomness of Input Data Spaces is an A Priori Predictor for Generalization
Martin Briesch, Dominik Sobania, Franz Rothlauf

TL;DR
This paper investigates how the inherent randomness in input data spaces, measured by Maurer's universal, predicts the generalization performance of deep neural networks across synthetic and real-world datasets.
Contribution
It introduces a method to quantify input data space randomness and demonstrates its strong correlation with neural network generalization error.
Findings
Higher data space randomness correlates with increased generalization error.
The proposed measure predicts generalization performance across multiple datasets.
Results are consistent for synthetic and real-world image classification tasks.
Abstract
Over-parameterized models can perfectly learn various types of data distributions, however, generalization error is usually lower for real data in comparison to artificial data. This suggests that the properties of data distributions have an impact on generalization capability. This work focuses on the search space defined by the input data and assumes that the correlation between labels of neighboring input values influences generalization. If correlation is low, the randomness of the input data space is high leading to high generalization error. We suggest to measure the randomness of an input data space using Maurer's universal. Results for synthetic classification tasks and common image classification benchmarks (MNIST, CIFAR10, and Microsoft's cats vs. dogs data set) find a high correlation between the randomness of input data spaces and the generalization error of deep neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Neural Networks and Applications · Machine Learning and Algorithms
