A Constructive Prediction of the Generalization Error Across Scales
Jonathan S. Rosenfeld, Amir Rosenfeld, Yonatan Belinkov, Nir Shavit

TL;DR
This paper proposes a functional form to predict the generalization error of neural networks across different model and dataset sizes, validated through extensive empirical observations in vision and language tasks.
Contribution
It introduces a new predictive model for generalization error that applies across scales and specifies the models capable of achieving it, based on empirical insights.
Findings
The proposed form accurately fits observed generalization errors across various scales.
It enables precise predictions of generalization error from small to large models and datasets.
The approach is validated on diverse model types and tasks in vision and language domains.
Abstract
The dependency of the generalization error of neural networks on model and dataset size is of critical importance both in practice and for understanding the theory of neural networks. Nevertheless, the functional form of this dependency remains elusive. In this work, we present a functional form which approximates well the generalization error in practice. Capitalizing on the successful concept of model scaling (e.g., width, depth), we are able to simultaneously construct such a form and specify the exact models which can attain it across model/data scales. Our construction follows insights obtained from observations conducted over a range of model/data scales, in various model types and datasets, in vision and language tasks. We show that the form both fits the observations well across scales, and provides accurate predictions from small- to large-scale models and data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Neural Networks and Applications
