More is Better in Modern Machine Learning: when Infinite   Overparameterization is Optimal and Overfitting is Obligatory

James B. Simon; Dhruva Karkada; Nikhil Ghosh; Mikhail Belkin

arXiv:2311.14646·cs.LG·May 17, 2024·5 cites

More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory

James B. Simon, Dhruva Karkada, Nikhil Ghosh, Mikhail Belkin

PDF

Open Access

TL;DR

This paper provides theoretical evidence that in modern machine learning, larger models, more data, and overfitting are not only beneficial but often necessary for optimal performance, especially in overparameterized regimes.

Contribution

The paper proves that infinite-width random feature models outperform finite ones and shows that near-zero training error is essential for optimal results in certain tasks.

Findings

01

Test risk decreases with more features and data when regularization is tuned.

02

Infinite width models are preferable to finite width.

03

Overfitting and near-zero training error are necessary for optimal performance in some tasks.

Abstract

In our era of enormous neural networks, empirical progress has been driven by the philosophy that more is better. Recent deep learning practice has found repeatedly that larger model size, more data, and more computation (resulting in lower training loss) improves performance. In this paper, we give theoretical backing to these empirical observations by showing that these three properties hold in random feature (RF) regression, a class of models equivalent to shallow networks with only the last layer trained. Concretely, we first show that the test risk of RF regression decreases monotonically with both the number of features and the number of samples, provided the ridge penalty is tuned optimally. In particular, this implies that infinite width RF architectures are preferable to those of any finite width. We then proceed to demonstrate that, for a large class of tasks characterized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Domain Adaptation and Few-Shot Learning · Machine Learning and Algorithms