No Free Lunch From Random Feature Ensembles: Scaling Laws and Near-Optimality Conditions
Benjamin S. Ruben, William L. Tong, Hamza Tahir Chaudhry, Cengiz Pehlevan

TL;DR
This paper analyzes the trade-offs between single large models and ensembles of smaller models in random-feature ridge regression, showing that larger models generally perform better unless specific conditions favor ensembles, especially in overparameterized regimes.
Contribution
It provides theoretical scaling laws and optimality conditions for ensembles versus single models in high-dimensional ridge regression, highlighting when ensembles can be near-optimal.
Findings
Single large models outperform ensembles with fixed total parameters.
Overparameterized ensembles can achieve near-optimal performance.
Scaling laws depend on kernel and task eigenstructure.
Abstract
Given a fixed budget for total model size, one must choose between training a single large model or combining the predictions of multiple smaller models. We investigate this trade-off for ensembles of random-feature ridge regression models in both the overparameterized and underparameterized regimes. Using deterministic equivalent risk estimates, we prove that when a fixed number of parameters is distributed among independently trained models, the ridge-optimized test risk increases with . Consequently, a single large model achieves optimal performance. We then ask when ensembles can achieve \textit{near}-optimal performance. In the overparameterized regime, we show that, to leading order, the test error depends on ensemble size and model size only through the total feature count, so that overparameterized ensembles consistently achieve near-optimal performance. To understand…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Data Mining Algorithms and Applications · Data Management and Algorithms
MethodsWeight Decay
