Classical Statistical (In-Sample) Intuitions Don't Generalize Well: A Note on Bias-Variance Tradeoffs, Overfitting and Moving from Fixed to Random Designs
Alicia Curth

TL;DR
This paper explains why classical statistical intuitions about bias-variance tradeoffs and overfitting do not always apply to modern machine learning, emphasizing the importance of fixed versus random design considerations.
Contribution
It clarifies how shifting from fixed to random design settings fundamentally alters classical intuitions about overfitting and generalization in machine learning.
Findings
Fixed design intuitions do not generalize to random designs.
Double descent phenomena are less observable in fixed design settings.
Bias-variance tradeoff behaviors differ significantly between fixed and random designs.
Abstract
The sudden appearance of modern machine learning (ML) phenomena like double descent and benign overfitting may leave many classically trained statisticians feeling uneasy -- these phenomena appear to go against the very core of statistical intuitions conveyed in any introductory class on learning from data. The historical lack of earlier observation of such phenomena is usually attributed to today's reliance on more complex ML methods, overparameterization, interpolation and/or higher data dimensionality. In this note, we show that there is another reason why we observe behaviors today that appear at odds with intuitions taught in classical statistics textbooks, which is much simpler to understand yet rarely discussed explicitly. In particular, many intuitions originate in fixed design settings, in which in-sample prediction error (under resampling of noisy outcomes) is of interest,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials · Bayesian Modeling and Causal Inference · Statistics Education and Methodologies
