Classical Statistical (In-Sample) Intuitions Don't Generalize Well: A   Note on Bias-Variance Tradeoffs, Overfitting and Moving from Fixed to Random   Designs

Alicia Curth

arXiv:2409.18842·stat.ML·September 30, 2024

Classical Statistical (In-Sample) Intuitions Don't Generalize Well: A Note on Bias-Variance Tradeoffs, Overfitting and Moving from Fixed to Random Designs

Alicia Curth

PDF

Open Access

TL;DR

This paper explains why classical statistical intuitions about bias-variance tradeoffs and overfitting do not always apply to modern machine learning, emphasizing the importance of fixed versus random design considerations.

Contribution

It clarifies how shifting from fixed to random design settings fundamentally alters classical intuitions about overfitting and generalization in machine learning.

Findings

01

Fixed design intuitions do not generalize to random designs.

02

Double descent phenomena are less observable in fixed design settings.

03

Bias-variance tradeoff behaviors differ significantly between fixed and random designs.

Abstract

The sudden appearance of modern machine learning (ML) phenomena like double descent and benign overfitting may leave many classically trained statisticians feeling uneasy -- these phenomena appear to go against the very core of statistical intuitions conveyed in any introductory class on learning from data. The historical lack of earlier observation of such phenomena is usually attributed to today's reliance on more complex ML methods, overparameterization, interpolation and/or higher data dimensionality. In this note, we show that there is another reason why we observe behaviors today that appear at odds with intuitions taught in classical statistics textbooks, which is much simpler to understand yet rarely discussed explicitly. In particular, many intuitions originate in fixed design settings, in which in-sample prediction error (under resampling of noisy outcomes) is of interest,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods in Clinical Trials · Bayesian Modeling and Causal Inference · Statistics Education and Methodologies