Double descent for least-squares interpolation on contaminated data: A simulation study
Tino Werner

TL;DR
This simulation study investigates the double descent phenomenon in overparametrized linear regression models with contaminated data, showing that least-squares interpolation can outperform robust methods in generalization.
Contribution
It demonstrates that double descent occurs in contaminated data settings and that least-squares interpolation can outperform robust alternatives in overparametrized linear regression.
Findings
Double descent observed in contaminated data with overparametrized models.
Least-squares interpolation surpasses robust methods in generalization performance.
Overparametrization enables good generalization despite data contamination.
Abstract
Overparametrized models can exhibit an excellent generalization performance, although they should be prone to overfitting according to classical statistical theory. The discovery of the "double descent", indicating that the generalization error decreases after a certain model complexity has been reached, opened a new line of research. Robust statistics considers statistical estimation on contaminated data, which, due to assumptions that do not hold on real data, let data points appear as outliers w.r.t. the assumed "ideal" distribution, potentially severely distorting any classical estimator. We address the question whether a double descent phenomenon can be observed in a linear regression setting with contaminated training data. We compare the performance of the highly non-robust least-squares interpolation estimator with several robust alternatives. It turns out that large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
