Correction of overfitting bias in regression models
Emanuele Massa, Marianne Jonker, Kit Roes, Anthony Coolen

TL;DR
This paper introduces a jackknife-based method to quantify and correct overfitting bias in maximum likelihood regression estimators when the number of covariates is comparable to the number of observations, improving inference accuracy.
Contribution
It develops a new set of non-linear equations for the statistical properties of ML estimators in high-dimensional settings, enabling bias correction without relying on the replica method.
Findings
The equations accurately predict overfitting bias in simulations.
Shrinkage factors effectively remove bias in various regression models.
Method offers transparent bias correction with minimal assumptions.
Abstract
Regression analysis based on many covariates is becoming increasingly common. However, when the number of covariates is of the same order as the number of observations , maximum likelihood regression becomes unreliable due to overfitting. This typically leads to systematic estimation biases and increased estimator variances. It is crucial for inference and prediction to quantify these effects correctly. Several methods have been proposed in literature to overcome overfitting bias or adjust estimates. The vast majority of these focus on the regression parameters. But failure to estimate correctly also the nuisance parameters may lead to significant errors in confidence statements and outcome prediction. In this paper we present a jacknife method for deriving a compact set of non-linear equations which describe the statistical properties of the ML estimator in the regime where…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Statistical Methods and Inference · Advanced Statistical Methods and Models
