On resampling methods for model assessment in penalized and unpenalized logistic regression
Angelika Geroldinger, Lara Lusa, Mariana Nold, and Georg Heinze

TL;DR
This paper evaluates various resampling methods for assessing logistic regression models, revealing biases in leave-one-out crossvalidation and recommending alternative techniques for more accurate performance estimation.
Contribution
It compares the effects of different resampling techniques on model performance metrics in penalized and unpenalized logistic regression, highlighting biases and proposing better methods.
Findings
Leave-one-out crossvalidation biases c-statistics towards zero.
Bias is more severe for ridge regression estimators.
Leave-pair-out and five-fold crossvalidation provide more accurate estimates.
Abstract
Penalized logistic regression methods are frequently used to investigate the relationship between a binary outcome and a set of explanatory variables. The model performance can be assessed by measures such as the concordance statistic (c-statistic), the discrimination slope and the Brier score. Often, data resampling techniques, e.g. crossvalidation, are employed to correct for optimism in these model performance criteria. Especially with small samples or a rare binary outcome variable, leave-one-out crossvalidation is a popular choice. Using simulations and a real data example, we compared the effect of different resampling techniques on the estimation of c-statistics, discrimination slopes and Brier scores for three estimators of logistic regression models, including the maximum likelihood and two maximum penalized-likelihood estimators. Our simulation study confirms earlier studies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Statistical Methods and Inference · Optimal Experimental Design Methods
