On resampling methods for model assessment in penalized and unpenalized   logistic regression

Angelika Geroldinger; Lara Lusa; Mariana Nold; and Georg Heinze

arXiv:2101.07640·stat.ME·January 20, 2021·1 cites

On resampling methods for model assessment in penalized and unpenalized logistic regression

Angelika Geroldinger, Lara Lusa, Mariana Nold, and Georg Heinze

PDF

Open Access

TL;DR

This paper evaluates various resampling methods for assessing logistic regression models, revealing biases in leave-one-out crossvalidation and recommending alternative techniques for more accurate performance estimation.

Contribution

It compares the effects of different resampling techniques on model performance metrics in penalized and unpenalized logistic regression, highlighting biases and proposing better methods.

Findings

01

Leave-one-out crossvalidation biases c-statistics towards zero.

02

Bias is more severe for ridge regression estimators.

03

Leave-pair-out and five-fold crossvalidation provide more accurate estimates.

Abstract

Penalized logistic regression methods are frequently used to investigate the relationship between a binary outcome and a set of explanatory variables. The model performance can be assessed by measures such as the concordance statistic (c-statistic), the discrimination slope and the Brier score. Often, data resampling techniques, e.g. crossvalidation, are employed to correct for optimism in these model performance criteria. Especially with small samples or a rare binary outcome variable, leave-one-out crossvalidation is a popular choice. Using simulations and a real data example, we compared the effect of different resampling techniques on the estimation of c-statistics, discrimination slopes and Brier scores for three estimators of logistic regression models, including the maximum likelihood and two maximum penalized-likelihood estimators. Our simulation study confirms earlier studies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Statistical Methods and Inference · Optimal Experimental Design Methods