A modern maximum-likelihood theory for high-dimensional logistic   regression

Pragya Sur; Emmanuel J. Candes

arXiv:1803.06964·math.ST·June 8, 2022

A modern maximum-likelihood theory for high-dimensional logistic regression

Pragya Sur, Emmanuel J. Candes

PDF

1 Video

TL;DR

This paper develops a new theoretical framework for high-dimensional logistic regression, revealing that classical inference methods are unreliable when the number of variables grows proportionally with the sample size, and proposes adjustments for accurate inference.

Contribution

It introduces a modern maximum-likelihood theory that accurately predicts bias, variance, and distribution of estimators in high-dimensional logistic regression, improving inference reliability.

Findings

01

MLE is biased and has greater variability than classical theory predicts.

02

Likelihood-ratio test does not follow chi-square distribution in high dimensions.

03

Proposes a procedure to adjust inference based on a single scalar parameter.

Abstract

Every student in statistics or data science learns early on that when the sample size largely exceeds the number of variables, fitting a logistic model produces estimates that are approximately unbiased. Every student also learns that there are formulas to predict the variability of these estimates which are used for the purpose of statistical inference; for instance, to produce p-values for testing the significance of regression coefficients. Although these formulas come from large sample asymptotics, we are often told that we are on reasonably safe grounds when $n$ is large in such a way that $n \geq 5 p$ or $n \geq 10 p$ . This paper shows that this is far from the case, and consequently, inferences routinely produced by common software packages are often unreliable. Consider a logistic model with independent features in which $n$ and $p$ become increasingly large in a fixed ratio. Then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Modern Maximum-Likelihood Theory for High-Dimensional Logistic Regression· youtube