The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression
Emmanuel J. Candes, Pragya Sur

TL;DR
This paper characterizes a sharp phase transition in high-dimensional logistic regression, determining when the maximum likelihood estimate exists based on the ratio of features to samples and the magnitude of coefficients.
Contribution
It introduces an explicit boundary curve that predicts the existence of the MLE in high-dimensional settings with Gaussian covariates.
Findings
Existence of MLE sharply transitions at the boundary curve
If feature-to-sample ratio exceeds the boundary, MLE almost surely does not exist
If below the boundary, MLE almost surely exists
Abstract
This paper rigorously establishes that the existence of the maximum likelihood estimate (MLE) in high-dimensional logistic regression models with Gaussian covariates undergoes a sharp `phase transition'. We introduce an explicit boundary curve , parameterized by two scalars measuring the overall magnitude of the unknown sequence of regression coefficients, with the following property: in the limit of large sample sizes and number of features proportioned in such a way that , we show that if the problem is sufficiently high dimensional in the sense that , then the MLE does not exist with probability one. Conversely, if , the MLE asymptotically exists with probability one.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLogistic Regression
