High-dimensional logistic regression with missing data: Imputation, regularization, and universality
Kabir Aladin Verchand, Andrea Montanari

TL;DR
This paper provides exact and universal characterizations of prediction and estimation errors in high-dimensional ridge-regularized logistic regression with missing or noisy data, and compares imputation strategies to the Bayes optimal.
Contribution
It introduces universal error characterizations under broad conditions and analyzes the effectiveness of imputation strategies combined with regularization in high-dimensional settings.
Findings
Regularization with ridge improves imputation-based logistic regression performance.
Single imputation with ridge regularization approaches Bayes optimal error.
Universality of error characterizations applies under broad data distribution conditions.
Abstract
We study high-dimensional, ridge-regularized logistic regression in a setting in which the covariates may be missing or corrupted by additive noise. When both the covariates and the additive corruptions are independent and normally distributed, we provide exact characterizations of both the prediction error as well as the estimation error. Moreover, we show that these characterizations are universal: as long as the entries of the data matrix satisfy a set of independence and moment conditions, our guarantees continue to hold. Universality, in turn, enables the detailed study of several imputation-based strategies when the covariates are missing completely at random. We ground our study by comparing the performance of these strategies with the conjectured performance -- stemming from replica theory in statistical physics -- of the Bayes optimal procedure. Our analysis yields several…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Statistical Methods and Inference · Face and Expression Recognition
MethodsSparse Evolutionary Training · Logistic Regression
