A Conditional Randomization Test for Sparse Logistic Regression in High-Dimension
Binh T. Nguyen, Bertrand Thirion, Sylvain Arlot

TL;DR
This paper introduces CRT-logit, an efficient and theoretically grounded method for variable selection in high-dimensional sparse logistic regression, improving inference accuracy and computational speed.
Contribution
It develops CRT-logit, a novel algorithm combining variable distillation and decorrelation tailored for high-dimensional logistic regression, with theoretical guarantees.
Findings
CRT-logit outperforms existing methods in simulation studies.
It achieves accurate variable selection with controlled error rates.
Demonstrates effectiveness on large-scale brain-imaging and genomics datasets.
Abstract
Identifying the relevant variables for a classification model with correct confidence levels is a central but difficult task in high-dimension. Despite the core role of sparse logistic regression in statistics and machine learning, it still lacks a good solution for accurate inference in the regime where the number of features is as large as or larger than the number of samples . Here, we tackle this problem by improving the Conditional Randomization Test (CRT). The original CRT algorithm shows promise as a way to output p-values while making few assumptions on the distribution of the test statistics. As it comes with a prohibitive computational cost even in mildly high-dimensional problems, faster solutions based on distillation have been proposed. Yet, they rely on unrealistic hypotheses and result in low-power solutions. To improve this, we propose \emph{CRT-logit}, an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Statistical Methods and Inference · Machine Learning and Data Classification
MethodsLogistic Regression
