PULasso: High-dimensional variable selection with presence-only data
Hyebin Song, Garvesh Raskutti

TL;DR
This paper introduces PUlasso, an algorithm for variable selection and classification in high-dimensional presence-only data, addressing statistical and computational challenges with theoretical guarantees and superior performance in simulations and real data.
Contribution
We develop PUlasso, a scalable algorithm using MM framework for variable selection in presence-only data, with convergence guarantees and minimax optimal error bounds.
Findings
PUlasso outperforms existing algorithms in moderate p settings.
Theoretical convergence to stationary points with optimal error bounds.
Effective in real biochemistry data analysis.
Abstract
In various real-world problems, we are presented with classification problems with positive and unlabeled data, referred to as presence-only responses. In this paper, we study variable selection in the context of presence only responses where the number of features or covariates p is large. The combination of presence-only responses and high dimensionality presents both statistical and computational challenges. In this paper, we develop the PUlasso algorithm for variable selection and classification with positive and unlabeled responses. Our algorithm involves using the majorization-minimization (MM) framework which is a generalization of the well-known expectation-maximization (EM) algorithm. In particular to make our algorithm scalable, we provide two computational speed-ups to the standard EM algorithm. We provide a theoretical guarantee where we first show that our algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and Algorithms · Advanced Statistical Methods and Models
