Surrogate Aided Unsupervised Recovery of Sparse Signals in Single Index Models for Binary Outcomes
Abhishek Chakrabortty, Matey Neykov, Raymond Carroll, Tianxi Cai

TL;DR
This paper introduces a novel method for recovering sparse regression coefficients in single index models for binary outcomes using surrogate variables, applicable in large unlabeled datasets like electronic medical records, with theoretical guarantees and empirical validation.
Contribution
It proposes a surrogate-aided approach that leverages extreme values of a surrogate variable to recover coefficients without observing the binary outcome, under sparsity assumptions.
Findings
Effective coefficient recovery demonstrated in simulations.
Finite sample performance bounds established.
Successful application to real EMR data.
Abstract
We consider the recovery of regression coefficients, denoted by , for a single index model (SIM) relating a binary outcome to a set of possibly high dimensional covariates , based on a large but 'unlabeled' dataset , with never observed. On , we fully observe and additionally, a surrogate which, while not being strongly predictive of throughout the entirety of its support, can forecast it with high accuracy when it assumes extreme values. Such datasets arise naturally in modern studies involving large databases such as electronic medical records (EMR) where , unlike , is difficult and/or expensive to obtain. In EMR studies, an example of and would be the true disease phenotype and the count of the associated diagnostic codes respectively. Assuming another SIM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Sparse and Compressive Sensing Techniques · Face and Expression Recognition
