Mapping beyond diseases: Controlled variable selection for secondary phenotypes using tilted knockoffs

Qian Zhao; Susan Service; Carrie E. Bearden; Carlos Lopez-Jaramillo; Nelson Freimer; Chiara Sabatti

arXiv:2508.18548·stat.ME·October 1, 2025

Mapping beyond diseases: Controlled variable selection for secondary phenotypes using tilted knockoffs

Qian Zhao, Susan Service, Carrie E. Bearden, Carlos Lopez-Jaramillo, Nelson Freimer, Chiara Sabatti

PDF

TL;DR

This paper introduces a method using tilted knockoffs to control false discovery rate when selecting important variables in biased samples, such as case-control studies, ensuring valid secondary phenotype analysis.

Contribution

It develops a novel tilted knockoff approach that accounts for biased sampling, enabling reliable variable selection with FDR control in complex biomedical studies.

Findings

01

Tilted knockoffs effectively control FDR in biased sampling scenarios.

02

The method demonstrates good power in simulated examples.

03

Application to genetic data reveals meaningful secondary phenotypes.

Abstract

Researchers in biomedical studies often work with samples that are not selected uniformly at random from the population of interest, a major example being a case-control study. While these designs are motivated by specific scientific questions, it is often of interest to use the data collected to pursue secondary lines of investigations. In these cases, ignoring the fact that observations are not sampled uniformly at random can lead to spurious results. For example, in a case-control study, one might identify a spurious association between an exposure and a secondary phenotype when both affect the case-control status. This phenomenon is known as collider bias in the causal inference literature. While tests of independence under biased sampling are available, these methods typically do not apply when the number of variables is large. Here, we are interested in using the biased sample…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.