Searching for subgroup-specific associations while controlling the false discovery rate
Matteo Sesia, Tianshu Sun

TL;DR
This paper presents a new method for subgroup-specific association discovery in high-dimensional data, controlling false discoveries and leveraging machine learning models without sample splitting.
Contribution
It extends the model-X knockoff filter to enable automated, subgroup-aware hypothesis testing with rigorous FDR control, improving interpretability and efficiency.
Findings
Effective in simulations and real data experiments
Controls false discovery rate in subgroup analyses
Leverages machine learning models for hypothesis generation
Abstract
This paper introduces an innovative method for conducting conditional independence testing in high-dimensional data, facilitating the automated discovery of significant associations within distinct subgroups of a population, all while controlling the false discovery rate. This is achieved by expanding upon the model-X knockoff filter to provide more informative inferences. Our enhanced inferences can help explain sample heterogeneity and uncover interactions, making better use of the capabilities offered by modern machine learning models. Specifically, our method is able to leverage any model for the identification of data-driven hypotheses pertaining to interesting population subgroups. Then, it rigorously test these hypotheses without succumbing to selection bias. Importantly, our approach is efficient and does not require sample splitting. We demonstrate the effectiveness of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Mental Health Research Topics · Statistical Methods and Bayesian Inference
