Controlling FDR in selecting group-level simultaneous signals from multiple data sources with application to the National Covid Collaborative Cohort data
Runqiu Wang, Ran Dai, Hongying Dai, Evan French, Cheng Zheng (on, behalf of N3C consortium)

TL;DR
This paper introduces a knockoff-based method for controlling the false discovery rate when identifying group-level signals across multiple data sources, demonstrated with COVID cohort data.
Contribution
It develops a novel FDR-controlling algorithm for union of group-level tests applicable to heterogeneous multi-source data, with theoretical guarantees.
Findings
Exact FDR control achieved in simulations
Effective identification of true signals in N3C data
Method robust to heterogeneity across sources
Abstract
One challenge in exploratory association studies using observational data is that the associations between the predictors and the outcome are potentially weak and rare, and the candidate predictors have complex correlation structures. False discovery rate (FDR) controlling procedures can provide important statistical guarantees for replicability in predictor identification in exploratory research. In the recently established National COVID Collaborative Cohort (N3C), electronic health record (EHR) data on the same set of candidate predictors are independently collected in multiple different sites, offering opportunities to identify true associations by combining information from different sources. This paper presents a general knockoff-based variable selection algorithm to identify associations from unions of group-level conditional independence tests (simultaneous signals) with exact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health Research Topics · Advanced Causal Inference Techniques · Health, Environment, Cognitive Aging
