Synthetic-Powered Multiple Testing with FDR Control
Yonghoon Lee, Meshi Bashari, Edgar Dobriban, Yaniv Romano

TL;DR
SynthBH is a novel multiple testing procedure that leverages synthetic data to improve power while guaranteeing FDR control in various applications, including genomics and outlier detection.
Contribution
It introduces SynthBH, a method that adaptively uses synthetic data for FDR control, ensuring finite-sample guarantees without requiring valid null p-values.
Findings
SynthBH controls FDR under positive dependence conditions.
It improves power when synthetic data quality is high.
Empirical results show effectiveness on real and simulated data.
Abstract
Multiple hypothesis testing with false discovery rate (FDR) control is a fundamental problem in statistical inference, with broad applications in genomics, drug screening, and outlier detection. In many such settings, researchers may have access not only to real experimental observations but also to auxiliary or synthetic data -- from past, related experiments or generated by generative models -- that can provide additional evidence about the hypotheses of interest. We introduce SynthBH, a synthetic-powered multiple testing procedure that safely leverages such synthetic data. We prove that SynthBH guarantees finite-sample, distribution-free FDR control under a mild PRDS-type positive dependence condition, without requiring the pooled-data p-values to be valid under the null. The proposed method adapts to the (unknown) quality of the synthetic data: it enhances the sample efficiency and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials · Advanced Causal Inference Techniques · Statistical Methods and Inference
