Semi-supervised multiple testing
David Mary, Etienne Roquain

TL;DR
This paper develops a null distribution-free semi-supervised multiple testing method that controls the false discovery rate using a null training sample, with theoretical bounds, power analysis, and practical applications demonstrated.
Contribution
It introduces a semi-supervised approach for multiple testing that does not require known null distribution, providing theoretical bounds and demonstrating practical relevance.
Findings
Bounds for FDR of the BH procedure based on empirical p-values
Power analysis shows low cost of ignoring null distribution when sample size is large
Negative result indicating the optimality of the empirical BH method in the semi-supervised setting
Abstract
An important limitation of standard multiple testing procedures is that the null distribution should be known. Here, we consider a null distribution-free approach for multiple testing in the following semi-supervised setting: the user does not know the null distribution, but has at hand a sample drawn from this null distribution. In practical situations, this null training sample (NTS) can come from previous experiments, from a part of the data under test, from specific simulations, or from a sampling process. In this work, we present theoretical results that handle such a framework, with a focus on the false discovery rate (FDR) control and the Benjamini-Hochberg (BH) procedure. First, we provide upper and lower bounds for the FDR of the BH procedure based on empirical -values. These bounds match when is an integer, where is the NTS sample size and is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Clinical Trials
