Semi-supervised multiple testing

David Mary; Etienne Roquain

arXiv:2106.13501·math.ST·December 8, 2022·1 cites

Semi-supervised multiple testing

David Mary, Etienne Roquain

PDF

Open Access

TL;DR

This paper develops a null distribution-free semi-supervised multiple testing method that controls the false discovery rate using a null training sample, with theoretical bounds, power analysis, and practical applications demonstrated.

Contribution

It introduces a semi-supervised approach for multiple testing that does not require known null distribution, providing theoretical bounds and demonstrating practical relevance.

Findings

01

Bounds for FDR of the BH procedure based on empirical p-values

02

Power analysis shows low cost of ignoring null distribution when sample size is large

03

Negative result indicating the optimality of the empirical BH method in the semi-supervised setting

Abstract

An important limitation of standard multiple testing procedures is that the null distribution should be known. Here, we consider a null distribution-free approach for multiple testing in the following semi-supervised setting: the user does not know the null distribution, but has at hand a sample drawn from this null distribution. In practical situations, this null training sample (NTS) can come from previous experiments, from a part of the data under test, from specific simulations, or from a sampling process. In this work, we present theoretical results that handle such a framework, with a focus on the false discovery rate (FDR) control and the Benjamini-Hochberg (BH) procedure. First, we provide upper and lower bounds for the FDR of the BH procedure based on empirical $p$ -values. These bounds match when $α (n + 1) / m$ is an integer, where $n$ is the NTS sample size and $m$ is the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods in Clinical Trials