A more practical approach for the Benjamini-Hochberg FDR controlling   procedure for huge-scale testing problems

Vered Madar; Sandra Batista

arXiv:1501.05225·stat.ME·January 22, 2015

A more practical approach for the Benjamini-Hochberg FDR controlling procedure for huge-scale testing problems

Vered Madar, Sandra Batista

PDF

Open Access

TL;DR

This paper introduces a linear-time, memory-efficient algorithm for controlling the false discovery rate in large-scale hypothesis testing, enabling practical analysis of massive datasets without p-value ordering.

Contribution

It presents a novel algorithm that divides huge testing problems into manageable chunks, ensuring accurate FDR control with reduced computational and memory requirements.

Findings

01

Algorithm achieves linear time complexity.

02

No need for p-value sorting, simplifying large-scale testing.

03

Maintains FDR control across divided testing sets.

Abstract

We address a common problem in large-scale data analysis, and especially the field of genetics, the huge-scale testing problem, where millions to billions of hypotheses are tested together creating a computational challenge to perform multiple hypotheses testing procedures. As a solution we propose an alternative algorithm to the well used Linear Step Up procedure of Benjamini and Hochberg (1995). Our algorithm requires linear time and does not require any p-value ordering. It permits separating huge-scale testing problems arbitrarily into computationally feasible sets or chunks. Results from the chunks are combined by our algorithm to produce the same results as the controlling procedure on the entire set of tests, thus controlling the global false discovery rate even when p-values are arbitrarily divided. The practical memory usage may also be determined arbitrarily by the size of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVLSI and Analog Circuit Testing · Statistical Methods in Clinical Trials · Optimal Experimental Design Methods