Statistical tests for the intersection of independent lists of genes: Sensitivity, FDR, and type I error control
Loki Natarajan, Minya Pu, Karen Messer

TL;DR
This paper develops a formal statistical framework for analyzing the intersection of gene lists from multiple studies, controlling false discovery rate and type I error, and optimizing sensitivity in genomic research.
Contribution
It introduces a novel p-value computation method for list intersection tests using a Poisson approximation, with practical guidelines for study design.
Findings
Provides a closed-form p-value for gene list intersection significance.
Analyzes the trade-off between FDR control and sensitivity.
Demonstrates application on prostate cancer gene-expression data.
Abstract
Public data repositories have enabled researchers to compare results across multiple genomic studies in order to replicate findings. A common approach is to first rank genes according to an hypothesis of interest within each study. Then, lists of the top-ranked genes within each study are compared across studies. Genes recaptured as highly ranked (usually above some threshold) in multiple studies are considered to be significant. However, this comparison strategy often remains informal, in that type I error and false discovery rate (FDR) are usually uncontrolled. In this paper, we formalize an inferential strategy for this kind of list-intersection discovery test. We show how to compute a -value associated with a "recaptured" set of genes, using a closed-form Poisson approximation to the distribution of the size of the recaptured set. We investigate operating characteristics of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
