Empirical Bayes estimation of posterior probabilities of enrichment
Zhenyu Yang, Zuojing Li, David R. Bickel

TL;DR
This paper compares different estimators for the local false discovery rate in gene enrichment analysis, recommending specific methods based on the number of categories to improve biological interpretation.
Contribution
It introduces and evaluates three estimators (SPE, NMLE, MLE) for LFDR, providing practical guidance for their use in gene enrichment studies.
Findings
MLE performs well with about 100 categories
SPE is more reliable with around 10 categories
NMLE is suitable for very few categories (~1)
Abstract
To interpret differentially expressed genes or other discovered features, researchers conduct hypothesis tests to determine which biological categories such as those of the Gene Ontology (GO) are enriched in the sense of having differential representation among the discovered features. We study application of better estimators of the local false discovery rate (LFDR), a probability that the biological category has equivalent representation among the preselected features. We identified three promising estimators of the LFDR for detecting differential representation: a semiparametric estimator (SPE), a normalized maximum likelihood estimator (NMLE), and a maximum likelihood estimator (MLE). We found that the MLE performs at least as well as the SPE for on the order of 100 of GO categories even when the ideal number of components in its underlying mixture model is unknown. However, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
