ProbCD: enrichment analysis accounting for categorization uncertainty
Ricardo V\^encio, Ilya Shmulevich

TL;DR
ProbCD is an open-source R package that advances enrichment analysis by incorporating probabilistic annotations and uncertainty, moving beyond traditional static contingency tables and Fisher's Exact Test.
Contribution
It introduces a novel method for probabilistic categorical data analysis in enrichment studies, including an accessible online interface for broader use.
Findings
Enables analysis with probabilistic gene annotations.
Accounts for uncertainty in high-throughput data.
Provides a flexible framework for enrichment analysis.
Abstract
As in many other areas of science, systems biology makes extensive use of statistical association and significance estimates in contingency tables, a type of categorical data analysis known in this field as enrichment (also over-representation or enhancement) analysis. In spite of efforts to create probabilistic annotations, especially in the Gene Ontology context, or to deal with uncertainty in high throughput-based datasets, current enrichment methods largely ignore this probabilistic information since they are mainly based on variants of the Fisher Exact Test. We developed an open-source R package to deal with probabilistic categorical data analysis, ProbCD, that does not require a static contingency table. The contingency table for the enrichment problem is built using the expectation of a Bernoulli Scheme stochastic process given the categorization probabilities. An on-line…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
