Robust and accurate data enrichment statistics via distribution function of sum of weights
Aleksandar Stojmirovi\'c, Yi-Kuo Yu

TL;DR
SaddleSum is a novel, statistically rigorous method for term enrichment analysis that accurately assesses significance for small and large sets of entities by leveraging distribution functions of sum of weights.
Contribution
It introduces SaddleSum, a universal enrichment analysis method that overcomes limitations of existing approaches by using the Lugananni-Rice formula and asymptotic approximation.
Findings
SaddleSum provides stable significance scores regardless of the number of entities.
The method accurately assesses significance for small and large terms.
SaddleSum outperforms existing methods in stability and accuracy.
Abstract
Term enrichment analysis facilitates biological interpretation by assigning to experimentally/computationally obtained data annotation associated with terms from controlled vocabularies. This process usually involves obtaining statistical significance for each vocabulary term and using the most significant terms to describe a given set of biological entities, often associated with weights. Many existing enrichment methods require selections of (arbitrary number of) the most significant entities and/or do not account for weights of entities. Others either mandate extensive simulations to obtain statistics or assume normal weight distribution. In addition, most methods have difficulty assigning correct statistical significance to terms with few entities. Implementing the well-known Lugananni-Rice formula, we have developed a novel approach, called SaddleSum, that is free from all the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
