Mutual Enrichment in Ranked Lists and the Statistical Assessment of Position Weight Matrix Motifs
Limor Leibovich, Zohar Yakhini

TL;DR
This paper introduces a statistical framework for assessing the enrichment of position weight matrix motifs in ranked lists, addressing a gap in flexible motif analysis in molecular biology data like ChIP-seq.
Contribution
It develops tight upper bounds on tail distributions of permutation intersections and implements a software tool, mmHG-Finder, for PWM enrichment analysis.
Findings
Effective bounds on permutation tail distributions
Successful application to biological datasets
Enhanced PWM motif detection capabilities
Abstract
Statistics in ranked lists is important in analyzing molecular biology measurement data, such as ChIP-seq, which yields ranked lists of genomic sequences. State of the art methods study fixed motifs in ranked lists. More flexible models such as position weight matrix (PWM) motifs are not addressed in this context. To assess the enrichment of a PWM motif in a ranked list we use a PWM induced second ranking on the same set of elements. Possible orders of one ranked list relative to the other are modeled by permutations. Due to sample space complexity, it is difficult to characterize tail distributions in the group of permutations. In this paper we develop tight upper bounds on tail distributions of the size of the intersection of the top of two uniformly and independently drawn permutations and demonstrate advantages of this approach using our software implementation, mmHG-Finder, to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Genomics and Chromatin Dynamics · Algorithms and Data Compression
