Expected Density of Random Minimizers
Shay Golan, Arseny M. Shur

TL;DR
This paper analyzes the expected density of random minimizers in biological string sampling, providing formulas and algorithms for efficient computation, and exploring their behavior across different parameters.
Contribution
It introduces new formulas and algorithms to compute the expected density of random minimizers efficiently and explores their theoretical properties.
Findings
Expected density close to 2/(w+1) for large w
New algorithms for computing density in O(kσ^{k+w}) and O(w log w) time
Density is slightly less than 2/(w+1) unless w is small
Abstract
Minimizer schemes, or just minimizers, are a very important computational primitive in sampling and sketching biological strings. Assuming a fixed alphabet of size , a minimizer is defined by two integers and a total order on strings of length (also called -mers). A string is processed by a sliding window algorithm that chooses, in each window of length , its minimal -mer with respect to . A key characteristic of the minimizer is the expected density of chosen -mers among all -mers in a random infinite -ary string. Random minimizers, in which the order is chosen uniformly at random, are often used in applications. However, little is known about their expected density besides the fact that it is close to unless . We first show that can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models
