Expected Density of Random Minimizers

Shay Golan; Arseny M. Shur

arXiv:2410.16968·math.CO·November 27, 2024·FSE·2 cites

Expected Density of Random Minimizers

Shay Golan, Arseny M. Shur

PDF

Open Access

TL;DR

This paper analyzes the expected density of random minimizers in biological string sampling, providing formulas and algorithms for efficient computation, and exploring their behavior across different parameters.

Contribution

It introduces new formulas and algorithms to compute the expected density of random minimizers efficiently and explores their theoretical properties.

Findings

01

Expected density close to 2/(w+1) for large w

02

New algorithms for computing density in O(kσ^{k+w}) and O(w log w) time

03

Density is slightly less than 2/(w+1) unless w is small

Abstract

Minimizer schemes, or just minimizers, are a very important computational primitive in sampling and sketching biological strings. Assuming a fixed alphabet of size $σ$ , a minimizer is defined by two integers $k, w \geq 2$ and a total order $ρ$ on strings of length $k$ (also called $k$ -mers). A string is processed by a sliding window algorithm that chooses, in each window of length $w + k - 1$ , its minimal $k$ -mer with respect to $ρ$ . A key characteristic of the minimizer is the expected density of chosen $k$ -mers among all $k$ -mers in a random infinite $σ$ -ary string. Random minimizers, in which the order $ρ$ is chosen uniformly at random, are often used in applications. However, little is known about their expected density $DR_{σ} (k, w)$ besides the fact that it is close to $\frac{2}{w + 1}$ unless $w ≫ k$ . We first show that $DR_{σ} (k, w)$ can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models