On the first k moments of the random count of a pattern in a multi-states sequence generated by a Markov source
Gr\'egory Nuel (MAP5)

TL;DR
This paper provides explicit formulas and efficient algorithms for calculating the first k moments of pattern counts in Markov-generated sequences, improving distribution approximations in genomic data analysis.
Contribution
It introduces a novel explicit formula and algorithms for moments of pattern counts in Markov sequences, enhancing distribution approximation methods.
Findings
Explicit formulas for moments of pattern counts
Efficient algorithms for various Markov models
Improved distribution approximations in genomic sequences
Abstract
In this paper, we develop an explicit formula allowing to compute the first k moments of the random count of a pattern in a multi-states sequence generated by a Markov source. We derive efficient algorithms allowing to deal both with low or high complexity patterns and either homogeneous or heterogenous Markov models. We then apply these results to the distribution of DNA patterns in genomic sequences where we show that moment-based developments (namely: Edgeworth's expansion and Gram-Charlier type B series) allow to improve the reliability of common asymptotic approximations like Gaussian or Poisson approximations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFractal and DNA sequence analysis · Genomics and Chromatin Dynamics · DNA and Biological Computing
