On the first k moments of the random count of a pattern in a   multi-states sequence generated by a Markov source

Gr\'egory Nuel (MAP5)

arXiv:0909.4071·math.PR·January 24, 2012

On the first k moments of the random count of a pattern in a multi-states sequence generated by a Markov source

Gr\'egory Nuel (MAP5)

PDF

Open Access

TL;DR

This paper provides explicit formulas and efficient algorithms for calculating the first k moments of pattern counts in Markov-generated sequences, improving distribution approximations in genomic data analysis.

Contribution

It introduces a novel explicit formula and algorithms for moments of pattern counts in Markov sequences, enhancing distribution approximation methods.

Findings

01

Explicit formulas for moments of pattern counts

02

Efficient algorithms for various Markov models

03

Improved distribution approximations in genomic sequences

Abstract

In this paper, we develop an explicit formula allowing to compute the first k moments of the random count of a pattern in a multi-states sequence generated by a Markov source. We derive efficient algorithms allowing to deal both with low or high complexity patterns and either homogeneous or heterogenous Markov models. We then apply these results to the distribution of DNA patterns in genomic sequences where we show that moment-based developments (namely: Edgeworth's expansion and Gram-Charlier type B series) allow to improve the reliability of common asymptotic approximations like Gaussian or Poisson approximations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFractal and DNA sequence analysis · Genomics and Chromatin Dynamics · DNA and Biological Computing