An automaton approach for waiting times in DNA evolution
S. Behrens, C. Nicaud, P. Nicodeme

TL;DR
This paper introduces an automata-based method to accurately compute waiting times for the emergence of specific k-mers, including highly autocorrelated ones, during DNA evolution, improving predictions for transcription factor binding site appearance.
Contribution
It relaxes previous assumptions by accounting for overlapping word occurrences using automata, extending analysis to promoters of any size, and comparing Bernoulli and Markov models.
Findings
Automata approach improves accuracy for autocorrelated k-mers
Highly autocorrelated k-mers have increased waiting times up to 40%
Probability of k-mer appearance at generation 1 scales linearly with promoter length
Abstract
In a recent article, Behrens and Vingron (JCB 17, 12, 2010) compute waiting times for k-mers to appear during DNA evolution under the assumption that the considered k-mers do not occur in the initial DNA sequence, an issue arising when studying the evolution of regulatory DNA sequences with regard to transcription factor (TF) binding site emergence. The mathematical analysis underlying their computation assumes that occurrences of words under interest do not overlap. We relax here this assumption by use of an automata approach. In an alphabet of size 4 like the DNA alphabet, most words have no or a low autocorrelation; therefore, globally, our results confirm those of Behrens and Vingron. The outcome is quite different when considering highly autocorrelated k-mers; in this case, the autocorrelation pushes down the probability of occurrence of these k-mers at generation 1 and,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · Genomics and Chromatin Dynamics · DNA and Biological Computing
