Revisiting Waiting Times in DNA evolution
Pierre Nicodeme

TL;DR
This paper investigates the waiting times for specific DNA motifs to appear under evolution, comparing existing models and introducing a new approach using clump analysis and automata to improve accuracy, especially for overlapping words.
Contribution
It introduces a novel clump analysis and automaton-based method to better estimate waiting times for overlapping DNA motifs, extending previous models.
Findings
Automata approach confirms previous models for non-overlapping words.
Provides up to 44% correction for highly overlapping words.
Proves quasi-linear behavior of motif appearance probability over large ranges.
Abstract
Transcription factors are short stretches of DNA (or -mers) mainly located in promoters sequences that enhance or repress gene expression. With respect to an initial distribution of letters on the DNA alphabet, Behrens and Vingron consider a random sequence of length that does not contain a given -mer or word of size . Under an evolution model of the DNA, they compute the probability that this -mer appears after a unit time of 20 years. They prove that the waiting time for the first apparition of the -mer is well approximated by . Their work relies on the simplifying assumption that the -mer is not self-overlapping. They observe in particular that the waiting time is mostly driven by the initial distribution of letters. Behrens et al. use an approach by automata that relaxes the assumption related to words overlaps. Their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA and Biological Computing · Algorithms and Data Compression · semigroups and automata theory
