Revisiting Waiting Times in DNA evolution

Pierre Nicodeme

arXiv:1205.6420·cs.DM·May 30, 2012·2 cites

Revisiting Waiting Times in DNA evolution

Pierre Nicodeme

PDF

Open Access

TL;DR

This paper investigates the waiting times for specific DNA motifs to appear under evolution, comparing existing models and introducing a new approach using clump analysis and automata to improve accuracy, especially for overlapping words.

Contribution

It introduces a novel clump analysis and automaton-based method to better estimate waiting times for overlapping DNA motifs, extending previous models.

Findings

01

Automata approach confirms previous models for non-overlapping words.

02

Provides up to 44% correction for highly overlapping words.

03

Proves quasi-linear behavior of motif appearance probability over large ranges.

Abstract

Transcription factors are short stretches of DNA (or $k$ -mers) mainly located in promoters sequences that enhance or repress gene expression. With respect to an initial distribution of letters on the DNA alphabet, Behrens and Vingron consider a random sequence of length $n$ that does not contain a given $k$ -mer or word of size $k$ . Under an evolution model of the DNA, they compute the probability $p_{n}$ that this $k$ -mer appears after a unit time of 20 years. They prove that the waiting time for the first apparition of the $k$ -mer is well approximated by $T_{n} = 1/ p_{n}$ . Their work relies on the simplifying assumption that the $k$ -mer is not self-overlapping. They observe in particular that the waiting time is mostly driven by the initial distribution of letters. Behrens et al. use an approach by automata that relaxes the assumption related to words overlaps. Their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDNA and Biological Computing · Algorithms and Data Compression · semigroups and automata theory