Inverted and mirror repeats in model nucleotide sequences

Fabrizio Lillo; Marco Span\'o

arXiv:0705.2143·q-bio.GN·November 13, 2009

Inverted and mirror repeats in model nucleotide sequences

Fabrizio Lillo, Marco Span\'o

PDF

TL;DR

This paper investigates the probabilistic characteristics of inverted and mirror repeats in nucleic acid sequences, analyzing perfect and imperfect repeats across various sequence models, revealing increased repeat occurrences in correlated sequences.

Contribution

It provides a comprehensive analytical and numerical study of repeat properties in different sequence models, highlighting the impact of sequence correlations on repeat frequency.

Findings

01

Number of repeats is larger in correlated sequences.

02

Discrepancy increases exponentially with repeat length.

03

Long range sequences show significantly more repeats.

Abstract

We analytically and numerically study the probabilistic properties of inverted and mirror repeats in model sequences of nucleic acids. We consider both perfect and non-perfect repeats, i.e. repeats with mismatches and gaps. The considered sequence models are independent identically distributed (i.i.d.) sequences, Markov processes and long range sequences. We show that the number of repeats in correlated sequences is significantly larger than in i.i.d. sequences and that this discrepancy increases exponentially with the repeat length for long range sequences.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.