Inverted and mirror repeats in model nucleotide sequences
Fabrizio Lillo, Marco Span\'o

TL;DR
This paper investigates the probabilistic characteristics of inverted and mirror repeats in nucleic acid sequences, analyzing perfect and imperfect repeats across various sequence models, revealing increased repeat occurrences in correlated sequences.
Contribution
It provides a comprehensive analytical and numerical study of repeat properties in different sequence models, highlighting the impact of sequence correlations on repeat frequency.
Findings
Number of repeats is larger in correlated sequences.
Discrepancy increases exponentially with repeat length.
Long range sequences show significantly more repeats.
Abstract
We analytically and numerically study the probabilistic properties of inverted and mirror repeats in model sequences of nucleic acids. We consider both perfect and non-perfect repeats, i.e. repeats with mismatches and gaps. The considered sequence models are independent identically distributed (i.i.d.) sequences, Markov processes and long range sequences. We show that the number of repeats in correlated sequences is significantly larger than in i.i.d. sequences and that this discrepancy increases exponentially with the repeat length for long range sequences.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
