Quantifying Local Randomness in Human DNA and RNA Sequences Using Erdos Motifs
Wentian Li, Dimitrios Thanos, Astero Provata

TL;DR
This study investigates the local randomness of human DNA and RNA sequences by analyzing Erdos motifs, revealing distinct patterns of under- and overrepresentation related to nucleotide properties.
Contribution
It introduces a novel application of Erdos motifs to quantify local randomness in biological sequences, linking mathematical concepts with genomic data analysis.
Findings
Purine/pyrimidine Erdos motifs are underrepresented in human DNA.
Strong/weak base pair Erdos motifs are slightly overrepresented.
Erdos motif densities are negatively correlated in DNA sequences.
Abstract
In 1932, Paul Erdos asked whether a random walk constructed from a binary sequence can achieve the lowest possible deviation (lowest discrepancy), for the sequence itself and for all its subsequences formed by homogeneous arithmetic progressions. Although avoiding low discrepancy is impossible for infinite sequences, as recently proven by Terence Tao, attempts were made to construct such sequences with finite lengths. We recognize that such constructed sequences (we call these "Erdos sequences") exhibit certain hallmarks of randomness at the local level: they show roughly equal frequencies of subsequences, and at the same time exclude the trivial periodic patterns. For the human DNA we examine the frequency of a set of Erdos motifs of length-10 using three nucleotides-to-binary mappings. The particular length-10 Erdos sequence is derived by the length-11 Mathias sequence and is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
