Information-theoretic View of Sequence Organization in a Genome
Liaofu Luo, Yang Gao, Jun Lu

TL;DR
This paper explores how informational redundancy and k-mer frequency statistics reveal the emergence of order in genomes, identifying critical sequence lengths and evolutionary factors influencing genome organization.
Contribution
It introduces a new theoretical framework linking informational correlation and k-mer statistics to genome organization and evolution, with a sum rule relating complexity and sequence length.
Findings
Order emerges at 200-300k bases in human and E. coli genomes.
The sum rule Q(k,N) correlates with evolutionary complexity.
Functional selection of k-mers leads to genome ordering.
Abstract
Sequence organizations are viewed from two points: one is from informational redundancy or informational correlation (IC) and another is from k-mer frequency statistics. Two problems are investigated. The first is how the ICs exceed the fluctuation bound and the order emerges from fluctuation in a genome when the sequence length attains some critical value. We demonstrated that the transition from fluctuation to order takes place at about sequence length 200-300 thousands bases for human and E coli genome. It means that the life emerges from a region between macroscopic and microscopic. The second is about the statistical law of the k-mer organization in a genome under the evolutionary pressure and functional selection. We deduced a sum rule Q(k,N) on the k-mer frequency deviations from the randomness in a N-long sequence of genome and deduced the relations of Q(k,N) with k and N. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFractal and DNA sequence analysis · Machine Learning in Bioinformatics · RNA and protein synthesis mechanisms
