Topological Entropy of DNA Sequences
David Koslicki

TL;DR
This paper introduces a new method to approximate topological entropy in DNA sequences, revealing that introns generally have higher entropy than exons, with unique findings in chromosome Y suggesting complex sequence structures.
Contribution
A novel approximation of topological entropy for DNA sequences that overcomes previous computational difficulties and provides new insights into genomic sequence randomness.
Findings
Intron entropy is significantly higher than exon entropy in the human genome.
Chromosome Y exhibits atypically low and bi-modal entropy patterns.
Introns are less random than their expected entropy values suggest.
Abstract
Topological entropy has been one of the most difficult to implement of all the entropy-theoretic notions. This is primarily due to finite sample effects and high-dimensionality problems. In particular, topological entropy has been implemented in previous literature to conclude that entropy of exons is higher than of introns, thus implying that exons are more "random" than introns. We define a new approximation to topological entropy free from the aforementioned difficulties. We compute its expected value and apply this definition to the intron and exon regions of the human genome to observe that as expected, the entropy of introns are significantly higher than that of exons. Though we surprisingly find that introns are less random than expected: their entropy is lower than the computed expected value. We observe the perplexing phenomena that chromosome Y has atypically low and bi-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · Machine Learning in Bioinformatics · Fractal and DNA sequence analysis
