Information Analysis of DNA Sequences
Riyazuddin Mohammed

TL;DR
This study uses a modified entropy measure to analyze DNA sequences, revealing that non-coding introns carry nearly as much information as coding exons, challenging the idea that introns are 'junk DNA.'
Contribution
It introduces a length-normalized entropy approach to compare informational content of exons and introns, providing new insights into non-coding DNA.
Findings
Introns carry nearly as much information as exons.
Normalized entropy analysis disproves the 'junk DNA' hypothesis.
Method may aid in understanding genetic code symmetry models.
Abstract
The problem of differentiating the informational content of coding (exons) and non-coding (introns) regions of a DNA sequence is one of the central problems of genomics. The introns are estimated to be nearly 95% of the DNA and since they do not seem to participate in the process of transcription of amino-acids, they have been termed "junk DNA." Although it is believed that the non-coding regions in genomes have no role in cell growth and evolution, demonstration that these regions carry useful information would tend to falsify this belief. In this paper, we consider entropy as a measure of information by modifying the entropy expression to take into account the varying length of these sequences. Exons are usually much shorter in length than introns; therefore the comparison of the entropy values needs to be normalized. A length correction strategy was employed using randomly generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · Fractal and DNA sequence analysis · DNA and Biological Computing
