Statistical distributions and entropy considerations in gene codes
Krystyna Lukierska-Walasek, Krzysztof Topolski, Krzysztof Trojanowski

TL;DR
This paper explores the statistical and information-theoretic properties of gene codes, highlighting hyperbolic distributions, entropy considerations, and similarities between gene length histograms and language words, supported by genomic data.
Contribution
It introduces a theoretical framework linking hyperbolic distributions to entropy stability in gene codes and correlates these with empirical genomic data.
Findings
Gene length histograms resemble language word distributions
Hyperbolic distributions imply entropy loss and stability in genomes
Empirical data from multiple species support the theory
Abstract
In our paper selected linguistic features of genomes to study the statistics of the gene codes are considered. We present the information theory from which it follows that if the system is described by distributions of hyperbolic type it leads to the possibility of entropy loss and stability. We show that the histograms of gene lengths are similar to that of language words. We show the correspondence between presented theory and results for the number of replicated genes and replicated fragments of genes in genomes for Borelia burgdorferi, Escherichia coli and Saccharomyces cerevisiae S288c.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDiffusion and Search Dynamics · Gene Regulatory Network Analysis · RNA and protein synthesis mechanisms
