Universality and Shannon entropy of codon usage
L. Frappat, A. Sciarrino, P. Sorba

TL;DR
This paper analyzes codon usage distributions across various species, finding they are best modeled by a combination of constant, exponential, and linear functions, and introduces a quantum-inspired model to explain these patterns.
Contribution
It proposes a novel quantum-mechanics-inspired model to describe codon usage distributions and links these patterns to the GC content of species.
Findings
Codon usage probabilities do not follow Zipf's law.
Distribution functions are best fitted by a sum of constant, exponential, and linear terms.
Shannon entropy varies with GC content and is computed for different species.
Abstract
The distribution functions of the codon usage probabilities, computed over all the available GenBank data, for 40 eukaryotic biological species and 5 chloroplasts, do not follow a Zipf law, but are best fitted by the sum of a constant, an exponential and a linear function in the rank of usage. For mitochondriae the analysis is not conclusive. A quantum-mechanics-inspired model is proposed to describe the observed behaviour. These functions are characterized by parameters that strongly depend on the total GC content of the coding regions of biological species. It is predicted that the codon usage is the same in all exonic genes with the same GC content. The Shannon entropy for codons, also strongly depending on the exonic GC content, is computed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
