Of Protein Size and Genomes
N.S. Santos-Magalhaes, H.M. de Oliveira

TL;DR
This paper introduces a method to estimate gene counts in genomes based on average protein size, analyzing various genomes and suggesting protein size as a complexity indicator, with specific insights into human gene distribution.
Contribution
It presents an approximate calculation approach for gene numbers considering species-specific protein sizes and analyzes human genome features in detail.
Findings
Average protein size correlates with genome complexity.
Human gene storage requirement is less than 12 MB.
Genome figures support protein size as a complexity criterion.
Abstract
An approach for approximately calculating the number of genes in a genome is presented, which takes into account the average protein length expected for the species. A number of virus, bacterial and eukaryotic genomes are scrutinized. Genome figures are presented, which support the average protein size of a species as a criterion for assessing life complexity. The human gene distribution in the 23 chromosomes is investigated emphasizing the genomic rate, the mean 'exon' length, and the mean 'exons per gene'. It is shown that storing all genes of a single human definitely requires less than 12 MB.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFractal and DNA sequence analysis · Genomics and Phylogenetic Studies · RNA and protein synthesis mechanisms
