Prediction of genomic properties and classification of life by protein length distributions
Dirson Jian Li, Shengli Zhang

TL;DR
This paper demonstrates that protein length distributions encode evolutionary information, enabling the prediction of genome properties and classification of life into three domains based on their structural patterns.
Contribution
It introduces a novel approach linking protein length distributions to genome size, non-coding DNA content, and phylogenetic classification, revealing intrinsic relationships and structural order.
Findings
Genome size and non-coding DNA can be predicted from protein length distributions.
Protein length distributions exhibit correlations and quasi-periodicity.
Life can be classified into three domains based on distribution structures.
Abstract
Much evolutionary information is stored in the fluctuations of protein length distributions. The genome size and non-coding DNA content can be calculated based only on the protein length distributions. So there is intrinsic relationship between the coding DNA size and non-coding DNA size. According to the correlations and quasi-periodicity of protein length distributions, we can classify life into three domains. Strong evidences are found to support the order in the structures of protein length distributions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Machine Learning in Bioinformatics · Evolution and Genetic Dynamics
