Microbial genome as a fluctuating system: Distribution and correlation of coding sequence lengths
V. V. Morariu

TL;DR
This study analyzes microbial genome coding sequence lengths as fluctuating systems using statistical physics, revealing exponential distribution patterns and short-range correlations, contrasting with linguistic Zipf's law.
Contribution
It introduces a novel application of statistical physics methods to characterize microbial genome coding sequence length distributions and correlations.
Findings
Coding sequence lengths do not follow Zipf's law.
Distribution is closer to exponential.
Series exhibit short-range memory properties.
Abstract
The length of coding sequence series in microbial genomes were regarded as a fluctuating system and characterized by the methods of statistical physics. The distribution and the correlatin properties of 50 genomes including bacteria and several archaea were investigated. The distribution was investigated by rank-size analysis (Zipf's law. We found that coding sequence lengths series do not obey Zipf's law contrary to natural languages. The distribution was found to be more closely to an exponential distribution. The correlation appeared to be similar to natural languages. Segmentation analysis of the series showed to be short range memory systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFractal and DNA sequence analysis · Genomics and Phylogenetic Studies · Complex Systems and Time Series Analysis
