Information and order of genomic sequences within chromosomes as identified by complexity theory. An integrated methodology
L. P. Karakatsanis, E. G. Pavlos, G. Tsoulouhas, G. L. Stamokostas, T., L. Mosbruger, J. L. Duke, G. P. Pavlos, and D. S. Monos

TL;DR
This study introduces an integrated methodology combining complexity metrics, machine learning, and non-extensive statistical theory to analyze the size distribution and information order of genomic sequences within human chromosomes, revealing non-random, patterned complexity behaviors.
Contribution
The paper presents a novel integrated approach using complexity theory, Tsallis statistics, and machine learning to uncover the non-random, patterned complexity of genomic sequence distributions within chromosomes.
Findings
Intron regions exhibit higher complexity and longer-range correlations than exons.
Genomic size distributions follow specific, non-random patterns with characteristic complexity features.
DNA sequences show multifractal characteristics and long-range correlations, indicating complex dynamics.
Abstract
Complexity metrics and machine learning (ML) models have been utilized to analyze the lengths of segmental genomic entities like: exons, introns, intergenic and repeat/unique DNA sequences, in each of the 22 human chromosomes. The purpose of the study was to assess information and order that may be concealed within the size distribution of these sequences. For this purpose, we developed an innovative integrated methodology. Our analysis is based upon the reconstructed phase space theorem, the non-extensive statistical theory of Tsallis, ML techniques and a new technical index, integrating the generated information, which we introduce and named it Complexity Factor (COFA). The low-dimensional deterministic nonlinear chaotic and non-extensive statistical character of the DNA sequences was verified with strong multifractal characteristics and long-range correlations with significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFractal and DNA sequence analysis · Machine Learning in Bioinformatics · Complex Systems and Time Series Analysis
