Geometric approach to string analysis: deviation from linearity and its use for biosequence classification
Boris Brimkov, Valentin E. Brimkov

TL;DR
This paper introduces a geometric method based on string linearity to analyze DNA sequences, effectively distinguishing biological sequences from random ones and revealing evolutionary complexity.
Contribution
It presents a novel geometric criteria for sequence analysis, demonstrating its effectiveness in biosequence classification and evolutionary trend detection.
Findings
Biosequences show higher deviation from linearity than random sequences.
Deviation from linearity increases with organism complexity.
The method effectively differentiates between biological and random sequences.
Abstract
Tools that effectively analyze and compare sequences are of great importance in various areas of applied computational research, especially in the framework of molecular biology. In the present paper, we introduce simple geometric criteria based on the notion of string linearity and use them to compare DNA sequences of various organisms, as well as to distinguish them from random sequences. Our experiments reveal a significant difference between biosequences and random sequences - the former having much higher deviation from linearity than the latter - as well as a general trend of increasing deviation from linearity between primitive and biologically complex organisms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Fractal and DNA sequence analysis · Genomics and Phylogenetic Studies
