A complexity measure for symbolic sequences and applications to DNA
Ana P. Majtey, Ramon Roman-Roldan, Pedro W. Lamberti

TL;DR
This paper introduces a new complexity measure for symbolic sequences based on entropy of domain lengths, demonstrating its desirable properties and applying it to analyze genetic sequences' complexity profiles.
Contribution
A novel complexity measure for symbolic sequences based on segmentation and entropy, with proven properties and applications to DNA analysis.
Findings
The measure satisfies properties of a good complexity metric.
It is dependent on the analysis level of the sequence.
Applied to genetic sequences, revealing their complexity profiles.
Abstract
We introduce a complexity measure for symbolic sequences. Starting from a segmentation procedure of the sequence, we define its complexity as the entropy of the distribution of lengths of the domains of relatively uniform composition in which the sequence is decomposed. We show that this quantity verifies the properties usually required for a ``good'' complexity measure. In particular it satisfies the one hump property, is super-additive and has the important property of being dependent of the level of detail in which the sequence is analyzed. Finally we apply it to the evaluation of the complexity profile of some genetic sequences.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFractal and DNA sequence analysis · DNA and Biological Computing · RNA and protein synthesis mechanisms
