A complexity measure for symbolic sequences and applications to DNA

Ana P. Majtey; Ramon Roman-Roldan; Pedro W. Lamberti

arXiv:physics/0606113·physics.class-ph·May 23, 2007·3 cites

A complexity measure for symbolic sequences and applications to DNA

Ana P. Majtey, Ramon Roman-Roldan, Pedro W. Lamberti

PDF

Open Access

TL;DR

This paper introduces a new complexity measure for symbolic sequences based on entropy of domain lengths, demonstrating its desirable properties and applying it to analyze genetic sequences' complexity profiles.

Contribution

A novel complexity measure for symbolic sequences based on segmentation and entropy, with proven properties and applications to DNA analysis.

Findings

01

The measure satisfies properties of a good complexity metric.

02

It is dependent on the analysis level of the sequence.

03

Applied to genetic sequences, revealing their complexity profiles.

Abstract

We introduce a complexity measure for symbolic sequences. Starting from a segmentation procedure of the sequence, we define its complexity as the entropy of the distribution of lengths of the domains of relatively uniform composition in which the sequence is decomposed. We show that this quantity verifies the properties usually required for a ``good'' complexity measure. In particular it satisfies the one hump property, is super-additive and has the important property of being dependent of the level of detail in which the sequence is analyzed. Finally we apply it to the evaluation of the complexity profile of some genetic sequences.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFractal and DNA sequence analysis · DNA and Biological Computing · RNA and protein synthesis mechanisms