Mapping Equivalence for Symbolic Sequences: Theory and Applications
Liming Wang, Dan Schonfeld

TL;DR
This paper introduces a theoretical framework to analyze when different numerical mappings of symbolic sequences are equivalent, ensuring that results in symbolic signal processing reflect inherent data properties rather than artifacts of the mapping.
Contribution
It develops a novel algebraic and correlation-based framework for assessing the equivalence of symbolic sequence mappings, with theoretical conditions and practical DNA analysis applications.
Findings
Established conditions for mapping consistency
Introduced an algebraic framework for equivalence
Applied to DNA sequence analysis
Abstract
Processing of symbolic sequences represented by mapping of symbolic data into numerical signals is commonly used in various applications. It is a particularly popular approach in genomic and proteomic sequence analysis. Numerous mappings of symbolic sequences have been proposed for various applications. It is unclear however whether the processing of symbolic data provides an artifact of the numerical mapping or is an inherent property of the symbolic data. This issue has been long ignored in the engineering and scientific literature. It is possible that many of the results obtained in symbolic signal processing could be a byproduct of the mapping and might not shed any light on the underlying properties embedded in the data. Moreover, in many applications, conflicting conclusions may arise due to the choice of the mapping used for numerical representation of symbolic data. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
