Topological Sequence Analysis of Genomes: Category theory Approaches
Jian Liu, Li Shen, Mushal Zia, and Guo-Wei Wei

TL;DR
This paper introduces a novel topological sequence analysis method based on category theory for genomes, capturing hierarchical structures and extracting multi-scale features for tasks like phylogenetics and binding prediction.
Contribution
It presents a new categorical topological framework for genome analysis that moves beyond traditional alignment-free methods by incorporating structured mathematical formalisms.
Findings
CTSA outperforms six state-of-the-art methods in key tasks.
The approach provides consistent and robust topological signatures.
It demonstrates the potential of category theory in biological sequence analysis.
Abstract
Sequence data, such as DNA, RNA, and protein sequences, exhibit intricate, multi-scale structures that pose significant challenges for conventional analysis methods, particularly those relying on alignment or purely statistical representations. In this work, we introduce category-based topological sequence analysis (CTSA ) of genomes. CTSA models a sequence as a resolution category, capturing its hierarchical structure through a categorical construction. Substructure complexes are then derived from this categorical representation, and their persistent homology is computed to extract multi-scale topological features. Our models depart from traditional alignment-free approaches by incorporating structured mathematical formalisms rooted in sequence topology. The resulting topological signatures provide informative representations across a variety of tasks, including the phylogenetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFractal and DNA sequence analysis · Genomics and Phylogenetic Studies · Genome Rearrangement Algorithms
