Topological Sequence Analysis of Genomes: Delta Complex approaches
Jian Liu, Li Shen, Dong Chen, and Guo-Wei Wei

TL;DR
This paper introduces topological sequence analysis (TSA) methods using algebraic topology techniques like $\\Delta$-complexes and persistent homology to analyze genome sequences, demonstrating applications in phylogenetics and potential for other sequential data.
Contribution
It presents novel TSA techniques based on $\\Delta$-complexes and persistent Laplacians for genome analysis, improving efficiency over previous models.
Findings
Effective in phylogenetic analysis of Ebola virus and bacterial genomes
More efficient than earlier TSA models and k-mer topology
Potential applications in linguistics, music, and social data analysis
Abstract
Algebraic topology has been widely applied to point cloud data to capture geometric shapes and topological structures. However, its application to genome sequence analysis remains rare. In this work, we propose topological sequence analysis (TSA) techniques by constructing -complexes and classifying spaces, leading to persistent homology, and persistent path homology on genome sequences. We also develop -complex-based persistent Laplacians to facilitate the topological spectral analysis of genome sequences. Finally, we demonstrate the utility of the proposed TSA approaches in phylogenetic analysis using Ebola virus sequences and whole bacterial genomes. The present TSA methods are more efficient than earlier TSA model, k-mer topology, and thus have a potential to be applied to other time-consuming sequential data analyses, such as those in linguistics, literature, music,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
