Detecting phylogenetic relations out from sparse context trees
Florencia Leonardi, Sergio R. Matioli, Hugo A. Armelin, Antonio, Galves

TL;DR
This paper introduces a novel method for measuring sequence similarity based on sparse context trees, enabling phylogenetic analysis through structure-aware distances, demonstrated on protein sequences.
Contribution
It proposes a new distance measure between sparse context trees derived from sequences, and implements a tool for phylogenetic reconstruction using this approach.
Findings
Successfully reconstructed a phylogenetic tree of globin proteins
The method compares favorably with PAM distance in phylogenetic accuracy
Provides a structure-based alternative for sequence similarity measurement
Abstract
The goal of this paper is to study the similarity between sequences using a distance between the \emph{context} trees associated to the sequences. These trees are defined in the framework of \emph{Sparse Probabilistic Suffix Trees} (SPST), and can be estimated using the SPST algorithm. We implement the Phyl-SPST package to compute the distance between the sparse context trees estimated with the SPST algorithm. The distance takes into account the structure of the trees, and indirectly the transition probabilities. We apply this approach to reconstruct a phylogenetic tree of protein sequences in the globin family of vertebrates. We compare this tree with the one obtained using the well-known PAM distance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Machine Learning in Bioinformatics
