Computing Phylo-k-mers
Nikolai Romashchenko (MAB), Benjamin Linard (MAB), Fabio Pardi (MAB),, Eric Rivals (MAB)

TL;DR
This paper introduces efficient algorithms for computing phylo-k-mers, which are probabilistically predicted k-mers at specific phylogenetic tree nodes, enabling alignment-free phylogenetic classification.
Contribution
The paper presents novel divide-and-conquer algorithms for phylo-k-mer computation, improving efficiency over existing branch-and-bound methods.
Findings
Divide-and-conquer algorithms outperform branch-and-bound in large-scale scenarios.
Algorithms effectively handle large numbers of k-mers with high probability thresholds.
Empirical evaluation demonstrates improved computational performance on real and simulated data.
Abstract
Phylogenetically informed k-mers, or phylo-k-mers for short, are k-mers that are predicted to appear within a given genomic region at predefined locations of a fixed phylogeny. Given a reference alignment for this genomic region and assuming a phylogenetic model of sequence evolution, we can compute a probability score for any given k-mer at any given tree node. The k-mers with sufficiently high probabilities can later be used to perform alignment-free phylogenetic classification of new sequences-a procedure recently proposed for the phylogenetic placement of metabarcoding reads and the detection of novel virus recombinants. While computing phylo-k-mers, we need to consider large numbers of k-mers at each tree node, which warrants the development of efficient enumeration algorithms. We consider a formal definition of the problem of phylo-k-mer computation: How to efficiently find all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Chromosomal and Genetic Variations · Gene expression and cancer classification
