KATKA: A KRAKEN-like tool with $k$ given at query time

Travis Gagie; Sana Kashgouli; Ben Langmead

arXiv:2206.06053·cs.DS·August 23, 2022

KATKA: A KRAKEN-like tool with $k$ given at query time

Travis Gagie, Sana Kashgouli, Ben Langmead

PDF

Open Access

TL;DR

KATKA is a new tool that efficiently identifies the smallest subtree containing genomes with specific k-mers, allowing k to be specified at query time, unlike prior tools like KRAKEN.

Contribution

It introduces a data structure enabling flexible k-mer queries in phylogenetic trees, with k specified at query time, enhancing KRAKEN's capabilities.

Findings

01

Supports rapid subtree identification for given k-mers

02

Allows k to be specified dynamically at query time

03

Improves flexibility over existing tools like KRAKEN

Abstract

We describe a new tool, KATKA, that stores a phylogenetic tree $T$ such that later, given a pattern $P [1.. m]$ and an integer $k$ , it can quickly return the root of the smallest subtree of $T$ containing all the genomes in which the $k$ -mer $P [i .. i + k - 1]$ occurs, for $1 \leq i \leq m - k + 1$ . This is similar to KRAKEN's functionality but with $k$ given at query time instead of at construction time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression · Biochemical and Structural Characterization