KeBaB: $k$-mer based breaking for finding long MEMs
Nathaniel K. Brown, Lore Depuydt, Mohsen Zakeri, Anas Alhadi, Nour Allam, Dove Begleiter, Nithin Bharathi Kabilan Karpagavalli, Suchith Sridhar Khajjayam, Hamza Wahed, Travis Gagie, Ben Langmead

TL;DR
This paper introduces KeBaB, a $k$-mer based filtering method using Bloom filters that enhances the speed of long MEM detection in genomics, improving efficiency in applications like metagenomic classification.
Contribution
The paper presents a novel $k$-mer filtration step with Bloom filters that significantly accelerates MEM-finding tools like ropebwt3 while maintaining accuracy.
Findings
Accelerates MEM detection in genomics applications
Reduces computational resources needed for sequence analysis
Maintains high accuracy in metagenomic classification
Abstract
Long maximal exact matches (MEMs) are used in many genomics applications such as read classification and sequence alignment. Li's ropebwt3 finds long MEMs quickly because it can often ignore much of its input. In this paper we show that a fast and space efficient -mer filtration step using a Bloom filter speeds up MEM-finders such as ropebwt3 even further by letting them ignore even more. We also show experimentally that our approach can accelerate metagenomic classification without significantly hurting accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Genomics and Phylogenetic Studies · Gene expression and cancer classification
