Parse indexing for choosing pseudo-MEMs

Travis Gagie

arXiv:2605.17574·cs.DS·May 20, 2026

Parse indexing for choosing pseudo-MEMs

Travis Gagie

PDF

TL;DR

This paper introduces a parse indexing method to select pseudo-MEMs efficiently and safely, eliminating the need to choose the parameter k, thus improving search speed for maximal exact matches in repetitive texts.

Contribution

It presents a novel parse indexing approach that ensures safe pseudo-MEM selection without the need for parameter k, enhancing the KeBaB method.

Findings

01

Parse indexing guarantees safe pseudo-MEM selection.

02

Eliminates the need to choose parameter k.

03

Improves search efficiency for MEMs.

Abstract

Brown et al.\ (2025) recently proposed a pre-processing step, called $k$ -mer based breaking (KeBaB), to speed up searches for long maximal exact matches (MEMs) between patterns and an indexed repetitive text. They fix a parameter $k$ and build a Bloom filter for the distinct $k$ -mers in the text. When given a pattern, they quickly separate the $k$ -mers in it into those that probably occur in the text and those that certainly do not. They call the maximal substrings of the pattern consisting only of the former $k$ -mers {\em pseudo-MEMs}. These pseudo-MEMs are guaranteed to contain all the MEMs of length at least $k$ of the pattern with respect to the text, and it is usually much faster to find the pseudo-MEMs and then find the MEMs in them than to find the MEMs in the pattern directly. KeBaB is particularly effective when we choose a threshold $L > k$ and discard the pseudo-MEMs of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.