Space-Efficient k-Mismatch Text Indexes
Tomasz Kociumaka, Jakub Radoszewski

TL;DR
This paper introduces a space-efficient $k$-mismatch text index that improves upon previous methods by reducing space complexity while maintaining query efficiency, applicable to general and constant-sized alphabets.
Contribution
It presents the first general case $k$-mismatch index with improved space complexity and develops specialized indexes for short patterns, advancing string indexing techniques.
Findings
Achieved $O(n ext{log}^{k-1} n)$ space for $k$-mismatch index with optimal query time.
Obtained smaller index size $O(n ext{log}^{k-1.5+ ext{ε}} n)$ for constant alphabets.
Developed improved indexes for short pattern matching.
Abstract
A central task in string processing is text indexing, where the goal is to preprocess a text (a string of length ) into an efficient index (a data structure) supporting queries about the text. Cole, Gottlieb, and Lewenstein (STOC 2004) proposed -errata trees, a family of text indexes supporting approximate pattern matching queries of several types. In particular, -errata trees yield an elegant solution to -mismatch queries, where we are to report all substrings of the text with Hamming distance at most to the query pattern. The resulting -mismatch index uses space and answers a query for a length- pattern in time, where is the number of approximate occurrences. In retrospect, -errata trees appear very well optimized: even though a large body of work has adapted -errata trees to various settings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
