Space-Efficient k-Mismatch Text Indexes

Tomasz Kociumaka; Jakub Radoszewski

arXiv:2510.26264·cs.DS·October 31, 2025

Space-Efficient k-Mismatch Text Indexes

Tomasz Kociumaka, Jakub Radoszewski

PDF

TL;DR

This paper introduces a space-efficient $k$-mismatch text index that improves upon previous methods by reducing space complexity while maintaining query efficiency, applicable to general and constant-sized alphabets.

Contribution

It presents the first general case $k$-mismatch index with improved space complexity and develops specialized indexes for short patterns, advancing string indexing techniques.

Findings

01

Achieved $O(n ext{log}^{k-1} n)$ space for $k$-mismatch index with optimal query time.

02

Obtained smaller index size $O(n ext{log}^{k-1.5+ ext{ε}} n)$ for constant alphabets.

03

Developed improved indexes for short pattern matching.

Abstract

A central task in string processing is text indexing, where the goal is to preprocess a text (a string of length $n$ ) into an efficient index (a data structure) supporting queries about the text. Cole, Gottlieb, and Lewenstein (STOC 2004) proposed $k$ -errata trees, a family of text indexes supporting approximate pattern matching queries of several types. In particular, $k$ -errata trees yield an elegant solution to $k$ -mismatch queries, where we are to report all substrings of the text with Hamming distance at most $k$ to the query pattern. The resulting $k$ -mismatch index uses $O (n lo g^{k} n)$ space and answers a query for a length- $m$ pattern in $O (lo g^{k} n lo g lo g n + m + occ)$ time, where $occ$ is the number of approximate occurrences. In retrospect, $k$ -errata trees appear very well optimized: even though a large body of work has adapted $k$ -errata trees to various settings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.