Space-Efficient Text Indexing with Mismatches using Function Inversion

Jackson Bibbens; Levi Borevitz; Samuel McCauley

arXiv:2604.01307·cs.DS·April 3, 2026

Space-Efficient Text Indexing with Mismatches using Function Inversion

Jackson Bibbens, Levi Borevitz, Samuel McCauley

PDF

TL;DR

This paper introduces a space-efficient text indexing data structure that supports approximate pattern matching with mismatches, achieving improved query times and space tradeoffs, including the first sublinear-space solutions.

Contribution

It presents the first linear-space data structure with improved query times for approximate pattern matching, and introduces the first sublinear-space succinct data structure.

Findings

01

Achieves $O(n)$ space with query time $ ilde{O}(|q| + ext{polylog}(n))$ for $k eq 2$

02

Provides the first sublinear-space data structure for this problem

03

Improves performance of both the CGL tree and Fiat-Naor data structures

Abstract

A classic data structure problem is to preprocess a string T of length $n$ so that, given a query $q$ , we can quickly find all substrings of T with Hamming distance at most $k$ from the query string. Variants of this problem have seen significant research both in theory and in practice. For a wide parameter range, the best worst-case bounds are achieved by the "CGL tree" (Cole, Gottlieb, Lewenstein 2004), which achieves query time roughly $\tilde{O} (∣ q ∣ + lo g^{k} n + # occ)$ where $# occ$ is the size of the output, and space $O (n lo g^{k} n)$ . The CGL Tree space was recently improved to $O (n lo g^{k - 1} n)$ (Kociumaka, Radoszewski 2026). A natural question is whether a high space bound is necessary. How efficient can we make queries when the data structure is constrained to $O (n)$ space? While this question has seen extensive research, all known results have query time with unfavorable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.