Improved space-time tradeoffs for approximate full-text indexing with   one edit error

Djamal Belazzougui

arXiv:1103.2167·cs.DS·August 25, 2014

Improved space-time tradeoffs for approximate full-text indexing with one edit error

Djamal Belazzougui

PDF

Open Access

TL;DR

This paper introduces new data structures for approximate full-text indexing with one edit error, achieving improved space and time tradeoffs for both unbounded and constant alphabets.

Contribution

The paper presents two novel indexing methods that improve space and query time efficiency for approximate substring matching with one edit error.

Findings

01

Unbounded alphabet index uses O(n log^ε n) words of space with O(m+occ) query time.

02

Constant alphabet index variants achieve better space and time bounds than previous methods.

03

New results outperform prior work for both unbounded and constant alphabet scenarios.

Abstract

In this paper we are interested in indexing texts for substring matching queries with one edit error. That is, given a text $T$ of $n$ characters over an alphabet of size $σ$ , we are asked to build a data structure that answers the following query: find all the $occ$ substrings of the text that are at edit distance at most $1$ from a given string $q$ of length $m$ . In this paper we show two new results for this problem. The first result, suitable for an unbounded alphabet, uses $O (n lo g^{ϵ} n)$ (where $ϵ$ is any constant such that $0 < ϵ < 1$ ) words of space and answers to queries in time $O (m + occ)$ . This improves simultaneously in space and time over the result of Cole et al. The second result, suitable only for a constant alphabet, relies on compressed text indices and comes in two variants: the first variant uses $O (n lo g^{ϵ} n)$ bits of space and answers…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · DNA and Biological Computing · Network Packet Processing and Optimization