Binary Jumbled String Matching for Highly Run-Length Compressible Texts

Golnaz Badkobeh; Gabriele Fici; Steve Kroon; Zsuzsanna Lipt\'ak

arXiv:1206.2523·cs.DS·June 3, 2013

Binary Jumbled String Matching for Highly Run-Length Compressible Texts

Golnaz Badkobeh, Gabriele Fici, Steve Kroon, Zsuzsanna Lipt\'ak

PDF

TL;DR

This paper introduces a run-length encoding-based index for binary jumbled string matching that improves construction time and query efficiency for texts with high run-length compressibility, simplifying implementation.

Contribution

It proposes a novel index construction method leveraging run-length encoding, reducing complexity and potentially decreasing index size for highly compressible texts.

Findings

01

Index construction time is $O(n+ ho^2\log ho)$.

02

Query time is $O(\log ho)$.

03

Index size often close to run-length encoding length.

Abstract

The Binary Jumbled String Matching problem is defined as: Given a string $s$ over ${a, b}$ of length $n$ and a query $(x, y)$ , with $x, y$ non-negative integers, decide whether $s$ has a substring $t$ with exactly $x$ $a$ 's and $y$ $b$ 's. Previous solutions created an index of size O(n) in a pre-processing step, which was then used to answer queries in constant time. The fastest algorithms for construction of this index have running time $O (n^{2} / lo g n)$ [Burcsi et al., FUN 2010; Moosa and Rahman, IPL 2010], or $O (n^{2} / lo g^{2} n)$ in the word-RAM model [Moosa and Rahman, JDA 2012]. We propose an index constructed directly from the run-length encoding of $s$ . The construction time of our index is $O (n + ρ^{2} lo g ρ)$ , where O(n) is the time for computing the run-length encoding of $s$ and $ρ$ is the length of this encoding---this is no worse than previous solutions if $\rho = O(n/\log…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.