Binary Jumbled Indexing: Suffix tree histogram

Lu\'is Cunha; M\'ario Medina

arXiv:2501.00111·cs.DS·January 3, 2025

Binary Jumbled Indexing: Suffix tree histogram

Lu\'is Cunha, M\'ario Medina

PDF

Open Access

TL;DR

This paper investigates the Binary Jumbled Indexing Problem, analyzing average-case complexity, and introduces a suffix tree-based algorithm that improves practical performance despite similar theoretical complexity.

Contribution

It presents a new suffix tree-based algorithm, SFTree, which reduces memory access overhead and improves practical performance for binary jumbled indexing.

Findings

01

Average number of runs is n/4, confirming quadratic worst-case behavior.

02

SFTree outperforms existing algorithms in practical scenarios.

03

Both algorithms have similar theoretical complexity, but SFTree is more efficient in practice.

Abstract

Given a binary string $ω$ over the alphabet ${0, 1}$ , a vector $(a, b)$ is a Parikh vector if and only if a factor of $ω$ contains exactly $a$ occurrences of $0$ and $b$ occurrences of $1$ . Answering whether a vector is a Parikh vector of $ω$ is known as the Binary Jumbled Indexing Problem (BJPMP) or the Histogram Indexing Problem. Most solutions to this problem rely on an $O (n)$ word-space index to answer queries in constant time, encoding the Parikh set of $ω$ , i.e., all its Parikh vectors. Cunha et al. (Combinatorial Pattern Matching, 2017) introduced an algorithm (JBM2017), which computes the index table in $O (n + ρ^{2})$ time, where $ρ$ is the number of runs of identical digits in $ω$ , leading to $O (n^{2})$ in the worst case. We prove that the average number of runs $ρ$ is $n /4$ , confirming the quadratic behavior also in the average-case. We propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Video Analysis and Summarization · Music and Audio Processing