Binary Jumbled Indexing: Suffix tree histogram
Lu\'is Cunha, M\'ario Medina

TL;DR
This paper investigates the Binary Jumbled Indexing Problem, analyzing average-case complexity, and introduces a suffix tree-based algorithm that improves practical performance despite similar theoretical complexity.
Contribution
It presents a new suffix tree-based algorithm, SFTree, which reduces memory access overhead and improves practical performance for binary jumbled indexing.
Findings
Average number of runs is n/4, confirming quadratic worst-case behavior.
SFTree outperforms existing algorithms in practical scenarios.
Both algorithms have similar theoretical complexity, but SFTree is more efficient in practice.
Abstract
Given a binary string over the alphabet , a vector is a Parikh vector if and only if a factor of contains exactly occurrences of and occurrences of . Answering whether a vector is a Parikh vector of is known as the Binary Jumbled Indexing Problem (BJPMP) or the Histogram Indexing Problem. Most solutions to this problem rely on an word-space index to answer queries in constant time, encoding the Parikh set of , i.e., all its Parikh vectors. Cunha et al. (Combinatorial Pattern Matching, 2017) introduced an algorithm (JBM2017), which computes the index table in time, where is the number of runs of identical digits in , leading to in the worst case. We prove that the average number of runs is , confirming the quadratic behavior also in the average-case. We propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Video Analysis and Summarization · Music and Audio Processing
