A Searchable Compressed Edit-Sensitive Parsing
Naoya Kishiue, Masaya Nakahara, Shirou Maruyama, Hiroshi Sakamoto

TL;DR
This paper introduces a practical, compressed data structure for edit-sensitive parsing that efficiently indexes strings and supports fast substring occurrence counting, with demonstrated experimental performance.
Contribution
It proposes a novel, succinct representation of ESP trees using bit strings and arrays, enabling efficient indexing and substring search in compressed space.
Findings
Uses (1+ε)n log n + 4n + o(n) bits for ESP representation.
Supports substring occurrence counting in O(1/ε)(m log n + occ_c log m log u) time.
Experimental results show competitive performance on benchmark datasets.
Abstract
Practical data structures for the edit-sensitive parsing (ESP) are proposed. Given a string S, its ESP tree is equivalent to a context-free grammar G generating just S, which is represented by a DAG. Using the succinct data structures for trees and permutations, G is decomposed to two LOUDS bit strings and single array in (1+\epsilon)n\log n+4n+o(n) bits for any 0<\epsilon <1 and the number n of variables in G. The time to count occurrences of P in S is in O(\frac{1}{\epsilon}(m\log n+occ_c(\log m\log u)), whereas m = |P|, u = |S|, and occ_c is the number of occurrences of a maximal common subtree in ESPs of P and S. The efficiency of the proposed index is evaluated by the experiments conducted on several benchmarks complying with the other compressed indexes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Advanced Image and Video Retrieval Techniques
