The CDAWG Index and Pattern Matching on Grammar-Compressed Strings
Alan M. Cleary, Joseph Winjum, Jordan Dood, Shunsuke Inenaga

TL;DR
This paper introduces a novel use of the CDAWG index for pattern matching on grammar-compressed strings, achieving efficient search times and minimal space overhead, validated through empirical experiments.
Contribution
It extends the CDAWG index to grammar-compressed strings, enabling efficient pattern matching with theoretical guarantees and practical performance improvements.
Findings
Pattern matching on grammar-compressed strings in O(ra(m)+occ) time.
CDAWG index requires only O(er(T)) space, matching theoretical bounds.
Experiments show state-of-the-art performance even with naive random access.
Abstract
The compact directed acyclic word graph (CDAWG) is the minimal compact automaton that recognizes all the suffixes of a string. Classically the CDAWG has been implemented as an index of the string it recognizes, requiring space for a copy of the string being indexed, where . In this work, we propose using the CDAWG as an index for grammar-compressed strings. While this enables all analyses supported by the CDAWG on any grammar-compressed string, in this work we specifically consider pattern matching. Using the CDAWG index, pattern matching can be performed on any grammar-compressed string in time while requiring only additional space, where is the length of the pattern, is the grammar random access time, is the number of occurrences of the pattern in , and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Speech Recognition and Synthesis · Natural Language Processing Techniques
