Pattern Matching on Grammar-Compressed Strings in Linear Time
Moses Ganardi, Pawe{\l} Gawrychowski

TL;DR
This paper introduces a linear-time algorithm for pattern matching directly on grammar-compressed strings, significantly improving efficiency for highly repetitive data such as biological sequences.
Contribution
The authors develop the first linear-time algorithm for pattern matching on grammar-compressed strings, solving an open problem in compressed pattern matching.
Findings
Achieved $O(n+m)$ pattern matching time on grammar-compressed strings.
Improved solutions for weighted ancestor and substring concatenation problems.
Demonstrated efficiency on highly repetitive data like biological sequences.
Abstract
The most fundamental problem considered in algorithms for text processing is pattern matching: given a pattern of length and a text of length , does occur in ? Multiple versions of this basic question have been considered, and by now we know algorithms that are fast both in practice and in theory. However, the rapid increase in the amount of generated and stored data brings the need of designing algorithms that operate directly on compressed representations of data. In the compressed pattern matching problem we are given a compressed representation of the text, with being the length of the compressed representation and being the length of the text, and an uncompressed pattern of length . The most challenging (and yet relevant when working with highly repetitive data, say biological information) scenario is when the chosen compression method is capable of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · Network Packet Processing and Optimization
