Pattern Matching on Grammar-Compressed Strings in Linear Time

Moses Ganardi; Pawe{\l} Gawrychowski

arXiv:2111.05016·cs.DS·November 10, 2021

Pattern Matching on Grammar-Compressed Strings in Linear Time

Moses Ganardi, Pawe{\l} Gawrychowski

PDF

Open Access

TL;DR

This paper introduces a linear-time algorithm for pattern matching directly on grammar-compressed strings, significantly improving efficiency for highly repetitive data such as biological sequences.

Contribution

The authors develop the first linear-time algorithm for pattern matching on grammar-compressed strings, solving an open problem in compressed pattern matching.

Findings

01

Achieved $O(n+m)$ pattern matching time on grammar-compressed strings.

02

Improved solutions for weighted ancestor and substring concatenation problems.

03

Demonstrated efficiency on highly repetitive data like biological sequences.

Abstract

The most fundamental problem considered in algorithms for text processing is pattern matching: given a pattern $p$ of length $m$ and a text $t$ of length $n$ , does $p$ occur in $t$ ? Multiple versions of this basic question have been considered, and by now we know algorithms that are fast both in practice and in theory. However, the rapid increase in the amount of generated and stored data brings the need of designing algorithms that operate directly on compressed representations of data. In the compressed pattern matching problem we are given a compressed representation of the text, with $n$ being the length of the compressed representation and $N$ being the length of the text, and an uncompressed pattern of length $m$ . The most challenging (and yet relevant when working with highly repetitive data, say biological information) scenario is when the chosen compression method is capable of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · DNA and Biological Computing · Network Packet Processing and Optimization