Contextual Pattern Matching

Gonzalo Navarro

arXiv:2010.07076·cs.DS·October 15, 2020·1 cites

Contextual Pattern Matching

Gonzalo Navarro

PDF

Open Access

TL;DR

This paper introduces a new type of pattern matching query called contextual pattern matching for repetitive string collections, providing efficient solutions that leverage the text's repetitiveness to optimize space and time.

Contribution

It presents the first solution for contextual pattern matching that uses space related to text repetitiveness and achieves optimal query time.

Findings

01

Uses $O(ar{r}\log(n/ar{r}))$ space for the index.

02

Finds all contextual occurrences in $O(|P| + c \log n)$ time.

03

Provides various space/time tradeoffs for compressed and uncompressed indexes.

Abstract

The research on indexing repetitive string collections has focused on the same search problems used for regular string collections, though they can make little sense in this scenario. For example, the basic pattern matching query "list all the positions where pattern $P$ appears" can produce huge outputs when $P$ appears in an area shared by many documents. All those occurrences are essentially the same. In this paper we propose a new query that can be more appropriate in these collections, which we call {\em contextual pattern matching}. The basic query of this type gives, in addition to $P$ , a context length $ℓ$ , and asks to report the occurrences of all {\em distinct} strings $X P Y$ , with $∣ X ∣ = ∣ Y ∣ = ℓ$ . While this query is easily solved in optimal time and linear space, we focus on using space related to the repetitiveness of the text collection and present the first solution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · DNA and Biological Computing