Faster subsequence recognition in compressed strings

Alexander Tiskin

arXiv:0707.3407·cs.DS·November 10, 2011

Faster subsequence recognition in compressed strings

Alexander Tiskin

PDF

Open Access

TL;DR

This paper presents an improved algorithm for local subsequence recognition in compressed strings, reducing the time complexity from quadratic to near-linear in the pattern length, enabling more efficient processing of massive data sets.

Contribution

The authors develop a faster algorithm for subsequence recognition on SLP-compressed strings, improving the time complexity from O(𝑚̄ n^2 log n) to O(𝑚̄ n^{1.5}), and extend it to compute longest common subsequences.

Findings

01

Algorithm runs in O(𝑚̄ n^{1.5}) time for subsequence recognition.

02

Extension to longest common subsequence computation in similar time.

03

Improves efficiency for processing large compressed data sets.

Abstract

Computation on compressed strings is one of the key approaches to processing massive data sets. We consider local subsequence recognition problems on strings compressed by straight-line programs (SLP), which is closely related to Lempel--Ziv compression. For an SLP-compressed text of length $\overset{m}{ˉ}$ , and an uncompressed pattern of length $n$ , C{\'e}gielski et al. gave an algorithm for local subsequence recognition running in time $O (\overset{m}{ˉ} n^{2} lo g n)$ . We improve the running time to $O (\overset{m}{ˉ} n^{1.5})$ . Our algorithm can also be used to compute the longest common subsequence between a compressed text and an uncompressed pattern in time $O (\overset{m}{ˉ} n^{1.5})$ ; the same problem with a compressed pattern is known to be NP-hard.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · Parallel Computing and Optimization Techniques