Does Preprocessing help in Fast Sequence Comparisons?

Elazar Goldenberg; Aviad Rubinstein; Barna Saha

arXiv:2108.09115·cs.DS·August 23, 2021

Does Preprocessing help in Fast Sequence Comparisons?

Elazar Goldenberg, Aviad Rubinstein, Barna Saha

PDF

Open Access

TL;DR

This paper investigates how preprocessing can significantly improve the efficiency of computing exact and approximate edit distances between strings, especially in scenarios involving many comparisons, leading to new faster algorithms.

Contribution

It introduces novel preprocessing-based algorithms for exact and approximate edit distance computation, outperforming previous methods and enabling faster comparisons in large string pools.

Findings

01

Exact permutation-LCS computation with $O(n \,\log n)$ preprocessing

02

Exact edit distance for small $k$ with $O(n \,\log n)$ preprocessing

03

Approximate edit distance within factor $(7+o(1))$ with subquadratic time

Abstract

We study edit distance computation with preprocessing: the preprocessing algorithm acts on each string separately, and then the query algorithm takes as input the two preprocessed strings. This model is inspired by scenarios where we would like to compute edit distance between many pairs in the same pool of strings. Our results include: Permutation-LCS: If the LCS between two permutations has length $n - k$ , we can compute it \textit{ exactly} with $O (n lo g (n))$ preprocessing and $O (k lo g (n))$ query time. Small edit distance: For general strings, if their edit distance is at most $k$ , we can compute it \textit{ exactly} with $O (n lo g (n))$ preprocessing and $O (k^{2} lo g (n))$ query time. Approximate edit distance: For the most general input, we can approximate the edit distance to within factor $(7 + o (1))$ with preprocessing time $\tilde{O} (n^{2})$ and query time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Complexity and Algorithms in Graphs · semigroups and automata theory