Unified Compression-Based Acceleration of Edit-Distance Computation
Danny Hermelin, Gad M. Landau, Shir Landau, Oren Weimann

TL;DR
This paper introduces a unified algorithm leveraging straight-line programs to compute edit distance efficiently across various compression schemes, significantly improving performance for highly compressible strings while maintaining quadratic bounds in worst cases.
Contribution
It presents a novel, compression-agnostic algorithm for edit distance based on straight-line programs, applicable to multiple compression schemes with improved efficiency for compressible data.
Findings
Achieves O(nN log(N/n)) time for strings with SLP representations.
Provides O(n^{2/3}N^{4/3}) time algorithm for arbitrary scoring functions.
Maintains quadratic time complexity in the worst-case scenario.
Abstract
The edit distance problem is a classical fundamental problem in computer science in general, and in combinatorial pattern matching in particular. The standard dynamic programming solution for this problem computes the edit-distance between a pair of strings of total length O(N) in O(N^2) time. To this date, this quadratic upper-bound has never been substantially improved for general strings. However, there are known techniques for breaking this bound in case the strings are known to compress well under a particular compression scheme. The basic idea is to first compress the strings, and then to compute the edit distance between the compressed strings. As it turns out, practically all known o(N^2) edit-distance algorithms work, in some sense, under the same paradigm described above. It is therefore natural to ask whether there is a single edit-distance algorithm that works for strings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · Complexity and Algorithms in Graphs
