How Compression and Approximation Affect Efficiency in String Distance Measures
Arun Ganesh, Tomasz Kociumaka, Andrea Lincoln, Barna Saha

TL;DR
This paper investigates how compression affects the efficiency of string distance computations, showing that approximation algorithms combined with compression can significantly improve runtime for sequence comparison problems.
Contribution
It introduces new approximation algorithms that leverage compression to achieve faster runtimes for median edit distance and related problems, surpassing previous uncompressed bounds.
Findings
Approximation algorithms benefit from compression, reducing runtime.
Lower bounds show no significant improvement beyond certain limits.
New FPTAS algorithms for median edit distance and related measures.
Abstract
Real-world data often comes in compressed form. Analyzing compressed data directly (without decompressing it) can save space and time by orders of magnitude. In this work, we focus on fundamental sequence comparison problems and try to quantify the gain in time complexity when the underlying data is highly compressible. We consider grammar compression, which unifies many practically relevant compression schemes. For two strings of total length and total compressed size , it is known that the edit distance and a longest common subsequence (LCS) can be computed exactly in time , as opposed to for the uncompressed setting. Many applications need to align multiple sequences simultaneously, and the fastest known exact algorithms for median edit distance and LCS of strings run in time. This naturally raises the question of whether compression can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · DNA and Biological Computing
