Almost Linear Size Edit Distance Sketch
Michal Kouck\'y, Michael Saks

TL;DR
This paper introduces an almost linear-size sketching scheme for efficiently approximating edit distance up to a threshold, with optimal dependence on the threshold and improved efficiency over prior schemes.
Contribution
It presents a novel sketching and recovery scheme for edit distance with nearly linear size, improving over previous quadratic-size schemes and achieving optimal dependence on the threshold.
Findings
Sketch size is $k 2^{O(\sqrt{\log(n)\log\log(n)})}$, nearly linear in $k$.
Recovery algorithm outputs edit distance and optimal edits with high probability.
Scheme runs in polynomial time and improves over prior quadratic-size schemes.
Abstract
Edit distance is an important measure of string similarity. It counts the number of insertions, deletions and substitutions one has to make to a string to get a string . In this paper we design an almost linear-size sketching scheme for computing edit distance up to a given threshold . The scheme consists of two algorithms, a sketching algorithm and a recovery algorithm. The sketching algorithm depends on the parameter and takes as input a string and a public random string and computes a sketch , which is a digested version of . The recovery algorithm is given two sketches and as well as the public random string used to create the two sketches, and (with high probability) if the edit distance between and is at most , will output together with an optimal sequence of edit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Video Analysis and Summarization · Advanced Image and Video Retrieval Techniques
