On Estimating Edit Distance: Alignment, Dimension Reduction, and Embeddings
Moses Charikar, Ofir Geri, Michael P. Kim, William Kuszmaul

TL;DR
This paper explores methods to estimate and produce alignments for edit distance, introduces new metric embeddings using min-hash techniques, and improves dimension reduction maps with better distortion bounds.
Contribution
It demonstrates that estimation algorithms can be used to produce approximate alignments with minimal loss, and introduces novel embeddings and dimension reduction techniques for edit distance.
Findings
Estimation algorithms can produce approximate alignments with modest loss.
New embeddings for Ulam distance match the best known distortion of O(log n).
Improved dimension-reduction map with near-optimal expected distortion.
Abstract
Edit distance is a fundamental measure of distance between strings and has been widely studied in computer science. While the problem of estimating edit distance has been studied extensively, the equally important question of actually producing an alignment (i.e., the sequence of edits) has received far less attention. Somewhat surprisingly, we show that any algorithm to estimate edit distance can be used in a black-box fashion to produce an approximate alignment of strings, with modest loss in approximation factor and small loss in run time. Plugging in the result of Andoni, Krauthgamer, and Onak, we obtain an alignment that is a approximation in time . Closely related to the study of approximation algorithms is the study of metric embeddings for edit distance. We show that min-hash techniques can be useful in designing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
