On Practical Accuracy of Edit Distance Approximation Algorithms
Hiroyuki Hanada, Mineichi Kudo, Atsuyoshi Nakamura

TL;DR
This paper evaluates the practical accuracy of six approximation algorithms for edit distance through theoretical refinement and experimental testing on artificial and real data sets.
Contribution
It provides a detailed comparison of six edit distance approximation algorithms, combining refined theoretical analysis with empirical experiments to assess their real-world performance.
Findings
[Batu 2006] is theoretically the best for large strings (n >= 300).
[Andoni 2010] performs best experimentally on most data sets.
Some algorithms with moderate theoretical performance perform well in practice, especially with large alphabet sizes.
Abstract
The edit distance is a basic string similarity measure used in many applications such as text mining, signal processing, bioinformatics, and so on. However, the computational cost can be a problem when we repeat many distance calculations as seen in real-life searching situations. A promising solution to cope with the problem is to approximate the edit distance by another distance with a lower computational cost. There are, indeed, many distances have been proposed for approximating the edit distance. However, their approximation accuracies are evaluated only theoretically: many of them are evaluated only with big-oh (asymptotic) notations, and without experimental analysis. Therefore, it is beneficial to know their actual performance in real applications. In this study we compared existing six approximation distances in two approaches: (i) we refined their theoretical approximation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Network Packet Processing and Optimization · Web Data Mining and Analysis
