Computable Bounds and Monte Carlo Estimates of the Expected Edit Distance
Gianfranco Bilardi, Michele Schimd

TL;DR
This paper develops methods to compute and estimate the expected edit distance between random strings, providing bounds, algorithms, and statistical techniques to evaluate it efficiently for large string lengths and alphabet sizes.
Contribution
It introduces new bounds, algorithms, and statistical estimation methods for the expected edit distance, improving accuracy and efficiency over previous approaches.
Findings
Bounds on the limit of normalized expected edit distance are established.
A computationally intensive algorithm for exact values is presented.
Statistical estimates with high confidence are feasible for large string lengths.
Abstract
The edit distance is a metric of dissimilarity between strings, widely applied in computational biology, speech recognition, and machine learning. Let denote the average edit distance between random, independent strings of characters from an alphabet of size . For , it is an open problem how to efficiently compute the exact value of as well as of , a limit known to exist. This paper shows that , for a specific , a result which implies that is computable. The exact computation of is explored, leading to an algorithm running in time , a complexity that makes it of limited practical use. An analysis of statistical estimates is proposed, based on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Machine Learning and Algorithms · semigroups and automata theory
