Near-Linear Time Edit Distance for Indel Channels
Arun Ganesh, Aaron Sy

TL;DR
This paper presents an efficient algorithm to compute the edit distance between two related strings in near-linear time, providing theoretical support for practical alignment heuristics like BLAST and FASTA.
Contribution
The authors introduce a simple $O(n \, \ln n)$ time algorithm for edit distance under a probabilistic model, with a novel analysis partitioning alignments to justify heuristic methods.
Findings
Algorithm runs in $O(n \ln n)$ time with high probability.
Provides theoretical justification for practical alignment heuristics.
Techniques may apply to average-case analysis of dynamic programming problems.
Abstract
We consider the following model for sampling pairs of strings: is a uniformly random bitstring of length , and is the bitstring arrived at by applying substitutions, insertions, and deletions to each bit of with some probability. We show that the edit distance between and can be computed in time with high probability, as long as each bit of has a mutation applied to it with probability at most a small constant. The algorithm is simple and only uses the textbook dynamic programming algorithm as a primitive, first computing an approximate alignment between the two strings, and then running the dynamic programming algorithm restricted to entries close to the approximate alignment. The analysis of our algorithm provides theoretical justification for alignment heuristics used in practice such as BLAST, FASTA, and MAFFT, which also start by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
