siEDM: an efficient string index and search algorithm for edit distance with moves
Yoshimasa Takabatake, Kenta Nakashima, Tetsuji Kuboyama, Yasuo Tabei,, Hiroshi Sakamoto

TL;DR
siEDM is a novel indexing and search algorithm designed for efficiently handling strings based on the edit distance with moves (EDM), enabling fast approximate searches in large, repetitive text collections.
Contribution
This paper introduces siEDM, the first efficient index and search algorithm specifically designed for EDM, leveraging ESP to enable fast approximate string retrieval.
Findings
siEDM achieves efficient indexing and searching on benchmark datasets.
siEDM provides guaranteed bounds for approximate EDM computation.
Experimental results demonstrate siEDM's superior speed and accuracy.
Abstract
Although several self-indexes for highly repetitive text collections exist, developing an index and search algorithm with editing operations remains a challenge. Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string into another. Although the problem of computing EDM is intractable, it has a wide range of potential applications, especially in approximate string retrieval. Despite the importance of computing EDM, there has been no efficient method for indexing and searching large text collections based on the EDM measure. We propose the first algorithm, named string index for edit distance with moves (siEDM), for indexing and searching strings with EDM. The siEDM algorithm builds an index structure by leveraging the idea behind the edit sensitive parsing (ESP), an efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Natural Language Processing Techniques · Genomics and Phylogenetic Studies
