Locally consistent decomposition of strings with applications to edit distance sketching
Sudatta Bhattacharya, Michal Kouck\'y

TL;DR
This paper introduces a new string decomposition method that enables efficient sketching of edit distance, facilitating fast approximate comparisons and updates for string similarity measures.
Contribution
The authors develop a locally consistent string decomposition technique that supports efficient edit distance sketching and dynamic updates.
Findings
Decomposition uses grammars of size ~O(k)
Sketch size is ~O(k^2) for edit distance
Supports dynamic updates with rolling sketches
Abstract
In this paper we provide a new locally consistent decomposition of strings. Each string is decomposed into blocks that can be described by grammars of size (using some amount of randomness). If we take two strings and of edit distance at most then their block decomposition uses the same number of grammars and the -th grammar of is the same as the -th grammar of except for at most indexes . The edit distance of and equals to the sum of edit distances of pairs of blocks where and differ. Our decomposition can be used to design a sketch of size for edit distance, and also a rolling sketch for edit distance of size . The rolling sketch allows to update the sketched string by appending a symbol or removing a symbol from the beginning of the string.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · DNA and Biological Computing · semigroups and automata theory
