Improved Fast Similarity Search in Dictionaries
Daniel Karch, Dennis Luxen, Peter Sanders

TL;DR
This paper introduces an optimized algorithm and data structures for fast approximate dictionary matching, enabling near-instantaneous retrieval of similar words within large datasets.
Contribution
The authors develop a novel, memory-efficient indexing method that significantly accelerates approximate string matching in large dictionaries.
Findings
Supports fault-tolerant queries with high speed
Reduces memory consumption and preprocessing time
Achieves microsecond query times on large datasets
Abstract
We engineer an algorithm to solve the approximate dictionary matching problem. Given a list of words , maximum distance fixed at preprocessing time and a query word , we would like to retrieve all words from that can be transformed into with or less edit operations. We present data structures that support fault tolerant queries by generating an index. On top of that, we present a generalization of the method that eases memory consumption and preprocessing time significantly. At the same time, running times of queries are virtually unaffected. We are able to match in lists of hundreds of thousands of words and beyond within microseconds for reasonable distances.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Algorithms and Data Compression · Video Analysis and Summarization
