Top-k String Auto-Completion with Synonyms
Pengfei Xu, Jiaheng Lu

TL;DR
This paper introduces trie-based algorithms for top-k auto-completion that incorporate synonyms and abbreviations, enabling efficient retrieval of suggestions with minimal space overhead and microsecond response times.
Contribution
It presents three novel trie-based algorithms for synonym-aware auto-completion, balancing space and time efficiency, and demonstrates their effectiveness on large-scale datasets.
Findings
Supports retrieval of a million strings with thousands of synonyms in about a microsecond per completion.
Uses small space overhead of 160-200 bytes per string.
Achieves effective and efficient synonym-based auto-completion in large datasets.
Abstract
Auto-completion is one of the most prominent features of modern information systems. The existing solutions of auto-completion provide the suggestions based on the beginning of the currently input character sequence (i.e. prefix). However, in many real applications, one entity often has synonyms or abbreviations. For example, "DBMS" is an abbreviation of "Database Management Systems". In this paper, we study a novel type of auto-completion by using synonyms and abbreviations. We propose three trie-based algorithms to solve the top-k auto-completion with synonyms; each one with different space and time complexity trade-offs. Experiments on large-scale datasets show that it is possible to support effective and efficient synonym-based retrieval of completions of a million strings with thousands of synonyms rules at about a microsecond per-completion, while taking small space overhead (i.e.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Database Systems and Queries · Data Mining Algorithms and Applications
