An Improved Algorithm for Fast K-Word Proximity Search Based on Multi-Component Key Indexes
Alexander B. Veretennikov

TL;DR
This paper introduces an improved algorithm for fast proximity search of multiple words in documents, utilizing multi-component key indexes to significantly reduce query times, especially for high-frequency words.
Contribution
The paper presents a new search algorithm that overcomes limitations of previous methods, achieving greater performance gains in proximity full-text search using multi-component key indexes.
Findings
Up to 130 times faster query execution for high-frequency words
Enhanced algorithm over previous methods with better performance
Effective use of multi-component key indexes for proximity search
Abstract
A search query consists of several words. In a proximity full-text search, we want to find documents that contain these words near each other. This task requires much time when the query consists of high-frequently occurring words. If we cannot avoid this task by excluding high-frequently occurring words from consideration by declaring them as stop words, then we can optimize our solution by introducing additional indexes for faster execution. In a previous work, we discussed how to decrease the search time with multi-component key indexes. We had shown that additional indexes can be used to improve the average query execution time up to 130 times if queries consisted of high-frequently occurring words. In this paper, we present another search algorithm that overcomes some limitations of our previous algorithm and provides even more performance gain. This is a pre-print of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
