Selection of Optimal Parameters in the Fast K-Word Proximity Search Based on Multi-component Key Indexes
Alexander B. Veretennikov

TL;DR
This paper investigates how to optimally select parameters for multi-component key indexes in proximity full-text search, improving query performance and quality, especially for high-frequency words, through experimental analysis and a new index schema.
Contribution
It introduces a new index schema and parameter selection method based on experimental analysis to enhance proximity search efficiency and effectiveness.
Findings
Optimal MaxDistance values significantly improve search speed.
The new index schema outperforms previous models in experiments.
Parameter tuning can reduce query times by up to 130 times.
Abstract
Proximity full-text search is commonly implemented in contemporary full-text search systems. Let us assume that the search query is a list of words. It is natural to consider a document as relevant if the queried words are near each other in the document. The proximity factor is even more significant for the case where the query consists of frequently occurring words. Proximity full-text search requires the storage of information for every occurrence in documents of every word that the user can search. For every occurrence of every word in a document, we employ additional indexes to store information about nearby words, that is, the words that occur in the document at distances from the given word of less than or equal to the MaxDistance parameter. We showed in previous works that these indexes can be used to improve the average query execution time by up to 130 times for queries that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Web Data Mining and Analysis · Algorithms and Data Compression
