An Improved Algorithm for Fast K-Word Proximity Search Based on   Multi-Component Key Indexes

Alexander B. Veretennikov

arXiv:2009.02684·cs.IR·September 8, 2020

An Improved Algorithm for Fast K-Word Proximity Search Based on Multi-Component Key Indexes

Alexander B. Veretennikov

PDF

TL;DR

This paper introduces an improved algorithm for fast proximity search of multiple words in documents, utilizing multi-component key indexes to significantly reduce query times, especially for high-frequency words.

Contribution

The paper presents a new search algorithm that overcomes limitations of previous methods, achieving greater performance gains in proximity full-text search using multi-component key indexes.

Findings

01

Up to 130 times faster query execution for high-frequency words

02

Enhanced algorithm over previous methods with better performance

03

Effective use of multi-component key indexes for proximity search

Abstract

A search query consists of several words. In a proximity full-text search, we want to find documents that contain these words near each other. This task requires much time when the query consists of high-frequently occurring words. If we cannot avoid this task by excluding high-frequently occurring words from consideration by declaring them as stop words, then we can optimize our solution by introducing additional indexes for faster execution. In a previous work, we discussed how to decrease the search time with multi-component key indexes. We had shown that additional indexes can be used to improve the average query execution time up to 130 times if queries consisted of high-frequently occurring words. In this paper, we present another search algorithm that overcomes some limitations of our previous algorithm and provides even more performance gain. This is a pre-print of a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.