An efficient algorithm for three-component key index construction

Alexander B. Veretennikov

arXiv:2006.07954·cs.IR·June 28, 2020

An efficient algorithm for three-component key index construction

Alexander B. Veretennikov

PDF

TL;DR

This paper introduces a new algorithm for constructing three-component key indexes that significantly accelerate proximity full-text searches in large text collections, especially for queries with frequent words.

Contribution

The paper presents a novel, correct algorithm for building three-component key indexes tailored for efficient proximity search, with experimental validation based on MaxDistance parameter.

Findings

01

Index construction reduces query time for frequent words by over 94 times

02

Experimental results confirm the algorithm's correctness and efficiency

03

Index performance varies with MaxDistance parameter

Abstract

In this paper, proximity full-text searches in large text arrays are considered. A search query consists of several words. The search result is a list of documents containing these words. In a modern search system, documents that contain search query words that are near each other are more relevant than documents that do not share this trait. To solve this task, for each word in each indexed document, we need to store a record in the index. In this case, the query search time is proportional to the number of occurrences of the queried words in the indexed documents. Consequently, it is common for search systems to evaluate queries that contain frequently occurring words much more slowly than queries that contain less frequently occurring, ordinary words. For each word in the text, we use additional indexes to store information about nearby words at distances from the given word of less…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.