Faster Exact Search using Document Clustering

Jonathan Dimond; Peter Sanders

arXiv:1411.1220·cs.IR·November 6, 2014

Faster Exact Search using Document Clustering

Jonathan Dimond, Peter Sanders

PDF

Open Access

TL;DR

This paper introduces a clustering-based method to significantly speed up full-text search using inverted indices, achieving up to four times faster results without losing accuracy.

Contribution

It presents a novel multilevel clustering algorithm that optimizes for query cost, enhancing search speed and enabling efficient data compression and distributed processing.

Findings

01

Up to fourfold speed improvement over traditional search methods.

02

Clusters facilitate data compression and distributed search.

03

The method maintains search result accuracy.

Abstract

We show how full-text search based on inverted indices can be accelerated by clustering the documents without losing results (SeCluD -- SEarch with CLUstered Documents). We develop a fast multilevel clustering algorithm that explicitly uses query cost for conjunctive queries as an objective function. Depending on the inputs we get up to four times faster than non-clustered search. The resulting clusters are also useful for data compression and for distributing the work over many machines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Web Data Mining and Analysis · Advanced Image and Video Retrieval Techniques