Faster Exact Search using Document Clustering
Jonathan Dimond, Peter Sanders

TL;DR
This paper introduces a clustering-based method to significantly speed up full-text search using inverted indices, achieving up to four times faster results without losing accuracy.
Contribution
It presents a novel multilevel clustering algorithm that optimizes for query cost, enhancing search speed and enabling efficient data compression and distributed processing.
Findings
Up to fourfold speed improvement over traditional search methods.
Clusters facilitate data compression and distributed search.
The method maintains search result accuracy.
Abstract
We show how full-text search based on inverted indices can be accelerated by clustering the documents without losing results (SeCluD -- SEarch with CLUstered Documents). We develop a fast multilevel clustering algorithm that explicitly uses query cost for conjunctive queries as an objective function. Depending on the inputs we get up to four times faster than non-clustered search. The resulting clusters are also useful for data compression and for distributing the work over many machines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Web Data Mining and Analysis · Advanced Image and Video Retrieval Techniques
