Efficient Document Indexing Using Pivot Tree

Gaurav Singh; Benjamin Piwowarski

arXiv:1605.06693·cs.IR·May 24, 2016

Efficient Document Indexing Using Pivot Tree

Gaurav Singh, Benjamin Piwowarski

PDF

Open Access

TL;DR

This paper introduces a pivot tree-based indexing method for fast top-k document search in high-dimensional tf-idf space using cosine similarity, addressing the challenge of non-metric similarity measures.

Contribution

The paper proposes a novel pivot tree indexing technique tailored for cosine similarity, enabling efficient document retrieval in high-dimensional spaces.

Findings

01

The pivot tree method improves search efficiency over existing approaches.

02

The study analyzes the trade-off between precision and efficiency.

03

Comparison shows competitive performance with state-of-the-art methods.

Abstract

We present a novel method for efficiently searching top-k neighbors for documents represented in high dimensional space of terms based on the cosine similarity. Mostly, documents are stored as bag-of-words tf-idf representation. One of the most used ways of computing similarity between a pair of documents is cosine similarity between the vector representations, but cosine similarity is not a metric distance measure as it doesn't follow triangle inequality, therefore most metric searching methods can not be applied directly. We propose an efficient method for indexing documents using a pivot tree that leads to efficient retrieval. We also study the relation between precision and efficiency for the proposed method and compare it with a state of the art in the area of document searching based on inner product.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Image Retrieval and Classification Techniques · Algorithms and Data Compression