Fast Cosine Similarity Search in Binary Space with Angular Multi-index   Hashing

Sepehr Eghbali; Ladan Tahvildari

arXiv:1610.00574·cs.DB·April 19, 2018

Fast Cosine Similarity Search in Binary Space with Angular Multi-index Hashing

Sepehr Eghbali, Ladan Tahvildari

PDF

1 Repo

TL;DR

This paper introduces an efficient angular multi-index hashing algorithm for fast cosine similarity search in large binary datasets, significantly outperforming linear scan and approximation methods.

Contribution

It proposes a novel multi-index hashing approach that enables exact cosine similarity search in binary space with sublinear query time.

Findings

01

Achieves orders of magnitude faster search than linear scan.

02

Provides exact nearest neighbor results in binary space.

03

Outperforms existing approximation methods in speed and accuracy.

Abstract

Given a large dataset of binary codes and a binary query point, we address how to efficiently find $K$ codes in the dataset that yield the largest cosine similarities to the query. The straightforward answer to this problem is to compare the query with all items in the dataset, but this is practical only for small datasets. One potential solution to enhance the search time and achieve sublinear cost is to use a hash table populated with binary codes of the dataset and then look up the nearby buckets to the query to retrieve the nearest neighbors. However, if codes are compared in terms of cosine similarity rather than the Hamming distance, then the main issue is that the order of buckets to probe is not evident. To examine this issue, we first elaborate on the connection between the Hamming distance and the cosine similarity. Doing this allows us to systematically find the probing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sepehr3pehr/AMIH
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.