Fast approximate furthest neighbors with data-dependent hashing
Ryan R. Curtin, Andrew B. Gardner

TL;DR
This paper introduces DrusillaHash, a data-dependent hashing method for approximate furthest neighbor search that outperforms existing strategies in speed and accuracy, with a variant providing absolute approximation guarantees.
Contribution
The paper proposes a novel data-dependent hashing algorithm for furthest neighbor search, including the first with an absolute approximation guarantee.
Findings
DrusillaHash achieves up to ten times faster performance.
It maintains comparable approximation accuracy to existing methods.
The algorithm is implemented in the mlpack library.
Abstract
We present a novel hashing strategy for approximate furthest neighbor search that selects projection bases using the data distribution. This strategy leads to an algorithm, which we call DrusillaHash, that is able to outperform existing approximate furthest neighbor strategies. Our strategy is motivated by an empirical study of the behavior of the furthest neighbor search problem, which lends intuition for where our algorithm is most useful. We also present a variant of the algorithm that gives an absolute approximation guarantee; to our knowledge, this is the first such approximate furthest neighbor hashing approach to give such a guarantee. Performance studies indicate that DrusillaHash can achieve comparable levels of approximation to other algorithms while giving up to an order of magnitude speedup. An implementation is available in the mlpack machine learning library (found at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Video Analysis and Summarization
