Learning to Hash Robustly, Guaranteed
Alexandr Andoni, Daniel Beaglehole

TL;DR
This paper introduces a robust nearest neighbor search algorithm in Hamming space that guarantees worst-case performance while adapting to dataset structure, outperforming existing methods both theoretically and practically.
Contribution
It presents a novel NNS algorithm with worst-case guarantees that is instance-optimal and adapts to dataset structure, bridging the gap between theoretical guarantees and practical performance.
Findings
Achieves near-theoretical worst-case guarantees.
Demonstrates 1.8x and 2.1x better recall on MNIST and ImageNet.
Outperforms standard algorithms on structured datasets.
Abstract
The indexing algorithms for the high-dimensional nearest neighbor search (NNS) with the best worst-case guarantees are based on the randomized Locality Sensitive Hashing (LSH), and its derivatives. In practice, many heuristic approaches exist to "learn" the best indexing method in order to speed-up NNS, crucially adapting to the structure of the given dataset. Oftentimes, these heuristics outperform the LSH-based algorithms on real datasets, but, almost always, come at the cost of losing the guarantees of either correctness or robust performance on adversarial queries, or apply to datasets with an assumed extra structure/model. In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms, while optimizing the hashing to the structure of the dataset (think instance-optimal algorithms) for performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Video Surveillance and Tracking Methods
