Fast Exact Search in Hamming Space with Multi-Index Hashing
Mohammad Norouzi, Ali Punjani, David J. Fleet

TL;DR
This paper presents a novel multi-index hashing method enabling fast, exact k-nearest neighbor search in Hamming space for long binary codes, achieving significant speedups over linear scans on large datasets.
Contribution
It introduces a new multi-index hashing technique for exact search in Hamming space, effective for long binary codes, with theoretical and empirical validation.
Findings
Sub-linear runtime behavior for uniformly distributed codes
Dramatic speedups over linear scan on datasets up to one billion codes
Effective for binary codes of 64, 128, and 256 bits
Abstract
There is growing interest in representing image data and feature descriptors using compact binary codes for fast near neighbor search. Although binary codes are motivated by their use as direct indices (addresses) into a hash table, codes longer than 32 bits are not being used as such, as it was thought to be ineffective. We introduce a rigorous way to build multiple hash tables on binary code substrings that enables exact k-nearest neighbor search in Hamming space. The approach is storage efficient and straightforward to implement. Theoretical analysis shows that the algorithm exhibits sub-linear run-time behavior for uniformly distributed codes. Empirical results show dramatic speedups over a linear scan baseline for datasets of up to one billion codes of 64, 128, or 256 bits.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Algorithms and Data Compression · Image Retrieval and Classification Techniques
