PM-LSH: a fast and accurate in-memory framework for high-dimensional approximate NN and closest pair search
Bolong Zheng, Xi Zhao, Lianggui Weng, Nguyen Quoc Viet Hung, Hang Liu,, Christian S. Jensen

TL;DR
PM-LSH is a novel in-memory framework that combines a PM-tree index and confidence intervals to efficiently and accurately perform high-dimensional approximate nearest neighbor and closest pair searches, outperforming existing methods.
Contribution
The paper introduces PM-LSH, a new in-memory LSH framework with a PM-tree index and confidence intervals for improved accuracy and efficiency in high-dimensional NN and CP searches.
Findings
Outperforms existing methods in efficiency and accuracy
Effective for large-scale, high-dimensional datasets
Supports both NN and CP search with high result quality
Abstract
Nearest neighbor (NN) search is inherently computationally expensive in high-dimensional spaces due to the curse of dimensionality. As a well-known solution, locality-sensitive hashing (LSH) is able to answer c-approximate NN (c-ANN) queries in sublinear time with constant probability. Existing LSH methods focus mainly on building hash bucket-based indexing such that the candidate points can be retrieved quickly. However, existing coarse-grained structures fail to offer accurate distance estimation for candidate points, which translates into additional computational overhead when having to examine unnecessary points. This in turn reduces the performance of query processing. In contrast, we propose a fast and accurate in-memory LSH framework, called PM-LSH, that aims to compute c-ANN queries on large-scale, high-dimensional datasets. First, we adopt a simple yet effective PM-tree to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Data Management and Algorithms · Video Surveillance and Tracking Methods
