PolyMinHash: Efficient Area-Based MinHashing of Polygons for Approximate Nearest Neighbor Search
Alima Subedi, Sankalpa Pokharel, Satish Puri

TL;DR
PolyMinHash introduces a novel 2D polygon hashing method that efficiently approximates polygon similarity for large-scale spatial data, significantly reducing candidate processing in nearest neighbor searches.
Contribution
This paper presents PolyMinHash, a new area-based MinHashing scheme specifically designed for polygons, enabling efficient approximate similarity search in spatial databases.
Findings
Reduces candidate processing by up to 98% compared to brute-force methods.
Preserves area-based Jaccard similarity through a novel polygon hashing technique.
Balances search accuracy and runtime effectively.
Abstract
Similarity searches are a critical task in data mining. As data sets grow larger, exact nearest neighbor searches quickly become unfeasible, leading to the adoption of approximate nearest neighbor (ANN) searches. ANN has been studied for text data, images, and trajectories. However, there has been little effort to develop ANN systems for polygons in spatial database systems and geographic information systems. We present PolyMinHash, a system for approximate polygon similarity search that adapts MinHashing into a novel 2D polygon-hashing scheme to generate short, similarity-preserving signatures of input polygons. Minhash is generated by counting the number of randomly sampled points needed before the sampled point lands within the polygon's interior area, yielding hash values that preserve area-based Jaccard similarity. We present the tradeoff between search accuracy and runtime of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Data Management and Algorithms · Algorithms and Data Compression
