# Range-efficient consistent sampling and locality-sensitive hashing for   polygons

**Authors:** Joachim Gudmundsson, Rasmus Pagh

arXiv: 1701.05290 · 2017-09-25

## TL;DR

This paper introduces a novel, efficient method for consistent sampling and locality-sensitive hashing tailored for polygons represented as point sets, enabling faster similarity search and set operations.

## Contribution

It presents a new range-efficient consistent sampling primitive and demonstrates its application to LSH and set size estimation for polygons.

## Key findings

- Achieves faster LSH computation for polygon point sets.
- Provides a data structure for quick intersection and union size estimation.
- Introduces a geometric transformation approach for consistent sampling.

## Abstract

Locality-sensitive hashing (LSH) is a fundamental technique for similarity search and similarity estimation in high-dimensional spaces. The basic idea is that similar objects should produce hash collisions with probability significantly larger than objects with low similarity. We consider LSH for objects that can be represented as point sets in either one or two dimensions. To make the point sets finite size we consider the subset of points on a grid. Directly applying LSH (e.g. min-wise hashing) to these point sets would require time proportional to the number of points. We seek to achieve time that is much lower than direct approaches.   Technically, we introduce new primitives for range-efficient consistent sampling (of independent interest), and show how to turn such samples into LSH values. Another application of our technique is a data structure for quickly estimating the size of the intersection or union of a set of preprocessed polygons. Curiously, our consistent sampling method uses transformation to a geometric problem.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.05290/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1701.05290/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/1701.05290/full.md

---
Source: https://tomesphere.com/paper/1701.05290