# The Effect of Points Dispersion on the $k$-nn Search in Random   Projection Forests

**Authors:** Mashaan Alshammari, John Stavrakakis, Adel F. Ahmed, Masahiro, Takatsuka

arXiv: 2302.13160 · 2023-02-28

## TL;DR

This paper investigates how point dispersion along random directions and the number of trees in random projection forests affect $k$-nearest neighbor search performance, concluding that increasing trees diminishes the impact of dispersion.

## Contribution

The study reveals that the dispersion of points has limited influence on $k$-nn search as the number of trees increases, supporting the use of original rpTree algorithms.

## Key findings

- More trees reduce the effect of point dispersion on $k$-nn search.
- Using the original rpTree algorithm with random directions is recommended.
- Point dispersion becomes less critical with larger rpForest collections.

## Abstract

Partitioning trees are efficient data structures for $k$-nearest neighbor search. Machine learning libraries commonly use a special type of partitioning trees called $k$d-trees to perform $k$-nn search. Unfortunately, $k$d-trees can be ineffective in high dimensions because they need more tree levels to decrease the vector quantization (VQ) error. Random projection trees rpTrees solve this scalability problem by using random directions to split the data. A collection of rpTrees is called rpForest. $k$-nn search in an rpForest is influenced by two factors: 1) the dispersion of points along the random direction and 2) the number of rpTrees in the rpForest. In this study, we investigate how these two factors affect the $k$-nn search with varying $k$ values and different datasets. We found that with larger number of trees, the dispersion of points has a very limited effect on the $k$-nn search. One should use the original rpTree algorithm by picking a random direction regardless of the dispersion of points.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.13160/full.md

## Figures

26 figures with captions in the complete paper: https://tomesphere.com/paper/2302.13160/full.md

## References

31 references — full list in the complete paper: https://tomesphere.com/paper/2302.13160/full.md

---
Source: https://tomesphere.com/paper/2302.13160