# Near neighbor preserving dimension reduction for doubling subsets of   $\ell_1$

**Authors:** Ioannis Z. Emiris, Vasilis Margonis, Ioannis Psarros

arXiv: 1902.08815 · 2019-09-10

## TL;DR

This paper introduces a new dimension reduction technique for doubling subsets of  that preserves near neighbor relationships, addressing a gap in metric embedding theory and improving high-dimensional data analysis.

## Contribution

It presents the first near neighbor-preserving embedding for doubling subsets of , using covering sets and random projections with Cauchy variables.

## Key findings

- Provides a dimension reduction method with bounded distortion for doubling  subsets.
- Analyzes tradeoffs between different covering set constructions.
- Employs concentration bounds for Cauchy variables that are of independent interest.

## Abstract

Randomized dimensionality reduction has been recognized as one of the fundamental techniques in handling high-dimensional data. Starting with the celebrated Johnson-Lindenstrauss Lemma, such reductions have been studied in depth for the Euclidean $(\ell_2)$ metric, but much less for the Manhattan $(\ell_1)$ metric. Our primary motivation is the approximate nearest neighbor problem in $\ell_1$. We exploit its reduction to the decision-with-witness version, called approximate \textit{near} neighbor, which incurs a roughly logarithmic overhead. In 2007, Indyk and Naor, in the context of approximate nearest neighbors, introduced the notion of nearest neighbor-preserving embeddings. These are randomized embeddings between two metric spaces with guaranteed bounded distortion only for the distances between a query point and a point set. Such embeddings are known to exist for both $\ell_2$ and $\ell_1$ metrics, as well as for doubling subsets of $\ell_2$. The case that remained open were doubling subsets of $\ell_1$. In this paper, we propose a dimension reduction by means of a \textit{near} neighbor-preserving embedding for doubling subsets of $\ell_1$. Our approach is to represent the pointset with a carefully chosen covering set, then randomly project the latter. We study two types of covering sets: $c$-approximate $r$-nets and randomly shifted grids, and we discuss the tradeoff between them in terms of preprocessing time and target dimension. We employ Cauchy variables: certain concentration bounds derived should be of independent interest.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.08815/full.md

## References

16 references — full list in the complete paper: https://tomesphere.com/paper/1902.08815/full.md

---
Source: https://tomesphere.com/paper/1902.08815