Linear Dimension Reduction Approximately Preserving a Function of the   1-Norm

Michael P. Casey

arXiv:1906.03536·math.PR·November 9, 2020

Linear Dimension Reduction Approximately Preserving a Function of the 1-Norm

Michael P. Casey

PDF

TL;DR

This paper introduces a novel random linear embedding method for finite point sets in high-dimensional 1-norm space, preserving a transformed distance function with high probability using Cauchy matrices.

Contribution

It presents a new dimension reduction technique that preserves a concave increasing function of original distances, requiring only quadratic logarithmic target dimension.

Findings

01

Embedding dimension is quadratic in log of point set size.

02

Uses Cauchy random matrices for embeddings.

03

Distance preservation holds with high probability.

Abstract

For any finite point set in $D$ -dimensional space equipped with the 1-norm, we present random linear embeddings to $k$ -dimensional space, with a new metric, having the following properties. For any pair of points from the point set that are not too close, the distance between their images is a strictly concave increasing function of their original distance, up to multiplicative error. The target dimension $k$ need only be quadratic in the logarithm of the size of the point set to ensure the result holds with high probability. The linear embeddings are random matrices composed of standard Cauchy random variables, and the proofs rely on Chernoff bounds for sums of iid random variables. The new metric is translation invariant, but is not induced by a norm.

Equations482

(1 - ϵ) ∥ x - y ∥_{2} \leq ∥ F (x) - F (y) ∥_{2} \leq (1 + ϵ) ∥ x - y ∥_{2}

(1 - ϵ) ∥ x - y ∥_{2} \leq ∥ F (x) - F (y) ∥_{2} \leq (1 + ϵ) ∥ x - y ∥_{2}

∥ x - y ∥_{1} \leq ∥ F (x) - F (y) ∥_{1} \leq c ∥ x - y ∥_{1} .

∥ x - y ∥_{1} \leq ∥ F (x) - F (y) ∥_{1} \leq c ∥ x - y ∥_{1} .

(1 - ϵ) μ (∥ x - y ∥_{1}) \leq ρ (F (x), F (y)) \leq (1 + ϵ) μ (∥ x - y ∥_{1})

(1 - ϵ) μ (∥ x - y ∥_{1}) \leq ρ (F (x), F (y)) \leq (1 + ϵ) μ (∥ x - y ∥_{1})

μ (∥ x - y ∥_{1}) \leq μ (∥ x - z ∥_{1}) + μ (∥ z - y ∥_{1}) for any x, y, z \in R^{D},

μ (∥ x - y ∥_{1}) \leq μ (∥ x - z ∥_{1}) + μ (∥ z - y ∥_{1}) for any x, y, z \in R^{D},

ρ (x, y) := \frac{1}{k} i = 1 \sum k ξ (∣ x_{i} - y_{i} ∣)

ρ (x, y) := \frac{1}{k} i = 1 \sum k ξ (∣ x_{i} - y_{i} ∣)

ξ (λ) := ln (1 + λ) + \frac{1}{2} ln (1 + λ) and μ (λ) := E ξ (λ F_{11})

ξ (λ) := ln (1 + λ) + \frac{1}{2} ln (1 + λ) and μ (λ) := E ξ (λ F_{11})

μ (\frac{∥ x - y ∥ _{1}}{1 + ϵ}) \leq ρ (F (x), F (y)) \leq μ ((1 + ϵ) ∥ x - y ∥_{1})

μ (\frac{∥ x - y ∥ _{1}}{1 + ϵ}) \leq ρ (F (x), F (y)) \leq μ ((1 + ϵ) ∥ x - y ∥_{1})

k = \frac{C}{ϵ ^{2} ( 1 - ϵ ) ^{2}} ln N .

k = \frac{C}{ϵ ^{2} ( 1 - ϵ ) ^{2}} ln N .

\mu\left(\frac{\left\lVert x-y\right\rVert_{p}}{1+\epsilon}\right)\leq\rho(F(x),F(y))\leq\mu\big{(}(1+\epsilon)\left\lVert x-y\right\rVert_{p}\big{)}

\mu\left(\frac{\left\lVert x-y\right\rVert_{p}}{1+\epsilon}\right)\leq\rho(F(x),F(y))\leq\mu\big{(}(1+\epsilon)\left\lVert x-y\right\rVert_{p}\big{)}

ρ (F (v), 0) = \frac{1}{k} j = 1 \sum k ξ (∥ v ∥_{1} X_{j}),

ρ (F (v), 0) = \frac{1}{k} j = 1 \sum k ξ (∥ v ∥_{1} X_{j}),

E ρ (F (v), 0) = E ξ (∥ v ∥_{1} X) =: μ (∥ v ∥_{1}) for X \sim Cauchy (1) .

E ρ (F (v), 0) = E ξ (∥ v ∥_{1} X) =: μ (∥ v ∥_{1}) for X \sim Cauchy (1) .

μ (λ) \approx \frac{1}{k} j = 1 \sum k ξ (λ X_{j})

μ (λ) \approx \frac{1}{k} j = 1 \sum k ξ (λ X_{j})

\displaystyle\mathbb{P}\left\{\frac{1}{k}\sum_{j=1}^{k}\xi(\lambda\left\lvert X_{j}\right\rvert)-\mu(\lambda)>t\right\}\leq\left(e^{-st}\mathbb{E}e^{s\big{(}\xi(\lambda\left\lvert X\right\rvert)-\mu(\lambda)\big{)}}\right)^{k}

\displaystyle\mathbb{P}\left\{\frac{1}{k}\sum_{j=1}^{k}\xi(\lambda\left\lvert X_{j}\right\rvert)-\mu(\lambda)>t\right\}\leq\left(e^{-st}\mathbb{E}e^{s\big{(}\xi(\lambda\left\lvert X\right\rvert)-\mu(\lambda)\big{)}}\right)^{k}

E ∣ X ∣^{b} < \infty only for ∣ b ∣ < 1,

E ∣ X ∣^{b} < \infty only for ∣ b ∣ < 1,

ξ (λ) = ln (1 + λ) + \frac{1}{2} ln (1 + λ) \leq 2 ln (1 + λ),

ξ (λ) = ln (1 + λ) + \frac{1}{2} ln (1 + λ) \leq 2 ln (1 + λ),

ξ (∣ x_{i} - y_{i} ∣) \leq ξ (∣ x_{i} - z_{i} ∣) + ξ (∣ z_{i} - y_{i} ∣),

ξ (∣ x_{i} - y_{i} ∣) \leq ξ (∣ x_{i} - z_{i} ∣) + ξ (∣ z_{i} - y_{i} ∣),

\min_{s}e^{-s\Delta}\mathbb{E}e^{s\big{(}\xi(\lambda\left\lvert X\right\rvert)-\mu(\lambda)\big{)}}\leq\exp\left(-\frac{\Delta^{2}}{4(V^{2}+A)}\right)

\min_{s}e^{-s\Delta}\mathbb{E}e^{s\big{(}\xi(\lambda\left\lvert X\right\rvert)-\mu(\lambda)\big{)}}\leq\exp\left(-\frac{\Delta^{2}}{4(V^{2}+A)}\right)

E exp (s Y) I {s Y \leq 1} \leq 1 + s^{2} E Y^{2} \leq 1 + s^{2} V^{2}

E exp (s Y) I {s Y \leq 1} \leq 1 + s^{2} E Y^{2} \leq 1 + s^{2} V^{2}

E exp (s Y) I {s Y > 1} = e P {Y > 1/ s} + \int_{1}^{\infty} e^{t} P {Y > t / s} d t \leq s^{2} A (λ) .

E exp (s Y) I {s Y > 1} = e P {Y > 1/ s} + \int_{1}^{\infty} e^{t} P {Y > t / s} d t \leq s^{2} A (λ) .

P {Y > t / s}

P {Y > t / s}

\leq P {2 ln (1 + λ ∣ X ∣) > μ (λ) + t / s} \leq C (λ) e^{- t / s}

s min e^{- s Δ}

s min e^{- s Δ}

P {\frac{1}{k} j = 1 \sum k ξ (λ ∣ X_{j} ∣) - μ (λ) > Δ} \leq 2 exp (- k \frac{Δ ^{2}}{4 ( V ^{2} + A ( λ ))})

P {\frac{1}{k} j = 1 \sum k ξ (λ ∣ X_{j} ∣) - μ (λ) > Δ} \leq 2 exp (- k \frac{Δ ^{2}}{4 ( V ^{2} + A ( λ ))})

k = (c + 2) ln (N) \frac{4 ( V ^{2} + A ( λ ))}{Δ ^{2}}

k = (c + 2) ln (N) \frac{4 ( V ^{2} + A ( λ ))}{Δ ^{2}}

\frac{1}{k} j = 1 \sum k ξ (λ ∣ X_{j} ∣) - μ (λ) \leq Δ

\frac{1}{k} j = 1 \sum k ξ (λ ∣ X_{j} ∣) - μ (λ) \leq Δ

μ (λ) := \frac{1}{2} ln (1 + λ^{2}) + atanh (\frac{2 λ}{1 + λ}) with atanh (x) := j = 0 \sum \infty \frac{x ^{2 j + 1}}{2 j + 1}

μ (λ) := \frac{1}{2} ln (1 + λ^{2}) + atanh (\frac{2 λ}{1 + λ}) with atanh (x) := j = 0 \sum \infty \frac{x ^{2 j + 1}}{2 j + 1}

Var (ξ (λ ∣ X ∣)) \leq min {\frac{π ^{2}}{2}, 2 E ln (1 + λ ∣ X ∣)} .

Var (ξ (λ ∣ X ∣)) \leq min {\frac{π ^{2}}{2}, 2 E ln (1 + λ ∣ X ∣)} .

μ ((1 + ϵ) λ) - μ (λ) and μ (λ) - μ ((1 + ϵ)^{- 1} λ),

μ ((1 + ϵ) λ) - μ (λ) and μ (λ) - μ ((1 + ϵ)^{- 1} λ),

s^{*} = Δ/ (2 (V^{2} + A (λ))) < 1/2.

s^{*} = Δ/ (2 (V^{2} + A (λ))) < 1/2.

\mu\left(\frac{\left\lVert x-y\right\rVert_{1}}{1+\epsilon}\right)\leq\rho\big{(}F(x),F(y)\big{)}\leq\mu\big{(}(1+\epsilon)\left\lVert x-y\right\rVert_{1}\big{)}

\mu\left(\frac{\left\lVert x-y\right\rVert_{1}}{1+\epsilon}\right)\leq\rho\big{(}F(x),F(y)\big{)}\leq\mu\big{(}(1+\epsilon)\left\lVert x-y\right\rVert_{1}\big{)}

k \geq \frac{C}{ϵ ^{2} ( 1 - ϵ ) ^{2}} ln (N^{c + 2}) with C = 64 (\frac{π ^{2}}{2} + \frac{16 2}{e π} e^{atanh 1/ 2}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Linear Dimension Reduction

Approximately Preserving

a Function of the 1-Norm

Michael P. Caseylabel=e1][email protected] [ U. S. Air Force Research Laboratory, [email protected]

United States Air Force Research Laboratory

Abstract

For any finite point set in $D$ -dimensional space equipped with the 1-norm, we present random linear embeddings to $k$ -dimensional space, with a new metric, having the following properties. For any pair of points from the point set that are not too close, the distance between their images is a strictly concave increasing function of their original distance, up to multiplicative error. The target dimension $k$ need only be quadratic in the logarithm of the size of the point set to ensure the result holds with high probability. The linear embeddings are random matrices composed of standard Cauchy random variables, and the proofs rely on Chernoff bounds for sums of iid random variables. The new metric is translation invariant, but is not induced by a norm.

60,

46B09, 46B85, 60E07, 60G50,

dimension reduction,

embeddings of finite metric spaces,

random projection,

metric preserving function,

Cauchy random variables,

Cauchy projections,

stable distributions,

concentration of measure,

keywords:

[class=MSC]

keywords:

\startlocaldefs\endlocaldefs

1 Introduction

The Johnson-Lindenstrauss lemma [8] states that for a finite set of points $P\subset\mathbb{R}^{D}$ and $0<\epsilon<1$ , there are random linear maps $F:\mathbb{R}^{D}\to\mathbb{R}^{k}$ satisfying, for any $x,y\in P$ ,

[TABLE]

with high probability, provided $k=\Theta(\epsilon^{-2}\ln\left\lvert P\right\rvert)$ . It is sufficient to draw the entries of $F$ i.i.d. sub-Gaussian [13]. These random linear projections have provided improved worst case performance bounds for many problems in theoretical computer science, machine learning, and numerical linear algebra. Ailon and Chazelle [1] show how $F$ may be computed quickly and apply it to the approximate nearest-neighbor problem, working on the projected points $F(P)$ . Vempala [19] gives a review of problems that may be reduced to analyzing a set of points $P\subset\mathbb{R}^{D}$ , so that after the random projection $F:\mathbb{R}^{D}\to\mathbb{R}^{k}$ is applied, the recovery of approximate solutions is possible with time and space bounds depending on $k$ , the target dimension, instead of $D$ , the ambient dimension.

In numerical linear algebra, Drineas et al. [5] use the lemma to approximate the leverage scores of a given matrix $A$ ; such scores are used to inform subsampling schemes for $A$ , resulting in sketches $\tilde{A}$ of smaller dimensions that preserve desired properties of $A$ . Drineas and Mahoney [6] give a further review of using randomness in numerical linear algebra.

The Johnson-Lindenstrauss lemma is a metric embedding result; the map $F$ sends the finite metric space $P\subset\mathbb{R}^{D}$ induced by the 2-norm to a corresponding metric space $F(P)\subset\mathbb{R}^{k}$ , also induced by the 2-norm, such that distances are preserved well. Ailon and Chazelle [1] also show that equipping the target space $\mathbb{R}^{k}$ with the 1-norm is also possible; the target dimension is still proportional to $\ln\left\lvert P\right\rvert$ , but the dependence on $\epsilon$ may be a bit worse. However, analogous results using the 1-norm on the domain do not hold. For example, in [2] and [10], specific $N$ -point subsets of $\mathbb{R}^{D}$ equipped with the $1$ -norm are shown to embed only in $\mathbb{R}^{k}$ with $k=N^{1/c^{2}}$ if one requires

[TABLE]

In particular, Brinkman and Charikar [2] show the target dimension $k$ must be at least $N^{1/2-O(\epsilon\ln(1/\epsilon))}$ if one wants $c=1+\epsilon$ .

In light of these negative results, people have tried estimating $\left\lVert x-y\right\rVert_{1}$ from the coordinates of $F(x)-F(y)$ . When the entries of $F$ are i.i.d. standard Cauchy random variables, the coordinates are distributed i.i.d. like $\left\lVert x-y\right\rVert_{1}X$ with $X\sim\text{Cauchy}\left(1\right)$ . The median of $\left\lVert x-y\right\rVert_{1}\left\lvert X\right\rvert$ is $\left\lVert x-y\right\rVert_{1}$ , so estimating the median from the coordinates of $F(x)-F(y)$ would estimate the distance this way. Indyk [7] considers the sample median as an estimator, while Li, Hastie, and Church [12] consider 1-homogeneous functions of these coordinates for estimators. None of the estimators considered are metrics on $\mathbb{R}^{k}$ . For nearest neighbor methods, we should like to have a metric on the target space $\mathbb{R}^{k}$ and prefer a low number of coordinates for each point.

Relaxing the problem as follows, we wish to find linear maps $F:\mathbb{R}^{D}\to\mathbb{R}^{k}$ satisfying, for any $x,y\in P$ ,

[TABLE]

with high probability. We have changed the metric on $\mathbb{R}^{k}$ to $\rho$ instead of the one induced by the 1-norm, and we have introduced a nonlinear function $\mu$ in place of the identity function. We want $k=\Theta(\epsilon^{-2}\ln^{c}\left\lvert P\right\rvert)$ , with $c<4$ or better.

Here, $\mu:\mathbb{R}_{+}\to\mathbb{R}_{+}$ is a concave increasing function with $\mu(0)=0$ . Such $\mu$ are called “metric preserving” by Corazza [4], for the following reason:

[TABLE]

that is, they admit a new metric on the space that is “compatible” with the old one. In particular, spheres for the new metric about a particular point $y\in\mathbb{R}^{D}$ , that is, the level sets $\left\{x\in\mathbb{R}^{D}\;\lvert\;\mu\circ\left\lVert x-y\right\rVert_{1}=t\right\},$ look like scaled versions of spheres for the 1-norm (crosspolytopes) about that point; the scaling however is nonlinear. The 1-norm is used here as an example, but any other input metric will still satisfy the triangle inequality under such $\mu$ . Not all metric preserving functions are concave increasing, but such a choice ensures the new metric generates the same topology as the old one.

For us, the linear map $F:\mathbb{R}^{D}\to\mathbb{R}^{k}$ will have entries $F_{ij}\overset{\text{i.i.d.}}{\sim}\text{Cauchy}\left(1\right)$ , and we introduce the metric $\rho$ on $\mathbb{R}^{k}$ using an auxiliary function $\xi$ :

[TABLE]

with

[TABLE]

for $\lambda>0$ . Our main theorem has several regimes depending on how big $\left\lVert x-y\right\rVert_{1}$ can be. (See theorems 3.0.1, 3.0.3, and 3.0.9.) However, the primary result is as follows.

Theorem 1.0.1.

Let $F$ , $\rho$ , and $\mu$ be as above. Given $N$ points $P\subset\mathbb{R}^{D}$ and $\epsilon\in(0,1)$ ,

[TABLE]

for all $x,y\in P$ with $\left\lVert x-y\right\rVert_{1}\geq\sqrt{1+\epsilon}$ , provided

[TABLE]

Independent of its interest as an analog of the Johnson-Lindenstrauss lemma, theorem 1.0.1 also contributes to the study of $p$ -stable projections. In fact, we make the following conjecture for $1<p<2$ upon replacing the entries $F_{ij}$ of $F$ by i.i.d. standard $p$ -stable random variables and setting $\mu(\lambda)=\mathbb{E}\xi(\lambda F_{11})$ . Just like the 1.0.1, the conjecture could have several parts based on how large $\left\lVert x-y\right\rVert_{p}$ is, but the primary conjecture is as follows.

Conjecture 1.0.2.

With $F$ and $\mu$ modified as above, and $\rho$ , $\epsilon$ , and $k$ as in theorem 1.0.1, the following bound holds

[TABLE]

for all $x,y\in P$ with $\left\lVert x-y\right\rVert_{p}=\Omega(1)$ .

The setup for the proof would be the same as for theorem 1.0.1, relying on 1st and 2nd moment estimates for $\xi(\lambda\left\lvert W\right\rvert)$ ; however, because the density for a $p$ -stable random variable $W$ is only implicitly defined, the needed 1st and 2nd moment estimates are not so straightforward, but could be empirically found on the computer using methods such as [3] to draw the $p$ -stable random variables. This approach, in which we directly project the points from $\mathbb{R}^{D}$ , may be contrasted to embedding $\ell_{p}^{D}\hookrightarrow\ell_{1}^{n}$ and applying theorem 1.0.1 there. Pisier [17] (see also [15, chapter 8] and [9, chapter 9]) shows that such embeddings exist with distortion $(1+\epsilon)$ , with $n$ proportional to $D$ and depending on $p$ and $\epsilon$ .

2 Overview of the Proof

In this section, we explain the choices for the function $\xi$ and the metric $\rho$ , as well as the use of Cauchy random variables, outlining the proof along the way.

Consider a point $v\in\mathbb{R}^{D}$ . The 1-stability of the Cauchy distribution dictates that the coordinates of the projected point $F(v)$ are Cauchy distributed: $F(v)_{j}\sim\left\lVert v\right\rVert_{1}X_{j}$ with $X_{j}\overset{\text{i.i.d.}}{\sim}\text{Cauchy}\left(1\right)$ . The metric $\rho$ is then an empirical mean:

[TABLE]

and if we marginalize out the Cauchy dependence, we recover the deterministic function $\mu$ of $\left\lVert v\right\rVert_{1}$ :

[TABLE]

We can now outline the proof as follows: let $x-y=v\in\mathbb{R}^{D}$ . The projection map $F:\mathbb{R}^{D}\to\mathbb{R}^{k}$ is linear and the metric $\rho$ is translation invariant, so our goal is to show $\mu(\left\lVert v\right\rVert_{1})\approx\rho(F(v),0)$ or upon setting $\left\lVert v\right\rVert_{1}=\lambda$ ,

[TABLE]

with high probability. As usual, we use the exponential Markov inequality and the i.i.d. assumption to estimate

[TABLE]

with a similar setup for the lower tail. However, Cauchy random variables $X$ only have finite fractional moments,

[TABLE]

so the presence of $\xi(\lambda\left\lvert X\right\rvert)$ in the exponential requires $\xi(\lambda)=c\ln(o(\lambda))$ when $\lambda$ is large. Our choice of $\xi$ ensures this behavior:

[TABLE]

while the presence of the “1+” in the logarithms ensures $\xi$ is nonnegative, increasing, and sends 0 to 0. The function $\xi$ is thus subadditive and preserves the triangle inequality:

[TABLE]

ensuring $\rho$ is a metric on $\mathbb{R}^{k}$ . Because $\mu$ is the expectation of $\xi$ , it inherits these properties, so that $\mu\circ\left\lVert\right\rVert_{1}$ induces a metric on the original space $\mathbb{R}^{D}$ .

We show in sections 4 and 5 that our tail bounds take the following form: To be concrete, here is the upper tail case, but the other lower tail cases are similar

[TABLE]

with $\Delta$ depending on $\mu(\lambda)$ , the function $V^{2}$ giving an upper bound for the 2nd moment or the variance of $\xi(\lambda\left\lvert X\right\rvert)$ , and the auxiliary function $A(\lambda)$ , derived from tail estimates for $\xi(\lambda\left\lvert X\right\rvert)$ . The particular form of $\xi$ was chosen to give explicit control over all these quantities as $\lambda$ varies, allowing us to obtain bounds on equation ( $\diamond$ ‣ 2) that only weakly depend on $\lambda$ .

We arrive at the particular form ( $\diamond$ ‣ 2) for the tail bounds by estimating the moment generating function as follows, taking the upper tail as an example: with $Y=\xi(\lambda\left\lvert X\right\rvert)-\mu(\lambda)$ , we split $\mathbb{E}\exp(sY)$ into two terms and desire each to be bounded by something quadratic in $s$ : for the 1st term, using a 2nd order Taylor expansion for the exponential,

[TABLE]

while for the 2nd term, we use integration by parts, eventually showing

[TABLE]

We can show the integrand decays exponentially in $t$ using our choice of $\xi$ and the explicit density for the Cauchy distribution:

[TABLE]

with $C$ depending on $\lambda$ and $\mu(\lambda)$ . We can then combine these estimates and optimize in $s$ :

[TABLE]

using $s=\Delta/(2(V^{2}+A(\lambda)))$ .

The tail probabilities now have the form

[TABLE]

for a single $\lambda$ corresponding to a single vector $v=x-y\in\mathbb{R}^{D}$ . There are at most $\binom{N}{2}$ pairs of points from $P$ , so we would want to choose the target dimension as

[TABLE]

to ensure with probability at least $1-N^{-c}$ ,

[TABLE]

for all pairs of points simultaneously. However, the error $\Delta$ and the target dimension $k$ both depend on $\lambda$ , so we require uniform estimates for these quantities. We find these by breaking up the possible values for $\lambda$ into three regimes: small, medium, and big

Our choice of $\xi$ provides an explicit function for $\mu:=\mathbb{E}\xi(\lambda\left\lvert X\right\rvert)$ , (lemma A.1.1)

[TABLE]

for $\left\lvert x\right\rvert<1$ . The big regime has $\mu$ behaving like the log term, while the medium and small regimes have it behaving like $\Theta(\sqrt{2\lambda})$ . The choice of $\xi$ also gives us a bound on the variance (corollary A.3.2)

[TABLE]

The constant bound, independent of $\lambda$ , is used for the big regime, while the expectation bound provides finer control on the variance when $\lambda$ is small, via another explicit function, lemma A.3.3, of $\lambda$ .

For the big regime, taking $\Delta$ as

[TABLE]

both of which are bounded by lemma A.2.1, together with the constant bound for the variance give theorem 3.0.1, as $A(\lambda)$ is bounded here.

For the medium and small regimes, we take $\Delta=\epsilon\mu(\lambda)$ and use corollary A.3.4 to bound $V^{2}/\mu(\lambda)^{2}$ . The split between medium and small regimes occurs because of the $\ln(\lambda)$ term in that ratio: the target dimension has $\Delta^{2}\sim\lambda$ on the bottom, while the upper tail bound 4.0.1 required $s^{\ast}$ to only have $\Delta$ on top

[TABLE]

This mismatch in powers of $\Delta$ forces us to choose a cutoff $\lambda$ ; because $A(\lambda)$ (and $V^{2}$ ) have terms proportional to $\lambda$ , the above inequality can only hold for $\lambda$ not too small. This gives the $\ln(\epsilon^{2}/3)$ term in theorem 3.0.3 for the medium regime.

For the small regime, there is no such restriction on $s$ for the lower tail bound 5.0.2, but the target dimension still grows like $\ln(1/\lambda)$ as $\lambda$ decreases (See lemma 3.0.5.). We stop that growth by fixing a particular $\lambda_{0}$ , showing that for all smaller $\lambda$ , the $(1-\epsilon)$ error has a suitable replacement in theorem 3.0.9. The key is lemma 3.0.7: we choose $\lambda_{0}$ so that $\lambda_{0}\max_{i}\left\lvert X_{i}\right\rvert<1/6$ with high probability, making both $\xi(\lambda\left\lvert X_{i}\right\rvert)$ and $\mu(\lambda)$ behave like $\Theta(\sqrt{\lambda})=\sqrt{\eta}\Theta(\sqrt{\lambda_{0}})$ for $\lambda=\eta\lambda_{0}$ with $\eta\in(0,1)$ . Because $\lambda_{0}$ turns out to be $\Theta(1/(N^{c+2}k))$ , the $-\ln(\lambda_{0})$ in the target dimension forces $k=\Theta(\epsilon^{-2}\ln^{2}(N^{c+2}))$ , a quadratic dependence on $\ln(N)$ .

We finish the proofs in the next section, while the upper and lower tail estimates are provided in sections 4 and 5. We collect the estimates on the 1st and 2nd moments in appendix A, and ancillary identities for those estimates in appendix B.

3 Finishing the Proof

We now tie down the target dimension $k$ . Recall $P$ is a set of $N$ points in $\mathbb{R}^{D}$ , and $F:\mathbb{R}^{D}\to\mathbb{R}^{k}$ is a matrix of i.i.d. $\text{Cauchy}\left(1\right)$ entries. In what follows, the estimates are not sharp.

Theorem 3.0.1 (Big Regime).

For $\epsilon\in(0,1)$ and $\left\lVert x-y\right\rVert_{1}\geq\sqrt{1+\epsilon}$ ,

[TABLE]

for all $x,y\in P$ with probability at least $1-N^{-c}$ provided

[TABLE]

*Remark 3.0.2**.*

The constants are not expected to be sharp; $C$ is computed so that $k$ is uniformly bounded with respect to $\left\lVert x-y\right\rVert_{1}\geq\sqrt{1+\epsilon}$ .

Proof.

Let $\lambda=\left\lVert x-y\right\rVert_{1}$ . We want to use the lower and upper tail estimates from lemmas 5.0.1 and 4.0.1, so it remains to verify

[TABLE]

with $\Delta$ either

[TABLE]

By lemma A.2.1, the differences $\Delta$ are at most $\epsilon$ for $\lambda\geq\sqrt{1+\epsilon}$ , while the upper bound for the variance of $\xi(\lambda\left\lvert X\right\rvert)$ is $V^{2}=\pi^{2}/2$ by corollary A.3.2. Because $\epsilon<1$ , we then certainly have $s^{\ast}<1/2$ .

As explained in section 2, the target dimension $k$ is chosen to ensure the union bound is at most $N^{-c}$ for both tails combined. The choice of $C$ comes from the lower bound for the $\Delta$ ’s from lemma A.2.1 and the larger of the two $A(\lambda)$ functions in lemmas 5.0.1 and 4.0.1. ∎

Theorem 3.0.3 (Medium Regime).

For $\left\lVert x-y\right\rVert_{1}\in[\epsilon^{2}/3,\sqrt{1+\epsilon}]$ and $\epsilon\in(0,1)$ ,

[TABLE]

for all $x,y\in P$ with probability at least $1-N^{-c}$ provided

[TABLE]

*Remark 3.0.4**.*

We have not been able to establish an upper bound result

[TABLE]

with high probability when $\left\lVert x-y\right\rVert<\epsilon^{2}/3$ . Our proofs break down or require a much higher estimate for the target dimension $k$ . We conjecture that $k=O(\ln^{2}(N^{c})/\epsilon^{2})$ still suffices, in light of theorem 3.0.9 for the small regime.

Proof.

With $\lambda=\left\lVert x-y\right\rVert_{1}$ , we take $\Delta=\epsilon\mu(\lambda)$ . By lemma 3.0.5, the lower bound for $\rho(F(x),F(y))$ requires an initial estimate for the target dimension of $\tilde{k}=2\ln(N^{c+2})\epsilon^{-2}C(1-\ln(\lambda^{\ast}))$ with $\lambda^{\ast}$ the smallest $\lambda$ we wish to consider. The upper bound will force our choice of $\lambda^{\ast}$ .

We now want to use the upper tail estimate from lemma 4.0.1. It remains to check

[TABLE]

and it suffices to show $\epsilon\mu(\lambda)/A(\lambda)\leq 1$ . With $b=16e/(\pi(e-1)^{2})\approx 4.7$ from $A(\lambda)$ , we use lemma A.1.2 for the upper bound for $\mu(\lambda)$ to find, after some estimation,

[TABLE]

recalling $\epsilon\in(0,1)$ .

Using the expression for $A(\lambda)$ , we now have the following estimate for the target dimension. Because $V^{2}$ is an estimate on the variance now, we can remove the $1+$ ’s from corollary A.3.4 to find

[TABLE]

using $\mu(\lambda)\geq\sqrt{2\lambda}/(1+\lambda)$ from remark A.1.3. The $b$ dependent term here is enough to ensure $k\geq\tilde{k}$ , so both sides of the inequality for $\rho(F(x),F(y))$ hold with high probability and this dimension $k$ . ∎

The following two lemmas lead to theorem 3.0.9, which shows that a lower bound for $\rho(F(x),F(x))$ continues to hold for all $\left\lVert x-y\right\rVert_{1}<2$ .

Lemma 3.0.5.

For $\epsilon\in(0,1)$ and $0<\left\lVert x-y\right\rVert_{1}\in[\lambda^{\ast},2)$ ,

[TABLE]

for all $x,y\in P$ with probability at least $1-N^{-c}$ provided

[TABLE]

*Remark 3.0.6**.*

The estimates are not sharp.

Proof.

With $\lambda=\left\lVert x-y\right\rVert_{1}$ , we take $\Delta=\epsilon\mu(\lambda)$ . Using corollary A.3.4 and the lower tail esimate from lemma 5.0.2, the target dimension is

[TABLE]

to ensure the bound holds with probability at least $1-N^{-c}$ for all pairs of points. ∎

Lemma 3.0.7.

For $1\leq i\leq k$ , let $X_{i}\overset{\text{i.i.d.}}{\sim}\text{Cauchy}\left(1\right)$ . For $0<\epsilon<1$ and $0<\lambda_{0}\leq 1$ , suppose

[TABLE]

and $\lambda_{0}\max_{i}\left\lvert X_{i}\right\rvert\leq c_{0}\leq 1/6$ .

Then if $0<\eta<1$ , the same $X_{i}$ also satisfy

[TABLE]

with $\epsilon^{\prime}$ depending on $\epsilon$ , $c_{0}$ , and $\lambda_{0}$ . If $\lambda_{0}\leq\epsilon^{2}$ , then we can have

[TABLE]

*Remark 3.0.8**.*

Analogous upper bounds are also possible, with a similar proof.

Proof.

A fourth order Taylor expansion with Lagrange remainder shows

[TABLE]

Because $\max_{i}{\left\lvert X_{i}\right\rvert}\leq c_{0}\leq 1/6$ and $0<\eta<1$ , we invoke the above inequality twice to find

[TABLE]

By assumption, summing over $i$ and dividing by $k$ yields

[TABLE]

We finish by using remark A.1.3 (twice) to “absorb” $\sqrt{\eta}$ into $\mu$ ,

[TABLE]

∎

Theorem 3.0.9 (Small Regime).

For $\epsilon\in[N^{-(c+2)/2},1]$ and all $\left\lVert x-y\right\rVert_{1}<2$ , the following bound holds:

[TABLE]

with probability at least $1-N^{-c}$ , provided

[TABLE]

Proof.

We can use lemma 3.0.5 with $\lambda^{\ast}=\lambda_{0}$ to cover all distances $\left\lVert x-y\right\rVert_{1}$ down to $\lambda_{0}$ . We then choose $\lambda_{0}$ in order to extend the lower bound to distances smaller than $\lambda_{0}$ , using lemma 3.0.7.

Concretely, recall from section 2 that because $F$ is a linear map of i.i.d. Cauchy entries,

[TABLE]

with the same $X_{i}\overset{\text{i.i.d.}}{\sim}\text{Cauchy}\left(1\right)$ . Let $Z=\max_{1\leq i\leq k}\left\lvert X_{i}\right\rvert$ . To use lemma 3.0.7, we just need to ensure $\lambda_{0}Z\leq 1/6$ with high probability. By the independence of the $X_{i}$ ,

[TABLE]

So set $\lambda_{0}=\pi/(12N^{c+2}k)$ . Choosing $k$ according to lemma 3.0.5 with $\lambda_{0}=\lambda^{\ast}$ , we have the following inequality for the target dimension

[TABLE]

Taking $k=C\ln^{2}(N^{c+2})/\epsilon^{2}$ satisfies the above, provided $\epsilon^{2}>N^{-c-2}$ , say. We now have the conditions of lemma 3.0.7 satisfied for all $\binom{N}{2}$ pairs of points, with probability at least $1-N^{-c}$ , and $\lambda_{0}<\epsilon^{2}$ . ∎

4 Upper Tails

In the following lemmas, the estimates are not sharp.

Lemma 4.0.1 (General Upper Tail).

With $Y=\mathbb{E}\xi(\lambda\left\lvert X\right\rvert)-\mu(\lambda)$ and $V^{2}\geq\operatorname{Var}(\xi(\lambda\left\lvert X\right\rvert))$ ,

[TABLE]

and is minimized at $s^{\ast}$ with

[TABLE]

Proof.

From the discussion in section 2, we just need to establish the $A(\lambda)$ function for the integration by parts terms. To ensure $\mathbb{E}\exp(sY)$ is finite, we require $s<1$ . For $1>s>0$ and $t>1$ , we then estimate, with $w=\mu+t/s$ ,

[TABLE]

We shall choose $w_{0}$ and hence $C(\lambda)$ a bit later; note that $C(\lambda)$ contains the factor $e^{-\mu}$ .

With $Y=\xi(\lambda\left\lvert X\right\rvert)-\mu(\lambda)$ , we can now estimate the integration by parts terms

[TABLE]

as at most

[TABLE]

using $e^{-1/s}\leq(2/e)^{2}s^{2}$ and $s\in(0,1)$ . Assuming $s\leq 1/2$ , we can now write, for a suitable upper bound $A(\lambda)$ ,

[TABLE]

and we may optimize in $s$

[TABLE]

at

[TABLE]

It remains to choose $w_{0}$ and hence $A(\lambda)$ . Recalling the formula for $\mu$ either from section 2 or directly from lemma A.1.1, we can lower bound $w=\mu(\lambda)+t/s\geq(1/2)\ln(1+\lambda^{2})+2\geq 2$ provided $s\leq 1/2$ . Choosing $w_{0}=2$ ,

[TABLE]

∎

5 Lower Tails

Unlike for the upper tails, we can control the lower tails for the full range of $\lambda$ . We address $\lambda$ bounded away from 0 using the same techniques as for the upper tail. The lower tail proof for small $\lambda$ simplifies because $\xi(\lambda\left\lvert X\right\rvert)$ is nonnegative, so that there is no restriction on optimizing $s$ in the moment generating function.

In the following lemmas, the estimates are not sharp.

Lemma 5.0.1 (Lower Tail, Big Regime).

With $Y=\mathbb{E}\xi(\lambda\left\lvert X\right\rvert)-\mu(\lambda)$ and $V^{2}\geq\operatorname{Var}(\xi(\lambda\left\lvert X\right\rvert))$ ,

[TABLE]

and is minimized at $s^{\ast}$ with

[TABLE]

Proof.

Just as in the upper tail computations,

[TABLE]

and

[TABLE]

We shall again determine the function $A(\lambda)$ by estimating a tail, but now it is the lower tail

[TABLE]

By subadditivity of $\sqrt{a}$ ,

[TABLE]

We now can estimate

[TABLE]

We can then upper bound the integration by parts terms just like in the proof for the upper tail lemma 4.0.1. Assuming $s\leq 1/2$ , we choose an upper bound $A(\lambda)$ for $(8/e)C(\lambda)$ and arrive at

[TABLE]

To find $A(\lambda)$ , note that $\mu(\lambda)\leq\operatorname{atanh}(1/\sqrt{2})+(1/2)\ln(1+\lambda^{2})$ , so that

[TABLE]

which is bounded for $\lambda$ away from 0. ∎

Lemma 5.0.2 (Lower Tail, Small Regimes).

With $Y=\mathbb{E}\xi(\lambda\left\lvert X\right\rvert)-\mu(\lambda)$ and $V^{2}\geq\mathbb{E}\xi(\lambda\left\lvert X\right\rvert)^{2}$ ,

[TABLE]

Proof.

Because $\xi(\lambda\left\lvert X\right\rvert)$ is nonnegative, we can use the 2nd order Taylor expansion of $\exp(-x)$ to write

[TABLE]

and we can then optimize $s$ in the usual way. ∎

Acknowledgments

This work was supported in part by Duke University while completing my Ph.D. thesis. I should like to thank my advisor Professor Sayan Mukherjee for encouraging me in completing this work. I should also like to thank the anonymous referee, whose comments helped greatly streamline the paper. I should like to thank Mom, Dad, Katie, and everyone who has been praying for me throughout my time at Duke. I should finally like to thank the Blessed Virgin Mary, Saint Joseph, and the Holy Trinity for helping me be patient throughout this work.

Appendix A The First and Second Moments

Here we derive the explicit formula for $\mu(\lambda)=\mathbb{E}\xi(\lambda\left\lvert X\right\rvert)$ in lemma A.1.1 and the upper bounds $V^{2}$ for $\operatorname{Var}(\xi(\lambda\left\lvert X\right\rvert))$ in lemma A.3.2. Some of this work is a bit tedious, but it will allow us to give explicit upper bounds on the target dimension

[TABLE]

We need bounds for $\mu(\lambda)$ when $\lambda$ is small (lemma A.1.3) as well as lower bounds for the $\Delta$ ’s

[TABLE]

when $\lambda$ is “large” (lemma A.2.1).

Because the Cauchy density has particularly simple behavior when extended to the complex plane, we heavily rely on complex analysis techniques. We chose $\xi$ to be the linear combination

[TABLE]

as it will simplify the estimates as well as be easy to compute using a pair of contour integrals. For both moments, the contour integral setup below will greatly facilitate computations; in particular, it will allow us to avoid estimating

[TABLE]

individually, which while possible, is not necessary for our results.

Proposition A.0.1 (Contour Integral Setup).

For $\lambda>0$ , $b>0$ , and $X\sim\text{Cauchy}\left(1\right)$ ,

[TABLE]

*Remark A.0.2**.*

The task is then to simplify the complex logarithms on the right hand side when particular values of $b$ are chosen. We shall choose $b=1$ and $b=2$ in the next sections.

Proof.

We want to compute

[TABLE]

via contour integration. Extending to $z\in\mathbb{C}-(-\infty,0]$ , let

[TABLE]

which has simple poles at $z=\pm i$ .

We shall compute $I(\lambda)$ be using two different contours that both traverse the interval $[0,R]$ in the positive direction. Specifically, $\mathcal{C}^{+}$ is oriented counterclockwise, while $\mathcal{C}^{-}$ is oriented clockwise, setting

[TABLE]

with “large” arcs

[TABLE]

and segments rotating as $\epsilon\to 0$ to the negative real axis

[TABLE]

Check that

[TABLE]

Keeping in mind the orientations of the contours, the residue theorem dictates for $R>1$ ,

[TABLE]

and similarly

[TABLE]

It remains to show that

[TABLE]

For these $\mathcal{C}^{\pm}_{\epsilon}(R)$ integrals, note that

[TABLE]

which approaches $\pm i\sqrt{r}$ when $\epsilon\to 0$ . Consequently, when $z=re^{i(\pi-\epsilon)}=-re^{-i\epsilon}$ , we can use the dominated convergence theorem to conclude

[TABLE]

checking that the integrand is bounded by a summable one when $\epsilon<\pi/8$ say. Sending $R\to\infty$ recovers

[TABLE]

Similar reasoning applies to the $\mathcal{C}^{-}_{\epsilon}(R)$ integral to yield

[TABLE]

Putting everything together, we have

[TABLE]

as claimed. ∎

A.1 1st Moment

Recall from definition B.1.6 that $\operatorname{atanh}(x)$ may be defined by the power series

[TABLE]

Lemma A.1.1.

If $\lambda>0$ and $X\sim\text{Cauchy}\left(1\right)$ , then

[TABLE]

that is,

[TABLE]

Proof.

Starting from lemma A.0.1 with $b=1$ ,

[TABLE]

By lemma B.1.7 and the atanh addition formula B.1.8,

[TABLE]

By remark B.0.10,

[TABLE]

Consequently,

[TABLE]

as claimed. ∎

We use the following lemma to show that $\mu(\lambda)=\Theta(\sqrt{\lambda})$ as well when $\lambda$ is small.

Lemma A.1.2.

For $\lambda>0$ ,

[TABLE]

and approaches 0 as $\lambda\to\infty$ . Further, for any $\lambda\leq\lambda_{0}\leq 1$ ,

[TABLE]

*Remark A.1.3**.*

By lemma A.1.1, we now also have the bound

[TABLE]

using $\ln(1+x)\leq x$ twice.

Proof.

The limit for large $\lambda$ is immediate. From the power series for $\operatorname{atanh}$ , conclude $\operatorname{atanh}(x)>x$ for $x>0$ . We can also give the upper bound

[TABLE]

So,

[TABLE]

Noting that $\lambda/(1+\lambda^{2})$ is strictly increasing for $\lambda\in(0,1)$ , we can fix the $\lambda^{2}$ term at a particular constant. ∎

A.2 Estimating Deviations of the Mean

We derive the estimates used in the large scale concencentration proofs given above. Both differences

[TABLE]

are controlled by lemma A.2.1 by requiring $\lambda\geq\sqrt{1+\epsilon}$ . Because

[TABLE]

both deviations will be sums of two terms, an $\operatorname{atanh}$ term and a $\ln$ term.

Lemma A.2.1.

For $1\leq a$ and $1/\sqrt{a}\leq\lambda$ ,

[TABLE]

Proof.

We shall show that for $\lambda\geq 1/\sqrt{a}$ , the difference in the $\operatorname{atanh}$ terms is nonpositive. We then immediately have the upper bound

[TABLE]

On the other hand, because $\lambda\geq 1/\sqrt{a}$ , the $\ln$ contribution also has the lower bound

[TABLE]

using a 2nd order Taylor series with Lagrange remainder in the last line, recalling $a\geq 1$ here.

For the lower bound for $\mu(a\lambda)-\mu(\lambda)$ , it remains to control how negative the $\operatorname{atanh}$ contribution is. With

[TABLE]

we can use the atanh addition formula B.1.8,

[TABLE]

for $u,v\in(-1,1)$ , which is the case for us here. After some simplification, we recover

[TABLE]

which is negative for $\lambda\geq 1/\sqrt{a}$ . Because atanh is an odd function, taking it of the above gives a negative contribution for such $\lambda$ . Use the AM-GM inequality to upper bound

[TABLE]

then use the estimate

[TABLE]

as the remaining factor is seen to be decreasing for $a\geq 1$ upon taking logarithms. Using $\sqrt{a}\leq 1+(a-1)/2$ , we finally have.

[TABLE]

∎

A.3 2nd Moment

To estimate the 2nd moment $\mathbb{E}\xi^{2}(\lambda\left\lvert X\right\rvert)$ , note that for any $a,b>0$ , the AM-GM inequality gives $(a+b)^{2}\leq 2(a^{2}+b^{2})$ , so that

[TABLE]

It turns out this last expression also arises from a contour integral.

Lemma A.3.1.

If $\lambda>0$ and $X\sim\text{Cauchy}\left(1\right)$ , then

[TABLE]

with

[TABLE]

Proof.

The computations will be a bit more involved than those for the first moment. Starting from lemma A.0.1 with $b=2$ ,

[TABLE]

that is,

[TABLE]

By lemma A.3.5,

[TABLE]

For the residue terms, we use lemma A.3.6:

[TABLE]

with

[TABLE]

Recalling our computation of $\mu(\lambda)$ in lemma A.1.1, we can further simplify:

[TABLE]

Putting everything together we may conclude

[TABLE]

∎

Corollary A.3.2 (The Variance Is Bounded).

For $\lambda>0$ and $X\sim\text{Cauchy}\left(1\right)$ ,

[TABLE]

Proof.

Just note that for $\nu>0$ ,

[TABLE]

The constant follows from $\arctan(x)\leq\pi/2$ for all $x\in\mathbb{R}$ , while the $\ln(1+\nu)$ bound follows from comparing derivatives, noting that both functions take 0 when $\nu=0$ . ∎

For quantitative estimates for the 2nd moment and the variance, we make the $\mathbb{E}\ln(1+\lambda\left\lvert X\right\rvert)$ term explicit in the above bound.

Lemma A.3.3.

For $\lambda\geq 0$ and $X\sim\text{Cauchy}\left(1\right)$ ,

[TABLE]

Proof.

From lemma B.0.1

[TABLE]

We use the reflection formula B.2.1 to expand the dilogarithm terms.

Recall from lemma B.2.1, for $z\in(\mathbb{C}-\mathbb{R})\cup(0,1)$ ,

[TABLE]

Consequently, using definition B.1.3 for $\operatorname{Ti}_{2}$ ,

[TABLE]

By lemma B.0.9 (really the remark there) and the definition of arctan,

[TABLE]

Thus,

[TABLE]

∎

Corollary A.3.4.

For $0<\lambda<2$

[TABLE]

Proof.

By corollary A.3.2 and lemma A.3.3, we have

[TABLE]

because $\operatorname{Ti}_{2}(\lambda)$ is an alternating series with terms of decreasing magnitude for $\lambda<2$ and that for $\lambda\leq 1$ , $\ln(\lambda)$ is nonnegative. For $\lambda\in(1,2)$ , we can drop the $\ln(\lambda)$ term for an upper bound. Consequently, using $\mu(\lambda)\geq\sqrt{2\lambda}/(1+\lambda)$ from remark A.1.3,

[TABLE]

for $\lambda\leq 1$ , and

[TABLE]

for $\lambda\in(1,2)$ . ∎

Lemma A.3.5.

For $r>0$ ,

[TABLE]

Proof.

We are adding complex conjugates, so the left hand side is

[TABLE]

∎

Lemma A.3.6.

For $\nu>0$ ,

[TABLE]

with

[TABLE]

Proof.

Using lemma B.1.7,

[TABLE]

and similarly

[TABLE]

Adding yields several terms:

[TABLE]

From lemma A.3.5,

[TABLE]

We also have

[TABLE]

Let

[TABLE]

by the atanh addition formula B.1.8, as $\sqrt{\pm i}=(1\pm i)/\sqrt{2}$ are conjugates of each other.

Let

[TABLE]

Then

[TABLE]

So we are left to understand $h(\nu)$ . By lemma A.3.7, it is

[TABLE]

∎

Lemma A.3.7.

For $\nu>0$ ,

[TABLE]

*Remark A.3.8**.*

For $\nu<1$ , we can rewrite the above as

[TABLE]

Proof.

We cannot directly use the atanh addition formula because there is a singularity when $\nu$ crosses 1. However, by definition of atanh B.1.6, we can convert $h(\nu)$ as follows, using $\sqrt{-i}=-i\sqrt{i}$

[TABLE]

We now use the inversion formula B.3.1 for $\arctan$ .

[TABLE]

The following identity holds

[TABLE]

because both analytic expressions are 0 at $\nu=1$ , and their derivatives match for $\nu>0$ . ∎

Appendix B Polylogarithms and Their Friends

The polylogarithms $\operatorname{Li}_{b}(z)$ arise when we compute or estimate the first and second moments of the coordinate projections; they will help us give quantitative bounds which are needed in some of the proofs. References for polylogarithms are [11] and [14].

As initial motivation for studying such functions, we have the following lemma.

Lemma B.0.1.

Let $X\sim\text{Cauchy}\left(1\right)$ and $\nu>0$ . Then for $b>-1$ ,

[TABLE]

Proof.

We have

[TABLE]

Change variables $u=1+\nu x$ and then $t=\ln(u)$ to find

[TABLE]

Using partial fractions, we may write

[TABLE]

by definition B.0.7. The polylogarithms are defined because $\nu>0$ , and if $b>0$ , the value at $\nu=0$ is also defined. ∎

General references for complex analysis are [18] for proofs and [16] for intuition. If $z=x+iy\in\mathbb{C}$ with $x,y\in\mathbb{R}$ , then $\Re(z):=x$ and $\Im(z):=y$ . If $z=re^{i\theta}=x+iy\in\mathbb{C}$ , denote $z^{\ast}=re^{-i\theta}=x-iy$ for the complex conjugate. Further $\left\lvert z\right\rvert^{2}=zz^{\ast}=x^{2}+y^{2}$ . Thus, if $w=se^{i\phi}$ , we have

[TABLE]

Further, if $w\neq 0$ ,

[TABLE]

For us, analytic functions are synonymous with holomorphic ones. We shall be using two theorems from complex analysis repeatedly. Cf. [18, page 52,96].

Theorem B.0.2 (Analytic Continuation).

Let $f$ and $g$ be analytic functions in a connected open subset $\Omega$ of $\mathbb{C}$ . If $f(z)=g(z)$ for all $z$ in a non-empty open subset of $\Omega$ , then $f(z)=g(z)$ throughout $\Omega$ .

Theorem B.0.3 (Primitives).

Let $f$ be an analytic function in a simply connected subset $\Omega$ of $\mathbb{C}$ . Then for $z_{0},z\in\Omega$ , the function

[TABLE]

is analytic too, with $\gamma$ any path from $z_{0}$ to $z$ lying in $\Omega$ .

Definition B.0.4 (The Logarithm).

For $z=re^{i\theta}\in\mathbb{C}-(-\infty,0]$ , define (the principle branch of) the logarithm of $z$ , $\ln(z)$ as

[TABLE]

for any path from 1 to $z$ in $\mathbb{C}-(-\infty,0]$ .

*Remark B.0.5**.*

Note that $\ln(z^{\ast})=\ln(r)-i\theta=\ln(z)^{\ast}$ . The map $w\mapsto 1/w$ takes $\mathbb{C}-(-\infty,0]$ to itself; for if $w=se^{i\phi}$ , with $\left\lvert\phi\right\rvert<\pi$ , then $1/w=(1/s)e^{-i\phi}$ which also lives in $\mathbb{C}-(-\infty,0]$ . With this choice of principle branch, the logarithm still satisfies $-\ln(1/w)=\ln(w)$ via

[TABLE]

Similarly, note that if $\Re(z),\Re(w)>0$ , then $zw=rse^{i(\theta+\phi)}$ with $\left\lvert\theta+\phi\right\rvert<\pi$ so $\arg(zw)=\theta+\phi$ and

[TABLE]

in this case. However, the general identity $\ln(z_{1}z_{2})=\ln(z_{1})+\ln(z_{2})$ does not hold.

Definition B.0.6 (The Polylogarithm of Order 1).

Define the polylogarithm of order 1, $\operatorname{Li}_{1}(z)$ as

[TABLE]

and

[TABLE]

For general $z$ , the domain makes sense, as $1-z=-(z-1)\in\mathbb{C}-(-\infty,0]$ for the $z$ in question. Recall when $\left\lvert z\right\rvert<1$ ,

[TABLE]

noting that both sides agree when $z=0$ , and upon differentiating,

[TABLE]

which means $-\ln(1-z)$ and the sum differ by a constant, namely 0.

The order of the polylogarithms may be extended; the general integral form below will be useful for some of the computations later.

Definition B.0.7.

For $b>0$ , define the polylogarithm of order $b$ as

[TABLE]

and

[TABLE]

for $z\in\mathbb{C}-[1,\infty)$ .

The nonintegral order polylogarithms also extend to the unit circle when the order is greater than 1.

Lemma B.0.8.

For $b>1$ and $z\in\mathbb{C}$ with $\left\lvert z\right\rvert=1$ ,

[TABLE]

Proof.

By definition,

[TABLE]

The series is finite because $b>1$ ; concretely, by the integral test (because $1/x^{b}$ is convex),

[TABLE]

∎

Lemma B.0.9.

For $z\in(\mathbb{C}-\mathbb{R})\cup(-1,1)$ and $b>0$ ,

[TABLE]

If $b>1$ , the equality also holds when $z=\pm 1$ .

*Remark B.0.10**.*

When $b=1$ , recover

[TABLE]

Proof.

First assume $\left\lvert z\right\rvert<1$ . From the power series,

[TABLE]

Both sides are analytic functions on $(\mathbb{C}-\mathbb{R})\cup(-1,1)$ , so by analytic continuation, the identity continues to hold there. If $b>1$ , the power series are also defined at $z=\pm 1$ . ∎

A useful property of the polylogarithms and the logarithm that we shall use repeatedly in computations is that they are all symmetric about the real axis, that is, $\operatorname{Li}_{b}(z^{\ast})^{\ast}=\operatorname{Li}_{b}(z)$ or concretely

[TABLE]

Powers and polynomials of such functions also have this property. Intuitively this symmetry follows from the real coeffecients in their power series expansions, so that $\operatorname{Li}(x)\in\mathbb{R}$ when $x<1$ . Rigorously, we use the Schwarz reflection principle; because $\operatorname{Li}_{b}(z)$ is analytic in $\mathbb{C}-[1,\infty)$ when $0\leq\arg(z)<\pi$ and real valued on $(-\infty,1)$ , $\operatorname{Li}_{b}(z)$ may be extended to the rest of $\mathbb{C}-[1,\infty)$ in an analytic fashion. Analytic continuation then dictates that this extension coincides with the original definition of $\operatorname{Li}_{b}(z)$ . See [18] pages 57-59 for the Schwarz reflection principle, page 56 for showing the integral definitions of $\operatorname{Li}_{b}(z)$ are analytic, and page 52 for the principle of analytic continuation.

B.1 Arctan and the Inverse Tangent Integrals

The function $t\mapsto\arctan(t)$ is proportional to the distribution function of $\left\lvert X\right\rvert$ with $X\sim\text{Cauchy}\left(1\right)$ . It is then perhaps not surprising that $\arctan$ and its relatives arise in working with functions of Cauchy random variables. We outline the properties we shall be using here.

The following definition is opaque but most useful to us.

Definition B.1.1.

Define $\arctan(z)$ as

[TABLE]

and

[TABLE]

Equivalently,

[TABLE]

*Remark B.1.2**.*

From the integral formulation, we also immediately have, with $v=-w$ ,

[TABLE]

The last definition for $\arctan(z)$ follows from

[TABLE]

and that $\arctan(0)=0$ .

We can generalize.

Definition B.1.3.

For $z\in\mathbb{C}-i\mathbb{R}\cup(-i,i)$ and $b>0$ , define the inverse tangent integral of order $b$ as

[TABLE]

and

[TABLE]

*Remark B.1.4**.*

Note if $\left\lvert y\right\rvert<1$ , we find

[TABLE]

Hence,

[TABLE]

when $\left\lvert y\right\rvert<1$ and $b>0$ . The right hand side continues to make sense for $y\in(\mathbb{C}-i\mathbb{R})\cup(-i,i)$ , so we may define

[TABLE]

as an analytic function on $z\in(\mathbb{C}-i\mathbb{R})\cup(-i,i)$ that agrees with the power series on the interior of the unit circle.

*Remark B.1.5**.*

In particular, we have $\operatorname{Ti}_{1}(z)=\arctan(z)$ .

To focus on the behavior of $\arctan$ on $(-i,i)$ which was not addressed in the inversion formula B.3.1, we change points of view through a rotation of the complex plane.

Definition B.1.6.

Define the function $\operatorname{atanh}$ as

[TABLE]

and as

[TABLE]

or equivalently as

[TABLE]

To see that the definitions are consistent, note first from the power series, $\operatorname{atanh}(0)=0=\arctan(0)$ , while on the other hand,

[TABLE]

Lemma B.1.7.

Let $z\in(\mathbb{C}-\mathbb{R})\cup(-1,1)$ then

[TABLE]

Proof.

Just split into even and odd degree terms.

[TABLE]

The equality extends to $(\mathbb{C}-\mathbb{R})\cup(-1,1)$ as both sides are analytic there. We now have

[TABLE]

as desired. ∎

Here is the addition formula.

Lemma B.1.8 (Atanh Addition Formula).

If $-1<x,y<1$ ,

[TABLE]

If $z\in\mathbb{C}-\mathbb{R}$ ,

[TABLE]

Proof.

Because $\operatorname{atanh}$ is odd, the addition formula also covers subtraction too. Check that

[TABLE]

So

[TABLE]

with $c$ a constant. Taking $z=0$ forces $c=\operatorname{atanh}(w)$ as desired.

For $z,w\in(\mathbb{C}-\mathbb{R})\cup(-1,1)$ , let

[TABLE]

We want to know when $f(z,w)$ also lies in the domain of atanh. When $w=z^{\ast}$ ,

[TABLE]

by the AM-GM inequality. The equality case occurs just if $\left\lvert z\right\rvert=1$ , but in that case, $\left\lvert\Re(z)\right\rvert/\left\lvert z\right\rvert<1$ as $z=\pm 1$ is not allowed for $\operatorname{atanh}$ . We are thus ok for all $z\in(\mathbb{C}-\mathbb{R})\cup(-1,1)$ in this $w=z^{\ast}$ case.

When $x,y\in(-1,1)$ , we may consider

[TABLE]

and by symmetry, $\partial_{y}f(x,y)>0$ . So $f$ is increasing in each of the individual coordinates. In particular, when $-1<x<y<1$ ,

[TABLE]

For each permutation of $x$ , $y$ , and [math], check that

[TABLE]

by the AM-GM inequality, with strict inequality because $\left\lvert x\right\rvert,\left\lvert y\right\rvert<1$ .

∎

B.2 Dilogarithm Properties

The dilogarithm is the polylogarithm of order 2.

Lemma B.2.1 (Reflection Formula).

For $z\in(\mathbb{C}-\mathbb{R})\cup(0,1)$ ,

[TABLE]

Proof.

(Compare to [11, page 5].) Consider

[TABLE]

On the other hand,

[TABLE]

Because the domain $(\mathbb{C}-\mathbb{R})\cup(0,1)$ is simply connected and the derivative above is analytic there, we have

[TABLE]

for some $z_{0}$ which we may take to lie on $(0,1)$ . Taking the limit as $z_{0}\to 0$ is safe, as the Taylor series for $\ln(1-z_{0})$ ensures $\ln(z_{0})\ln(1-z_{0})\to 0$ , while the dilogarithm is continuous on $(-\infty,1]$ . Hence,

[TABLE]

as desired. Note that proving the identity via integration by parts has to make this same limiting argument. ∎

B.3 Inversion Formulas

The following lemma allows us to describe the survival function of $\left\lvert X\right\rvert$ with $X\sim\text{Cauchy}\left(1\right)$ in a convenient way. Note that the survival function for $\left\lvert X\right\rvert$ will only consider $z=x>0$ .

Lemma B.3.1.

For $z\in\mathbb{C}-i\mathbb{R}$ ,

[TABLE]

*Remark B.3.2**.*

On the imaginary axis, $\arctan(ir)=i\operatorname{atanh}(r)$ and $\operatorname{atanh}$ is only defined for $r\in(-1,1)$ so $1/r$ does not make sense there. Consequently the domain in question has two connected components, so different constants should not be unexpected.

Proof.

First note that the left hand side is a constant

[TABLE]

The constant is determined by representative points $z=\pm 1$ in the right and left hand planes respectively. ∎

Bibliography19

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] {barticle} [author] \bauthor \bsnm Ailon, \bfnm Nir \binits N. and \bauthor \bsnm Chazelle, \bfnm Bernard \binits B. ( \byear 2009). \btitle The Fast Johnson-Lindenstrauss Transform and Approximate Nearest Neighbors*. \bjournal SIAM Journal on Computing \bvolume 39 \bpages 302–322. \endbibitem
2[2] {barticle} [author] \bauthor \bsnm Brinkman, \bfnm Bo \binits B. and \bauthor \bsnm Charikar, \bfnm Moses \binits M. ( \byear 2005). \btitle On the Impossibility of Dimension Reduction in L 1 subscript 𝐿 1 L_{1} . \bjournal J. ACM \bvolume 52 \bpages 766–788. \bdoi 10.1145/1089023.1089026 \endbibitem
3[3] {barticle} [author] \bauthor \bsnm Chambers, \bfnm J. M. \binits J. M., \bauthor \bsnm Mallows, \bfnm C. L. \binits C. L. and \bauthor \bsnm Stuck, \bfnm B. W. \binits B. W. ( \byear 1976). \btitle A Method for Simulating Stable Random Variables. \bjournal Journal of the American Statistical Association \bvolume 71 \bpages 340–344. \bdoi 10.2307/2285309 \endbibitem
4[4] {barticle} [author] \bauthor \bsnm Corazza, \bfnm Paul \binits P. ( \byear 1999). \btitle Introduction to Metric-Preserving Functions. \bjournal The American Mathematical Monthly \bvolume 106 \bpages 309–323. \bdoi 10.2307/2589554 \endbibitem
5[5] {barticle} [author] \bauthor \bsnm Drineas, \bfnm Petros \binits P., \bauthor \bsnm Magdon-Ismail, \bfnm Malik \binits M., \bauthor \bsnm Mahoney, \bfnm Michael W \binits M. W. and \bauthor \bsnm Woodruff, \bfnm David P \binits D. P. ( \byear 2012). \btitle Fast Approximation of Matrix Coherence and Statistical Leverage. \bjournal Journal of Machine Learning Research \bvolume 13 \bpages 32. \bnote ar Xiv: 1109.3843. \endbibitem
6[6] {barticle} [author] \bauthor \bsnm Drineas, \bfnm Petros \binits P. and \bauthor \bsnm Mahoney, \bfnm Michael W. \binits M. W. ( \byear 2016). \btitle Rand NLA: Randomized Numerical Linear Algebra. \bjournal Communications of the ACM \bvolume 59 \bpages 80–90. \bdoi 10.1145/2842602 \endbibitem
7[7] {barticle} [author] \bauthor \bsnm Indyk, \bfnm Piotr \binits P. ( \byear 2006). \btitle Stable Distributions, Pseudorandom Generators, Embeddings, and Data Stream Computation. \bjournal J. ACM \bvolume 53 \bpages 307–323. \bdoi 10.1145/1147954.1147955 \endbibitem
8[8] {barticle} [author] \bauthor \bsnm Johnson, \bfnm William B. \binits W. B. and \bauthor \bsnm Lindenstrauss, \bfnm Joram \binits J. ( \byear 1984). \btitle Extensions of Lipschitz Mappings into a Hilbert Space. \bjournal Contemporary Mathematics \bvolume 26 \bpages 189–206. \endbibitem

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Linear Dimension Reduction

Abstract

keywords:

keywords:

1 Introduction

Theorem 1.0.1**.**

Conjecture 1.0.2**.**

2 Overview of the Proof

3 Finishing the Proof

Theorem 3.0.1** (Big Regime).**

Remark 3.0.2*.*

Proof.

Theorem 3.0.3** (Medium Regime).**

Remark 3.0.4*.*

Proof.

Lemma 3.0.5**.**

Remark 3.0.6*.*

Proof.

Lemma 3.0.7**.**

Remark 3.0.8*.*

Proof.

Theorem 3.0.9** (Small Regime).**

Proof.

4 Upper Tails

Lemma 4.0.1** (General Upper Tail).**

Proof.

5 Lower Tails

Lemma 5.0.1** (Lower Tail, Big Regime).**

Proof.

Lemma 5.0.2** (Lower Tail, Small Regimes).**

Proof.

Acknowledgments

Appendix A The First and Second Moments

Proposition A.0.1** (Contour Integral Setup).**

Remark A.0.2*.*

Proof.

A.1 1st Moment

Lemma A.1.1**.**

Proof.

Lemma A.1.2**.**

Remark A.1.3*.*

Proof.

A.2 Estimating Deviations of the Mean

Lemma A.2.1**.**

Proof.

A.3 2nd Moment

Lemma A.3.1**.**

Proof.

Corollary A.3.2** (The Variance Is Bounded).**

Proof.

Lemma A.3.3**.**

Proof.

Corollary A.3.4**.**

Proof.

Lemma A.3.5**.**

Proof.

Lemma A.3.6**.**

Proof.

Lemma A.3.7**.**

Remark A.3.8*.*

Proof.

Appendix B Polylogarithms and Their Friends

Lemma B.0.1**.**

Proof.

Theorem B.0.2** (Analytic Continuation).**

Theorem B.0.3** (Primitives).**

Definition B.0.4** (The Logarithm).**

Remark B.0.5*.*

Definition B.0.6** (The Polylogarithm of Order 1).**

Definition B.0.7**.**

Lemma B.0.8**.**

Proof.

Lemma B.0.9**.**

Remark B.0.10*.*

Theorem 1.0.1.

Conjecture 1.0.2.

Theorem 3.0.1 (Big Regime).

*Remark 3.0.2**.*

Theorem 3.0.3 (Medium Regime).

*Remark 3.0.4**.*

Lemma 3.0.5.

*Remark 3.0.6**.*

Lemma 3.0.7.

*Remark 3.0.8**.*

Theorem 3.0.9 (Small Regime).

Lemma 4.0.1 (General Upper Tail).

Lemma 5.0.1 (Lower Tail, Big Regime).

Lemma 5.0.2 (Lower Tail, Small Regimes).

Proposition A.0.1 (Contour Integral Setup).

*Remark A.0.2**.*

Lemma A.1.1.

Lemma A.1.2.

*Remark A.1.3**.*

Lemma A.2.1.

Lemma A.3.1.

Corollary A.3.2 (The Variance Is Bounded).

Lemma A.3.3.

Corollary A.3.4.

Lemma A.3.5.

Lemma A.3.6.

Lemma A.3.7.

*Remark A.3.8**.*

Lemma B.0.1.

Theorem B.0.2 (Analytic Continuation).

Theorem B.0.3 (Primitives).

Definition B.0.4 (The Logarithm).

*Remark B.0.5**.*

Definition B.0.6 (The Polylogarithm of Order 1).

Definition B.0.7.

Lemma B.0.8.

Lemma B.0.9.

*Remark B.0.10**.*

Definition B.1.1.

*Remark B.1.2**.*

Definition B.1.3.

*Remark B.1.4**.*

*Remark B.1.5**.*

Definition B.1.6.

Lemma B.1.7.

Lemma B.1.8 (Atanh Addition Formula).

Lemma B.2.1 (Reflection Formula).

Lemma B.3.1.

*Remark B.3.2**.*