Linear Dimension Reduction Approximately Preserving a Function of the 1-Norm
Michael P. Casey

TL;DR
This paper introduces a novel random linear embedding method for finite point sets in high-dimensional 1-norm space, preserving a transformed distance function with high probability using Cauchy matrices.
Contribution
It presents a new dimension reduction technique that preserves a concave increasing function of original distances, requiring only quadratic logarithmic target dimension.
Findings
Embedding dimension is quadratic in log of point set size.
Uses Cauchy random matrices for embeddings.
Distance preservation holds with high probability.
Abstract
For any finite point set in -dimensional space equipped with the 1-norm, we present random linear embeddings to -dimensional space, with a new metric, having the following properties. For any pair of points from the point set that are not too close, the distance between their images is a strictly concave increasing function of their original distance, up to multiplicative error. The target dimension need only be quadratic in the logarithm of the size of the point set to ensure the result holds with high probability. The linear embeddings are random matrices composed of standard Cauchy random variables, and the proofs rely on Chernoff bounds for sums of iid random variables. The new metric is translation invariant, but is not induced by a norm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Linear Dimension Reduction
Approximately Preserving
a Function of the 1-Norm
Michael P. Caseylabel=e1][email protected] [ U. S. Air Force Research Laboratory, [email protected]
United States Air Force Research Laboratory
Abstract
For any finite point set in -dimensional space equipped with the 1-norm, we present random linear embeddings to -dimensional space, with a new metric, having the following properties. For any pair of points from the point set that are not too close, the distance between their images is a strictly concave increasing function of their original distance, up to multiplicative error. The target dimension need only be quadratic in the logarithm of the size of the point set to ensure the result holds with high probability. The linear embeddings are random matrices composed of standard Cauchy random variables, and the proofs rely on Chernoff bounds for sums of iid random variables. The new metric is translation invariant, but is not induced by a norm.
60,
46B09, 46B85, 60E07, 60G50,
dimension reduction,
embeddings of finite metric spaces,
random projection,
metric preserving function,
Cauchy random variables,
Cauchy projections,
stable distributions,
concentration of measure,
keywords:
[class=MSC]
keywords:
\startlocaldefs\endlocaldefs
1 Introduction
The Johnson-Lindenstrauss lemma [8] states that for a finite set of points and , there are random linear maps satisfying, for any ,
[TABLE]
with high probability, provided . It is sufficient to draw the entries of i.i.d. sub-Gaussian [13]. These random linear projections have provided improved worst case performance bounds for many problems in theoretical computer science, machine learning, and numerical linear algebra. Ailon and Chazelle [1] show how may be computed quickly and apply it to the approximate nearest-neighbor problem, working on the projected points . Vempala [19] gives a review of problems that may be reduced to analyzing a set of points , so that after the random projection is applied, the recovery of approximate solutions is possible with time and space bounds depending on , the target dimension, instead of , the ambient dimension.
In numerical linear algebra, Drineas et al. [5] use the lemma to approximate the leverage scores of a given matrix ; such scores are used to inform subsampling schemes for , resulting in sketches of smaller dimensions that preserve desired properties of . Drineas and Mahoney [6] give a further review of using randomness in numerical linear algebra.
The Johnson-Lindenstrauss lemma is a metric embedding result; the map sends the finite metric space induced by the 2-norm to a corresponding metric space , also induced by the 2-norm, such that distances are preserved well. Ailon and Chazelle [1] also show that equipping the target space with the 1-norm is also possible; the target dimension is still proportional to , but the dependence on may be a bit worse. However, analogous results using the 1-norm on the domain do not hold. For example, in [2] and [10], specific -point subsets of equipped with the -norm are shown to embed only in with if one requires
[TABLE]
In particular, Brinkman and Charikar [2] show the target dimension must be at least if one wants .
In light of these negative results, people have tried estimating from the coordinates of . When the entries of are i.i.d. standard Cauchy random variables, the coordinates are distributed i.i.d. like with . The median of is , so estimating the median from the coordinates of would estimate the distance this way. Indyk [7] considers the sample median as an estimator, while Li, Hastie, and Church [12] consider 1-homogeneous functions of these coordinates for estimators. None of the estimators considered are metrics on . For nearest neighbor methods, we should like to have a metric on the target space and prefer a low number of coordinates for each point.
Relaxing the problem as follows, we wish to find linear maps satisfying, for any ,
[TABLE]
with high probability. We have changed the metric on to instead of the one induced by the 1-norm, and we have introduced a nonlinear function in place of the identity function. We want , with or better.
Here, is a concave increasing function with . Such are called “metric preserving” by Corazza [4], for the following reason:
[TABLE]
that is, they admit a new metric on the space that is “compatible” with the old one. In particular, spheres for the new metric about a particular point , that is, the level sets look like scaled versions of spheres for the 1-norm (crosspolytopes) about that point; the scaling however is nonlinear. The 1-norm is used here as an example, but any other input metric will still satisfy the triangle inequality under such . Not all metric preserving functions are concave increasing, but such a choice ensures the new metric generates the same topology as the old one.
For us, the linear map will have entries , and we introduce the metric on using an auxiliary function :
[TABLE]
with
[TABLE]
for . Our main theorem has several regimes depending on how big can be. (See theorems 3.0.1, 3.0.3, and 3.0.9.) However, the primary result is as follows.
Theorem 1.0.1**.**
Let , , and be as above. Given points and ,
[TABLE]
for all with , provided
[TABLE]
Independent of its interest as an analog of the Johnson-Lindenstrauss lemma, theorem 1.0.1 also contributes to the study of -stable projections. In fact, we make the following conjecture for upon replacing the entries of by i.i.d. standard -stable random variables and setting . Just like the 1.0.1, the conjecture could have several parts based on how large is, but the primary conjecture is as follows.
Conjecture 1.0.2**.**
With and modified as above, and , , and as in theorem 1.0.1, the following bound holds
[TABLE]
for all with .
The setup for the proof would be the same as for theorem 1.0.1, relying on 1st and 2nd moment estimates for ; however, because the density for a -stable random variable is only implicitly defined, the needed 1st and 2nd moment estimates are not so straightforward, but could be empirically found on the computer using methods such as [3] to draw the -stable random variables. This approach, in which we directly project the points from , may be contrasted to embedding and applying theorem 1.0.1 there. Pisier [17] (see also [15, chapter 8] and [9, chapter 9]) shows that such embeddings exist with distortion , with proportional to and depending on and .
2 Overview of the Proof
In this section, we explain the choices for the function and the metric , as well as the use of Cauchy random variables, outlining the proof along the way.
Consider a point . The 1-stability of the Cauchy distribution dictates that the coordinates of the projected point are Cauchy distributed: with . The metric is then an empirical mean:
[TABLE]
and if we marginalize out the Cauchy dependence, we recover the deterministic function of :
[TABLE]
We can now outline the proof as follows: let . The projection map is linear and the metric is translation invariant, so our goal is to show or upon setting ,
[TABLE]
with high probability. As usual, we use the exponential Markov inequality and the i.i.d. assumption to estimate
[TABLE]
with a similar setup for the lower tail. However, Cauchy random variables only have finite fractional moments,
[TABLE]
so the presence of in the exponential requires when is large. Our choice of ensures this behavior:
[TABLE]
while the presence of the “1+” in the logarithms ensures is nonnegative, increasing, and sends 0 to 0. The function is thus subadditive and preserves the triangle inequality:
[TABLE]
ensuring is a metric on . Because is the expectation of , it inherits these properties, so that induces a metric on the original space .
We show in sections 4 and 5 that our tail bounds take the following form: To be concrete, here is the upper tail case, but the other lower tail cases are similar
[TABLE]
with depending on , the function giving an upper bound for the 2nd moment or the variance of , and the auxiliary function , derived from tail estimates for . The particular form of was chosen to give explicit control over all these quantities as varies, allowing us to obtain bounds on equation ( ‣ 2) that only weakly depend on .
We arrive at the particular form ( ‣ 2) for the tail bounds by estimating the moment generating function as follows, taking the upper tail as an example: with , we split into two terms and desire each to be bounded by something quadratic in : for the 1st term, using a 2nd order Taylor expansion for the exponential,
[TABLE]
while for the 2nd term, we use integration by parts, eventually showing
[TABLE]
We can show the integrand decays exponentially in using our choice of and the explicit density for the Cauchy distribution:
[TABLE]
with depending on and . We can then combine these estimates and optimize in :
[TABLE]
using .
The tail probabilities now have the form
[TABLE]
for a single corresponding to a single vector . There are at most pairs of points from , so we would want to choose the target dimension as
[TABLE]
to ensure with probability at least ,
[TABLE]
for all pairs of points simultaneously. However, the error and the target dimension both depend on , so we require uniform estimates for these quantities. We find these by breaking up the possible values for into three regimes: small, medium, and big
Our choice of provides an explicit function for , (lemma A.1.1)
[TABLE]
for . The big regime has behaving like the log term, while the medium and small regimes have it behaving like . The choice of also gives us a bound on the variance (corollary A.3.2)
[TABLE]
The constant bound, independent of , is used for the big regime, while the expectation bound provides finer control on the variance when is small, via another explicit function, lemma A.3.3, of .
For the big regime, taking as
[TABLE]
both of which are bounded by lemma A.2.1, together with the constant bound for the variance give theorem 3.0.1, as is bounded here.
For the medium and small regimes, we take and use corollary A.3.4 to bound . The split between medium and small regimes occurs because of the term in that ratio: the target dimension has on the bottom, while the upper tail bound 4.0.1 required to only have on top
[TABLE]
This mismatch in powers of forces us to choose a cutoff ; because (and ) have terms proportional to , the above inequality can only hold for not too small. This gives the term in theorem 3.0.3 for the medium regime.
For the small regime, there is no such restriction on for the lower tail bound 5.0.2, but the target dimension still grows like as decreases (See lemma 3.0.5.). We stop that growth by fixing a particular , showing that for all smaller , the error has a suitable replacement in theorem 3.0.9. The key is lemma 3.0.7: we choose so that with high probability, making both and behave like for with . Because turns out to be , the in the target dimension forces , a quadratic dependence on .
We finish the proofs in the next section, while the upper and lower tail estimates are provided in sections 4 and 5. We collect the estimates on the 1st and 2nd moments in appendix A, and ancillary identities for those estimates in appendix B.
3 Finishing the Proof
We now tie down the target dimension . Recall is a set of points in , and is a matrix of i.i.d. entries. In what follows, the estimates are not sharp.
Theorem 3.0.1** (Big Regime).**
For and ,
[TABLE]
for all with probability at least provided
[TABLE]
Remark 3.0.2*.*
The constants are not expected to be sharp; is computed so that is uniformly bounded with respect to .
Proof.
Let . We want to use the lower and upper tail estimates from lemmas 5.0.1 and 4.0.1, so it remains to verify
[TABLE]
with either
[TABLE]
By lemma A.2.1, the differences are at most for , while the upper bound for the variance of is by corollary A.3.2. Because , we then certainly have .
As explained in section 2, the target dimension is chosen to ensure the union bound is at most for both tails combined. The choice of comes from the lower bound for the ’s from lemma A.2.1 and the larger of the two functions in lemmas 5.0.1 and 4.0.1. ∎
Theorem 3.0.3** (Medium Regime).**
For and ,
[TABLE]
for all with probability at least provided
[TABLE]
Remark 3.0.4*.*
We have not been able to establish an upper bound result
[TABLE]
with high probability when . Our proofs break down or require a much higher estimate for the target dimension . We conjecture that still suffices, in light of theorem 3.0.9 for the small regime.
Proof.
With , we take . By lemma 3.0.5, the lower bound for requires an initial estimate for the target dimension of with the smallest we wish to consider. The upper bound will force our choice of .
We now want to use the upper tail estimate from lemma 4.0.1. It remains to check
[TABLE]
and it suffices to show . With from , we use lemma A.1.2 for the upper bound for to find, after some estimation,
[TABLE]
recalling .
Using the expression for , we now have the following estimate for the target dimension. Because is an estimate on the variance now, we can remove the ’s from corollary A.3.4 to find
[TABLE]
using from remark A.1.3. The dependent term here is enough to ensure , so both sides of the inequality for hold with high probability and this dimension . ∎
The following two lemmas lead to theorem 3.0.9, which shows that a lower bound for continues to hold for all .
Lemma 3.0.5**.**
For and ,
[TABLE]
for all with probability at least provided
[TABLE]
Remark 3.0.6*.*
The estimates are not sharp.
Proof.
With , we take . Using corollary A.3.4 and the lower tail esimate from lemma 5.0.2, the target dimension is
[TABLE]
to ensure the bound holds with probability at least for all pairs of points. ∎
Lemma 3.0.7**.**
For , let . For and , suppose
[TABLE]
and .
Then if , the same also satisfy
[TABLE]
with depending on , , and . If , then we can have
[TABLE]
Remark 3.0.8*.*
Analogous upper bounds are also possible, with a similar proof.
Proof.
A fourth order Taylor expansion with Lagrange remainder shows
[TABLE]
Because and , we invoke the above inequality twice to find
[TABLE]
By assumption, summing over and dividing by yields
[TABLE]
We finish by using remark A.1.3 (twice) to “absorb” into ,
[TABLE]
∎
Theorem 3.0.9** (Small Regime).**
For and all , the following bound holds:
[TABLE]
with probability at least , provided
[TABLE]
Proof.
We can use lemma 3.0.5 with to cover all distances down to . We then choose in order to extend the lower bound to distances smaller than , using lemma 3.0.7.
Concretely, recall from section 2 that because is a linear map of i.i.d. Cauchy entries,
[TABLE]
with the same . Let . To use lemma 3.0.7, we just need to ensure with high probability. By the independence of the ,
[TABLE]
So set . Choosing according to lemma 3.0.5 with , we have the following inequality for the target dimension
[TABLE]
Taking satisfies the above, provided , say. We now have the conditions of lemma 3.0.7 satisfied for all pairs of points, with probability at least , and . ∎
4 Upper Tails
In the following lemmas, the estimates are not sharp.
Lemma 4.0.1** (General Upper Tail).**
With and ,
[TABLE]
and is minimized at with
[TABLE]
Proof.
From the discussion in section 2, we just need to establish the function for the integration by parts terms. To ensure is finite, we require . For and , we then estimate, with ,
[TABLE]
We shall choose and hence a bit later; note that contains the factor .
With , we can now estimate the integration by parts terms
[TABLE]
as at most
[TABLE]
using and . Assuming , we can now write, for a suitable upper bound ,
[TABLE]
and we may optimize in
[TABLE]
at
[TABLE]
It remains to choose and hence . Recalling the formula for either from section 2 or directly from lemma A.1.1, we can lower bound provided . Choosing ,
[TABLE]
∎
5 Lower Tails
Unlike for the upper tails, we can control the lower tails for the full range of . We address bounded away from 0 using the same techniques as for the upper tail. The lower tail proof for small simplifies because is nonnegative, so that there is no restriction on optimizing in the moment generating function.
In the following lemmas, the estimates are not sharp.
Lemma 5.0.1** (Lower Tail, Big Regime).**
With and ,
[TABLE]
and is minimized at with
[TABLE]
Proof.
Just as in the upper tail computations,
[TABLE]
and
[TABLE]
We shall again determine the function by estimating a tail, but now it is the lower tail
[TABLE]
By subadditivity of ,
[TABLE]
We now can estimate
[TABLE]
We can then upper bound the integration by parts terms just like in the proof for the upper tail lemma 4.0.1. Assuming , we choose an upper bound for and arrive at
[TABLE]
To find , note that , so that
[TABLE]
which is bounded for away from 0. ∎
Lemma 5.0.2** (Lower Tail, Small Regimes).**
With and ,
[TABLE]
Proof.
Because is nonnegative, we can use the 2nd order Taylor expansion of to write
[TABLE]
and we can then optimize in the usual way. ∎
Acknowledgments
This work was supported in part by Duke University while completing my Ph.D. thesis. I should like to thank my advisor Professor Sayan Mukherjee for encouraging me in completing this work. I should also like to thank the anonymous referee, whose comments helped greatly streamline the paper. I should like to thank Mom, Dad, Katie, and everyone who has been praying for me throughout my time at Duke. I should finally like to thank the Blessed Virgin Mary, Saint Joseph, and the Holy Trinity for helping me be patient throughout this work.
Appendix A The First and Second Moments
Here we derive the explicit formula for in lemma A.1.1 and the upper bounds for in lemma A.3.2. Some of this work is a bit tedious, but it will allow us to give explicit upper bounds on the target dimension
[TABLE]
We need bounds for when is small (lemma A.1.3) as well as lower bounds for the ’s
[TABLE]
when is “large” (lemma A.2.1).
Because the Cauchy density has particularly simple behavior when extended to the complex plane, we heavily rely on complex analysis techniques. We chose to be the linear combination
[TABLE]
as it will simplify the estimates as well as be easy to compute using a pair of contour integrals. For both moments, the contour integral setup below will greatly facilitate computations; in particular, it will allow us to avoid estimating
[TABLE]
individually, which while possible, is not necessary for our results.
Proposition A.0.1** (Contour Integral Setup).**
For , , and ,
[TABLE]
Remark A.0.2*.*
The task is then to simplify the complex logarithms on the right hand side when particular values of are chosen. We shall choose and in the next sections.
Proof.
We want to compute
[TABLE]
via contour integration. Extending to , let
[TABLE]
which has simple poles at .
We shall compute be using two different contours that both traverse the interval in the positive direction. Specifically, is oriented counterclockwise, while is oriented clockwise, setting
[TABLE]
with “large” arcs
[TABLE]
and segments rotating as to the negative real axis
[TABLE]
Check that
[TABLE]
Keeping in mind the orientations of the contours, the residue theorem dictates for ,
[TABLE]
and similarly
[TABLE]
It remains to show that
[TABLE]
For these integrals, note that
[TABLE]
which approaches when . Consequently, when , we can use the dominated convergence theorem to conclude
[TABLE]
checking that the integrand is bounded by a summable one when say. Sending recovers
[TABLE]
Similar reasoning applies to the integral to yield
[TABLE]
Putting everything together, we have
[TABLE]
as claimed. ∎
A.1 1st Moment
Recall from definition B.1.6 that may be defined by the power series
[TABLE]
Lemma A.1.1**.**
If and , then
[TABLE]
that is,
[TABLE]
Proof.
Starting from lemma A.0.1 with ,
[TABLE]
By lemma B.1.7 and the atanh addition formula B.1.8,
[TABLE]
By remark B.0.10,
[TABLE]
Consequently,
[TABLE]
as claimed. ∎
We use the following lemma to show that as well when is small.
Lemma A.1.2**.**
For ,
[TABLE]
and approaches 0 as . Further, for any ,
[TABLE]
Remark A.1.3*.*
By lemma A.1.1, we now also have the bound
[TABLE]
using twice.
Proof.
The limit for large is immediate. From the power series for , conclude for . We can also give the upper bound
[TABLE]
So,
[TABLE]
Noting that is strictly increasing for , we can fix the term at a particular constant. ∎
A.2 Estimating Deviations of the Mean
We derive the estimates used in the large scale concencentration proofs given above. Both differences
[TABLE]
are controlled by lemma A.2.1 by requiring . Because
[TABLE]
both deviations will be sums of two terms, an term and a term.
Lemma A.2.1**.**
For and ,
[TABLE]
Proof.
We shall show that for , the difference in the terms is nonpositive. We then immediately have the upper bound
[TABLE]
On the other hand, because , the contribution also has the lower bound
[TABLE]
using a 2nd order Taylor series with Lagrange remainder in the last line, recalling here.
For the lower bound for , it remains to control how negative the contribution is. With
[TABLE]
we can use the atanh addition formula B.1.8,
[TABLE]
for , which is the case for us here. After some simplification, we recover
[TABLE]
which is negative for . Because atanh is an odd function, taking it of the above gives a negative contribution for such . Use the AM-GM inequality to upper bound
[TABLE]
then use the estimate
[TABLE]
as the remaining factor is seen to be decreasing for upon taking logarithms. Using , we finally have.
[TABLE]
∎
A.3 2nd Moment
To estimate the 2nd moment , note that for any , the AM-GM inequality gives , so that
[TABLE]
It turns out this last expression also arises from a contour integral.
Lemma A.3.1**.**
If and , then
[TABLE]
with
[TABLE]
Proof.
The computations will be a bit more involved than those for the first moment. Starting from lemma A.0.1 with ,
[TABLE]
that is,
[TABLE]
By lemma A.3.5,
[TABLE]
For the residue terms, we use lemma A.3.6:
[TABLE]
with
[TABLE]
Recalling our computation of in lemma A.1.1, we can further simplify:
[TABLE]
Putting everything together we may conclude
[TABLE]
∎
Corollary A.3.2** (The Variance Is Bounded).**
For and ,
[TABLE]
Proof.
Just note that for ,
[TABLE]
The constant follows from for all , while the bound follows from comparing derivatives, noting that both functions take 0 when . ∎
For quantitative estimates for the 2nd moment and the variance, we make the term explicit in the above bound.
Lemma A.3.3**.**
For and ,
[TABLE]
Proof.
From lemma B.0.1
[TABLE]
We use the reflection formula B.2.1 to expand the dilogarithm terms.
Recall from lemma B.2.1, for ,
[TABLE]
Consequently, using definition B.1.3 for ,
[TABLE]
By lemma B.0.9 (really the remark there) and the definition of arctan,
[TABLE]
Thus,
[TABLE]
∎
Corollary A.3.4**.**
For
[TABLE]
Proof.
By corollary A.3.2 and lemma A.3.3, we have
[TABLE]
because is an alternating series with terms of decreasing magnitude for and that for , is nonnegative. For , we can drop the term for an upper bound. Consequently, using from remark A.1.3,
[TABLE]
for , and
[TABLE]
for . ∎
Lemma A.3.5**.**
For ,
[TABLE]
Proof.
We are adding complex conjugates, so the left hand side is
[TABLE]
∎
Lemma A.3.6**.**
For ,
[TABLE]
with
[TABLE]
Proof.
Using lemma B.1.7,
[TABLE]
and similarly
[TABLE]
Adding yields several terms:
[TABLE]
From lemma A.3.5,
[TABLE]
We also have
[TABLE]
Let
[TABLE]
by the atanh addition formula B.1.8, as are conjugates of each other.
Let
[TABLE]
Then
[TABLE]
So we are left to understand . By lemma A.3.7, it is
[TABLE]
∎
Lemma A.3.7**.**
For ,
[TABLE]
Remark A.3.8*.*
For , we can rewrite the above as
[TABLE]
Proof.
We cannot directly use the atanh addition formula because there is a singularity when crosses 1. However, by definition of atanh B.1.6, we can convert as follows, using
[TABLE]
We now use the inversion formula B.3.1 for .
[TABLE]
The following identity holds
[TABLE]
because both analytic expressions are 0 at , and their derivatives match for . ∎
Appendix B Polylogarithms and Their Friends
The polylogarithms arise when we compute or estimate the first and second moments of the coordinate projections; they will help us give quantitative bounds which are needed in some of the proofs. References for polylogarithms are [11] and [14].
As initial motivation for studying such functions, we have the following lemma.
Lemma B.0.1**.**
Let and . Then for ,
[TABLE]
Proof.
We have
[TABLE]
Change variables and then to find
[TABLE]
Using partial fractions, we may write
[TABLE]
by definition B.0.7. The polylogarithms are defined because , and if , the value at is also defined. ∎
General references for complex analysis are [18] for proofs and [16] for intuition. If with , then and . If , denote for the complex conjugate. Further . Thus, if , we have
[TABLE]
Further, if ,
[TABLE]
For us, analytic functions are synonymous with holomorphic ones. We shall be using two theorems from complex analysis repeatedly. Cf. [18, page 52,96].
Theorem B.0.2** (Analytic Continuation).**
Let and be analytic functions in a connected open subset of . If for all in a non-empty open subset of , then throughout .
Theorem B.0.3** (Primitives).**
Let be an analytic function in a simply connected subset of . Then for , the function
[TABLE]
is analytic too, with any path from to lying in .
Definition B.0.4** (The Logarithm).**
For , define (the principle branch of) the logarithm of , as
[TABLE]
for any path from 1 to in .
Remark B.0.5*.*
Note that . The map takes to itself; for if , with , then which also lives in . With this choice of principle branch, the logarithm still satisfies via
[TABLE]
Similarly, note that if , then with so and
[TABLE]
in this case. However, the general identity does not hold.
Definition B.0.6** (The Polylogarithm of Order 1).**
Define the polylogarithm of order 1, as
[TABLE]
and
[TABLE]
For general , the domain makes sense, as for the in question. Recall when ,
[TABLE]
noting that both sides agree when , and upon differentiating,
[TABLE]
which means and the sum differ by a constant, namely 0.
The order of the polylogarithms may be extended; the general integral form below will be useful for some of the computations later.
Definition B.0.7**.**
For , define the polylogarithm of order as
[TABLE]
and
[TABLE]
for .
The nonintegral order polylogarithms also extend to the unit circle when the order is greater than 1.
Lemma B.0.8**.**
For and with ,
[TABLE]
Proof.
By definition,
[TABLE]
The series is finite because ; concretely, by the integral test (because is convex),
[TABLE]
∎
Lemma B.0.9**.**
For and ,
[TABLE]
If , the equality also holds when .
Remark B.0.10*.*
When , recover
[TABLE]
Proof.
First assume . From the power series,
[TABLE]
Both sides are analytic functions on , so by analytic continuation, the identity continues to hold there. If , the power series are also defined at . ∎
A useful property of the polylogarithms and the logarithm that we shall use repeatedly in computations is that they are all symmetric about the real axis, that is, or concretely
[TABLE]
Powers and polynomials of such functions also have this property. Intuitively this symmetry follows from the real coeffecients in their power series expansions, so that when . Rigorously, we use the Schwarz reflection principle; because is analytic in when and real valued on , may be extended to the rest of in an analytic fashion. Analytic continuation then dictates that this extension coincides with the original definition of . See [18] pages 57-59 for the Schwarz reflection principle, page 56 for showing the integral definitions of are analytic, and page 52 for the principle of analytic continuation.
B.1 Arctan and the Inverse Tangent Integrals
The function is proportional to the distribution function of with . It is then perhaps not surprising that and its relatives arise in working with functions of Cauchy random variables. We outline the properties we shall be using here.
The following definition is opaque but most useful to us.
Definition B.1.1**.**
Define as
[TABLE]
and
[TABLE]
Equivalently,
[TABLE]
Remark B.1.2*.*
From the integral formulation, we also immediately have, with ,
[TABLE]
The last definition for follows from
[TABLE]
and that .
We can generalize.
Definition B.1.3**.**
For and , define the inverse tangent integral of order as
[TABLE]
and
[TABLE]
Remark B.1.4*.*
Note if , we find
[TABLE]
Hence,
[TABLE]
when and . The right hand side continues to make sense for , so we may define
[TABLE]
as an analytic function on that agrees with the power series on the interior of the unit circle.
Remark B.1.5*.*
In particular, we have .
To focus on the behavior of on which was not addressed in the inversion formula B.3.1, we change points of view through a rotation of the complex plane.
Definition B.1.6**.**
Define the function as
[TABLE]
and as
[TABLE]
or equivalently as
[TABLE]
To see that the definitions are consistent, note first from the power series, , while on the other hand,
[TABLE]
Lemma B.1.7**.**
Let then
[TABLE]
Proof.
Just split into even and odd degree terms.
[TABLE]
The equality extends to as both sides are analytic there. We now have
[TABLE]
as desired. ∎
Here is the addition formula.
Lemma B.1.8** (Atanh Addition Formula).**
If ,
[TABLE]
If ,
[TABLE]
Proof.
Because is odd, the addition formula also covers subtraction too. Check that
[TABLE]
So
[TABLE]
with a constant. Taking forces as desired.
For , let
[TABLE]
We want to know when also lies in the domain of atanh. When ,
[TABLE]
by the AM-GM inequality. The equality case occurs just if , but in that case, as is not allowed for . We are thus ok for all in this case.
When , we may consider
[TABLE]
and by symmetry, . So is increasing in each of the individual coordinates. In particular, when ,
[TABLE]
For each permutation of , , and [math], check that
[TABLE]
by the AM-GM inequality, with strict inequality because .
∎
B.2 Dilogarithm Properties
The dilogarithm is the polylogarithm of order 2.
Lemma B.2.1** (Reflection Formula).**
For ,
[TABLE]
Proof.
(Compare to [11, page 5].) Consider
[TABLE]
On the other hand,
[TABLE]
Because the domain is simply connected and the derivative above is analytic there, we have
[TABLE]
for some which we may take to lie on . Taking the limit as is safe, as the Taylor series for ensures , while the dilogarithm is continuous on . Hence,
[TABLE]
as desired. Note that proving the identity via integration by parts has to make this same limiting argument. ∎
B.3 Inversion Formulas
The following lemma allows us to describe the survival function of with in a convenient way. Note that the survival function for will only consider .
Lemma B.3.1**.**
For ,
[TABLE]
Remark B.3.2*.*
On the imaginary axis, and is only defined for so does not make sense there. Consequently the domain in question has two connected components, so different constants should not be unexpected.
Proof.
First note that the left hand side is a constant
[TABLE]
The constant is determined by representative points in the right and left hand planes respectively. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] {barticle} [author] \bauthor \bsnm Ailon, \bfnm Nir \binits N. and \bauthor \bsnm Chazelle, \bfnm Bernard \binits B. ( \byear 2009). \btitle The Fast Johnson-Lindenstrauss Transform and Approximate Nearest Neighbors*. \bjournal SIAM Journal on Computing \bvolume 39 \bpages 302–322. \endbibitem
- 2[2] {barticle} [author] \bauthor \bsnm Brinkman, \bfnm Bo \binits B. and \bauthor \bsnm Charikar, \bfnm Moses \binits M. ( \byear 2005). \btitle On the Impossibility of Dimension Reduction in L 1 subscript 𝐿 1 L_{1} . \bjournal J. ACM \bvolume 52 \bpages 766–788. \bdoi 10.1145/1089023.1089026 \endbibitem
- 3[3] {barticle} [author] \bauthor \bsnm Chambers, \bfnm J. M. \binits J. M., \bauthor \bsnm Mallows, \bfnm C. L. \binits C. L. and \bauthor \bsnm Stuck, \bfnm B. W. \binits B. W. ( \byear 1976). \btitle A Method for Simulating Stable Random Variables. \bjournal Journal of the American Statistical Association \bvolume 71 \bpages 340–344. \bdoi 10.2307/2285309 \endbibitem
- 4[4] {barticle} [author] \bauthor \bsnm Corazza, \bfnm Paul \binits P. ( \byear 1999). \btitle Introduction to Metric-Preserving Functions. \bjournal The American Mathematical Monthly \bvolume 106 \bpages 309–323. \bdoi 10.2307/2589554 \endbibitem
- 5[5] {barticle} [author] \bauthor \bsnm Drineas, \bfnm Petros \binits P., \bauthor \bsnm Magdon-Ismail, \bfnm Malik \binits M., \bauthor \bsnm Mahoney, \bfnm Michael W \binits M. W. and \bauthor \bsnm Woodruff, \bfnm David P \binits D. P. ( \byear 2012). \btitle Fast Approximation of Matrix Coherence and Statistical Leverage. \bjournal Journal of Machine Learning Research \bvolume 13 \bpages 32. \bnote ar Xiv: 1109.3843. \endbibitem
- 6[6] {barticle} [author] \bauthor \bsnm Drineas, \bfnm Petros \binits P. and \bauthor \bsnm Mahoney, \bfnm Michael W. \binits M. W. ( \byear 2016). \btitle Rand NLA: Randomized Numerical Linear Algebra. \bjournal Communications of the ACM \bvolume 59 \bpages 80–90. \bdoi 10.1145/2842602 \endbibitem
- 7[7] {barticle} [author] \bauthor \bsnm Indyk, \bfnm Piotr \binits P. ( \byear 2006). \btitle Stable Distributions, Pseudorandom Generators, Embeddings, and Data Stream Computation. \bjournal J. ACM \bvolume 53 \bpages 307–323. \bdoi 10.1145/1147954.1147955 \endbibitem
- 8[8] {barticle} [author] \bauthor \bsnm Johnson, \bfnm William B. \binits W. B. and \bauthor \bsnm Lindenstrauss, \bfnm Joram \binits J. ( \byear 1984). \btitle Extensions of Lipschitz Mappings into a Hilbert Space. \bjournal Contemporary Mathematics \bvolume 26 \bpages 189–206. \endbibitem
