Moments of the distance between independent random vectors
Assaf Naor, Krzysztof Oleszkiewicz

TL;DR
This paper establishes precise bounds on the moments of the distance between two independent random vectors in a Banach space, advancing understanding of their probabilistic behavior.
Contribution
It introduces new sharp bounds on moments of distances between independent Banach space-valued random vectors, a novel theoretical development.
Findings
Derived sharp bounds for moments of distances
Applicable to various Banach space settings
Enhances probabilistic analysis of random vectors
Abstract
We derive various sharp bounds on moments of the distance between two independent random vectors taking values in a Banach space.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Moments of the Distance Between Independent Random Vectors· youtube
Taxonomy
TopicsProbability and Risk Models · Fuzzy Systems and Optimization · Point processes and geometric inequalities
Moments of the distance between independent random vectors
Assaf Naor and Krzysztof Oleszkiewicz
Abstract.
We derive various sharp bounds on moments of the distance between two independent random vectors taking values in a Banach space.
A.N. was supported by the Packard Foundation and the Simons Foundation. The research that is presented here was conducted under the auspices of the Simons Algorithms and Geometry (A&G) Think Tank. K.O. was partially supported by the National Science Centre, Poland, project number 2012/05/B/ST1/00412.
1. Introduction
Throughout what follows, all Banach spaces are tacitly assumed to be separable. This assumption removes the need to discuss measurability side-issues; alternatively one could consider throughout only the special case of finitely-supported random variables, which captures all of the key ideas. We will also tacitly assume that all Banach spaces are over the complex scalars . This assumption is convenient for the ensuing proofs, but the main statements (namely, those that do not mention complex scalars explicitly) hold over the real scalars as well, through a standard complexification procedure. All the notation and terminology from Banach space theory that occurs below is basic and standard, as in e.g. [15].
Our starting point is the following question. What is the smallest such that for every Banach space and every two independent -valued integrable random vectors we have
[TABLE]
We will reason that (1) holds with , and that is the sharp constant here. More generally, we have the following theorem.
Theorem 1.1**.**
Suppose that and is a Banach space. Let be two independent -valued -integrable random vectors. Then
[TABLE]
The constant in (2) cannot be improved.
The Banach space that exhibits this sharpness of (2) is, of course, a subspace of , but we do not know what is the optimal constant in (2) when itself. More generally, understanding the meaning of the optimal constant in (2) for specific Banach spaces is an interesting question, which we investigate in the rest of the present work for certain special classes of Banach spaces but do not fully resolve.
1.1. Geometric motivation
Our interest in (1) arose from investigations of [1] in the context of Riemannian/Alexandrov geometry. It is well established throughout an extensive geometric literature that a range of useful quadratic distance inequalities for a metric space arise if one imposes bounds on its curvature in the sense of Alexandrov. The term “quadratic” here indicates that these inequalities involve squares of distances between finite point configurations in . A phenomenon that was established in [1] is that any such quadratic metric inequality that holds for every Alexandrov space of nonnegative curvature becomes valid in any metric space whatsoever if one removes the squaring of the distances, i.e., in essence upon “linearization” of the inequality; see [1] for a precise formulation. This led naturally to the question whether the same phenomenon holds for Hadamard spaces (complete simply connected spaces whose Alexandrov curvature in nonpositive); see [1] for an extensive discussion as well as the recent negative resolution of this question in [11]. In the context of a Hadamard space , the analogue of (1) is that independent finitely-supported -valued random variables satisfy
[TABLE]
See [1] for a standard derivation of (3), where is an appropriate “geometric barycenter,” namely it is obtained as the minimizer of the expected squared distance from to . As explained in [1], by using (3) iteratively one can obtain quadratic metric inequalities that hold in any Hadamard space and serve as obstructions for certain geometric embeddings. The “linearized” version of (3), in the case of Banach spaces and allowing for a loss of a factor , is precisely (1). So, in the spirit of [1] it is natural to ask what is the smallest for which it holds. This is what we address here, leading to analytic questions about Banach spaces that are interesting in their own right from the probabilistic and geometric perspective. We note that there are questions along these lines that [1] raises and remain open; see e.g. [1, Question 32].
1.2. Probabilistic discussion
The inequality which reverses (1) holds trivially as a consequence of the triangle inequality, even when and are not necessarily independent. Namely, any satisfy
[TABLE]
So, the above discussion is about the extent to which this use of the triangle inequality can be reversed.
Since the upper bound that we seek is in terms of the distance in between independent copies of and , this can be further used to control from above expressions such as for and not necessarily independent in terms of , where and are independent, has the same distribution as , and has the same distribution as .
In order to analyse the inequality (2) in a specific Banach space , we consider the following geometric moduli. Given let , or simply if the norm is clear from the context, be the infimum over those such that every independent -valued random variables satisfy
[TABLE]
Thus, is precisely the best possible constant in the -analogue of the aforementioned barycentric inequality (3). The use of the letter “” in this notation is in reference to the word “barycentric.” Theorem 1.1 asserts that , and that this bound cannot be improved in general.
Let , or simply if the norm is clear from the context, be the infimum over those such that every independent -valued random variables satisfy
[TABLE]
The use of the letter “” in this notation is in reference to the word “mixture,” since the left-hand side of (5) is equal to , where distributed according to the mixture of the laws of and , namely is the -valued random vector such that for every Borel set ,
[TABLE]
Obviously , because (5) corresponds to choosing in (4).
While we sometimes bound directly, it is beneficial to refine the considerations through the study of two further moduli that are natural in their own right and, as we shall see later, their use can lead to better bounds. Firstly, let , or simply if the norm is clear from the context, be the infimum over those such that every independent -valued random variables satisfy
[TABLE]
where are independent copies of and , respectively. The use of the letter “” in this notation is in reference to the word “roundness,” as we shall next explain.
Observe also that (7) is a purely metric condition, i.e., it involves only distances between points. So, it makes sense to investigate (7) in any metric space , namely to study the inequality
[TABLE]
One requires (8) to hold for -valued independent random variables (say, finitely-supported, to avoid measurability assumptions) such that each of the pairs and is identically distributed.
To the best of our knowledge, condition (8) was first studied systematically by Enflo [10], who defined a metric space to have generalized roundness it it satisfies (8) with . He proved that has generalized roundness for , and ingeniously used this notion to answer an old question of Smirnov. See [9] for a relatively recent example of substantial impact of Enflo’s approach. By combining [14] with [19], a metric space has generalized roundness if and only if embeds isometrically into a Hilbert space. The case of (8) arose in [2] in the context of metric embeddings.
The final geometric modulus that we consider here is a quantity , or simply if the norm is clear from the context, that is defined to be the infimum over those such that every independent and identically distributed -valued random variables satisfy
[TABLE]
Note that (9) holds with by Jensen’s inequality, so we are asking here for an improvement of (this use of) Jensen’s inequality by a definite factor; the letter “” in this notation is in reference to “Jensen.”
We have the following general bounds, which hold for every Banach space and every .
[TABLE]
Indeed, we already observed the first inequality in (10), and the second inequality in (10) is justified by taking independent random variables , considering their mixture as defined in (6), letting be independent copies of , respectively, and proceeding as follows.
[TABLE]
Recalling the definition (5) of , this implies (10).
Here we prove the following bounds on for .
Theorem 1.2**.**
For every we have , where
[TABLE]
We also have , where
[TABLE]
In fact, if , then , if , then , and if . Namely, the above bound on is sharp in the first, second and fifth ranges in (12).
Furthermore, if . More generally, we have the bound
[TABLE]
The upper bound on in (13) improves over (2) when for all values of . It would be interesting to find the exact value of in the entire range . Note that the second quantity in the minimum in the right hand side of (13) corresponds to using (10) together with the bounds on and that Theorem 1.2 provides; when, say, , this quantity is smaller than the first quantity in the minimum in the right hand side of (13) if and only if .
Theorem1.2 states that the constant is sharp in the first, second and fifth ranges in (12). The following conjecture formulates what we expect to be the sharp values of for all .
Conjecture 1.3*.*
For all we have , where
[TABLE]
We will prove later that , so Conjecture (1.3) is about improving our upper bounds on in the remaining third and fourth ranges that appear in (12).
Question 1.4*.*
Below we will obtain improvements over (2) for other spaces besides , including e.g. the Schatten–von Neumann trace classes (see e.g. [20]) . However, parts of Theorem 1.2 rely on “commutative” properties of which are not valid for , thus leading to even better bounds in the commutative setting. It would be especially interesting to obtain sharp bounds in noncommutative probabilistic inequalities such as the roundness inequality (7) when . In particular, we ask what is the value of ? At present, we know (as was already shown by Enflo [10]) that while the only bound that we have for is . Note that is a trivial upper bound here, which holds for every Banach space. Interestingly, it follows from [7] that , as explained in Remark 3.1 below. So, there is a genuine difference between the commutative and noncommutative settings of and , respectively. As a more modest question, is strictly less than ?
1.3. Complex interpolation
We will use basic terminology, notation and results of complex interpolation of Banach spaces; the relevant background appears in [8, 4]. Theorem 1.2 is a special case of the following more general result about interpolation spaces. As such, it applies also to random variables that take values in certain spaces other than , including, for examples, Schatten–von Neumann trace classes (see e.g. [20]) and, by an extrapolation theorem of Pisier[18], Banach lattices of nontrivial type.
Theorem 1.5**.**
Fix and . Let be a compatible pair of Banach spaces such that is a Hilbert space. Then the following estimates hold true.
[TABLE]
Additionally, we have
[TABLE]
(Note that if the first range of values of in the right hand side of (16) is nonempty, then necessarily .)
The deduction of Theorem 1.2 from Theorem 1.5 appears in Section 3 below; in most cases this deduction is nothing more than a direct substitution into Theorem 1.5, but in some cases a further argument is needed. Theorem 1.5 itself is a special case of the following theorem.
Theorem 1.6**.**
Fix and that satisfy . Let be a compatible pair of Banach spaces such that is a Hilbert space. Suppose that and are probability spaces. Then, for every we have
[TABLE]
and
[TABLE]
Furthermore, if , then
[TABLE]
Proof of Theorem 1.5 assuming Theorem 1.6.
Let and be independent -integrable -valued random vectors. Due to the independence assumption, without loss of generality there are probability spaces and such that and are elements of that depend only on the first variable and second variable, respectively. Then (17) and (18) applied to become
[TABLE]
and
[TABLE]
We therefore established the first inequality in (15) as well as the upper bound on that corresponds to the first term in the minimum that appears in (16).
Similarly, due to the fact that and are i.i.d., without loss of generality there is a probability space such that and are elements of that depend only on the first variable and second variable, respectively. Then, (19) applied to simplifies to give
[TABLE]
This establishes the second inequality in (15), as well as the upper bound on that corresponds to the second term in the minimum that appears in (16), due to (10). ∎
The first and third inequalities of Theorem 1.6 are generalizations of results that appeared in the literature. Specifically, (17) generalizes Lemma 6 of [2], and (19) generalizes Lemma 5 of [17], which is itself inspired by a step within the proof of Theorem 2 of [21]. The proof of Theorem 1.6, which appears in Section 3 below, differs from the proofs of [21, 17, 2], but relies on the same ideas.
2. Proof of Theorem 1.1
Let be a Banach space. Fix . Theorem 1.1 asserts that . In fact, , which is stronger by (10). To see this, let be independent random vectors and observe that
[TABLE]
where the penultimate step holds due to the convexity of and the final step holds because, by Jensen’s inequality, both and are at most . The symmetric reasoning with replaced by now gives
[TABLE]
This shows that . It remains to prove that the bound is optimal for general .
Fix an integer and consider
[TABLE]
equipped with supremum norm inherited from . We will prove that
[TABLE]
Denote by the standard coordinate basis of . Define two -element sets by
[TABLE]
and
[TABLE]
Note that and are indeed subsets of because . Let be independent and uniformly distributed on , respectively. One checks that for any and . So, . The desired bound (20) will follow if we demonstrate that
[TABLE]
The proof of (21) proceeds via symmetrization. For permutations , define by
[TABLE]
is a linear isometry of and the sets and are -invariant. Hence, for any ,
[TABLE]
Denoting , it follows from (22) that , because one of the first coordinates of any member of the support of equals . The same argument with replaced by gives that , because now one of the last coordinates of any member of the support of equals . We conclude with the following application of the convexity of .
[TABLE]
Remark 2.1*.*
It is worthwhile to examine what the above argument gives if we take the norm on to be the norm inherited from . One computes that for every and . So,
[TABLE]
Also, it follows from the same reasoning that led to (22) that for every ,
[TABLE]
and
[TABLE]
Hence, using the convexity of the ’th power of the norm on , we see that
[TABLE]
By contrasting (23) with (24) we conclude that
[TABLE]
In particular, if we take and , then we conclude that for some universal constant . So, there is very little potential asymptotic gain (as ) if we know that the Banach space of Theorem 1.1 admits an isometric embedding into .
Above, and in what follows, we stated that a normed space admits an isometric embedding into without specifying whether the embedding is linear or not. Later we will need such embeddings to be linear, so we recall that for any , by a classical differentiation argument (see [3, Chapter 7] for a thorough treatment of such reductions to the linear setting), a normed space embeds isometrically into as a metric space if and only if it admits a linear isometric embedding into .
Note that the phenomenon of Remark 2.1 is special to random variables that have different expectations. Namely, if , then by Jensen’s inequality the ratio that defines is at most rather than the aforementioned exponential growth as . The following proposition shows that if is a subspace of for , then when this ratio is at most , which is easily seen to be best possible (consider any nontrivial symmetric random variable , and take to be identically [math]).
Proposition 2.2**.**
Let be a Banach space that admits an isometric embedding into for some . Then, for any pair of independent -valued random vectors with ,
[TABLE]
Proof.
over embeds isometrically into over (indeed, complex is, as a real Banach space, the same as , so this follows from the fact that Hilbert space is isometric to a subspace of ). So, in Proposition 2.2 we may assume that embeds isometrically into over , and therefore by integration/Fubini it suffices to prove (25) for real-valued random variables. So, our goal is to show that if are independent mean-zero real random variables with , , then
[TABLE]
The bound (25) would then follow by applying (26) to the mean-zero variables and .
Note in passing that the assumption is crucial here, i.e. (26) fails if . Indeed, if and and , then but
[TABLE]
If , then the right hand side of (27) equals for . If , then the right hand side of (27) equals , which is less than for small since for .
To prove (26), for every and , denote Observe that
[TABLE]
Once (28) is proved, (26) would follow because
[TABLE]
where the penultimate step uses the independence of and the last step uses .
It suffices to prove (28) when ; the case follows by passing to the limit. Once checks that
[TABLE]
where the last step holds because is increasing. Hence, is decreasing for and increasing for . One checks that for all , so . Thus is convex for every fixed . But for any , i.e. the tangent to the graph of at is the -axis. Convexity implies that the graph of lies above the -axis, as required. ∎
We end this section with the following simpler metric space counterpart of Theorem 1.1.
Proposition 2.3**.**
Fix and let and be independent finitely supported random variables taking values in a metric space . Then
[TABLE]
The constant in (29) is optimal.
Proof.
Let have the same distribution as and be independent of and . The point-wise inequality
[TABLE]
is a consequence of the triangle inequality and the convexity of . By taking expectations, we obtain , so that
[TABLE]
To see that the constant is optimal, fix and let be the complete bipartite graph , equipped with its shortest-path metric. Equivalently, can be partitioned into two -point subsets , and for distinct we have if or , while otherwise. Let be uniformly distributed over and be uniformly distributed over . Then point-wise. If , then point-wise, while and . Consequently,
[TABLE]
By symmetry, the same holds if . ∎
3. Proof of Theorem 1.6 and its consequences
Here we prove Theorem 1.6 and deduce Theorem 1.2.
Proof of Theorem 1.6.
The assumption implies that for some (unique) . We will fix this value of for the rest of the proof of Theorem 1.6. All of the desired bounds (17), (18), (19) hold true when , namely for every Banach space and every we have
[TABLE]
and
[TABLE]
Furthermore, if , then
[TABLE]
Indeed, (30), (31), (32) are direct consequences of the triangle inequality in and and Jensen’s inequality, with the appropriate interpretation when .
By complex interpolation theory (specifically, by combining [4, Theorem 4.1.2] and [4, Theorem 5.1.2]), Theorem 1.6 will follow if we prove the case of (17), (18), (19). To this end, as is a Hilbert space and the inequalities in question are quadratic, it suffices to prove them coordinate-wise (with respect to any othonormal basis of ), i.e., it suffices to show that for every (-valued) and ,
[TABLE]
and
[TABLE]
and
[TABLE]
The following derivation of the quadratic scalar inequalities (33), (34), (35) is an exercise in linear algebra.
Let and be any orthonormal bases of and , respectively, for which and . Then and are orthonormal bases of and , respectively, where for and one defines (as usual) by setting for . We therefore have the following expansions, in the sense of convergence in and , respectively.
[TABLE]
In particular, by Parseval we have
[TABLE]
Define by
[TABLE]
So, -almost surely R_{\mathcal{X}}f(x,\chi)=\int_{\mathcal{Y}}\big{(}f(x,y)-f(\chi,y)\big{)}{\mathrm{d}}\nu(y). Also, define by
[TABLE]
So, -almost surely R_{\mathcal{Y}}f(y,\upupsilon)=\int_{\mathcal{X}}\big{(}f(x,y)-f(x,\upupsilon)\big{)}{\mathrm{d}}\nu(x). By Parseval in ,
[TABLE]
This is precisely (33).
Next, for every define and by
[TABLE]
and
[TABLE]
In other words, we have the following identities -almost surely and -almost surely, respectively.
[TABLE]
and
[TABLE]
By Parseval in ,
[TABLE]
The case of this inequality is precisely (34). It is worthwhile to note in passing that this reasoning (substituted into the above interpolation argument) yields the following generalization of (18).
[TABLE]
For the justification of the remaining inequality (35), define by
[TABLE]
In other words, -almost surely Tg(\chi)=\int_{\mathcal{X}}\big{(}g(x,\chi)-g(\chi,x)\big{)}{\mathrm{d}}\mu(x). By Parseval in ,
[TABLE]
where in the penultimate step we used the convexity of . This is precisely (35). ∎
We will next deduce Theorem 1.2 from the special case of Theorem 1.6 that we stated as Theorem 1.5.
Proof of Theorem 1.2.
The largest for which and also for some is
[TABLE]
We then have . Note that the quantity that is defined in (11) is equal to .
By (15) with and we have . The matching upper bound holds due to the following quick examples. If is uniformly distributed on , then and . So, . If and and , then for ,
[TABLE]
If and is uniformly distributed over , where is the standard basis of , then
[TABLE]
If are i.i.d. symmetric Bernoulli random variables viewed as elements of , e.g. they can be the coordinate functions in , then let be uniformly distributed over . Then,
[TABLE]
This completes the proof that .
Next, an application of (15) with and gives . In other words,
[TABLE]
for every -integrable independent -valued random variables such that and are identically distributed. The bound (38) coincides with (15), where is as in (12), only in the first two ranges that appear in (12), namely when or when . For the remaining ranges that appear in (12), the bound (38) is inferior to (15), so we reason as follows.
For every satisfying , by [16, Remark 5.10] (the case is an older result [6]) there exists an embedding (given by an explicit formula) such that
[TABLE]
Apply (38) to the -valued random vectors with replaced by and replaced with . The resulting estimate is
[TABLE]
It is in our interest to choose so as to minimize the right hand side of (40). If , then is the optimal choice in (40), and therefore we return to (38). But, if , then is the optimal choice in (40) and we arrive at the following estimate which is better than (38) in the stated range
[TABLE]
The bound (41) covers the third and fourth ranges that appear in (12), as well as the case of the fifth range that appears in (12). However, (41) is inferior to (12) when . When this occurs, use the fact [12] that is isometric to a subspace of and apply the already established case to the -valued random variables , where is any isometric embedding.
We will next prove that , where is given in (14). In particular, this will justify the second sharpness assertion of Theorem 1.2, namely that (38) is sharp when belong to the first, second or fifth ranges that appear in (12). Firstly, by considering the special case of (7) in which are i.i.d., we see that for any Banach space . Next, fix and let be such that and each form a sequence of i.i.d. symmetric Bernoulli random variables, and the supports of are disjoint from the supports of . For example, one could consider them as the elements of that are given by and for each . Let be uniformly distributed over and be uniformly distributed over . Due to the disjointness of the supports, we have point-wise. At the same time, . By letting , this shows that necessarily . Finally, if (7) holds, then in particular it holds for scalar-valued random variables. By integrating, we see that for any Banach space . But, the case of the above discussion gives , as required.
The bound (13) of Theorem 1.2 coincides with (16). When , we have , and thus . It therefore remains to check that when . In fact, for every and every Banach space . Indeed, fix distinct . Let be independent and uniformly distributed over . Then
[TABLE]
where the penultimate step is an application of the convexity of . ∎
Remark 3.1*.*
Fix . Following [7], for denote by and the vectors of real parts and imaginary parts of the entries of , respectively. Let be the area of the parallelogram that is generated by and , i.e.,
[TABLE]
By [7, Lemma 5.2] there is a linear operator from to the space of by complex matrices, such that for any the Schatten-1 norm of the matrix satisfies
[TABLE]
Let be the standard basis of and define matrices by and for . By (42) we have for distinct , while for all . Hence, if we let and be independent and distributed uniformly over and , respectively, and are independent copies of , respectively, then for every we have
[TABLE]
By letting , this implies that . In particular, .
Remark 3.2*.*
Fix . Let be a Banach space. Assume that has a linear subspace that is isometric to (or the Schatten–von Neumann trace class ). If are i.i.d. random variables taking values in , then for as in (11), by Theorem 1.2 we have
[TABLE]
We note that this inequality is optimal despite the fact that the infimum is now taken over in the larger super-space . Indeed, in the proof of Theorem 1.2 the random variables that established optimality of were symmetric when belong to the first three ranges that appear in (11). In these cases, by the convexity of , the infimum in the right had side of (43) is attained at . The fact that the term in the right hand side of (43) cannot be replaced by any value greater than needs the following separate treatment. If and for some with , then . Next, for any we have
[TABLE]
where the final step follows by elementary calculus. Therefore,
[TABLE]
Remark 3.3*.*
An extrapolation theorem of Pisier [18] asserts that if is a Banach lattice that is both -convex with constant and -concave with constant , where , then there exists a Banach lattice , a Hilbert space , and such that is isometric to the complex interpolation space . Hence, Theorem 1.5 applies in this setting, implying in particular that there is , namely , such that every i.i.d. -valued random variables satisfy
[TABLE]
We will conclude by discussing further bounds in the non-convex range , as well as their limit when . When , the topological vector space is not a normed space. Despite this, when we say that a normed space admits a linear isometric emebdding into we mean (as usual) that there exists a linear mapping such that for all . This of course forces the quasi-norm to induce a metric on the image of , so the use of the term “isometric” is not out of place here, though note that it is inconsistent with the standard metric on , which is given by for all . The following proposition treats the case , though later we will mainly be interested in the non-convex range . Note that the case implies the stated inequalities for, say, any two-dimensional normed space, since any such space admits [5] an isometric embedding into .
Proposition 3.4**.**
Let be a Banach space that admits an isometric linear embedding into for some . Let be independent -valued random vectors such that has the same distribution as and has the same distribution as . Then,
[TABLE]
and
[TABLE]
The constants and in (44) and (45), respectively, cannot be improved.
Proof.
By [19, 6] there is a mapping such that for all . By the (trivial) Hilbertian case of Theorem 1.2 applied to the -valued random vectors ,
[TABLE]
This substantiates (44). When we cannot proceed from here to prove (45) by considering the analogue of the mixture constant , namely by bounding the left hand side of (5) as we did in the Introduction, since the present integrability assumption on does not imply that and are well-defined elements of . Instead, let be independent of and distributed according to the mixture of the laws of and , as in (6). The point will be chosen randomly according to , i.e.,
[TABLE]
For we have by (44), and by Theorem 1.2, so , by (10).
The sharpness of (44) is seen by taking and to be identically distributed. When , we already saw in the proof of Theorem 1.2 that for any Banach space ; thus (45) is sharp in this range. The same reasoning as in the proof of Theorem 1.2 shows that the factor in (45) cannot be improved in the non-convex range as well. Indeed, fix with and let and be uniformly distributed over . Then, for every , while . ∎
Proposition 3.5 below is the limit of Proposition 3.4 as . While it is possible to deduce it formally from Proposition 3.4 by passing to the limit, a justification of this fact is quite complicated due to the singularity of the logarithm at zero. We will instead proceed via a shorter alternative approach.
Following [13], a real Banach space is said to admit a linear isometric embedding into if there exists a probability space and a linear operator , where denotes the space of (equivalence classes of) real-valued -measurable functions on , such that
[TABLE]
As shown in [13], every three-dimensional real normed space admits a linear isometric embedding into , so in particular the following proposition applies to any such space.
Proposition 3.5**.**
Let be a real Banach space that admits a linear isometric embedding into . Let be independent -valued random vectors such that has the same distribution as and has the same distribution as . Assume that and . Then,
[TABLE]
and
[TABLE]
The multiplicative constant in both of these inequalities is optimal.
Proof.
(49) is a consequence of (48) by reasoning analogously to (46). Due to the assumed representation (47), by Fubini’s theorem it suffices to prove (48) for real-valued random variables.
So, suppose that are independent real-valued random variables such that and . Note that every nonnegative random variable with satisfies
[TABLE]
Indeed, for every with we have
[TABLE]
so that (50) follows by applying this identity and the Fubini theorem separately on each of the events and , taking advantage of the fact that is of constant sign on both events.
Let be independent random variables whose law is the mixture of the laws of as in (6). the desired inequality (48) is equivalent to the assertion that . By two applications of (50), once with and once with , it suffices to prove that
[TABLE]
This is so because, using the formula for the Fourier transform of the Gaussian density, we have
[TABLE]
where (51) uses Fubini and the independence of and , (52) uses the fact that for all we have , the first step of (53) uses the independence of and , and the last step of (53) uses once more the formula for the Fourier transform of the Gaussian density.
The fact that (48) is sharp follows by considering the case when are i.i.d. and non-atomic. Note that when both and have an atom at the same point, both sides of (49) equal [math]. The example considered in the proof of Proposition 3.4 when is therefore of no use for establishing the optimality of (49), due to the atomic nature of the distributions under consideration. Instead, for an arbitrary such that , let us consider random vectors and , where and are independent random variables uniformly distributed on .
Observe that for every we have
[TABLE]
where the last step of (54) holds because, by periodicity, has the same distribution as .
The case of (54) simplifies to give . Hence, (54) becomes
[TABLE]
Consequently,
[TABLE]
Indeed, if , then one can write for some , so that by (55) the inequality in (56) holds as equality. If , then for all , thus implying (56). It also follows from (55) that
[TABLE]
Next, by the Hahn–Banach theorem, take such that and . For any ,
[TABLE]
This implies the asserted sharpness of (49). Note that the above argument that (49) cannot hold with a multiplicative constant less than in the right hand side worked for any Banach space whatsoever. ∎
Acknowledgements
We are grateful to Oded Regev for pointing us to [7, Lemma 5.2] and for significantly simplifying our initial reasoning for the statement that is proved in Remark 3.1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. Andoni, A. Naor, and O. Neiman. Snowflake universality of Wasserstein spaces. Ann. Sci. Éc. Norm. Supér. (4) , 51(3):657–700, 2018.
- 2[2] Y. Bartal, N. Linial, M. Mendel, and A. Naor. Some low distortion metric Ramsey problems. Discrete Comput. Geom. , 33(1):27–41, 2005.
- 3[3] Y. Benyamini and J. Lindenstrauss. Geometric nonlinear functional analysis. Vol. 1 , volume 48 of American Mathematical Society Colloquium Publications . American Mathematical Society, Providence, RI, 2000.
- 4[4] J. Bergh and J. Löfström. Interpolation spaces. An introduction . Springer-Verlag, Berlin-New York, 1976. Grundlehren der Mathematischen Wissenschaften, No. 223.
- 5[5] E. D. Bolker. A class of convex bodies. Trans. Amer. Math. Soc. , 145:323–345, 1969.
- 6[6] J. Bretagnolle, D. Dacunha-Castelle, and J.-L. Krivine. Lois stables et espaces L p superscript 𝐿 𝑝 L^{p} . Ann. Inst. H. Poincaré Sect. B (N.S.) , 2:231–259, 1965/1966.
- 7[7] J. Briët, O. Regev, and R. Saket. Tight hardness of the non-commutative Grothendieck problem. Theory Comput. , 13:Paper No. 15, 24, 2017.
- 8[8] A.-P. Calderón. Intermediate spaces and interpolation, the complex method. Studia Math. , 24:113–190, 1964.
