Moments of the distance between independent random vectors

Assaf Naor; Krzysztof Oleszkiewicz

arXiv:1905.01274·math.FA·May 6, 2019

Moments of the distance between independent random vectors

Assaf Naor, Krzysztof Oleszkiewicz

PDF

Open Access 1 Video

TL;DR

This paper establishes precise bounds on the moments of the distance between two independent random vectors in a Banach space, advancing understanding of their probabilistic behavior.

Contribution

It introduces new sharp bounds on moments of distances between independent Banach space-valued random vectors, a novel theoretical development.

Findings

01

Derived sharp bounds for moments of distances

02

Applicable to various Banach space settings

03

Enhances probabilistic analysis of random vectors

Abstract

We derive various sharp bounds on moments of the distance between two independent random vectors taking values in a Banach space.

Equations209

z \in F in f E [∥ X - z ∥_{F}^{p} + ∥ Y - z ∥_{F}^{p}] ⩽ C E [∥ X - Y ∥_{F}^{p}] ?

z \in F in f E [∥ X - z ∥_{F}^{p} + ∥ Y - z ∥_{F}^{p}] ⩽ C E [∥ X - Y ∥_{F}^{p}] ?

z \in F in f E [∥ X - z ∥_{F}^{p} + ∥ Y - z ∥_{F}^{p}] ⩽ \frac{3 ^{p}}{2 ^{p - 1}} E [∥ X - Y ∥_{F}^{p}] .

z \in F in f E [∥ X - z ∥_{F}^{p} + ∥ Y - z ∥_{F}^{p}] ⩽ \frac{3 ^{p}}{2 ^{p - 1}} E [∥ X - Y ∥_{F}^{p}] .

z \in M in f E [d_{M} (X, z)^{2} + d_{M} (Y, z)^{2}] ⩽ E [d_{M} (X, Y)^{2}] .

z \in M in f E [d_{M} (X, z)^{2} + d_{M} (Y, z)^{2}] ⩽ E [d_{M} (X, Y)^{2}] .

E [∥ X - Y ∥_{F}^{p}] ⩽ z \in F in f E [∥ X - z ∥_{F}^{p} + ∥ Y - z ∥_{F}^{p}] .

E [∥ X - Y ∥_{F}^{p}] ⩽ z \in F in f E [∥ X - z ∥_{F}^{p} + ∥ Y - z ∥_{F}^{p}] .

z \in F in f E [∥ X - z ∥_{F}^{p} + E ∥ Y - z ∥_{F}^{p}] ⩽ b E [∥ X - Y ∥_{F}^{p}] .

z \in F in f E [∥ X - z ∥_{F}^{p} + E ∥ Y - z ∥_{F}^{p}] ⩽ b E [∥ X - Y ∥_{F}^{p}] .

{\mathbb{E}}\left[\bigg{\|}X-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}_{\!F}^{p}+\bigg{\|}Y-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}_{\!F}^{p}\right]\leqslant\boldsymbol{\mathcal{m}}{\mathbb{E}}\left[\|X-Y\|_{\!F}^{p}\right].

{\mathbb{E}}\left[\bigg{\|}X-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}_{\!F}^{p}+\bigg{\|}Y-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}_{\!F}^{p}\right]\leqslant\boldsymbol{\mathcal{m}}{\mathbb{E}}\left[\|X-Y\|_{\!F}^{p}\right].

P [Z \in A] = \frac{1}{2} P [X \in A] + \frac{1}{2} P [Y \in A]

P [Z \in A] = \frac{1}{2} P [X \in A] + \frac{1}{2} P [Y \in A]

E [∥ X - X^{'} ∥_{F}^{p}] + E [∥ Y - Y^{'} ∥_{F}^{p}] ⩽ r E [∥ X - Y ∥_{F}^{p}],

E [∥ X - X^{'} ∥_{F}^{p}] + E [∥ Y - Y^{'} ∥_{F}^{p}] ⩽ r E [∥ X - Y ∥_{F}^{p}],

E [d_{M} (X, X^{'})^{p}] + E [d_{M} (Y, Y^{'})^{p}] ⩽ r E [d_{M} (X, Y)^{p}] .

E [d_{M} (X, X^{'})^{p}] + E [d_{M} (Y, Y^{'})^{p}] ⩽ r E [d_{M} (X, Y)^{p}] .

j E [∥ Z - E [Z] ∥_{F}^{p}] ⩽ E [∥ Z - Z^{'} ∥_{F}^{p}],

j E [∥ Z - E [Z] ∥_{F}^{p}] ⩽ E [∥ Z - Z^{'} ∥_{F}^{p}],

b_{p} (F) ⩽ m_{p} (F) ⩽ \frac{2 + r _{p} ( F )}{2 j _{p} ( F )} .

b_{p} (F) ⩽ m_{p} (F) ⩽ \frac{2 + r _{p} ( F )}{2 j _{p} ( F )} .

{\mathbb{E}}\left[\bigg{\|}X-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}_{\!F}^{p}+\bigg{\|}Y-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}_{\!F}^{p}\right]\stackrel{{\scriptstyle\eqref{eq:def mixture}}}{{=}}2{\mathbb{E}}\left[\|Z-{\mathbb{E}}[Z]\|_{\!F}^{p}\right]\stackrel{{\scriptstyle\eqref{eq:variance}}}{{\leqslant}}\frac{2}{\boldsymbol{\mathcal{j}}_{p}(F)}{\mathbb{E}}\left[\|Z-Z^{\prime}\|_{\!F}^{p}\right]\\ \stackrel{{\scriptstyle\eqref{eq:def mixture}}}{{=}}\frac{2}{\boldsymbol{\mathcal{j}}_{p}(F)}\left(\frac{1}{2}{\mathbb{E}}\left[\|X-Y\|_{\!F}^{p}\right]+\frac{1}{4}{\mathbb{E}}\left[\|X-X^{\prime}\|_{\!F}^{p}\right]+\frac{1}{4}{\mathbb{E}}\left[\|Y-Y^{\prime}\|_{\!F}^{p}\right]\right)\leqslant\frac{2}{\boldsymbol{\mathcal{j}}_{p}(F)}\left(\frac{1}{2}+\frac{1}{4}\boldsymbol{\mathcal{r}}_{p}(F)\right){\mathbb{E}}\left[\|X-Y\|_{\!F}^{p}\right].

{\mathbb{E}}\left[\bigg{\|}X-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}_{\!F}^{p}+\bigg{\|}Y-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}_{\!F}^{p}\right]\stackrel{{\scriptstyle\eqref{eq:def mixture}}}{{=}}2{\mathbb{E}}\left[\|Z-{\mathbb{E}}[Z]\|_{\!F}^{p}\right]\stackrel{{\scriptstyle\eqref{eq:variance}}}{{\leqslant}}\frac{2}{\boldsymbol{\mathcal{j}}_{p}(F)}{\mathbb{E}}\left[\|Z-Z^{\prime}\|_{\!F}^{p}\right]\\ \stackrel{{\scriptstyle\eqref{eq:def mixture}}}{{=}}\frac{2}{\boldsymbol{\mathcal{j}}_{p}(F)}\left(\frac{1}{2}{\mathbb{E}}\left[\|X-Y\|_{\!F}^{p}\right]+\frac{1}{4}{\mathbb{E}}\left[\|X-X^{\prime}\|_{\!F}^{p}\right]+\frac{1}{4}{\mathbb{E}}\left[\|Y-Y^{\prime}\|_{\!F}^{p}\right]\right)\leqslant\frac{2}{\boldsymbol{\mathcal{j}}_{p}(F)}\left(\frac{1}{2}+\frac{1}{4}\boldsymbol{\mathcal{r}}_{p}(F)\right){\mathbb{E}}\left[\|X-Y\|_{\!F}^{p}\right].

c(p,q)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\min\left\{1,p-1,\frac{p}{q},\frac{p(q-1)}{q}\right\}=\left\{\begin{array}[]{ll}p-1&\mathrm{if}\ 1\leqslant p\leqslant q\leqslant 2\ \mathrm{or}\ 1\leqslant p\leqslant\frac{q}{q-1}\leqslant 2,\\ \frac{p(q-1)}{q}&\mathrm{if}\ q\leqslant p\leqslant\frac{q}{q-1},\\ \frac{p}{q}&\mathrm{if}\ \frac{q}{q-1}\leqslant p\leqslant q,\\ 1&\mathrm{if}\ p\geqslant\frac{q}{q-1}\geqslant 2\ \mathrm{or}\ p\geqslant q\geqslant 2.\end{array}\right.

c(p,q)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\min\left\{1,p-1,\frac{p}{q},\frac{p(q-1)}{q}\right\}=\left\{\begin{array}[]{ll}p-1&\mathrm{if}\ 1\leqslant p\leqslant q\leqslant 2\ \mathrm{or}\ 1\leqslant p\leqslant\frac{q}{q-1}\leqslant 2,\\ \frac{p(q-1)}{q}&\mathrm{if}\ q\leqslant p\leqslant\frac{q}{q-1},\\ \frac{p}{q}&\mathrm{if}\ \frac{q}{q-1}\leqslant p\leqslant q,\\ 1&\mathrm{if}\ p\geqslant\frac{q}{q-1}\geqslant 2\ \mathrm{or}\ p\geqslant q\geqslant 2.\end{array}\right.

C(p,q)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\left\{\begin{array}[]{ll}p-1&\mathrm{if}\ \frac{p}{p-1}\leqslant q\leqslant p,\\ \frac{p(q-2)}{q}+1&\mathrm{if}\ \frac{q}{q-1}\leqslant p\leqslant q,\\ 2-\frac{p}{q}&\mathrm{if}\ q\geqslant 2\ \mathrm{and}\ 1\leqslant p\leqslant\frac{q}{q-1},\\ \frac{p}{q}&\mathrm{if}\ q\leqslant 2\ \mathrm{and}\ q\leqslant p\leqslant\frac{q}{q-1},\\ 1&\mathrm{if}\ 1\leqslant p\leqslant q\leqslant 2.\end{array}\right.

C(p,q)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\left\{\begin{array}[]{ll}p-1&\mathrm{if}\ \frac{p}{p-1}\leqslant q\leqslant p,\\ \frac{p(q-2)}{q}+1&\mathrm{if}\ \frac{q}{q-1}\leqslant p\leqslant q,\\ 2-\frac{p}{q}&\mathrm{if}\ q\geqslant 2\ \mathrm{and}\ 1\leqslant p\leqslant\frac{q}{q-1},\\ \frac{p}{q}&\mathrm{if}\ q\leqslant 2\ \mathrm{and}\ q\leqslant p\leqslant\frac{q}{q-1},\\ 1&\mathrm{if}\ 1\leqslant p\leqslant q\leqslant 2.\end{array}\right.

b_{p} (L_{q}) ⩽ m_{p} (L_{q}) ⩽ min ⎩ ⎨ ⎧ \frac{3 ^{p}}{2 ^{p - 1}} (\frac{2}{3})^{2 c (p, q)}, \frac{2 ^{C (p, q)} + 2}{2 ^{c (p, q) + 1}} ⎭ ⎬ ⎫ .

b_{p} (L_{q}) ⩽ m_{p} (L_{q}) ⩽ min ⎩ ⎨ ⎧ \frac{3 ^{p}}{2 ^{p - 1}} (\frac{2}{3})^{2 c (p, q)}, \frac{2 ^{C (p, q)} + 2}{2 ^{c (p, q) + 1}} ⎭ ⎬ ⎫ .

C_{\mathrm{opt}}(p,q)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\max\left\{1,p-1,\frac{p(q-2)}{q}+1\right\}=\left\{\begin{array}[]{ll}p-1&\mathrm{if}\ p\geqslant 2\ \mathrm{and}\ 1\leqslant q\leqslant p,\\ \frac{p(q-2)}{q}+1&\mathrm{if}\ q\geqslant 2\ \mathrm{and}\ 1\leqslant p\leqslant q,\\ 1&\mathrm{if}\ p,q\in[1,2].\end{array}\right.

C_{\mathrm{opt}}(p,q)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\max\left\{1,p-1,\frac{p(q-2)}{q}+1\right\}=\left\{\begin{array}[]{ll}p-1&\mathrm{if}\ p\geqslant 2\ \mathrm{and}\ 1\leqslant q\leqslant p,\\ \frac{p(q-2)}{q}+1&\mathrm{if}\ q\geqslant 2\ \mathrm{and}\ 1\leqslant p\leqslant q,\\ 1&\mathrm{if}\ p,q\in[1,2].\end{array}\right.

r_{p} ([F, H]_{θ}) ⩽ 2^{1 + (1 - θ) p} and j_{p} ([F, H]_{θ}) ⩾ 2^{\frac{θ p}{2}} .

r_{p} ([F, H]_{θ}) ⩽ 2^{1 + (1 - θ) p} and j_{p} ([F, H]_{θ}) ⩾ 2^{\frac{θ p}{2}} .

\boldsymbol{\mathcal{b}}_{p}([F,H]_{\theta})\leqslant\boldsymbol{\mathcal{m}}_{p}([F,H]_{\theta})\leqslant\min\left\{\frac{3^{p}}{2^{p-1}}\left(\frac{\sqrt{2}}{3}\right)^{p\theta},\frac{1+2^{(1-\theta)p}}{2^{\frac{\theta p}{2}}}\right\}=\left\{\begin{array}[]{ll}\frac{3^{p}}{2^{p-1}}\left(\frac{\sqrt{2}}{3}\right)^{p\theta}&\mathrm{if\ }\frac{1}{1-\theta}\leqslant p\leqslant\frac{2}{\theta},\\ \frac{1+2^{(1-\theta)p}}{2^{\frac{\theta p}{2}}}&\mathrm{if\ }\frac{2}{2-\theta}\leqslant p\leqslant\frac{1}{1-\theta}.\end{array}\right.

\boldsymbol{\mathcal{b}}_{p}([F,H]_{\theta})\leqslant\boldsymbol{\mathcal{m}}_{p}([F,H]_{\theta})\leqslant\min\left\{\frac{3^{p}}{2^{p-1}}\left(\frac{\sqrt{2}}{3}\right)^{p\theta},\frac{1+2^{(1-\theta)p}}{2^{\frac{\theta p}{2}}}\right\}=\left\{\begin{array}[]{ll}\frac{3^{p}}{2^{p-1}}\left(\frac{\sqrt{2}}{3}\right)^{p\theta}&\mathrm{if\ }\frac{1}{1-\theta}\leqslant p\leqslant\frac{2}{\theta},\\ \frac{1+2^{(1-\theta)p}}{2^{\frac{\theta p}{2}}}&\mathrm{if\ }\frac{2}{2-\theta}\leqslant p\leqslant\frac{1}{1-\theta}.\end{array}\right.

\displaystyle\begin{split}2^{1+(1-\theta)p}\iint_{\mathcal{X}\times\mathcal{Y}}\|f(x,y)\|_{\![F,H]_{\theta}}^{p}{\mathrm{d}}\mu(x)&{\mathrm{d}}\nu(y)\geqslant\iint_{\mathcal{X}\times\mathcal{X}}\bigg{\|}\int_{\mathcal{Y}}\big{(}f(x,y)-f(\chi,y)\big{)}{\mathrm{d}}\nu(y)\bigg{\|}_{\![F,H]_{\theta}}^{p}{\mathrm{d}}\mu(x){\mathrm{d}}\mu(\chi)\\ &\qquad\quad\ \ +\iint_{\mathcal{Y}\times\mathcal{Y}}\bigg{\|}\int_{\mathcal{X}}\big{(}f(x,y)-f(x,\upupsilon)\big{)}{\mathrm{d}}\mu(x)\bigg{\|}_{\![F,H]_{\theta}}^{p}{\mathrm{d}}\nu(y){\mathrm{d}}\nu(\upupsilon),\end{split}

\displaystyle\begin{split}2^{1+(1-\theta)p}\iint_{\mathcal{X}\times\mathcal{Y}}\|f(x,y)\|_{\![F,H]_{\theta}}^{p}{\mathrm{d}}\mu(x)&{\mathrm{d}}\nu(y)\geqslant\iint_{\mathcal{X}\times\mathcal{X}}\bigg{\|}\int_{\mathcal{Y}}\big{(}f(x,y)-f(\chi,y)\big{)}{\mathrm{d}}\nu(y)\bigg{\|}_{\![F,H]_{\theta}}^{p}{\mathrm{d}}\mu(x){\mathrm{d}}\mu(\chi)\\ &\qquad\quad\ \ +\iint_{\mathcal{Y}\times\mathcal{Y}}\bigg{\|}\int_{\mathcal{X}}\big{(}f(x,y)-f(x,\upupsilon)\big{)}{\mathrm{d}}\mu(x)\bigg{\|}_{\![F,H]_{\theta}}^{p}{\mathrm{d}}\nu(y){\mathrm{d}}\nu(\upupsilon),\end{split}

\displaystyle\begin{split}\frac{3^{p}}{2^{p-1}}&\left(\frac{\sqrt{2}}{3}\right)^{p\theta}\iint_{\mathcal{X}\times\mathcal{Y}}\|f(x,y)\|_{\![F,H]_{\theta}}^{p}{\mathrm{d}}\mu(x){\mathrm{d}}\nu(y)\\ &\geqslant\int_{\mathcal{X}}\bigg{\|}\int_{\mathcal{Y}}f(x,y){\mathrm{d}}\nu(y)-\frac{1}{2}\iint_{\mathcal{X}\times\mathcal{Y}}f(\chi,\upupsilon){\mathrm{d}}\mu(\chi){\mathrm{d}}\nu(\upupsilon)\bigg{\|}_{\![F,H]_{\theta}}^{p}{\mathrm{d}}\mu(x)\\ &\qquad+\int_{\mathcal{Y}}\bigg{\|}\int_{\mathcal{X}}f(x,y){\mathrm{d}}\mu(x)-\frac{1}{2}\iint_{\mathcal{X}\times\mathcal{Y}}f(\chi,\upupsilon){\mathrm{d}}\mu(\chi){\mathrm{d}}\nu(\upupsilon)\bigg{\|}_{\![F,H]_{\theta}}^{p}{\mathrm{d}}\nu(y).\end{split}

\displaystyle\begin{split}\frac{3^{p}}{2^{p-1}}&\left(\frac{\sqrt{2}}{3}\right)^{p\theta}\iint_{\mathcal{X}\times\mathcal{Y}}\|f(x,y)\|_{\![F,H]_{\theta}}^{p}{\mathrm{d}}\mu(x){\mathrm{d}}\nu(y)\\ &\geqslant\int_{\mathcal{X}}\bigg{\|}\int_{\mathcal{Y}}f(x,y){\mathrm{d}}\nu(y)-\frac{1}{2}\iint_{\mathcal{X}\times\mathcal{Y}}f(\chi,\upupsilon){\mathrm{d}}\mu(\chi){\mathrm{d}}\nu(\upupsilon)\bigg{\|}_{\![F,H]_{\theta}}^{p}{\mathrm{d}}\mu(x)\\ &\qquad+\int_{\mathcal{Y}}\bigg{\|}\int_{\mathcal{X}}f(x,y){\mathrm{d}}\mu(x)-\frac{1}{2}\iint_{\mathcal{X}\times\mathcal{Y}}f(\chi,\upupsilon){\mathrm{d}}\mu(\chi){\mathrm{d}}\nu(\upupsilon)\bigg{\|}_{\![F,H]_{\theta}}^{p}{\mathrm{d}}\nu(y).\end{split}

2^{\left(1-\frac{\theta}{2}\right)p}\iint_{\mathcal{X}\times\mathcal{X}}\|g(x,\chi)\|_{\![F,H]_{\theta}}^{p}{\mathrm{d}}\mu(x){\mathrm{d}}\mu(\chi)\geqslant\int_{\mathcal{X}}\bigg{\|}\int_{\mathcal{X}}\big{(}g(x,\chi)-g(\chi,x)\big{)}{\mathrm{d}}\mu(x)\bigg{\|}_{\![F,H]_{\theta}}^{p}{\mathrm{d}}\mu(\chi).

2^{\left(1-\frac{\theta}{2}\right)p}\iint_{\mathcal{X}\times\mathcal{X}}\|g(x,\chi)\|_{\![F,H]_{\theta}}^{p}{\mathrm{d}}\mu(x){\mathrm{d}}\mu(\chi)\geqslant\int_{\mathcal{X}}\bigg{\|}\int_{\mathcal{X}}\big{(}g(x,\chi)-g(\chi,x)\big{)}{\mathrm{d}}\mu(x)\bigg{\|}_{\![F,H]_{\theta}}^{p}{\mathrm{d}}\mu(\chi).

E [∥ X - X^{'} ∥_{[F, H]_{θ}}^{p}] + E [∥ Y - Y^{'} ∥_{[F, H]_{θ}}^{p}] ⩽ 2^{1 + (1 - θ) p} E [∥ X - Y ∥_{[F, H]_{θ}}^{p}],

E [∥ X - X^{'} ∥_{[F, H]_{θ}}^{p}] + E [∥ Y - Y^{'} ∥_{[F, H]_{θ}}^{p}] ⩽ 2^{1 + (1 - θ) p} E [∥ X - Y ∥_{[F, H]_{θ}}^{p}],

{\mathbb{E}}\left[\bigg{\|}X-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}^{p}_{\![F,H]_{\theta}}\right]+{\mathbb{E}}\left[\bigg{\|}Y-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}^{p}_{\![F,H]_{\theta}}\right]\leqslant\frac{3^{p}}{2^{p-1}}\left(\frac{\sqrt{2}}{3}\right)^{p\theta}{\mathbb{E}}\left[\left\|X-Y\right\|^{p}_{\![F,H]_{\theta}}\right].

{\mathbb{E}}\left[\bigg{\|}X-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}^{p}_{\![F,H]_{\theta}}\right]+{\mathbb{E}}\left[\bigg{\|}Y-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}^{p}_{\![F,H]_{\theta}}\right]\leqslant\frac{3^{p}}{2^{p-1}}\left(\frac{\sqrt{2}}{3}\right)^{p\theta}{\mathbb{E}}\left[\left\|X-Y\right\|^{p}_{\![F,H]_{\theta}}\right].

E [∥ X - X^{'} ∥_{[F, H]_{θ}}^{p}] ⩾ 2^{\frac{θ p}{2}} E [∥ X - E [X] ∥_{[F, H]_{θ}}^{p}] .

E [∥ X - X^{'} ∥_{[F, H]_{θ}}^{p}] ⩾ 2^{\frac{θ p}{2}} E [∥ X - E [X] ∥_{[F, H]_{θ}}^{p}] .

\displaystyle\begin{split}{\mathbb{E}}\left[\bigg{\|}X-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}_{\!F}^{p}\right]&=\frac{3^{p}}{2^{p}}{\mathbb{E}}\left[\bigg{\|}\frac{2}{3}\left(X-{\mathbb{E}}[Y]\right)+\frac{1}{3}\left({\mathbb{E}}[Y]-{\mathbb{E}}[X]\right)\bigg{\|}_{\!F}^{p}\right]\\ &\leqslant\frac{3^{p}}{2^{p}}\left(\frac{2}{3}{\mathbb{E}}\left[\left\|X-{\mathbb{E}}[Y]\right\|_{\!F}^{p}\right]+\frac{1}{3}{\mathbb{E}}\left[\left\|{\mathbb{E}}[Y]-{\mathbb{E}}[X]\right\|_{\!F}^{p}\right]\right)\leqslant\frac{3^{p}}{2^{p}}{\mathbb{E}}\left[\left\|X-Y\right\|_{\!F}^{p}\right],\end{split}

\displaystyle\begin{split}{\mathbb{E}}\left[\bigg{\|}X-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}_{\!F}^{p}\right]&=\frac{3^{p}}{2^{p}}{\mathbb{E}}\left[\bigg{\|}\frac{2}{3}\left(X-{\mathbb{E}}[Y]\right)+\frac{1}{3}\left({\mathbb{E}}[Y]-{\mathbb{E}}[X]\right)\bigg{\|}_{\!F}^{p}\right]\\ &\leqslant\frac{3^{p}}{2^{p}}\left(\frac{2}{3}{\mathbb{E}}\left[\left\|X-{\mathbb{E}}[Y]\right\|_{\!F}^{p}\right]+\frac{1}{3}{\mathbb{E}}\left[\left\|{\mathbb{E}}[Y]-{\mathbb{E}}[X]\right\|_{\!F}^{p}\right]\right)\leqslant\frac{3^{p}}{2^{p}}{\mathbb{E}}\left[\left\|X-Y\right\|_{\!F}^{p}\right],\end{split}

{\mathbb{E}}\left[\bigg{\|}X-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}_{\!F}^{p}\right]+{\mathbb{E}}\left[\bigg{\|}Y-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}_{\!F}^{p}\right]\\ \leqslant 2\max\left\{{\mathbb{E}}\left[\bigg{\|}X-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}_{\!F}^{p}\right],{\mathbb{E}}\left[\bigg{\|}Y-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}_{\!F}^{p}\right]\right\}\leqslant\frac{3^{p}}{2^{p-1}}{\mathbb{E}}\left[\left\|X-Y\right\|_{\!F}^{p}\right].

{\mathbb{E}}\left[\bigg{\|}X-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}_{\!F}^{p}\right]+{\mathbb{E}}\left[\bigg{\|}Y-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}_{\!F}^{p}\right]\\ \leqslant 2\max\left\{{\mathbb{E}}\left[\bigg{\|}X-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}_{\!F}^{p}\right],{\mathbb{E}}\left[\bigg{\|}Y-\frac{1}{2}{\mathbb{E}}[X]-\frac{1}{2}{\mathbb{E}}[Y]\bigg{\|}_{\!F}^{p}\right]\right\}\leqslant\frac{3^{p}}{2^{p-1}}{\mathbb{E}}\left[\left\|X-Y\right\|_{\!F}^{p}\right].

F_{n}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\bigg{\{}x\in{\mathbb{C}}^{2n}:\sum_{k=1}^{2n}x_{k}=0\bigg{\}},

F_{n}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\bigg{\{}x\in{\mathbb{C}}^{2n}:\sum_{k=1}^{2n}x_{k}=0\bigg{\}},

b_{p} (F_{n}) ⩾ 2 (\frac{3}{2} - \frac{1}{n})^{p} n \to \infty \frac{3 ^{p}}{2 ^{p - 1}} .

b_{p} (F_{n}) ⩾ 2 (\frac{3}{2} - \frac{1}{n})^{p} n \to \infty \frac{3 ^{p}}{2 ^{p - 1}} .

A_{n}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\bigg{\{}(3n-2)e_{j}-(n+2)\sum_{k\in\{1,\ldots,n\}\smallsetminus\{j\}}e_{k}+(n-2)\sum_{k=n+1}^{2n}e_{k}:\,j\in\{1,\ldots,n\}\bigg{\}},

A_{n}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\bigg{\{}(3n-2)e_{j}-(n+2)\sum_{k\in\{1,\ldots,n\}\smallsetminus\{j\}}e_{k}+(n-2)\sum_{k=n+1}^{2n}e_{k}:\,j\in\{1,\ldots,n\}\bigg{\}},

B_{n}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\bigg{\{}(n-2)\sum_{k=1}^{n}e_{k}+(3n-2)e_{j}-(n+2)\sum_{k\in\{n+1,\ldots,2n\}\smallsetminus\{j\}}e_{k}:\ j\in\{n+1,\ldots,2n\}\bigg{\}}.

B_{n}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\bigg{\{}(n-2)\sum_{k=1}^{n}e_{k}+(3n-2)e_{j}-(n+2)\sum_{k\in\{n+1,\ldots,2n\}\smallsetminus\{j\}}e_{k}:\ j\in\{n+1,\ldots,2n\}\bigg{\}}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Moments of the Distance Between Independent Random Vectors· youtube

Taxonomy

TopicsProbability and Risk Models · Fuzzy Systems and Optimization · Point processes and geometric inequalities

Full text

Moments of the distance between independent random vectors

Assaf Naor and Krzysztof Oleszkiewicz

Abstract.

We derive various sharp bounds on moments of the distance between two independent random vectors taking values in a Banach space.

A.N. was supported by the Packard Foundation and the Simons Foundation. The research that is presented here was conducted under the auspices of the Simons Algorithms and Geometry (A&G) Think Tank. K.O. was partially supported by the National Science Centre, Poland, project number 2012/05/B/ST1/00412.

1. Introduction

Throughout what follows, all Banach spaces are tacitly assumed to be separable. This assumption removes the need to discuss measurability side-issues; alternatively one could consider throughout only the special case of finitely-supported random variables, which captures all of the key ideas. We will also tacitly assume that all Banach spaces are over the complex scalars ${\mathbb{C}}$ . This assumption is convenient for the ensuing proofs, but the main statements (namely, those that do not mention complex scalars explicitly) hold over the real scalars as well, through a standard complexification procedure. All the notation and terminology from Banach space theory that occurs below is basic and standard, as in e.g. [15].

Our starting point is the following question. What is the smallest $C>0$ such that for every Banach space $(F,\|\cdot\|_{\!F})$ and every two independent $F$ -valued integrable random vectors $X,Y\in L_{1}(F)$ we have

[TABLE]

We will reason that (1) holds with $C=3$ , and that $C=3$ is the sharp constant here. More generally, we have the following theorem.

Theorem 1.1.

Suppose that $p\geqslant 1$ and $(F,\|\cdot\|_{\!F})$ is a Banach space. Let $X,Y\in L_{p}(F)$ be two independent $F$ -valued $p$ -integrable random vectors. Then

[TABLE]

The constant $\frac{3^{p}}{2^{p-1}}$ in (2) cannot be improved.

The Banach space $F$ that exhibits this sharpness of (2) is, of course, a subspace of $\ell_{\infty}$ , but we do not know what is the optimal constant in (2) when $F=\ell_{\infty}$ itself. More generally, understanding the meaning of the optimal constant in (2) for specific Banach spaces is an interesting question, which we investigate in the rest of the present work for certain special classes of Banach spaces but do not fully resolve.

1.1. Geometric motivation

Our interest in (1) arose from investigations of [1] in the context of Riemannian/Alexandrov geometry. It is well established throughout an extensive geometric literature that a range of useful quadratic distance inequalities for a metric space $(\mathcal{M},d_{\mathcal{M}})$ arise if one imposes bounds on its curvature in the sense of Alexandrov. The term “quadratic” here indicates that these inequalities involve squares of distances between finite point configurations in $\mathcal{M}$ . A phenomenon that was established in [1] is that any such quadratic metric inequality that holds for every Alexandrov space of nonnegative curvature becomes valid in any metric space whatsoever if one removes the squaring of the distances, i.e., in essence upon “linearization” of the inequality; see [1] for a precise formulation. This led naturally to the question whether the same phenomenon holds for Hadamard spaces (complete simply connected spaces whose Alexandrov curvature in nonpositive); see [1] for an extensive discussion as well as the recent negative resolution of this question in [11]. In the context of a Hadamard space $(\mathcal{M},d_{\mathcal{M}})$ , the analogue of (1) is that independent finitely-supported $\mathcal{M}$ -valued random variables $X,Y$ satisfy

[TABLE]

See [1] for a standard derivation of (3), where $z\in\mathcal{M}$ is an appropriate “geometric barycenter,” namely it is obtained as the minimizer of the expected squared distance from $X$ to $z$ . As explained in [1], by using (3) iteratively one can obtain quadratic metric inequalities that hold in any Hadamard space and serve as obstructions for certain geometric embeddings. The “linearized” version of (3), in the case of Banach spaces and allowing for a loss of a factor $C$ , is precisely (1). So, in the spirit of [1] it is natural to ask what is the smallest $C$ for which it holds. This is what we address here, leading to analytic questions about Banach spaces that are interesting in their own right from the probabilistic and geometric perspective. We note that there are questions along these lines that [1] raises and remain open; see e.g. [1, Question 32].

1.2. Probabilistic discussion

The inequality which reverses (1) holds trivially as a consequence of the triangle inequality, even when $X$ and $Y$ are not necessarily independent. Namely, any $X,Y\in L_{1}(F)$ satisfy

[TABLE]

So, the above discussion is about the extent to which this use of the triangle inequality can be reversed.

Since the upper bound that we seek is in terms of the distance in $L_{p}(F)$ between independent copies of $X$ and $Y$ , this can be further used to control from above expressions such as ${\mathbb{E}}[\|X-Y\|_{\!F}^{p}]$ for $X$ and $Y$ not necessarily independent in terms of ${\mathbb{E}}[\|X^{\prime}-Y^{\prime}\|_{\!F}^{p}]$ , where $X^{\prime}$ and $Y^{\prime}$ are independent, $X^{\prime}$ has the same distribution as $X$ , and $Y^{\prime}$ has the same distribution as $Y$ .

In order to analyse the inequality (2) in a specific Banach space $(F,\|\cdot\|_{F})$ , we consider the following geometric moduli. Given $p\geqslant 1$ let $\boldsymbol{\mathcal{b}}_{p}(F,\|\cdot\|_{F})$ , or simply $\boldsymbol{\mathcal{b}}_{p}(F)$ if the norm is clear from the context, be the infimum over those $\boldsymbol{\mathcal{m}}>0$ such that every independent $F$ -valued random variables $X,Y\in L_{p}(F)$ satisfy

[TABLE]

Thus, $\boldsymbol{\mathcal{b}}_{p}(F)$ is precisely the best possible constant in the $L_{p}(F)$ -analogue of the aforementioned barycentric inequality (3). The use of the letter “ $\boldsymbol{\mathcal{b}}$ ” in this notation is in reference to the word “barycentric.” Theorem 1.1 asserts that $\boldsymbol{\mathcal{b}}_{p}(F)\leqslant 3^{p}/2^{p-1}$ , and that this bound cannot be improved in general.

Let $\boldsymbol{\mathcal{m}}_{p}(F,\|\cdot\|_{F})>0$ , or simply $\boldsymbol{\mathcal{m}}_{p}(F)$ if the norm is clear from the context, be the infimum over those $\boldsymbol{\mathcal{b}}>0$ such that every independent $F$ -valued random variables $X,Y\in L_{p}(F)$ satisfy

[TABLE]

The use of the letter “ $\boldsymbol{\mathcal{m}}$ ” in this notation is in reference to the word “mixture,” since the left-hand side of (5) is equal to $2{\mathbb{E}}[\|Z-{\mathbb{E}}[Z]\|_{F}^{p}]$ , where $Z\in L_{p}(F)$ distributed according to the mixture of the laws of $X$ and $Y$ , namely $X$ is the $F$ -valued random vector such that for every Borel set $A\subseteq F$ ,

[TABLE]

Obviously $\boldsymbol{\mathcal{b}}_{p}(F)\leqslant\boldsymbol{\mathcal{m}}_{p}(F)$ , because (5) corresponds to choosing $z=\frac{1}{2}{\mathbb{E}}[X]+\frac{1}{2}{\mathbb{E}}[Y]\in F$ in (4).

While we sometimes bound $\boldsymbol{\mathcal{m}}_{p}(F)$ directly, it is beneficial to refine the considerations through the study of two further moduli that are natural in their own right and, as we shall see later, their use can lead to better bounds. Firstly, let $\boldsymbol{\mathcal{r}}_{p}(F,\|\cdot\|_{F})$ , or simply $\boldsymbol{\mathcal{r}}_{p}(F)$ if the norm is clear from the context, be the infimum over those $\boldsymbol{\mathcal{r}}>0$ such that every independent $F$ -valued random variables $X,Y\in L_{p}(F)$ satisfy

[TABLE]

where $X^{\prime},Y^{\prime}$ are independent copies of $X$ and $Y^{\prime}$ , respectively. The use of the letter “ $\boldsymbol{\mathcal{r}}$ ” in this notation is in reference to the word “roundness,” as we shall next explain.

Observe also that (7) is a purely metric condition, i.e., it involves only distances between points. So, it makes sense to investigate (7) in any metric space $(\mathcal{M},d_{\mathcal{M}})$ , namely to study the inequality

[TABLE]

One requires (8) to hold for $\mathcal{M}$ -valued independent random variables $X,X^{\prime},Y,Y^{\prime}$ (say, finitely-supported, to avoid measurability assumptions) such that each of the pairs $X,X^{\prime}$ and $Y,Y^{\prime}$ is identically distributed.

To the best of our knowledge, condition (8) was first studied systematically by Enflo [10], who defined a metric space $(\mathcal{M},d_{\mathcal{M}})$ to have generalized roundness $p$ it it satisfies (8) with $\boldsymbol{\mathcal{r}}=2$ . He proved that $L_{p}$ has generalized roundness $p$ for $p\in[1,2]$ , and ingeniously used this notion to answer an old question of Smirnov. See [9] for a relatively recent example of substantial impact of Enflo’s approach. By combining [14] with [19], a metric space $(\mathcal{M},d_{\mathcal{M}})$ has generalized roundness $p$ if and only if $(\mathcal{M},d_{\mathcal{M}}^{p/2})$ embeds isometrically into a Hilbert space. The case $\boldsymbol{\mathcal{r}}>1$ of (8) arose in [2] in the context of metric embeddings.

The final geometric modulus that we consider here is a quantity $\boldsymbol{\mathcal{j}}_{p}(F,\|\cdot\|_{F})$ , or simply $\boldsymbol{\mathcal{j}}_{p}(F)$ if the norm is clear from the context, that is defined to be the infimum over those $\boldsymbol{\mathcal{j}}\geqslant 1$ such that every independent and identically distributed $F$ -valued random variables $Z,Z^{\prime}\in L_{p}(F)$ satisfy

[TABLE]

Note that (9) holds with $\boldsymbol{\mathcal{j}}=1$ by Jensen’s inequality, so we are asking here for an improvement of (this use of) Jensen’s inequality by a definite factor; the letter “ $\boldsymbol{\mathcal{j}}$ ” in this notation is in reference to “Jensen.”

We have the following general bounds, which hold for every Banach space $(F,\|\cdot\|_{F})$ and every $p\geqslant 1$ .

[TABLE]

Indeed, we already observed the first inequality in (10), and the second inequality in (10) is justified by taking independent random variables $X,Y\in L_{p}(F)$ , considering their mixture $Z\in L_{p}(F)$ as defined in (6), letting $X^{\prime},Y^{\prime},Z^{\prime}$ be independent copies of $X,Y,Z$ , respectively, and proceeding as follows.

[TABLE]

Recalling the definition (5) of $\boldsymbol{\mathcal{m}}_{p}(F)$ , this implies (10).

Here we prove the following bounds on $\boldsymbol{\mathcal{b}}_{p}(L_{q}),\boldsymbol{\mathcal{m}}_{p}(L_{q}),\boldsymbol{\mathcal{r}}_{p}(L_{q}),\boldsymbol{\mathcal{j}}_{p}(L_{q})$ for $p,q\in[1,\infty)$ .

Theorem 1.2.

For every $p,q\in[1,\infty)$ we have $\boldsymbol{\mathcal{j}}_{p}(L_{q})=2^{c(p,q)}$ , where

[TABLE]

We also have $\boldsymbol{\mathcal{r}}_{p}(L_{q})\leqslant 2^{C(p,q)}$ , where

[TABLE]

In fact, if $\frac{p}{p-1}\leqslant q\leqslant p$ , then $\boldsymbol{\mathcal{r}}_{p}(L_{q})=2^{p-1}$ , if $\frac{q}{q-1}\leqslant p\leqslant q$ , then $\boldsymbol{\mathcal{r}}_{p}(L_{q})=2^{\frac{p(q-2)}{q}+1}$ , and $\boldsymbol{\mathcal{r}}_{p}(L_{q})=2$ if $1\leqslant p\leqslant q\leqslant 2$ . Namely, the above bound on $\boldsymbol{\mathcal{r}}_{p}(L_{q})$ is sharp in the first, second and fifth ranges in (12).

Furthermore, $\boldsymbol{\mathcal{b}}_{p}(L_{q})=\boldsymbol{\mathcal{m}}_{p}(L_{q})=2^{2-p}$ if $p\leqslant q\leqslant 2$ . More generally, we have the bound

[TABLE]

The upper bound on $\boldsymbol{\mathcal{b}}_{p}(L_{q})$ in (13) improves over (2) when $F=L_{q}$ for all values of $p,q\in[1,\infty)$ . It would be interesting to find the exact value of $\boldsymbol{\mathcal{b}}_{p}(L_{q})$ in the entire range $p,q\in[1,\infty)$ . Note that the second quantity in the minimum in the right hand side of (13) corresponds to using (10) together with the bounds on $\boldsymbol{\mathcal{j}}_{p}(L_{q})$ and $\boldsymbol{\mathcal{r}}_{p}(L_{q})$ that Theorem 1.2 provides; when, say, $p=q$ , this quantity is smaller than the first quantity in the minimum in the right hand side of (13) if and only if $1\leqslant p<3$ .

Theorem1.2 states that the constant $C(p,q)$ is sharp in the first, second and fifth ranges in (12). The following conjecture formulates what we expect to be the sharp values of $\boldsymbol{\mathcal{r}}_{p}(L_{q})$ for all $p,q\in[1,\infty)$ .

*Conjecture 1.3**.*

For all $p,q\in[1,\infty)$ we have $\boldsymbol{\mathcal{r}}_{p}(L_{q})=2^{C_{\mathrm{opt}}(p,q)}$ , where

[TABLE]

We will prove later that $\boldsymbol{\mathcal{r}}_{p}(L_{q})\geqslant 2^{C_{\mathrm{opt}}(p,q)}$ , so Conjecture (1.3) is about improving our upper bounds on $\boldsymbol{\mathcal{r}}_{p}(L_{q})$ in the remaining third and fourth ranges that appear in (12).

*Question 1.4**.*

Below we will obtain improvements over (2) for other spaces besides $\{L_{q}:\ q\in[1,\infty)\}$ , including e.g. the Schatten–von Neumann trace classes (see e.g. [20]) $\{\mathsf{S}_{q}:\ q\in(1,\infty)\}$ . However, parts of Theorem 1.2 rely on “commutative” properties of $L_{q}$ which are not valid for $\mathsf{S}_{q}$ , thus leading to even better bounds in the commutative setting. It would be especially interesting to obtain sharp bounds in noncommutative probabilistic inequalities such as the roundness inequality (7) when $F=\mathsf{S}_{q}$ . In particular, we ask what is the value of $\boldsymbol{\mathcal{r}}_{1}(\mathsf{S}_{1})$ ? At present, we know (as was already shown by Enflo [10]) that $\boldsymbol{\mathcal{r}}_{1}(L_{1})=2$ while the only bound that we have for $\mathsf{S}_{1}$ is $\boldsymbol{\mathcal{r}}_{1}(\mathsf{S}_{1})\leqslant 4$ . Note that $4$ is a trivial upper bound here, which holds for every Banach space. Interestingly, it follows from [7] that $\boldsymbol{\mathcal{r}}_{1}(\mathsf{S}_{1})\geqslant 2\sqrt{2}$ , as explained in Remark 3.1 below. So, there is a genuine difference between the commutative and noncommutative settings of $L_{1}$ and $\mathsf{S}_{1}$ , respectively. As a more modest question, is $\boldsymbol{\mathcal{r}}_{1}(\mathsf{S}_{1})$ strictly less than $4$ ?

1.3. Complex interpolation

We will use basic terminology, notation and results of complex interpolation of Banach spaces; the relevant background appears in [8, 4]. Theorem 1.2 is a special case of the following more general result about interpolation spaces. As such, it applies also to random variables that take values in certain spaces other than $L_{q}$ , including, for examples, Schatten–von Neumann trace classes (see e.g. [20]) and, by an extrapolation theorem of Pisier[18], Banach lattices of nontrivial type.

Theorem 1.5.

Fix $\theta\in[0,1]$ and $\frac{2}{2-\theta}\leqslant p\leqslant\frac{2}{\theta}$ . Let $(F,\|\cdot\|_{\!F}),(H,\|\cdot\|_{\!H})$ be a compatible pair of Banach spaces such that $(H,\|\cdot\|_{\!H})$ is a Hilbert space. Then the following estimates hold true.

[TABLE]

Additionally, we have

[TABLE]

(Note that if the first range of values of $p$ in the right hand side of (16) is nonempty, then necessarily $\theta\leqslant\frac{2}{3}$ .)

The deduction of Theorem 1.2 from Theorem 1.5 appears in Section 3 below; in most cases this deduction is nothing more than a direct substitution into Theorem 1.5, but in some cases a further argument is needed. Theorem 1.5 itself is a special case of the following theorem.

Theorem 1.6.

Fix $\theta\in[0,1]$ and $p\in[1,\infty]$ that satisfy $\frac{2}{2-\theta}\leqslant p\leqslant\frac{2}{\theta}$ . Let $(F,\|\cdot\|_{\!F}),(H,\|\cdot\|_{\!H})$ be a compatible pair of Banach spaces such that $(H,\|\cdot\|_{\!H})$ is a Hilbert space. Suppose that $(\mathcal{X},\mu)$ and $(\mathcal{Y},\nu)$ are probability spaces. Then, for every $f\in L_{p}(\mu\times\nu;[F,H]_{\theta})$ we have

[TABLE]

and

[TABLE]

Furthermore, if $g\in L_{p}(\mu\times\mu;[F,H]_{\theta})$ , then

[TABLE]

Proof of Theorem 1.5 assuming Theorem 1.6.

Let $X$ and $Y$ be independent $p$ -integrable $[F,H]_{\theta}$ -valued random vectors. Due to the independence assumption, without loss of generality there are probability spaces $(\mathcal{X},\mu)$ and $(\mathcal{Y},\nu)$ such that $X$ and $Y$ are elements of $L_{p}(\mu\times\nu;[F,H]_{\theta})$ that depend only on the first variable and second variable, respectively. Then (17) and (18) applied to $f=X-Y$ become

[TABLE]

and

[TABLE]

We therefore established the first inequality in (15) as well as the upper bound on $\boldsymbol{\mathcal{m}}_{p}([F,H]_{\theta})$ that corresponds to the first term in the minimum that appears in (16).

Similarly, due to the fact that $X$ and $X^{\prime}$ are i.i.d., without loss of generality there is a probability space $(\mathcal{X},\mu)$ such that $X$ and $X^{\prime}$ are elements of $L_{p}(\mu\times\mu;[F,H]_{\theta})$ that depend only on the first variable and second variable, respectively. Then, (19) applied to $g=X-X^{\prime}$ simplifies to give

[TABLE]

This establishes the second inequality in (15), as well as the upper bound on $\boldsymbol{\mathcal{m}}_{p}([F,H]_{\theta})$ that corresponds to the second term in the minimum that appears in (16), due to (10). ∎

The first and third inequalities of Theorem 1.6 are generalizations of results that appeared in the literature. Specifically, (17) generalizes Lemma 6 of [2], and (19) generalizes Lemma 5 of [17], which is itself inspired by a step within the proof of Theorem 2 of [21]. The proof of Theorem 1.6, which appears in Section 3 below, differs from the proofs of [21, 17, 2], but relies on the same ideas.

2. Proof of Theorem 1.1

Let $(F,\|\cdot\|_{F})$ be a Banach space. Fix $p\geqslant 1$ . Theorem 1.1 asserts that $\boldsymbol{\mathcal{b}}_{p}(F)\leqslant\frac{3^{p}}{2^{p-1}}$ . In fact, $\boldsymbol{\mathcal{m}}_{p}(F)\leqslant\frac{3^{p}}{2^{p-1}}$ , which is stronger by (10). To see this, let $X,Y\in L_{p}(F)$ be independent random vectors and observe that

[TABLE]

where the penultimate step holds due to the convexity of $\|\cdot\|_{\!F}^{p}$ and the final step holds because, by Jensen’s inequality, both ${\mathbb{E}}\left[\left\|X-{\mathbb{E}}[Y]\right\|_{\!F}^{p}\right]={\mathbb{E}}\left[\left\|{\mathbb{E}}_{Y}[X-Y]\right\|_{\!F}^{p}\right]$ and ${\mathbb{E}}\left[\left\|{\mathbb{E}}[Y-X]\right\|_{\!F}^{p}\right]$ are at most ${\mathbb{E}}\left[\left\|X-Y\right\|_{\!F}^{p}\right]$ . The symmetric reasoning with $X$ replaced by $Y$ now gives

[TABLE]

This shows that $\boldsymbol{\mathcal{m}}_{p}(F)\leqslant\frac{3^{p}}{2^{p-1}}$ . It remains to prove that the bound $\boldsymbol{\mathcal{b}}_{p}(F)\leqslant\frac{3^{p}}{2^{p-1}}$ is optimal for general $F$ .

Fix an integer $n\geqslant 2$ and consider

[TABLE]

equipped with supremum norm inherited from $\ell_{\infty}^{2n}$ . We will prove that

[TABLE]

Denote by $\{e_{k}\}_{k=1}^{2n}$ the standard coordinate basis of $\ell_{\infty}^{2n}$ . Define two $n$ -element sets $A_{n},B_{n}\subseteq F_{n}$ by

[TABLE]

and

[TABLE]

Note that $A_{n}$ and $B_{n}$ are indeed subsets of $F_{n}$ because $3n-2-(n-1)(n+2)+n(n-2)=0$ . Let $X,Y$ be independent and uniformly distributed on $A_{n},B_{n}$ , respectively. One checks that $\|a-b\|_{\infty}=2n$ for any $a\in A_{n}$ and $b\in B_{n}$ . So, ${\mathbb{E}}\left[\|X-Y\|_{\infty}^{p}\right]=(2n)^{p}$ . The desired bound (20) will follow if we demonstrate that

[TABLE]

The proof of (21) proceeds via symmetrization. For permutations $\sigma,\rho\in S_{n}$ , define $T_{\sigma,\rho}:F_{n}\rightarrow F_{n}$ by

[TABLE]

$T_{\sigma,\rho}$ is a linear isometry of $F_{n}$ and the sets $A_{n}$ and $B_{n}$ are $T_{\sigma,\rho}$ -invariant. Hence, for any $z\in F_{n}$ ,

[TABLE]

Denoting $u=(z_{1}+\ldots+z_{n})/n$ , it follows from (22) that ${\mathbb{E}}\left[\|X-z\|_{\infty}^{p}\right]\geqslant|3n-2-u|^{p}$ , because one of the first $n$ coordinates of any member of the support of $X$ equals $3n-2$ . The same argument with $X$ replaced by $Y$ gives that ${\mathbb{E}}\left[\|Y-z\|_{\infty}^{p}\right]\geqslant|3n-2+u|^{p}$ , because now one of the last $n$ coordinates of any member of the support of $Y$ equals $3n-2$ . We conclude with the following application of the convexity of $|\cdot|^{p}$ .

[TABLE]

*Remark 2.1**.*

It is worthwhile to examine what the above argument gives if we take the norm on $F_{n}$ to be the norm inherited from $\ell_{q}^{2n}$ . One computes that $\|a-b\|_{q}=(2n)^{1+1/q}$ for every $a\in A_{n}$ and $b\in B_{n}$ . So,

[TABLE]

Also, it follows from the same reasoning that led to (22) that for every $z\in F_{n}$ ,

[TABLE]

and

[TABLE]

Hence, using the convexity of the $p$ ’th power of the $\ell_{q}$ norm on ${\mathbb{R}}^{3}$ , we see that

[TABLE]

By contrasting (23) with (24) we conclude that

[TABLE]

In particular, if we take $p=q\geqslant 2$ and $n=\lceil q\rceil$ , then we conclude that $\boldsymbol{\mathcal{b}}_{q}(F_{n},\|\cdot\|_{q})\geqslant\frac{c}{q}\left(\frac{3}{2}\right)^{q}$ for some universal constant $c>0$ . So, there is very little potential asymptotic gain (as $q\to\infty$ ) if we know that the Banach space of Theorem 1.1 admits an isometric embedding into $L_{q}$ .

Above, and in what follows, we stated that a normed space admits an isometric embedding into $L_{q}$ without specifying whether the embedding is linear or not. Later we will need such embeddings to be linear, so we recall that for any $q\geqslant 1$ , by a classical differentiation argument (see [3, Chapter 7] for a thorough treatment of such reductions to the linear setting), a normed space embeds isometrically into $L_{q}$ as a metric space if and only if it admits a linear isometric embedding into $L_{q}$ .

Note that the phenomenon of Remark 2.1 is special to random variables that have different expectations. Namely, if ${\mathbb{E}}[X]={\mathbb{E}}[Y]$ , then by Jensen’s inequality the ratio that defines $\boldsymbol{\mathcal{b}}_{q}(F)$ is at most $2$ rather than the aforementioned exponential growth as $q\to\infty$ . The following proposition shows that if $F$ is a subspace of $L_{q}$ for $q\geqslant 3$ , then when ${\mathbb{E}}[X]={\mathbb{E}}[Y]$ this ratio is at most $1$ , which is easily seen to be best possible (consider any nontrivial symmetric random variable $X$ , and take $Y$ to be identically [math]).

Proposition 2.2.

Let $(F,\|\cdot\|_{F})$ be a Banach space that admits an isometric embedding into $L_{q}$ for some $q\in[3,\infty)$ . Then, for any pair of independent $F$ -valued random vectors $X,Y\in L_{q}(F)$ with ${\mathbb{E}}[X]={\mathbb{E}}[Y]$ ,

[TABLE]

Proof.

$L_{q}$ over ${\mathbb{C}}$ embeds isometrically into $L_{q}$ over ${\mathbb{R}}$ (indeed, complex $L_{q}$ is, as a real Banach space, the same as $L_{q}(\ell_{2}^{2})$ , so this follows from the fact that Hilbert space is isometric to a subspace of $L_{q}$ ). So, in Proposition 2.2 we may assume that $F$ embeds isometrically into $L_{q}$ over ${\mathbb{R}}$ , and therefore by integration/Fubini it suffices to prove (25) for real-valued random variables. So, our goal is to show that if $X,Y$ are independent mean-zero real random variables with ${\mathbb{E}}\left[|X|^{q}\right]$ , ${\mathbb{E}}\left[|Y|^{q}\right]<\infty$ , then

[TABLE]

The bound (25) would then follow by applying (26) to the mean-zero variables $X-{\mathbb{E}}[X]$ and ${\mathbb{E}}[X]-Y$ .

Note in passing that the assumption $q\geqslant 3$ is crucial here, i.e. (26) fails if $q\in(0,3)\smallsetminus\{2\}$ . Indeed, if $\beta\in(0,\frac{1}{2})$ and ${\mathbb{P}}[X=1-\beta]={\mathbb{P}}[Y=1-\beta]=\beta$ and ${\mathbb{P}}[X=-\beta]={\mathbb{P}}[Y=-\beta]=1-\beta$ , then ${\mathbb{E}}[X]={\mathbb{E}}[Y]=0$ but

[TABLE]

If $q\in(0,2)$ , then the right hand side of (27) equals $2^{q-2}<1$ for $\beta=\frac{1}{2}$ . If $q\in(2,3)$ , then the right hand side of (27) equals $1+(2^{q-1}-q-1)\beta+o(\beta)$ , which is less than $1$ for small $\beta$ since $2^{q-1}-q-1<0$ for $q\in(2,3)$ .

To prove (26), for every $s>0$ and $x\in{\mathbb{R}}$ , denote $\phi_{s}(x)={\mathrm{sign}}(x)\cdot|x|^{s}.$ Observe that

[TABLE]

Once (28) is proved, (26) would follow because

[TABLE]

where the penultimate step uses the independence of $X,Y$ and the last step uses ${\mathbb{E}}[X]={\mathbb{E}}[Y]=0$ .

It suffices to prove (28) when $q>3$ ; the case $q=3$ follows by passing to the limit. Once checks that

[TABLE]

where the last step holds because $\phi_{p-3}$ is increasing. Hence, $y\mapsto\frac{\partial^{2}\alpha}{\partial x^{2}}(x,y)$ is decreasing for $y<0$ and increasing for $y>0$ . One checks that $\frac{\partial^{2}\alpha}{\partial x^{2}}(x,0)=0$ for all $x\in{\mathbb{R}}$ , so $\frac{\partial^{2}\alpha}{\partial x^{2}}(x,y)\geqslant 0$ . Thus $x\mapsto\alpha(x,y)$ is convex for every fixed $y\in{\mathbb{R}}$ . But $\alpha(0,y)=\frac{\partial\alpha}{\partial x}(0,y)=0$ for any $y\in{\mathbb{R}}$ , i.e. the tangent to the graph of $x\mapsto\alpha(x,y)$ at $x=0$ is the $x$ -axis. Convexity implies that the graph of $x\mapsto\alpha(x,y)$ lies above the $x$ -axis, as required. ∎

We end this section with the following simpler metric space counterpart of Theorem 1.1.

Proposition 2.3.

Fix $p\geqslant 1$ and let $X$ and $Y$ be independent finitely supported random variables taking values in a metric space $(\mathcal{M},d_{\mathcal{M}})$ . Then

[TABLE]

The constant $2^{p}+1$ in (29) is optimal.

Proof.

Let $X^{\prime}$ have the same distribution as $X$ and be independent of $X$ and $Y$ . The point-wise inequality

[TABLE]

is a consequence of the triangle inequality and the convexity of $(u>0)\mapsto u^{p}$ . By taking expectations, we obtain ${\mathbb{E}}\left[d_{\mathcal{M}}(X,X^{\prime})^{p}\right]\leqslant 2^{p}{\mathbb{E}}\left[d_{\mathcal{M}}(X,Y)^{p}\right]$ , so that

[TABLE]

To see that the constant $2^{p}+1$ is optimal, fix $n\in\mathbb{N}$ and let $\mathcal{M}$ be the complete bipartite graph $\mathsf{K}_{n,n}$ , equipped with its shortest-path metric. Equivalently, $\mathcal{M}$ can be partitioned into two $n$ -point subsets $L,R$ , and for distinct $x,y\in\mathcal{M}$ we have $d_{\mathcal{M}}(x,y)=2$ if $\{x,y\}\subseteq L$ or $\{x,y\}\subseteq R$ , while $d_{\mathcal{M}}(x,y)=1$ otherwise. Let $X$ be uniformly distributed over $L$ and $Y$ be uniformly distributed over $R$ . Then $d_{\mathcal{M}}(X,Y)=1$ point-wise. If $z\in L$ , then $d_{\mathcal{M}}(Y,z)=1$ point-wise, while ${\mathbb{P}}\left[d_{\mathcal{M}}(X,z)=2\right]=\frac{n-1}{n}$ and ${\mathbb{P}}\left[d_{\mathcal{M}}(X,z)=0\right]=\frac{1}{n}$ . Consequently,

[TABLE]

By symmetry, the same holds if $z\in R$ . ∎

3. Proof of Theorem 1.6 and its consequences

Here we prove Theorem 1.6 and deduce Theorem 1.2.

Proof of Theorem 1.6.

The assumption $\frac{2}{2-\theta}\leqslant p\leqslant\frac{2}{\theta}$ implies that $\frac{1}{p}=\frac{1-\theta}{q}+\frac{\theta}{2}$ for some (unique) $q\in[1,\infty]$ . We will fix this value of $q$ for the rest of the proof of Theorem 1.6. All of the desired bounds (17), (18), (19) hold true when $\theta=0$ , namely for every Banach space $(F,\|\cdot\|_{F})$ and every $f\in L_{q}(\mu\times\nu;F)$ we have

[TABLE]

and

[TABLE]

Furthermore, if $g\in L_{q}(\mu\times\mu;F)$ , then

[TABLE]

Indeed, (30), (31), (32) are direct consequences of the triangle inequality in $L_{q}(\mu\times\nu;F)$ and $L_{q}(\mu\times\mu;F)$ and Jensen’s inequality, with the appropriate interpretation when $q=\infty$ .

By complex interpolation theory (specifically, by combining [4, Theorem 4.1.2] and [4, Theorem 5.1.2]), Theorem 1.6 will follow if we prove the $\theta=1$ case of (17), (18), (19). To this end, as $H$ is a Hilbert space and the inequalities in question are quadratic, it suffices to prove them coordinate-wise (with respect to any othonormal basis of $H$ ), i.e., it suffices to show that for every ( ${\mathbb{C}}$ -valued) $f\in L_{2}(\mu\times\nu)$ and $g\in L_{2}(\mu\times\mu)$ ,

[TABLE]

and

[TABLE]

and

[TABLE]

The following derivation of the quadratic scalar inequalities (33), (34), (35) is an exercise in linear algebra.

Let $\{\varphi_{j}\}_{j=0}^{\infty}\subseteq L_{2}(\mu)$ and $\{\psi_{k}\}_{k=0}^{\infty}\subseteq L_{2}(\nu)$ be any orthonormal bases of $L_{2}(\mu)$ and $L_{2}(\nu)$ , respectively, for which $\varphi_{0}={\mathbf{1}}_{\mathcal{X}}$ and $\psi_{0}={\mathbf{1}}_{\mathcal{Y}}$ . Then $\{\varphi_{j}\otimes\psi_{k}\}_{j,k=0}^{\infty},\{\varphi_{j}\otimes\varphi_{k}\}_{j,k=0}^{\infty}$ and $\{\psi_{j}\otimes\psi_{k}\}_{j,k=0}^{\infty}$ are orthonormal bases of $L_{2}(\mu\times\nu),L_{2}(\mu\times\mu)$ and $L_{2}(\nu\times\nu)$ , respectively, where for $\varphi\in L_{2}(\mu)$ and $\psi\in L_{2}(\nu)$ one defines (as usual) $\varphi\otimes\psi:\mathcal{X}\times\mathcal{Y}\to{\mathbb{C}}$ by setting $\varphi\otimes\psi(x,y)=\varphi(x)\psi(y)$ for $(x,y)\in\mathcal{X}\times\mathcal{Y}$ . We therefore have the following expansions, in the sense of convergence in $L_{2}(\mu\times\nu)$ and $L_{2}(\mu\times\mu)$ , respectively.

[TABLE]

In particular, by Parseval we have

[TABLE]

Define $R_{\mathcal{X}}f\in L_{2}(\mu\times\mu)$ by

[TABLE]

So, $(\mu\times\mu)$ -almost surely $R_{\mathcal{X}}f(x,\chi)=\int_{\mathcal{Y}}\big{(}f(x,y)-f(\chi,y)\big{)}{\mathrm{d}}\nu(y)$ . Also, define $R_{\mathcal{Y}}f\in L_{2}(\nu\times\nu)$ by

[TABLE]

So, $(\nu\times\nu)$ -almost surely $R_{\mathcal{Y}}f(y,\upupsilon)=\int_{\mathcal{X}}\big{(}f(x,y)-f(x,\upupsilon)\big{)}{\mathrm{d}}\nu(x)$ . By Parseval in $L_{2}(\mu\times\mu),L_{2}(\nu\times\nu),L_{2}(\mu\times\nu)$ ,

[TABLE]

This is precisely (33).

Next, for every $\alpha,\beta\in{\mathbb{C}}$ define $S^{\alpha}_{\mathcal{X}}f\in L_{2}(\mu)$ and $S^{\beta}_{\mathcal{Y}}f\in L_{2}(\nu)$ by

[TABLE]

and

[TABLE]

In other words, we have the following identities $\mu$ -almost surely and $\nu$ -almost surely, respectively.

[TABLE]

and

[TABLE]

By Parseval in $L_{2}(\mu),L_{2}(\nu),L_{2}(\mu\times\nu)$ ,

[TABLE]

The case $\alpha=\beta=\frac{1}{2}$ of this inequality is precisely (34). It is worthwhile to note in passing that this reasoning (substituted into the above interpolation argument) yields the following generalization of (18).

[TABLE]

For the justification of the remaining inequality (35), define $Tg\in L_{2}(\mu)$ by

[TABLE]

In other words, $\mu$ -almost surely $Tg(\chi)=\int_{\mathcal{X}}\big{(}g(x,\chi)-g(\chi,x)\big{)}{\mathrm{d}}\mu(x)$ . By Parseval in $L_{2}(\mu),L_{2}(\mu\times\nu)$ ,

[TABLE]

where in the penultimate step we used the convexity of $(\zeta\in{\mathbb{C}})\mapsto|\zeta|^{2}$ . This is precisely (35). ∎

We will next deduce Theorem 1.2 from the special case of Theorem 1.6 that we stated as Theorem 1.5.

Proof of Theorem 1.2.

The largest $\theta\in[0,1]$ for which $\frac{2}{2-\theta}\leqslant p\leqslant\frac{2}{\theta}$ and also $\frac{1}{q}=\frac{1-\theta}{r}+\frac{\theta}{2}$ for some $r\geqslant 1$ is

[TABLE]

We then have $L_{q}=[L_{r},L_{2}]_{\theta_{\max}}$ . Note that the quantity $c(p,q)$ that is defined in (11) is equal to $\frac{p}{2}\theta_{\max}$ .

By (15) with $\theta=\theta_{\max}$ and $F=L_{r}$ we have $\boldsymbol{\mathcal{j}}_{p}(L_{q})\geqslant 2^{c(p,q)}$ . The matching upper bound $\boldsymbol{\mathcal{j}}_{p}(L_{q})\leqslant 2^{c(p,q)}$ holds due to the following quick examples. If $X$ is uniformly distributed on $\{-1,1\}$ , then ${\mathbb{E}}[|X-{\mathbb{E}}[X]\|^{p}]$ and ${\mathbb{E}}|X-X^{\prime}|^{p}=2^{p-1}$ . So, $\boldsymbol{\mathcal{j}}_{p}({\mathbb{R}})\leqslant 2^{p-1}$ . If $\varepsilon\in(0,1)$ and ${\mathbb{P}}[X_{\varepsilon}=0]=1-\varepsilon$ and ${\mathbb{P}}[X_{\varepsilon}=1]=\varepsilon$ , then for $p>1$ ,

[TABLE]

If $n\in\mathbb{N}$ and $X_{n}$ is uniformly distributed over $\{\pm e_{1},\ldots,\pm e_{n}\}$ , where $\{e_{j}\}_{j=1}^{\infty}$ is the standard basis of $\ell_{p}$ , then

[TABLE]

If $r_{1},\ldots,r_{n}$ are i.i.d. symmetric Bernoulli random variables viewed as elements of $L_{q}$ , e.g. they can be the coordinate functions in $L_{q}(\{-1,1\}^{n})$ , then let $R_{n}$ be uniformly distributed over $\{\pm r_{1},\ldots,\pm r_{n}\}$ . Then,

[TABLE]

This completes the proof that $\boldsymbol{\mathcal{j}}_{p}(L_{q})=2^{c(p,q)}$ .

Next, an application of (15) with $\theta=\theta_{\max}$ and $F=L_{r}$ gives $\boldsymbol{\mathcal{r}}_{p}(L_{q})\leqslant 2^{1+(1-\theta_{\max})p}$ . In other words,

[TABLE]

for every $p$ -integrable independent $L_{q}$ -valued random variables $X,X^{\prime},Y,Y^{\prime}$ such that $(X,Y)$ and $(X^{\prime},Y^{\prime})$ are identically distributed. The bound (38) coincides with (15), where $C(p,q)$ is as in (12), only in the first two ranges that appear in (12), namely when $\frac{p}{p-1}\leqslant q\leqslant p$ or when $\frac{q}{q-1}\leqslant p\leqslant q$ . For the remaining ranges that appear in (12), the bound (38) is inferior to (15), so we reason as follows.

For every $q,Q\in[1,\infty]$ satisfying $Q\geqslant q$ , by [16, Remark 5.10] (the case $Q\in[1,2]$ is an older result [6]) there exists an embedding $\mathfrak{s}=\mathfrak{s}_{q,Q}:L_{q}\to L_{Q}$ (given by an explicit formula) such that

[TABLE]

Apply (38) to the $L_{Q}$ -valued random vectors $\mathfrak{s}(X),\mathfrak{s}(X^{\prime}),\mathfrak{s}(Y),\mathfrak{s}(Y^{\prime})$ with $q$ replaced by $Q$ and $p$ replaced with $\frac{pQ}{q}$ . The resulting estimate is

[TABLE]

It is in our interest to choose $Q\geqslant q$ so as to minimize the right hand side of (40). If $\frac{1}{p}+\frac{1}{q}\leqslant 1$ , then $Q=q$ is the optimal choice in (40), and therefore we return to (38). But, if $\frac{1}{p}+\frac{1}{q}\geqslant 1$ , then $Q=1+\frac{q}{p}\geqslant q$ is the optimal choice in (40) and we arrive at the following estimate which is better than (38) in the stated range

[TABLE]

The bound (41) covers the third and fourth ranges that appear in (12), as well as the case $p=q\in[1,2]$ of the fifth range that appears in (12). However, (41) is inferior to (12) when $1\leqslant p<q\leqslant 2$ . When this occurs, use the fact [12] that $L_{q}$ is isometric to a subspace of $L_{p}$ and apply the already established case $p=q$ to the $L_{p}$ -valued random variables $\mathcal{i}(X),\mathcal{i}(X^{\prime}),\mathcal{i}(Y),\mathcal{i}(Y^{\prime})$ , where $\mathcal{i}:L_{q}\to L_{p}$ is any isometric embedding.

We will next prove that $\boldsymbol{\mathcal{r}}_{p}(L_{q})\geqslant 2^{C_{\mathrm{opt}}(p,q)}$ , where $C_{\mathrm{opt}}(p,q)$ is given in (14). In particular, this will justify the second sharpness assertion of Theorem 1.2, namely that (38) is sharp when $p,q$ belong to the first, second or fifth ranges that appear in (12). Firstly, by considering the special case of (7) in which $X,Y$ are i.i.d., we see that $\boldsymbol{\mathcal{r}}_{p}(F)\geqslant 1$ for any Banach space $F$ . Next, fix $n\in\mathbb{N}$ and let $r_{1},\ldots,r_{n},\rho_{1},\ldots,\rho_{n}\in L_{q}$ be such that $r_{1},\ldots,r_{n}$ and $\rho_{1},\ldots,\rho_{n}$ each form a sequence of i.i.d. symmetric Bernoulli random variables, and the supports of $r_{1},\ldots,r_{n}$ are disjoint from the supports of $\rho_{1},\ldots,\rho_{n}$ . For example, one could consider them as the elements of $L_{q}(\{-1,1\}^{n})\oplus_{q}L_{q}(\{-1,1\}^{n})$ that are given by $r_{i}=(\omega\mapsto\omega_{i},0)$ and $\rho_{i}=(0,\omega\mapsto\omega_{i})$ for each $i\in\{1,\ldots,n\}$ . Let $X$ be uniformly distributed over $\{r_{1},\ldots,r_{n}\}$ and $Y$ be uniformly distributed over $\{\rho_{1},\ldots,\rho_{n}\}$ . Due to the disjointness of the supports, we have $\|X-Y\|^{p}_{q}=(\|X\|_{q}^{q}+\|Y\|_{q}^{q})^{p/q}=2^{p/q}$ point-wise. At the same time, ${\mathbb{E}}[\|X-X^{\prime}\|_{q}^{p}]+{\mathbb{E}}[\|Y-Y^{\prime}\|_{q}^{p}]=2(1-1/n)(2^{q}/2)^{p/q}=(1-1/n)2^{1+p(q-1)/q}$ . By letting $n\to\infty$ , this shows that necessarily $\boldsymbol{\mathcal{r}}_{p}(L_{q})\leqslant 2^{1+p(q-2)/q}$ . Finally, if (7) holds, then in particular it holds for scalar-valued random variables. By integrating, we see that $\boldsymbol{\mathcal{r}}_{p}(F)\geqslant\boldsymbol{\mathcal{r}}_{p}(L_{p})$ for any Banach space $F$ . But, the case $p=q$ of the above discussion gives $\boldsymbol{\mathcal{r}}_{p}(L_{p})\geqslant 2^{1+p(p-2)/p}=2^{p-1}$ , as required.

The bound (13) of Theorem 1.2 coincides with (16). When $p\leqslant q\leqslant 2$ , we have $C(p,q)=1$ , $c(p,q)=p-1$ and thus $\boldsymbol{\mathcal{m}}_{p}(L_{q})\leqslant 2^{2-p}$ . It therefore remains to check that $\boldsymbol{\mathcal{b}}_{p}(L_{q})\geqslant 2^{2-p}$ when $p\leqslant q\leqslant 2$ . In fact, $\boldsymbol{\mathcal{b}}_{p}(F)\geqslant 2^{2-p}$ for every $p\geqslant 1$ and every Banach space $(F,\|\cdot\|_{F})$ . Indeed, fix distinct $a,b\in F$ . Let $X,Y$ be independent and uniformly distributed over $\{a,b\}$ . Then

[TABLE]

where the penultimate step is an application of the convexity of $\|\cdot\|_{F}^{p}$ . ∎

*Remark 3.1**.*

Fix $n\in\mathbb{N}$ . Following [7], for $a=(a_{1},\ldots,a_{2n})\in{\mathbb{C}}^{2n}$ denote by $\Re(a)=(\Re(a_{1}),\ldots,\Re(a_{2n}))\in{\mathbb{R}}^{2n}$ and $\Im(a)=(\Im(a_{1}),\ldots,\Im(a_{2n}))\in{\mathbb{R}}^{2n}$ the vectors of real parts and imaginary parts of the entries of $a$ , respectively. Let $\Lambda(a)\in[0,\infty)$ be the area of the parallelogram that is generated by $\Re(a)$ and $\Im(a)$ , i.e.,

[TABLE]

By [7, Lemma 5.2] there is a linear operator $\mathcal{C}:{\mathbb{C}}^{2n}\to\mathsf{M}_{2^{n}}({\mathbb{C}})$ from ${\mathbb{C}}^{2n}$ to the space of $2^{n}$ by $2^{n}$ complex matrices, such that for any $a\in{\mathbb{C}}^{2n}$ the Schatten-1 norm of the matrix $\mathcal{C}(a)$ satisfies

[TABLE]

Let $e_{1},\ldots,e_{2n}\in{\mathbb{C}}^{2n}$ be the standard basis of ${\mathbb{C}}^{2n}$ and define $2n$ matrices $x_{1},\ldots,x_{n},y_{1},\ldots,y_{n}\in\mathsf{M}_{2^{n}}({\mathbb{C}})$ by $x_{k}=\mathcal{C}(e_{k})$ and $y_{k}=\mathcal{C}(ie_{n+k})$ for $k\in\{1,\ldots,n\}$ . By (42) we have $\|x_{j}-x_{k}\|_{\mathsf{S}_{1}}=\|y_{j}-y_{k}\|_{\mathsf{S}_{1}}=\sqrt{2}$ for distinct $j,k\in\{1,\ldots,n\}$ , while $\|x_{j}-y_{k}\|_{\mathsf{S}_{1}}=1$ for all $j,k\in\{1,\ldots,n\}$ . Hence, if we let $X$ and $Y$ be independent and distributed uniformly over $\{x_{1},\ldots,x_{n}\}$ and $\{y_{1},\ldots,y_{n}\}$ , respectively, and $X^{\prime},Y^{\prime}$ are independent copies of $X,Y$ , respectively, then for every $p\geqslant 1$ we have

[TABLE]

By letting $n\to\infty$ , this implies that $\boldsymbol{\mathcal{r}}_{p}(\mathsf{S}_{1})\geqslant 2^{\frac{p}{2}+1}$ . In particular, $\boldsymbol{\mathcal{r}}_{1}(\mathsf{S}_{1})\geqslant 2\sqrt{2}$ .

*Remark 3.2**.*

Fix $q\geqslant 1$ . Let $(F,\|\cdot\|_{\!F})$ be a Banach space. Assume that $F$ has a linear subspace $G\subseteq F$ that is isometric to $L_{q}$ (or the Schatten–von Neumann trace class $\mathsf{S}_{q}$ ). If $X,Y\in L_{p}(G)$ are i.i.d. random variables taking values in $G$ , then for $c(p,q)$ as in (11), by Theorem 1.2 we have

[TABLE]

We note that this inequality is optimal despite the fact that the infimum is now taken over $z$ in the larger super-space $F$ . Indeed, in the proof of Theorem 1.2 the random variables that established optimality of $c(p,q)$ were symmetric when $p,q$ belong to the first three ranges that appear in (11). In these cases, by the convexity of $\|\cdot\|_{\!F}^{p}$ , the infimum in the right had side of (43) is attained at $z=0\in G$ . The fact that the term $2^{c(p,q)}$ in the right hand side of (43) cannot be replaced by any value greater than $2$ needs the following separate treatment. If $\varepsilon\in(0,1)$ and ${\mathbb{P}}[X=v]=\varepsilon=1-{\mathbb{P}}[X=0]$ for some $v\in G$ with $\|v\|_{\!F}=1$ , then ${\mathbb{E}}\left[\|X-Y\|_{\!F}^{p}\right]=2\varepsilon(1-\varepsilon)$ . Next, for any $z\in F$ we have

[TABLE]

where the final step follows by elementary calculus. Therefore,

[TABLE]

*Remark 3.3**.*

An extrapolation theorem of Pisier [18] asserts that if $(F,\|\cdot\|_{\!F})$ is a Banach lattice that is both $p$ -convex with constant $1$ and $q$ -concave with constant $1$ , where $\frac{1}{p}+\frac{1}{q}=1$ , then there exists a Banach lattice $W$ , a Hilbert space $H$ , and $\theta\in(0,1]$ such that $F$ is isometric to the complex interpolation space $[W,H]_{\theta}$ . Hence, Theorem 1.5 applies in this setting, implying in particular that there is $r\in[1,\infty)$ , namely $r=\frac{2}{\theta}$ , such that every i.i.d. $F$ -valued random variables $X,Y\in L_{r}(F)$ satisfy

[TABLE]

We will conclude by discussing further bounds in the non-convex range $p<1$ , as well as their limit when $p\to 0^{+}$ . When $p\in(0,1)$ , the topological vector space $L_{p}$ is not a normed space. Despite this, when we say that a normed space $(F,\|\cdot\|_{\!F})$ admits a linear isometric emebdding into $L_{p}$ we mean (as usual) that there exists a linear mapping $T:\to L_{p}$ such that $\|Tx\|_{p}=\|x\|_{\!F}$ for all $x\in F$ . This of course forces the $L_{p}$ quasi-norm to induce a metric on the image of $T$ , so the use of the term “isometric” is not out of place here, though note that it is inconsistent with the standard metric on $L_{p}$ , which is given by $\|f-g\|_{p}^{p}$ for all $f,g\in L_{p}$ . The following proposition treats the case $p\in(0,2]$ , though later we will mainly be interested in the non-convex range $p\in(0,1)$ . Note that the case $p=1$ implies the stated inequalities for, say, any two-dimensional normed space, since any such space admits [5] an isometric embedding into $L_{1}$ .

Proposition 3.4.

Let $(F,\|\cdot\|_{\!F})$ be a Banach space that admits an isometric linear embedding into $L_{p}$ for some $p\in(0,2]$ . Let $X,X^{\prime},Y,Y^{\prime}\in L_{p}(F)$ be independent $F$ -valued random vectors such that $X^{\prime}$ has the same distribution as $X$ and $Y^{\prime}$ has the same distribution as $Y$ . Then,

[TABLE]

and

[TABLE]

The constants $2$ and $\min\left\{2,2^{2-p}\right\}$ in (44) and (45), respectively, cannot be improved.

Proof.

By [19, 6] there is a mapping $\mathfrak{s}:F\to L_{2}$ such that $\|\mathfrak{s}(x)-\mathfrak{s}(y)\|_{2}=\|x-y\|_{\!F}^{\!\frac{p}{2}}$ for all $x,y\in F$ . By the (trivial) Hilbertian case $p=q=2$ of Theorem 1.2 applied to the $L_{2}$ -valued random vectors $\mathfrak{s}(X),\mathfrak{s}(Y)$ ,

[TABLE]

This substantiates (44). When $p<1$ we cannot proceed from here to prove (45) by considering the analogue of the mixture constant $\boldsymbol{\mathcal{m}}(\cdot)$ , namely by bounding the left hand side of (5) as we did in the Introduction, since the present $L_{p}$ integrability assumption on $X,Y$ does not imply that ${\mathbb{E}}[X]$ and ${\mathbb{E}}[Y]$ are well-defined elements of $F$ . Instead, let $Z^{\prime}$ be independent of $X,Y$ and distributed according to the mixture of the laws of $X$ and $Y$ , as in (6). The point $z\in F$ will be chosen randomly according to $Z^{\prime}$ , i.e.,

[TABLE]

For $p\geqslant 1$ we have $\boldsymbol{\mathcal{r}}_{p}(F)\leqslant 2$ by (44), and $\boldsymbol{\mathcal{j}}_{p}(F)\geqslant\boldsymbol{\mathcal{j}}_{p}(L_{p})=p-1$ by Theorem 1.2, so $\boldsymbol{\mathcal{b}}_{p}(F)\leqslant 2^{2-p}$ , by (10).

The sharpness of (44) is seen by taking $X$ and $Y$ to be identically distributed. When $p\geqslant 1$ , we already saw in the proof of Theorem 1.2 that $\boldsymbol{\mathcal{b}}_{p}(F)\geqslant 2^{2-p}$ for any Banach space $F$ ; thus (45) is sharp in this range. The same reasoning as in the proof of Theorem 1.2 shows that the factor $2$ in (45) cannot be improved in the non-convex range $p\in(0,1)$ as well. Indeed, fix $v$ with $\|v\|_{\!F}=1$ and let $X$ and $Y$ be uniformly distributed over $\{0,v\}$ . Then, ${\mathbb{E}}\left[\|X-z\|_{\!F}^{p}+{\mathbb{E}}\|Y-z\|_{\!F}^{p}\right]=\|z\|_{\!F}^{p}+\|v-z\|_{\!F}^{p}\geqslant(\|z\|_{\!F}+\|v-z\|_{\!F})^{p}\geqslant\|v\|_{\!F}^{p}=1$ for every $z\in F$ , while ${\mathbb{E}}\left[\|X-Y\|_{\!F}^{p}\right]=\frac{1}{2}\|v\|_{\!F}^{p}=\frac{1}{2}$ . ∎

Proposition 3.5 below is the limit of Proposition 3.4 as $p\to 0^{+}$ . While it is possible to deduce it formally from Proposition 3.4 by passing to the limit, a justification of this fact is quite complicated due to the singularity of the logarithm at zero. We will instead proceed via a shorter alternative approach.

Following [13], a real Banach space $(F,\|\cdot\|_{\!F})$ is said to admit a linear isometric embedding into $L_{0}$ if there exists a probability space $(\Omega,\mu)$ and a linear operator $T:F\to\mathsf{Meas}(\Omega,\mu)$ , where $\mathsf{Meas}(\Omega,\mu)$ denotes the space of (equivalence classes of) real-valued $\mu$ -measurable functions on $\Omega$ , such that

[TABLE]

As shown in [13], every three-dimensional real normed space admits a linear isometric embedding into $L_{0}$ , so in particular the following proposition applies to any such space.

Proposition 3.5.

Let $(F,\|\cdot\|_{\!F})$ be a real Banach space that admits a linear isometric embedding into $L_{0}$ . Let $X,X^{\prime},Y,Y^{\prime}$ be independent $F$ -valued random vectors such that $X^{\prime}$ has the same distribution as $X$ and $Y^{\prime}$ has the same distribution as $Y$ . Assume that ${\mathbb{E}}\left[\log(1+\|X\|_{\!F})\right]<\infty$ and ${\mathbb{E}}\left[\log(1+\|Y\|_{\!F})\right]<\infty$ . Then,

[TABLE]

and

[TABLE]

The multiplicative constant $1$ in both of these inequalities is optimal.

Proof.

(49) is a consequence of (48) by reasoning analogously to (46). Due to the assumed representation (47), by Fubini’s theorem it suffices to prove (48) for real-valued random variables.

So, suppose that $X,Y$ are independent real-valued random variables such that ${\mathbb{E}}\left[\log(1+|X|)\right]<\infty$ and ${\mathbb{E}}\left[\log(1+|Y|)\right]<\infty$ . Note that every nonnegative random variable $W$ with ${\mathbb{E}}\left[\log(1+W)\right]<\infty$ satisfies

[TABLE]

Indeed, for every $a,b\in[0,\infty)$ with $a\leqslant b$ we have

[TABLE]

so that (50) follows by applying this identity and the Fubini theorem separately on each of the events $\{W\geqslant 1\}$ and $\{W<1\}$ , taking advantage of the fact that $e^{-s}-e^{-sW}$ is of constant sign on both events.

Let $Z,Z^{\prime}$ be independent random variables whose law is the mixture of the laws of $X,Y$ as in (6). the desired inequality (48) is equivalent to the assertion that ${\mathbb{E}}\left[\log(Z-Z^{\prime})^{2}\right]\leqslant{\mathbb{E}}\left[\log(X-Y)^{2}\right]$ . By two applications of (50), once with $W=(X-Y)^{2}$ and once with $W=(Z-Z^{\prime})^{2}$ , it suffices to prove that

[TABLE]

This is so because, using the formula for the Fourier transform of the Gaussian density, we have

[TABLE]

where (51) uses Fubini and the independence of $Z$ and $Z^{\prime}$ , (52) uses the fact that for all $a,b\in{\mathbb{C}}$ we have $|(a+b)/2|^{2}=|(a-b)/2|^{2}+\Re(a\overline{b})\geqslant\Re(a\overline{b})$ , the first step of (53) uses the independence of $X$ and $Y$ , and the last step of (53) uses once more the formula for the Fourier transform of the Gaussian density.

The fact that (48) is sharp follows by considering the case when $X,Y$ are i.i.d. and non-atomic. Note that when both $X$ and $Y$ have an atom at the same point, both sides of (49) equal [math]. The example considered in the proof of Proposition 3.4 when $p>0$ is therefore of no use for establishing the optimality of (49), due to the atomic nature of the distributions under consideration. Instead, for an arbitrary $v\in F$ such that $\|v\|_{\!F}=1$ , let us consider random vectors $X=(\cos\Theta)v$ and $Y=(\cos\Theta^{\prime})v$ , where $\Theta$ and $\Theta^{\prime}$ are independent random variables uniformly distributed on $[0,2\pi]$ .

Observe that for every $\alpha\in{\mathbb{R}}$ we have

[TABLE]

where the last step of (54) holds because, by periodicity, $\left|\sin\left(\frac{\Theta\pm\alpha}{2}\right)\right|$ has the same distribution as $\left|\cos\Theta\right|$ .

The case $\alpha=\frac{\pi}{2}$ of (54) simplifies to give ${\mathbb{E}}\left[\log\left|\cos\Theta\right|\right]=-\log 2$ . Hence, (54) becomes

[TABLE]

Consequently,

[TABLE]

Indeed, if $t\in[-1,1]$ , then one can write $t=\cos\alpha$ for some $\alpha\in{\mathbb{R}}$ , so that by (55) the inequality in (56) holds as equality. If $|t|>1$ , then $\left|\cos\theta-t\right|\geqslant\left|\cos\theta-\mathrm{sign}(t)\right|$ for all $\theta\in[0,2\pi]$ , thus implying (56). It also follows from (55) that

[TABLE]

Next, by the Hahn–Banach theorem, take $\varphi\in F^{*}$ such that $\|\varphi\|_{F^{*}}=1$ and $\varphi(v)=\|v\|_{\!F}=1$ . For any $z\in F$ ,

[TABLE]

This implies the asserted sharpness of (49). Note that the above argument that (49) cannot hold with a multiplicative constant less than $1$ in the right hand side worked for any Banach space $F$ whatsoever. ∎

Acknowledgements

We are grateful to Oded Regev for pointing us to [7, Lemma 5.2] and for significantly simplifying our initial reasoning for the statement that is proved in Remark 3.1.

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. Andoni, A. Naor, and O. Neiman. Snowflake universality of Wasserstein spaces. Ann. Sci. Éc. Norm. Supér. (4) , 51(3):657–700, 2018.
2[2] Y. Bartal, N. Linial, M. Mendel, and A. Naor. Some low distortion metric Ramsey problems. Discrete Comput. Geom. , 33(1):27–41, 2005.
3[3] Y. Benyamini and J. Lindenstrauss. Geometric nonlinear functional analysis. Vol. 1 , volume 48 of American Mathematical Society Colloquium Publications . American Mathematical Society, Providence, RI, 2000.
4[4] J. Bergh and J. Löfström. Interpolation spaces. An introduction . Springer-Verlag, Berlin-New York, 1976. Grundlehren der Mathematischen Wissenschaften, No. 223.
5[5] E. D. Bolker. A class of convex bodies. Trans. Amer. Math. Soc. , 145:323–345, 1969.
6[6] J. Bretagnolle, D. Dacunha-Castelle, and J.-L. Krivine. Lois stables et espaces L p superscript 𝐿 𝑝 L^{p} . Ann. Inst. H. Poincaré Sect. B (N.S.) , 2:231–259, 1965/1966.
7[7] J. Briët, O. Regev, and R. Saket. Tight hardness of the non-commutative Grothendieck problem. Theory Comput. , 13:Paper No. 15, 24, 2017.
8[8] A.-P. Calderón. Intermediate spaces and interpolation, the complex method. Studia Math. , 24:113–190, 1964.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Moments of the distance between independent random vectors

Abstract.

1. Introduction

Theorem 1.1**.**

1.1. Geometric motivation

1.2. Probabilistic discussion

Theorem 1.2**.**

Conjecture 1.3*.*

Question 1.4*.*

1.3. Complex interpolation

Theorem 1.5**.**

Theorem 1.6**.**

Proof of Theorem 1.5 assuming Theorem 1.6.

2. Proof of Theorem 1.1

Remark 2.1*.*

Proposition 2.2**.**

Proof.

Proposition 2.3**.**

Proof.

3. Proof of Theorem 1.6 and its consequences

Proof of Theorem 1.6.

Proof of Theorem 1.2.

Remark 3.1*.*

Remark 3.2*.*

Remark 3.3*.*

Proposition 3.4**.**

Proof.

Proposition 3.5**.**

Proof.

Acknowledgements

Theorem 1.1.

Theorem 1.2.

*Conjecture 1.3**.*

*Question 1.4**.*

Theorem 1.5.

Theorem 1.6.

*Remark 2.1**.*

Proposition 2.2.

Proposition 2.3.

*Remark 3.1**.*

*Remark 3.2**.*

*Remark 3.3**.*

Proposition 3.4.

Proposition 3.5.