"The Capacity of the Relay Channel": Solution to Cover's Problem in the   Gaussian Case

Xiugang Wu; Leighton Pate Barnes; Ayfer Ozgur

arXiv:1701.02043·cs.IT·October 9, 2018

"The Capacity of the Relay Channel": Solution to Cover's Problem in the Gaussian Case

Xiugang Wu, Leighton Pate Barnes, Ayfer Ozgur

PDF

Open Access

TL;DR

This paper solves a long-standing open problem by showing that in Gaussian relay channels, the capacity cannot be achieved with a finite relay link capacity, using a new high-dimensional geometric approach.

Contribution

The paper introduces a novel geometric method to bound the capacity of Gaussian relay channels, providing a definitive answer to Cover's problem.

Findings

01

Capacity cannot be achieved with finite relay link capacity in Gaussian channels.

02

Develops a new high-dimensional isoperimetric inequality extension.

03

Provides a new upper bound on the relay channel capacity.

Abstract

Consider a memoryless relay channel, where the relay is connected to the destination with an isolated bit pipe of capacity $C_{0}$ . Let $C (C_{0})$ denote the capacity of this channel as a function of $C_{0}$ . What is the critical value of $C_{0}$ such that $C (C_{0})$ first equals $C (\infty)$ ? This is a long-standing open problem posed by Cover and named "The Capacity of the Relay Channel," in $O p e n P r o b l e m s in C o mm u ni c a t i o n an d C o m p u t a t i o n$ , Springer-Verlag, 1987. In this paper, we answer this question in the Gaussian case and show that $C (C_{0})$ can not equal to $C (\infty)$ unless $C_{0} = \infty$ , regardless of the SNR of the Gaussian channels. This result follows as a corollary to a new upper bound we develop on the capacity of this channel. Instead of "single-letterizing" expressions involving information measures in a high-dimensional space as is typically done in converse results in…

Equations659

P_{e}^{(n)} = \mbox P r (g_{n} (Y^{n}, f_{n} (Z^{n})) \neq = M),

P_{e}^{(n)} = \mbox P r (g_{n} (Y^{n}, f_{n} (Z^{n})) \neq = M),

p (m, y^{n}, z^{n}) = 2^{- n R} i = 1 \prod n p (y_{i} ∣ x_{i} (m)) i = 1 \prod n p (z_{i} ∣ x_{i} (m)) .

p (m, y^{n}, z^{n}) = 2^{- n R} i = 1 \prod n p (y_{i} ∣ x_{i} (m)) i = 1 \prod n p (z_{i} ∣ x_{i} (m)) .

Z = X + W_{1}

Z = X + W_{1}

Y = X + W_{2}

∥ x^{n} (m) ∥^{2} \leq n P, \forall m \in [1 : 2^{n R}],

∥ x^{n} (m) ∥^{2} \leq n P, \forall m \in [1 : 2^{n R}],

C (\infty) = \frac{1}{2} lo g (1 + \frac{2 P}{N}) .

C (\infty) = \frac{1}{2} lo g (1 + \frac{2 P}{N}) .

C_{0}^{*} := \mbox in f {C_{0} : C (C_{0}) = C (\infty)} .

C_{0}^{*} := \mbox in f {C_{0} : C (C_{0}) = C (\infty)} .

C_{0}^{*} \leq \infty.

C_{0}^{*} \leq \infty.

C_{0}^{*} \geq \frac{1}{2} lo g (1 + \frac{2 P}{N}) - \frac{1}{2} lo g (1 + \frac{P}{N}) .

C_{0}^{*} \geq \frac{1}{2} lo g (1 + \frac{2 P}{N}) - \frac{1}{2} lo g (1 + \frac{P}{N}) .

\displaystyle C(C_{0})\leq\

\displaystyle C(C_{0})\leq\

\displaystyle+\sup_{\theta\in\left[\arcsin(2^{-C_{0}}),\frac{\pi}{2}\right]}\min\Bigg{\{}\begin{split}&C_{0}+\log\mbox{sin}\,\theta,\\ &\min_{\omega\in\left(\frac{\pi}{2}-\theta,\frac{\pi}{2}\right]}h_{\theta}(\omega)\end{split}\Bigg{\}}

h_{θ} (ω) := \frac{1}{2} lo g (\frac{4 sin ^{2} \frac{ω}{2} ( P + N - N sin ^{2} \frac{ω}{2} ) \mbox s in ^{2} θ}{( P + N ) ( sin ^{2} θ - cos ^{2} ω )}) .

h_{θ} (ω) := \frac{1}{2} lo g (\frac{4 sin ^{2} \frac{ω}{2} ( P + N - N sin ^{2} \frac{ω}{2} ) \mbox s in ^{2} θ}{( P + N ) ( sin ^{2} θ - cos ^{2} ω )}) .

S^{m - 1} = {z \in R^{m} : ∥ z ∥ = R},

S^{m - 1} = {z \in R^{m} : ∥ z ∥ = R},

μ (S^{m - 1}) = \frac{2 π ^{\frac{m}{2}}}{Γ ( \frac{m}{2} )} R^{m - 1},

μ (S^{m - 1}) = \frac{2 π ^{\frac{m}{2}}}{Γ ( \frac{m}{2} )} R^{m - 1},

Cap (z_{0}, θ) = {z \in S^{m - 1} : ∠ (z_{0}, z) \leq θ} .

Cap (z_{0}, θ) = {z \in S^{m - 1} : ∠ (z_{0}, z) \leq θ} .

μ (A_{t}) \geq μ (C_{t}), \forall t \geq 0,

μ (A_{t}) \geq μ (C_{t}), \forall t \geq 0,

A_{t} = {z \in S^{m - 1} : z^{'} \in A min ∠ (z, z^{'}) \leq t},

A_{t} = {z \in S^{m - 1} : z^{'} \in A min ∠ (z, z^{'}) \leq t},

C_{t} = {z \in S^{m - 1} : z^{'} \in C min ∠ (z, z^{'}) \leq t} = Cap (z_{0}, θ + t) .

C_{t} = {z \in S^{m - 1} : z^{'} \in C min ∠ (z, z^{'}) \leq t} = Cap (z_{0}, θ + t) .

P (∠ (z, Y) \in [π /2 - ϵ, π /2 + ϵ]) \geq 1 - δ,

P (∠ (z, Y) \in [π /2 - ϵ, π /2 + ϵ]) \geq 1 - δ,

P (∣ ⟨ e_{1} / R, Y / R ⟩ ∣ \geq μ) \leq \frac{1}{m μ ^{2}} .

P (∣ ⟨ e_{1} / R, Y / R ⟩ ∣ \geq μ) \leq \frac{1}{m μ ^{2}} .

P (A_{\frac{π}{2} - θ + ϵ}) \geq 1 - ϵ .

P (A_{\frac{π}{2} - θ + ϵ}) \geq 1 - ϵ .

P (μ (A \cap Cap (Y, \frac{π}{2} - θ + ϵ)) > 0) > 1 - ϵ,

P (μ (A \cap Cap (Y, \frac{π}{2} - θ + ϵ)) > 0) > 1 - ϵ,

P (μ (A \cap Cap (Y, ω + ϵ)) > (1 - ϵ) V) \geq 1 - ϵ,

P (μ (A \cap Cap (Y, ω + ϵ)) > (1 - ϵ) V) \geq 1 - ϵ,

L^{m} = {y \in R^{m} : R_{L} \leq ∥ y ∥ \leq R_{U}}

L^{m} = {y \in R^{m} : R_{L} \leq ∥ y ∥ \leq R_{U}}

∠ (y, z) = arccos (\frac{y \cdot z}{∥ y ∥∥ z ∥})

∠ (y, z) = arccos (\frac{y \cdot z}{∥ y ∥∥ z ∥})

ShellCap (z_{0}, θ) = {z \in L^{m} : ∠ (z_{0}, z) \leq θ} .

ShellCap (z_{0}, θ) = {z \in L^{m} : ∠ (z_{0}, z) \leq θ} .

P (∣ A \cap ShellCap (Y, ω + ϵ) ∣ > (1 - ϵ) V) \geq 1 - ϵ,

P (∣ A \cap ShellCap (Y, ω + ϵ) ∣ > (1 - ϵ) V) \geq 1 - ϵ,

H (I_{n} ∣ Y^{n})

H (I_{n} ∣ Y^{n})

\leq n \cdot ω \in (\frac{π}{2} - θ_{n}, \frac{π}{2}] min \frac{1}{2} lo g (\frac{4 sin ^{2} \frac{ω}{2} ( P + N - N sin ^{2} \frac{ω}{2} )}{( P + N ) ( sin ^{2} θ _{n} - cos ^{2} ω )}) .

{(I_{n} (b), Z^{n} (b), X^{n} (b), Y^{n} (b))}_{b = 1}^{B},

{(I_{n} (b), Z^{n} (b), X^{n} (b), Y^{n} (b))}_{b = 1}^{B},

Shell (c, r_{1}, r_{2}) := {a \in R^{n B} : r_{1} \leq ∥ a - c ∥ \leq r_{2}},

Shell (c, r_{1}, r_{2}) := {a \in R^{n B} : r_{1} \leq ∥ a - c ∥ \leq r_{2}},

Ball (c, r) := {a \in R^{n B} : ∥ a - c ∥ \leq r} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWireless Communication Security Techniques · Cooperative Communication and Network Coding · Limits and Structures in Graph Theory

Full text

“The Capacity of the Relay Channel”:

Solution to Cover’s Problem in the Gaussian Case

Xiugang Wu, Leighton Pate Barnes, and Ayfer Özgür The work was supported in part by NSF award CCF-1704624 and by the Center for Science of Information (CSoI), an NSF Science and Technology Center, under grant agreement CCF-0939370. This paper was presented in part at the 2016 Allerton Conference on Communication, Control, and Computing [1].X. Wu is with the Department of Electrical and Computer Engineering, University of Delaware, Newark, DE 19716, USA (e-mail: [email protected]). The work of X. Wu was done when he was with Stanford University.L. P. Barnes and A. Özgür are with the Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA (e-mail: [email protected]; [email protected]).

Abstract

Consider a memoryless relay channel, where the relay is connected to the destination with an isolated bit pipe of capacity $C_{0}$ . Let $C(C_{0})$ denote the capacity of this channel as a function of $C_{0}$ . What is the critical value of $C_{0}$ such that $C(C_{0})$ first equals $C(\infty)$ ? This is a long-standing open problem posed by Cover and named “The Capacity of the Relay Channel,” in Open Problems in Communication and Computation, Springer-Verlag, 1987. In this paper, we answer this question in the Gaussian case and show that $C(C_{0})$ can not equal to $C(\infty)$ unless $C_{0}=\infty$ , regardless of the SNR of the Gaussian channels. This result follows as a corollary to a new upper bound we develop on the capacity of this channel. Instead of “single-letterizing” expressions involving information measures in a high-dimensional space as is typically done in converse results in information theory, our proof directly quantifies the tension between the pertinent $n$ -letter forms. This is done by translating the information tension problem to a problem in high-dimensional geometry. As an intermediate result, we develop an extension of the classical isoperimetric inequality on a high-dimensional sphere, which can be of interest in its own right.

Index Terms:

Relay channel, capacity, information inequality, geometry, isoperimetric inequality, concentration of measure

I Problem Setup and Main Result

In 1987, Thomas M. Cover formulated a seemingly simple question in Open Problems in Communication and Computation, Springer-Verlag [2], which he called “The Capacity of the Relay Channel”. This problem, not much longer than a single page in [2], remains open to date. His problem statement, taken verbatim from [2] with only a few minor notation changes, is as follows:

The Capacity of the Relay Channel

Consider the following seemingly simple discrete memoryless relay channel:

Here $Z$ and $Y$ are conditionally independent and conditionally identically distributed given $X$ , that is, $p(z,y|x)=p(z|x)p(y|x)$ . Also, the channel from $Z$ to $Y$ does not interfere with $Y$ . A $(2^{nR},n)$ code for this channel is a map $X^{n}:[1:2^{nR}]\to\mathcal{X}^{n}$ , a relay function $f_{n}:\mathcal{Z}^{n}\to[1:2^{nC_{0}}]$ and a decoding function $g_{n}:\mathcal{Y}^{n}\times[1:2^{nC_{0}}]\to[1:2^{nR}]$ . The probability of error is given by

[TABLE]

where the message $M$ is uniformly distributed over $[1:2^{nR}]$ and

[TABLE]

Let $C(C_{0})$ be the supremum of achievable rates $R$ for a given $C_{0}$ , that is, the supremum of the rates $R$ for which $P_{e}^{(n)}$ can be made to tend to zero. We note the following facts:

$C(0)=\sup_{p(x)}I(X;Y).$ **

2.

$C(\infty)=\sup_{p(x)}I(X;Y,Z).$ **

3.

$C(C_{0})$ * is a nondecreasing function of $C_{0}$ .*

*What is the critical value of $C_{0}$ such that $C(C_{0})$ first equals $C(\infty)$ ? *

I-A Main Result

As is customary in network information theory, Cover formulates the problem for discrete memoryless channels. However, the same question clearly applies to channels with continuous input and output alphabets, and in particular when the channels from the source to the relay and the destination are Gaussian, which is the canonical model for wireless relay channels. More formally, assume

[TABLE]

with the transmitted signal being constrained to average power $P$ , i.e.,

[TABLE]

and $W_{1},W_{2}\sim\mathcal{N}(0,N)$ representing Gaussian noises that are independent of each other and $X$ . See Fig. 1.

For this Gaussian relay channel, it is easy to observe that111All logarithms throughout the paper are to base two.

[TABLE]

Let $C_{0}^{*}$ denote the threshold in Cover’s problem, i.e.

[TABLE]

For the Gaussian model, there is no known scheme that allows to achieve $C(\infty)$ at a finite $C_{0}$ regardless of the parameters of the channels, i.e. the signal to noise power ratio (SNR) $P/N$ . Therefore, from an achievability perspective we only have the trivial bound

[TABLE]

On the converse side, any upper bound on the capacity of this channel can be used to establish a lower bound on $C_{0}^{*}$ . The only upper bound on the capacity of this channel (prior to our work in [5]–[6] preceding the current paper) was the celebrated cut-set bound developed by Cover and El Gamal in 1979 [10]. It yields the following lower bound on $C_{0}^{*}$ :

[TABLE]

Note that the cut-set bound does not preclude achieving $C(\infty)$ at finite $C_{0}$ . Moreover, it is interesting to note that as $P/N$ decreases to zero, this lower bound decreases to zero. This implies a sharp dichotomy between the current achievability and converse results for this problem, which becomes even more apparent in the limit when SNR goes to zero: the cut-set bound does not preclude achieving $C(\infty)$ at diminishing $C_{0}$ if $C(\infty)$ itself is diminishing, while from an achievability perspective we need $C_{0}=\infty$ regardless of the SNRs of the channels (apart from the trivial case when $P/N$ is exactly equal to [math]). The main result of our paper is to show that $C_{0}^{*}=\infty$ regardless of the parameters of the problem, answering Cover’s long-standing question for the canonical Gaussian model.

Theorem I.1

For the symmetric Gaussian relay channel depicted in Fig. 1, $C^{*}_{0}=\infty$ .

This theorem follows immediately from the following theorem which establishes a new upper bound on the capacity of this channel for any $C_{0}$ .

Theorem I.2

For the symmetric Gaussian relay channel depicted in Fig. 1, the capacity $C(C_{0})$ satisfies

[TABLE]

*where *

[TABLE]

In Fig. 2 we plot this upper bound (label: New bound) under three different SNR values of the Gaussian channels, together with the cut-set bound [10] and an upper bound on the capacity of this channel we have previously derived in [6] (label: Old bound). For reference, we also provide the rate achieved by a compress-and-forward relay strategy (label: C-F), which employs Gaussian input distribution at the source combined with Gaussian quantization and Wyner-Ziv binning at the relay.222In the low SNR regime, we can achieve higher rates using bursty compress-and-forward [21], as demonstrated in the left-most plot of Fig. 2. Note that since we still impose the Gaussian restriction on the input and quantization distributions for bursty compress-forward, the resultant rates are not concave in $C_{0}$ and can be further improved by time sharing. The flat levels at which the cut-set bound and our old bound saturate in these plots precisely correspond to $C(\infty)$ . Note that while these earlier bounds reach $C(\infty)$ at finite $C_{0}$ values, hence leading to finite lower bounds on $C_{0}^{*}$ , our new bound remains bounded away from $C(\infty)$ in all the three plots. Indeed, it can be formally shown that the new bound remains bounded away from $C(\infty)$ (the flat level in the plots) at any finite $C_{0}$ value. We prove this formally in the proof of Theorem I.1.

While in this paper we restrict our attention to the symmetric case, an assumption imposed by Cover in his original formulation of the problem given above, our methods and results also extend to the asymmetric case. In [8], we show that when the relay’s and the destination’s observations are corrupted by independent Gaussian noises of different variances, it is still true that $C_{0}^{*}=\infty$ regardless of the channel parameters. The extension to this asymmetric case heavily builds on the methods and results we develop in this paper for the symmetric case. Interestingly, the symmetric case, which Cover seems to somewhat arbitrarily assume in his problem formulation, turns out to be the canonical case for our proof technique. We also provide a solution to Cover’s problem for binary symmetric channels in [9] using a similar approach.

I-B Technical Approach

There are two basic aspects in an information-theoretic characterization of an operational problem: the so-called achievability result and converse result. An achievability result establishes what is possible in a given setting, while the converse result distinguishes what is impossible. The ideal situation is when these two results match, in which case an information limit is born. The most famous example goes back to Shannon and the inception of the field: Reliable communication is possible over a noisy channel if, and only if, the rate of transmission does not exceed the capacity of the channel [18].

Over the last two decades, there has been significant leap forward in developing achievable schemes for multi-user problems, ranging from schemes based on interference alignment and distributed MIMO, to lattice-based techniques, to strategies inspired by network coding and linear deterministic models. This stands in fairly stark contrast to the set of converse arguments in the information theorist’s toolkit. Almost all converse arguments rely on a few fundamental tools that go back to the early years of the field: information measure calculus (e.g., chain rules, non-negativity of divergence), Fano’s inequality, and the entropy power inequality. The typical converse program follows from a clever application of these tools to “single-letterize” an expression involving information measures in a high-dimensional space (so called $n$ -letter forms), with the possible introduction of auxiliary random variables as needed.

In this paper, we take a different approach. Instead of focusing on single-letterizing pertinent $n$ -letter forms, we aim to directly quantify the tension between them. To do this, we lift the problem to an even higher dimensional space and study the geometry of the typical sequences generated independently and identically (i.i.d.) from these $n$ -dimensional distributions. We establish non-trivial geometric properties satisfied by these typical sequences, which are then translated to inequalities satisfied by the original $n$ -dimensional information measures. This notion of “typicality”, connecting information measures associated with a distribution to probabilities of long i.i.d. sequences generated from this distribution, is a standard tool in establishing achievability results in information theory but to the best of our knowledge has been rarely used in proving converse results in network information theory, with only a few examples such as the work of Zhang [11] from 1988 and our recent works [3]–[7].

To study the geometry of the typical sequences, we use classical tools from high-dimensional geometry, such as the isoperimetric inequality [14], measure concentration [12], and rearrangement and symmetrization theory [13, 25]. We also prove a new geometric result which can be regarded as an extension of the classical isoperimetric inequality on a high-dimensional sphere and can be of interest in its own right. Note that the classical isoperimetric inequality on the sphere states that among all sets on the sphere with a given measure (area), the spherical cap has the smallest boundary or more generally the smallest neighborhood [16]. As an intermediate result in this paper, we show that the spherical cap not only minimizes the measure of its neighborhood, but roughly speaking, also minimizes the measure of its intersection with the neighborhood of a randomly chosen point on the sphere.

The incorporation of geometric insight in information theory is not new. Formulating the problem of determining the communication capacity of channels as a problem in high-dimensional geometry is indeed one of Shannon’s most important insights that has led to the conception of the field. In his classical paper “Communication in the presence of noise”, 1949 [17], Shannon develops a geometric representation of any point-to-point communication system, and then uses this geometric representation to derive the capacity formula for the AWGN channel. His converse proof is based on a sphere-packing argument, which relies on the notion of sphere hardening (i.e. measure concentration) in high-dimensional space. Our approach resembles Shannon’s approach in [17] in that the main argument in our proof is also a packing argument; however, instead of packing smaller spheres in a larger sphere, we pack (quantization) regions of some minimal measure (and unknown shape) inside a spherical cap. The key ingredient in our packing argument is the extended isoperimetric inequality we develop, which guarantees that each of these quantization regions has some minimal intersection with the spherical cap. Also, note that we do not directly study the geometry of the codewords as in [17], but rather use geometry in an indirect way to solve an $n$ -letter information tension problem.

I-C Organization of The Paper

The remainder of the paper is organized as follows. In Section II, we review some basic definitions and results for high-dimensional spheres, and state our main geometric result in Theorem II.2, which can be regarded as an extension of the classical isoperimetric inequality on the sphere. In Section III, we introduce some typicality lemmas and combine them with Theorem II.2 to prove a key information inequality stated in Theorem III.1. The proofs of our main theorems, Theorem I.1 and I.2, are almost immediate given Theorem III.1 and are provided in Section IV.

Appendices A and B are then devoted to the proof of Theorem II.2 and the proofs of the typicality lemmas introduced in Section III, respectively. The proofs of these typicality lemmas require us to derive formulas and exponential characterizations for the area/volume of various high dimensional sets including balls, spherical caps, shell caps, and intersections of such sets. We derive these characterizations in Appendix C.

II Geometry of High-Dimensional Spheres

In this section, we summarize some basic definitions and results for high-dimensional spheres and present our main geometric result which can be regarded as an extension of the classical isoperimetric inequality on high-dimensional spheres. This result is the key to proving the information inequality we present in the next section, which in turn is the key to proving Theorems I.1 and I.2.

II-A Basic Results on High-Dimensional Spheres

We now summarize some basic results on high-dimensional spheres that will be referred to later in the paper.

(i)

Isoperimetric Inequality: Let $\mathbb{S}^{m-1}\subseteq\mathbb{R}^{m}$ denote the $(m-1)$ -sphere of radius $R$ , i.e.,

[TABLE]

equipped with the rotation invariant (Haar) measure $\mu=\mu_{m-1}$ that is normalized such that

[TABLE]

i.e. the usual surface area. Let $\mathbb{P}(A)$ denote the probability of a set or event $A$ with respect to the corresponding Haar probability measure, i.e. the normalized Haar measure such that $\mathbb{P}(\mathbb{S}^{m-1})=1$ . A spherical cap is defined as a ball on $\mathbb{S}^{m-1}$ in the geodesic metric (or simply the angle) $\angle(\mathbf{z},\mathbf{y})=\arccos(\langle\mathbf{z}/R,\mathbf{y}/R\rangle)$ , i.e.,

[TABLE]

See Fig. 3. We will often say that an arbitrary set $A\subseteq\mathbb{S}^{m-1}$ has an effective angle $\theta$ if $\mu(A)=\mu(C)$ , where $C=\text{Cap}(\mathbf{z}_{0},\theta)$ for some arbitrary $\mathbf{z}_{0}\in\mathbb{S}^{m-1}$ .

The following proposition is the so-called isoperimetric inequality, which was first proved by Levy in 1951 [14]. (See also [16].) It states the intuitive fact that among all sets on the sphere with a given measure, the spherical cap has the smallest boundary, or more generally the smallest neighborhood. This is formalized as follows:

Proposition II.1

For any arbitrary set $A\subseteq\mathbb{S}^{m-1}$ such that $\mu(A)=\mu(C)$ , where $C=\text{Cap}(\mathbf{z}_{0},\theta)\subseteq\mathbb{S}^{m-1}$ is a spherical cap, it holds that

[TABLE]

where $A_{t}$ is the $t$ -neighborhood of $A$ , defined as

[TABLE]

and similarly

[TABLE]

(ii)

Measure Concentration: Measure concentration on the sphere refers to the fact that most of the measure of a high-dimensional sphere is concentrated around any equator. The following elementary result capturing this phenomenon will be used later in the paper when we prove the extended isoperimetric inequality.

Proposition II.2

Given any $\epsilon,\delta>0$ , there exists some $M(\epsilon,\delta)$ such that for any $m\geq M(\epsilon,\delta)$ and any $\ \mathbf{z}\in\mathbb{S}^{m-1}$ ,

[TABLE]

where $\mathbf{Y}\in\mathbb{S}^{m-1}$ is distributed according to the Haar probability measure.

Proof:

Let $\mathbf{e}_{1}=(R,0,\ldots,0)$ . Note for any $\mathbf{z}\in\mathbb{S}^{m-1}$ , the distribution of $\angle(\mathbf{z},\mathbf{Y})$ is the same as the distribution of $\angle(\mathbf{e}_{1},\mathbf{Y})$ , since $\mathbf{z}$ can be written in the form $\mathbf{z}=U\mathbf{e}_{1}$ , where $U$ is an orthogonal matrix, and the distribution of $\mathbf{Y}$ is rotation-invariant. Therefore, without loss of generality, we can assume $\mathbf{z}=\mathbf{e}_{1}$ . Since $\langle\mathbf{e}_{1}/R,\mathbf{Y}/R\rangle=Y_{1}/R$ , we have $E[\langle\mathbf{e}_{1}/R,\mathbf{Y}/R\rangle]=E[Y_{1}]/R=0$ ; we also have $E[\langle\mathbf{e}_{1}/R,\mathbf{Y}/R\rangle^{2}]=E[Y_{1}^{2}]/R^{2}=1/m$ because $E[Y_{1}^{2}]=\cdots=E[Y^{2}_{m}]$ and $E[Y_{1}^{2}]+\cdots+E[Y^{2}_{m}]=R^{2}$ . Therefore by Chebyshev’s inequality, for any $\mu>0$ ,

[TABLE]

Recalling that $\angle(\mathbf{e}_{1},\mathbf{Y})=\arccos(\langle\mathbf{e}_{1}/R,\mathbf{Y}/R\rangle)$ and noting that the R.H.S. of the above inequality can be made arbitrarily small by choosing $m$ to be sufficiently large, we have proved the proposition. ∎

(iii)

Blowing-Up Lemma: The above measure concentration result combined with the isoperimetric inequality immediately yields the following result:

Proposition II.3

Let $A\subseteq\mathbb{S}^{m-1}$ be an arbitrary set and $C=\text{Cap}(\mathbf{z}_{0},\theta)\subseteq\mathbb{S}^{m-1}$ be a spherical cap such that $\mu(A)=\mu(C)$ , i.e. $A$ has an effective angle of $\theta$ . Then for any $\epsilon>0$ and $m$ sufficiently large,

[TABLE]

Proof:

If $A=\text{Cap}(\mathbf{z}_{0},\theta)$ , $\mathbb{P}(A_{\frac{\pi}{2}-\theta+\epsilon})\geq 1-\epsilon$ due to Proposition II.2. If $A$ is not a spherical cap, then $\mathbb{P}(A_{\frac{\pi}{2}-\theta+\epsilon})\geq P(C_{\frac{\pi}{2}-\theta+\epsilon})$ where $C=\text{Cap}(\mathbf{z}_{0},\theta)$ , due to the isoperimetric inequality in Proposition II.1. ∎

If we take $A$ to be a half sphere, this result says that most of the measure of the sphere is concentrated around the boundary of this half-sphere, i.e. an equator, which is the result in Proposition II.2. However, due to the isoperimetric inequality, Proposition II.3 allows us to make the stronger statement that the measure is concentrated around the boundary of any set with probability $1/2$ . While the elementary results we establish above suggest that this concentration takes place at a polynomial speed in the dimension $m$ , it can be shown that the measure concentrates around the boundary of any set with probability $1/2$ exponentially fast in the dimension $m$ ; see [15].

II-B Extended Isoperimetry on the Sphere and the Shell

An almost equivalent way to state the blowing-up lemma in Proposition II.3 is the following: Let $A\subseteq\mathbb{S}^{m-1}$ be an arbitrary set with effective angle $\theta>0$ . Then for any $\epsilon>0$ and sufficiently large $m$ ,

[TABLE]

where $\mathbf{Y}$ is distributed according to the normalized Haar measure on $\mathbb{S}^{m-1}$ . In words, if we take a $\mathbf{y}$ uniformly at random on the sphere and draw a spherical cap of angle slightly larger than $\frac{\pi}{2}-\theta$ around it, this cap will intersect the set $A$ with high probability. This statement is almost equivalent to (4) since the $\mathbf{y}$ ’s for which the intersection has non-zero measure lie in the $\frac{\pi}{2}-\theta+\epsilon$ -neighborhood of $A$ . Note that similarly to Proposition II.3, this statement would trivially follow from measure concentration on the sphere (Proposition II.2) if $A$ were known to be a spherical cap, and it holds for any $A$ due to the isoperimetric inequality in Proposition II.1. By building on the Riesz rearrangement inequality [25], we prove the following extended result:

Theorem II.1

Let $A\subseteq\mathbb{S}^{m-1}$ be any arbitrary subset of $\mathbb{S}^{m-1}$ with effective angle $\theta>0$ , and let $V=\mu(\text{Cap}(\mathbf{z}_{0},\theta)\cap\text{Cap}(\mathbf{y}_{0},\omega))$ where $\mathbf{z}_{0},\mathbf{y}_{0}\in\mathbb{S}^{m-1}$ with $\angle(\mathbf{z}_{0},\mathbf{y}_{0})=\pi/2$ and $\theta+\omega>\pi/2$ . (See Fig. 4.) Then for any $\epsilon>0$ , there exists an $M(\epsilon)$ such that for $m>M(\epsilon)$ ,

[TABLE]

where $\mathbf{Y}$ is a random vector on $\mathbb{S}^{m-1}$ distributed according to the normalized Haar measure.

If $A$ itself is a cap, then the statement in Theorem II.1 is straightforward and follows from the fact that $\mathbf{Y}$ with high probability will be concentrated around the equator at angle $\pi/2$ from the pole of $A$ (Proposition II.2). Therefore, as $m$ gets large for almost all $\mathbf{Y}$ , the intersection of the two spherical caps will be given by $V$ . See Fig. 4. The statement, however, is stronger than this and holds for any arbitrary set $A$ , analogous to the isoperimetric inequality in (5). It states that no matter what the set $A$ is, if we take a random point on the sphere and draw a cap of angle slightly larger than $\omega$ centered at this point, for any $\omega>\pi/2-\theta$ , then with high probability the intersection of the cap with the set $A$ would be at least as large as the intersection we would get if $A$ were a spherical cap. In this sense, Theorem II.1 can be regarded as an extension of the isoperimetric inequality in Proposition II.1, even though the latter can be stated purely geometrically and implies the weaker probabilistic statement in (5), while our result is inherently probabilistic.

Theorem II.1 is in fact a special case of a more general theorem that is true for subsets on a spherical shell. Let

[TABLE]

be this shell, where $0\leq R_{L}\leq R_{U}$ . A cap on this shell with pole $\mathbf{z}_{0}$ and angle $\theta$ can be defined as a ball in terms of the angle:

[TABLE]

on the shell, i.e.,

[TABLE]

Let $|A|$ denote the standard $m$ -dimensional Euclidean measure of a subset $A\subseteq\mathbb{L}^{m}$ . We will say that an arbitrary set $A\subseteq\mathbb{L}^{m}$ has effective angle $\theta>0$ if its measure is equal to that of a shell cap of angle $\theta$ , i.e. $|A|=|\text{ShellCap}(\mathbf{z}_{0},\theta)|$ for some $\mathbf{z}_{0}\in\mathbb{L}^{m}$ . We will also say that a probability measure $\mathbb{P}$ for subsets of $\mathbb{L}^{m}$ is rotationally invariant if $\mathbb{P}(A)=\mathbb{P}(UA)$ for any orthogonal matrix $U$ , where $UA$ denotes the image of the set $A$ under the linear transformation $U$ . The following more general theorem holds in the shell setting.

Theorem II.2

Let $A\subseteq\mathbb{L}^{m}$ be any arbitrary subset of $\mathbb{L}^{m}$ with effective angle $\theta>0$ , and let $V=|\text{ShellCap}(\mathbf{z}_{0},\theta)\cap\text{ShellCap}(\mathbf{y}_{0},\omega)|$ where $\mathbf{z}_{0},\mathbf{y}_{0}\in\mathbb{L}^{m}$ with $\angle(\mathbf{z}_{0},\mathbf{y}_{0})=\pi/2$ and $\theta+\omega>\pi/2$ . Then for any $\epsilon>0$ , there exists an $M(\epsilon)$ such that for $m>M(\epsilon)$ ,

[TABLE]

where $\mathbf{Y}$ is a random vector drawn from any rotationally invariant probability measure on $\mathbb{L}^{m}$ .

We prove Theorems II.1 and II.2 in Appendix A. Note that $M(\epsilon)$ in these two results depends only on $\epsilon$ —in particular it does not depend on the radius parameters for $\mathbb{L}^{m}$ and $\mathbb{S}^{m-1}$ , respectively, which means that these two results also apply if the radius parameters depend on the dimension $m$ . In the following section, we will be mainly interested in the case when the radius parameters scale in the square-root of the dimension.

III Information Tension in

A Symmetric Markov Chain

In this section, we prove an inequality between information measures in a certain type of Markov chain, which can be of interest in its own right. The proof of this inequality builds on Theorem II.2 from the previous section. As we will see in Section IV, the main theorems in this paper, i.e. Theorems I.1 and I.2, are almost immediate given this result. We now state this result in the following theorem.

Theorem III.1

Consider a Markov chain $I_{n}-Z^{n}-X^{n}-Y^{n}$ where $X^{n}$ , $Y^{n}$ and $Z^{n}$ are $n$ -length random vectors and $I_{n}=f_{n}(Z^{n})$ is a deterministic mapping of $Z^{n}$ to a set of integers. Assume moreover that $Z^{n}$ and $Y^{n}$ are i.i.d. white Gaussian vectors given $X^{n}$ , i.e. $Z^{n},Y^{n}\sim\mathcal{N}(X^{n},N\,I_{n\times n})$ where $I_{n\times n}$ denotes the identity matrix, $E[\|X^{n}\|^{2}]=nP$ , and $H(I_{n}|X^{n})=-n\log\mbox{sin}\,\theta_{n}$ for some $\theta_{n}\in[0,\pi/2]$ . Then the following inequality holds for any $n$ ,

[TABLE]

Note that $H(I_{n}|Y^{n})$ is trivially lower bounded by $H(I_{n}|X^{n})$ for any Markov chain $I_{n}-Z^{n}-X^{n}-Y^{n}$ . The above theorem says that if $I_{n}-Z^{n}-X^{n}-Y^{n}$ satisfies the conditions of the theorem, then $H(I_{n}|Y^{n})$ can also be upper bounded in terms of $H(I_{n}|X^{n})$ . In particular, it provides an upper bound on $H(I_{n}|Y^{n})$ in terms of $\theta_{n}=\arcsin 2^{-\frac{1}{n}H(I_{n}|X^{n})}$ . It can be easily verified that this upper bound on $H(I_{n}|Y^{n})$ is decreasing with increasing $\theta_{n}$ , or equivalently decreasing with decreasing $H(I_{n}|X^{n})$ , and implies that $H(I_{n}|Y^{n})\rightarrow 0$ as $H(I_{n}|X^{n})\rightarrow 0$ .

We next turn to proving Theorem III.1. The reader who is interested in seeing how this theorem leads to Theorems I.1 and I.2, without seeing its own proof, can jump to Section IV. In order to prove Theorem III.1, we will first establish some properties that are satisfied with high probability by long i.i.d. sequences generated from the source distribution $(I_{n},Z^{n},X^{n},Y^{n})$ satisfying the assumptions of the theorem. We now state and discuss these properties in Section III-A and then use them to prove Theorem III.1 in Section III-B.

III-A Typicality Lemmas

Assume $(I_{n},Z^{n},X^{n},Y^{n})$ satisfy the assumptions of Theorem III.1. Consider the $B$ -length i.i.d. sequence

[TABLE]

where for any $b\in[1:B]$ , $(I_{n}(b),Z^{n}(b),X^{n}(b),Y^{n}(b))$ has the same distribution as $(I_{n},Z^{n},X^{n},Y^{n})$ . For notational convenience, in the sequel we write the $B$ -length sequence $[X^{n}(1),X^{n}(2),\ldots,X^{n}(B)]$ as $\mathbf{X}$ and similarly define $\mathbf{Y},\mathbf{Z}$ and $\mathbf{I}$ ; note that we have $\mathbf{I}=[f_{n}(Z^{n}(1)),f_{n}(Z^{n}(2)),\ldots,f_{n}(Z^{n}(B))]=:f(\mathbf{Z})$ . Also let $\text{Shell}\left(\mathbf{c},r_{1},r_{2}\right)$ denote the spherical shell

[TABLE]

and let $\mbox{Ball}(\mathbf{c},r)$ denote the Euclidean ball

[TABLE]

We next state several properties that $\mathbf{X},\mathbf{Y},\mathbf{Z},\mathbf{I}$ satisfy with high probability when $B$ is large. The proofs of these properties are given in Appendix B.

Lemma III.1

For any $\delta>0$ and $B$ sufficiently large, we have

[TABLE]

where $E_{1}$ and $E_{2}$ are defined to be the following two events respectively:

[TABLE]

and

[TABLE]

The proof of this lemma is a simple application of the law of large numbers and is included in Appendix B-A. The lemma simply states that when $B$ is large, $\mathbf{Y}$ and $\mathbf{Z}$ will concentrate in a thin $nB$ -dimensional shell of radius $\sqrt{nB(P+N)}$ .

Lemma III.2

Given any $\epsilon>0$ and a pair of $(\mathbf{x},\mathbf{i})$ , let $S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ be a set of $\mathbf{z}$ ’s defined as333Note that under this definition of $S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ , if a pair $(\mathbf{x},\mathbf{i})$ doesn’t satisfy $2^{nB(\log\text{sin}\theta_{n}-\epsilon)}\leq p(\mathbf{i}|\mathbf{x})\leq 2^{nB(\log\text{sin}\theta_{n}+\epsilon)},$ then the set $S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ is empty because no $\mathbf{z}$ can satisfy the condition in (12).

[TABLE]

where $\theta_{n}=\arcsin 2^{-\frac{1}{n}H(I_{n}|X^{n})}$ as in Theorem III.1. Then for $B$ sufficiently large, there exists a set $S_{\epsilon}(X^{n},I_{n})$ of $(\mathbf{x},\mathbf{i})$ pairs, such that

[TABLE]

and for any $(\mathbf{x},\mathbf{i})\in S_{\epsilon}(X^{n},I_{n})$ ,

[TABLE]

This lemma establishes the existence of a high probability set $S_{\epsilon}(X^{n},I_{n})$ of $(\mathbf{x},\mathbf{i})$ sequences, and a conditional typical set $S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ for each $(\mathbf{x},\mathbf{i})\in S_{\epsilon}(X^{n},I_{n})$ such that $\mathbf{z}\in S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ satisfies some natural properties. Note that all properties in the definition of $S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ as well as (14) are analogous to properties of strongly typical sets as stated in [21, Ch. 2]. However, the notion of strong typicality does not apply to the current case since $Z^{n}$ and $Y^{n}$ are continuous random vectors and $X^{n}$ may or may not be continuous. Nevertheless, analogous properties can still be proved in this case; see the proof of this lemma in Appendix B-B.

The following result has a slightly different flavor from the previous two lemmas in that it is simply a corollary of Theorem II.2 from Section II.

Corollary III.1

For any $N,\epsilon$ such that $N>\epsilon>0$ , consider the spherical shell in $\mathbb{R}^{m}$

[TABLE]

Let $A\subseteq\mbox{Shell}\left(\mathbf{0},\sqrt{m(N-\epsilon)},\sqrt{m(N+\epsilon)}\right)$ be an arbitrary subset on this shell with volume

[TABLE]

where $\theta\in(0,\pi/2)$ . For any $\omega\in(\pi/2-\theta,\pi/2]$ and $m$ sufficiently large, we have

[TABLE]

where $\mathbf{Y}$ is drawn from any rotationally invariant distribution on the $\mbox{Shell}\left(\mathbf{0},\sqrt{m(N-\epsilon)},\sqrt{m(N+\epsilon)}\right)$ .

This is a simple corollary of Theorem II.2 when applied to a specific shell and a subset $A$ of this shell with measure prescribed by (15). The prescribed measure means that $A$ has an effective angle (asymptotically) greater than or equal to $\theta$ . The corollary follows by observing that due to the triangle inequality (see also Fig. 5), for any $\mathbf{y}$ in the shell, $\text{ShellCap}(\mathbf{y},\omega+\epsilon)$ considered in Theorem II.2 is contained in the Euclidean ball

[TABLE]

The lower bound on the intersection volume in (16) follows from an explicit characterization of

[TABLE]

in Theorem II.2, where $\angle(\mathbf{z}_{0},\mathbf{y}_{0})=\pi/2$ and $\theta+\omega>\pi/2$ ; see Appendix C-B, and in particular Lemma C.2, for this characterization. A formal proof of Corollary III.1 is given in Appendix B-C.

The above corollary together with Lemma III.2 leads to the following lemma.

Lemma III.3

For any $\delta>0$ and $B$ sufficiently large, we have

[TABLE]

where $E_{3}$ is defined to be the following event:

[TABLE]

in which $f^{-1}(\mathbf{I}):=\{\mathbf{a}\in\mathbb{R}^{nB}:f(\mathbf{a})=\mathbf{I}\}$ and $\omega\in(\pi/2-\theta_{n}+\delta,\pi/2]$ .

This lemma can also be regarded as a typicality lemma as it states a property satisfied by $(\mathbf{I},\mathbf{Y})$ pair with high probability when $B$ is large. However, this is a non-trivial property. The lemma follows by first fixing a pair $(\mathbf{x},\mathbf{i})\in S_{\epsilon}(X^{n},I_{n})$ and showing that the volume of the set $S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ defined in Lemma III.2 can be lower bounded by

[TABLE]

up to the first order term in the exponent. Since by definition $S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ is a subset of the shell

[TABLE]

and given $\mathbf{X}=\mathbf{x}$ , $\mathbf{Y}$ is isotropic Gaussian (therefore rotationally invariant around $\mathbf{x}$ when constrained to this shell), we can apply Corollary III.1 to the above shell by choosing the set $A$ to be $S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ . This allows us to conclude that

[TABLE]

The conclusion of Lemma III.3 then follows by observing that by definition

[TABLE]

and removing the conditioning with respect to $\mathbf{X}$ in (III-A). The formal proof of Lemma III.3 is given in Appendix B-D.

III-B Proof of Theorem III.1

We are now ready to prove Theorem III.1, which mainly builds on Lemma III.3. Consider a $\mathbf{Y}$ that with high probability lies in the ball with center $\mathbf{0}$ and approximate radius $\sqrt{nB(P+N)}$ , and draw another ball around $\mathbf{Y}$ of approximate radius $\sqrt{nBN4\mbox{sin}\,^{2}\frac{\omega}{2}}$ and intersect this ball with the original ball; equivalently, this corresponds to considering a cap around $\mathbf{Y}$ of angle $\phi$ on the original ball (see Fig. 6). Lemma III.3 asserts that this cap around $\mathbf{Y}$ will have a certain minimal intersection volume with $f^{-1}(\mathbf{I})$ . In other words, there is a subset of this cap with certain minimal volume that is mapped to $\mathbf{I}$ . This naturally lends itself to a packing argument: the number of distinct $\mathbf{I}$ values plausible under a given $\mathbf{Y}$ can be upper bounded by the ratio between the volume of the cap around $\mathbf{Y}$ and the minimal intersection volume occupied for each distinct $\mathbf{I}$ . This in turn leads to a bound on $H(\mathbf{I}|\mathbf{Y})$ .

We now proceed with the formal proof. Consider the indicator function

[TABLE]

where $\mathbb{I}(\cdot)$ is defined as

[TABLE]

and the events $E_{1},E_{2}$ and $E_{3}$ are as given by (8), (9) and (17) respectively. Obviously, by the union bound, we have

[TABLE]

for any $\delta>0$ and $B$ sufficiently large, and therefore

[TABLE]

To bound $H(\mathbf{I}|\mathbf{Y},F=1)$ , it suffices to bound $H(\mathbf{I}|\mathbf{Y}=\mathbf{y},F=1)$ for any

[TABLE]

For this, we apply a packing argument as follows. Consider a ball centered at any $\mathbf{y}$ satisfying (20) and of radius $\sqrt{nBN\left(4\mbox{sin}\,^{2}\frac{\omega}{2}+\delta\right)}$ , i.e.,

[TABLE]

where $\omega$ satisfies

[TABLE]

We now use the following lemma (whose proof is included in Appendix C-C) to upper bound the volume of the intersection between this ball and $\text{Ball}\left(\mathbf{0},\sqrt{nB(P+N+\delta)}\right)$ , i.e.,

[TABLE]

Lemma III.4

Let $\text{Ball}(\mathbf{c}_{1},\sqrt{mR_{1}})$ and $\text{Ball}(\mathbf{c}_{2},\sqrt{mR_{1}})$ be two balls in $\mathbb{R}^{m}$ with $\|\mathbf{c}_{1}-\mathbf{c}_{2}\|=\sqrt{mD}$ , where $D$ satisfies $(\sqrt{R_{1}}-\sqrt{R_{2}})^{2}<D<(\sqrt{R_{1}}+\sqrt{R_{2}})^{2}$ . Then for any $\epsilon>0$ and $m$ sufficiently large, we have

[TABLE]

where

[TABLE]

Using the above lemma, we have for $B$ sufficiently large,

[TABLE]

for some $\delta_{1}\to 0$ as $\delta\to 0$ , where the first inequality is an immediate application of Lemma III.4, the first equality follows from the fact that

[TABLE]

and the continuity of the function $\lambda(R_{1},R_{2},D)$ in its arguments, and the second equality follows from a simple evaluation of $\lambda\left(P+N,4N\text{sin}^{2}\frac{\omega}{2},P+N\right)$ .

On the other hand, the condition $F=1$ (c.f. the definition of $E_{3}$ in Lemma III.3) also ensures that

[TABLE]

Since $f^{-1}(\mathbf{i})$ are disjoint sets for different $\mathbf{i}$ , given $F=1$ and $\mathbf{Y}=\mathbf{y}$ , the number of different possible values for $\mathbf{I}$ can be upper bounded by the ratio between

[TABLE]

and

[TABLE]

which can be further upper bounded by

[TABLE]

where $\delta_{2}\to 0$ as $\delta\to 0$ . This immediately implies the following upper bound on $H(\mathbf{I}|\mathbf{Y}=\mathbf{y},F=1)$ and therefore $H(\mathbf{I}|\mathbf{Y},F=1)$ ,

[TABLE]

which combined with (19) yields that

[TABLE]

Dividing both sides of the above inequality by $B$ and noting that

[TABLE]

we have

[TABLE]

which holds for any

[TABLE]

Since $\delta,\delta_{2}$ and $\frac{1}{nB}$ in (21)–(22) can all be made arbitrarily small by choosing $B$ sufficiently large, we obtain

[TABLE]

for any $\omega\in\left(\frac{\pi}{2}-\theta_{n},\frac{\pi}{2}\right]$ . This completes the proof of Theorem III.1.

IV Proofs of Theorems I.1 and I.2

We now prove Theorem I.2 by using Theorem III.1, and use Theorem I.2 to prove Theorem I.1.

IV-A Proof of Theorem I.2

Suppose a rate $R$ is achievable. Then there exists a sequence of $(2^{nR},n)$ codes such that the average probability of error $P_{e}^{(n)}\to 0$ as $n\to\infty$ . Let the relay’s transmission be denoted by $I_{n}=f_{n}(Z^{n})$ . By standard information theoretic arguments, for this sequence of codes we have

[TABLE]

for any $\mu>0$ and $n$ sufficiently large. In the above, (24) follows from applying the data processing inequality to the Markov chain $M-X^{n}-(Y^{n},I_{n})$ and Fano’s inequality, (25) uses the fact that $I_{n}-X^{n}-Y^{n}$ form a Markov chain and thus $H(I_{n}|X^{n},Y^{n})=H(I_{n}|X^{n})$ , (26) follows by defining the time sharing random variable $Q$ to be uniformly distributed over $[1:n]$ , and (27) follows because

[TABLE]

Given (27), the standard way to proceed would be to upper bound the first entropy term by $H(I_{n}|Y^{n})\leq H(I_{n})\leq nC_{0}$ and lower bound the second entropy term $H(I_{n}|X^{n})$ simply by [math]. This would lead to the so-called multiple-access bound in the well-known cut-set bound on the capacity of this channel [10]. However, as we already point out in our previous works [3]–[7], this leads to a loose bound since it does not capture the inherent tension between how large the first entropy term can be and how small the second one can be. Instead, we can use Theorem III.1 to more tightly upper bound the difference $H(I_{n}|Y^{n})-H(I_{n}|X^{n})$ in (27).

We start by verifying that the random variables $I_{n},X^{n},Z^{n}$ and $Y^{n}$ associated with a code of blocklength $n$ satisfy the conditions in Theorem III.1. It is trivial to observe that they satisfy the required Markov chain condition and $Z^{n}$ and $Y^{n}$ are i.i.d. Gaussian given $X^{n}$ due to the channel structure. Also assume that

[TABLE]

with $P^{\prime}\leq P$ , and assume that $H(I_{n}|X^{n})=-n\log\mbox{sin}\,\theta_{n}$ . Then, applying Theorem III.1 to the random variables associated with a code for the relay channel, we have

[TABLE]

and therefore,

[TABLE]

where $h_{\theta_{n}}(\omega)$ is defined as

[TABLE]

in which $\theta_{n}=\arcsin 2^{-\frac{1}{n}H(I_{n}|X^{n})}$ satisfies

[TABLE]

Plugging (28) into (27), we conclude that for any achievable rate $R$ ,

[TABLE]

At the same time, for any achievable rate $R$ , we also have

[TABLE]

which simply follows from (27) by upper bounding $H(I_{n}|Y^{n})$ with $nC_{0}$ and plugging in the definition of $\theta_{n}$ . Therefore, if a rate $R$ is achievable, then for any $\mu>0$ and $n$ sufficiently large it should simultaneously satisfy both (31) and (32) for some $\theta_{n}$ that satisfies the condition in (30). This concludes the proof of the theorem.

IV-B Proof of Theorem I.1

In order to show that Theorem I.1 follows from Theorem I.2, consider the following bound on $C(C_{0})$ implied by Theorem I.2:

[TABLE]

With $\theta_{0}$ defined as $\arcsin(2^{-C_{0}})$ , we can upper bound the right-hand side of (33) to obtain

[TABLE]

Also because given any fixed $\omega\in\left(\frac{\pi}{2}-\theta_{0},\frac{\pi}{2}\right]$ , $h_{\theta}(\omega)\leq h_{\theta_{0}}(\omega)$ for any $\theta\in[\theta_{0},\pi/2]$ , we further have

[TABLE]

The significance of the function $h_{\theta_{0}}(\omega)$ is that for any $\theta_{0}>0$ ,

[TABLE]

and $h_{\theta_{0}}(\omega)$ is increasing at $\omega=\frac{\pi}{2}$ , or more precisely,

[TABLE]

Therefore, as long as $\theta_{0}>0$ , which is the case when $C_{0}$ is finite, the minimization of $h_{\theta_{0}}(\omega)$ with respect to $\omega$ in (34) yields a value strictly smaller than $h_{\theta_{0}}\left(\frac{\pi}{2}\right)$ in (35). This would allow us to conclude that the capacity $C(C_{0})$ for any finite $C_{0}$ is strictly smaller than $\frac{1}{2}\log\left(1+\frac{2P}{N}\right)$ .

We now formalize the above argument. Using the definition of the derivative, one obtains

[TABLE]

Therefore, there exists a sufficiently small $\Delta_{1}>0$ such that $0<\Delta_{1}<\theta_{0}$ and

[TABLE]

For such $\Delta_{1}$ we have

[TABLE]

which further implies that

[TABLE]

Combining (34) and (36) we obtain that for any finite $C_{0}$ , there exists some $\Delta_{1}>0$ such that

[TABLE]

This proves Theorem I.1.

V Conclusion

We have proved a new upper bound on the capacity of the Gaussian relay channel and solved a problem posed by Cover in [2], which has remained open since 1987. The derivation of our upper bound focuses on directly characterizing the tension between information measures of pertinent $n$ -letter random variables. In particular, this is done via the following steps:

•

we first use “typicality” to translate the information tension problem to a problem regarding the geometry of the typical sets of these $n$ -letter random variables;

•

we then use results and tools in the (broadly defined) field of concentration of measure, in particular rearrangement theory, to establish non-trivial geometric properties for these typical sets;

•

we finally use these geometric properties to construct a packing argument, which leads to an inequality between the original $n$ -letter information measures.

In contrast, the typical program for proving converses in network information theory focuses on “single-letterizing” $n$ -letter information measures. This makes it difficult to invoke tools from geometry and concentration of measure, which in retrospect appear well-suited for quantifying information tensions that lie at the hearth of network problems. Indeed, to the best of our knowledge, the use of concentration of measure in information theory has been mostly limited to establishing strong converses for problems whose capacity is already known (c.f., e.g. [26, 12]), and it has been rarely used to derive first-order results, i.e. bounds on the capacity of multi-user networks. Our proof suggests that measure concentration, in particular geometric inequalities and their functional counterparts, can have a bigger role to play in network information theory. It would be interesting to better understand this role and see if the program developed in this paper can be used to prove converses for other open problems in network information theory.

Appendix A Proofs of Extended Isoperimetric Inequalities

In this appendix, we prove the extended isoperimetric inequalities on the sphere and on the shell, as stated in Theorems II.1 and II.2 respectively. In particular, we will first prove the shell case and then show that the sphere case follows as a corollary.

A-A Preliminaries

We begin with some preliminaries that will be used in the proofs. Our main tool for proving Theorems II.1 and II.2 is the symmetric decreasing rearrangement of functions on the sphere, along with a version of the Riesz rearrangement inequality on the sphere due to Baernstein and Taylor [25].

For any measurable function $f:\mathbb{S}^{m-1}\to\mathbb{R}$ and pole $\mathbf{z}_{0}$ , the symmetric decreasing rearrangement of $f$ about $\mathbf{z}_{0}$ is defined to be the function $f^{*}:\mathbb{S}^{m-1}\to\mathbb{R}$ such that $f^{*}(\mathbf{y})$ depends only on the angle $\angle(\mathbf{y},\mathbf{z}_{0})$ , is nonincreasing in $\angle(\mathbf{y},\mathbf{z}_{0})$ , and has super-level sets of the same Haar measure as $f$ , i.e.

[TABLE]

for all $d$ . The function $f^{*}$ is unique up to its value on sets of measure zero.

One important special case is when the function $f=1_{A}$ is the characteristic function for a subset $A$ . The function $1_{A}$ is just the function such that

[TABLE]

In this case, $1_{A}^{*}$ is equal to the characteristic function associated with a spherical cap of the same size as $A$ . In other words, if $A^{*}$ is a spherical cap about the pole $\mathbf{z}_{0}$ such that $\mu(A^{*})=\mu(A)$ , then $1_{A}^{*}=1_{A^{*}}$ .

Lemma A.1 (Baernstein and Taylor [25])

Let $K$ be a nondecreasing bounded measurable function on the interval $[-1,1]$ . Then for all functions $f,g\in L^{1}(\mathbb{S}^{m-1})$ ,

[TABLE]

For any $f\in L^{1}(\mathbb{S}^{m-1})$ , define

[TABLE]

to be the inner integral in Lemma A.1. When applying Lemma A.1 we will use test functions $g$ that are characteristic functions. Let $g=1_{C}$ where $C=\{\mathbf{y}:\psi(\mathbf{y})>d\}$ for some $d$ (i.e. $C$ is a super-level set of $\psi$ ). For a fixed measure $\mu(C)$ , the left-hand side of the inequality from Lemma A.1 will be maximized by this choice of $C$ . With this choice we have the following equality:

[TABLE]

This follows from the layer-cake decomposition for any non-negative and measurable function $\psi$ in that

[TABLE]

Using this equality and our choice for $g$ we will rewrite the inequality from Lemma A.1 as

[TABLE]

where

[TABLE]

Note that both $\psi^{*}(\mathbf{y})$ and $\bar{\psi}(\mathbf{y})$ are spherically symmetric. More concretely, they both depend only on the angle $\angle(\mathbf{y},\mathbf{z}_{0})$ , so in an abuse of notation we will write $\bar{\psi}(\alpha)$ and $\psi^{*}(\alpha)$ where $\alpha=\angle(\mathbf{y},\mathbf{z}_{0})$ .

For convenience we will define a measure $\nu$ by

[TABLE]

where $A_{m}(R)$ denotes the Haar measure of the $m$ -sphere with radius $R$ . We do this so that an integral like

[TABLE]

can be expressed as

[TABLE]

A-B Proof of Theorem II.2 (The Shell Case)

Let $A\subseteq\mathbb{L}^{m}$ be a given subset with effective angle $\theta$ . In order to apply Lemma A.1, note that

[TABLE]

by using spherical coordinates, so that if we define

[TABLE]

for $A\subseteq\mathbb{L}^{m}$ and

[TABLE]

then

[TABLE]

Both $\psi$ and $f_{A}$ can be thought of as functions on the sphere $\mathbb{S}^{m-1}$ . Let $\psi^{*},f_{A}^{*}$ be their respective symmetric decreasing rearrangements about a pole $\mathbf{z}_{0}$ . Define

[TABLE]

so that by definition we have (39).

The inequality (39) allows to compare $\psi$ and $\bar{\psi}$ , but we require a way to compare $\psi$ with the function arising from a shell cap of angle $\theta$ . Let

[TABLE]

and

[TABLE]

We will show that

[TABLE]

so that along with (39),

[TABLE]

To show the inequality (41) note

[TABLE]

The term inside the parentheses is the measure of the intersection between the cap $C^{*}$ centered at $\mathbf{z}_{0}$ and a cap of angle $\omega+\epsilon$ centered at $\mathbf{z}$ . This intersection measure is a function only of the angle $\angle(\mathbf{z}_{0},\mathbf{z})$ and is nonincreasing in that angle. Consider functions $f:\mathbb{S}^{m-1}\to\mathbb{R}$ with $0\leq f(\mathbf{z})\leq\int_{R_{L}}^{R_{U}}\left(\frac{r}{R}\right)^{m-1}dr$ and $\int f(\mathbf{z})d\mathbf{z}=|A|$ . Both $f_{A}^{*}$ and $f_{A^{\prime}}$ satisfy these properties and moreover $f_{A^{\prime}}$ is extremal in the sense that $f_{A^{\prime}}(\mathbf{z})=\int_{R_{L}}^{R_{U}}\left(\frac{r}{R}\right)^{m-1}dr$ when $\angle(\mathbf{z}_{0},\mathbf{z})\leq\theta$ and [math] when $\angle(\mathbf{z}_{0},\mathbf{z})>\theta$ . Therefore (A-B) is maximized by replacing $f_{A}^{*}$ with $f_{A^{\prime}}$ , and

[TABLE]

Equipped with (42), we are now ready to finish the proof of Theorem II.2. Proposition II.2 implies that for any $0<\epsilon<1$ , there exists an $M(\epsilon)$ such that for $m>M(\epsilon)$ we have

[TABLE]

where $\mathbf{Y}$ is drawn from any rotationally invariant distribution on $\mathbb{L}^{m}$ . Because the random quantity $|A\cap\text{ShellCap}(\mathbf{Y},\omega+\epsilon)|$ depends only on the direction of $\mathbf{Y}$ , and not on its magnitude, we can instead consider $\mathbf{Y}$ to be distributed according to the Haar measure on $\mathbb{S}^{m-1}$ . The constant $M(\epsilon)$ is determined only by the concentration of measure phenomenon cited above, and it does not depend on any parameters in the problem other than $\epsilon$ . From now on, let us restrict our attention to dimensions $m>M(\epsilon)$ . Due to the triangle inequality for the geodesic metric, for $\mathbf{y}$ such that $\angle(\mathbf{z}_{0},\mathbf{y})\in[\pi/2-\epsilon,\pi/2+\epsilon]$ we have

[TABLE]

where $\mathbf{y}_{0}$ is such that $\angle(\mathbf{z}_{0},\mathbf{y}_{0})=\pi/2$ . Therefore,

[TABLE]

for all for $\mathbf{y}$ such that $\angle(\mathbf{z}_{0},\mathbf{y})\in[\pi/2-\epsilon,\pi/2+\epsilon]$ and

[TABLE]

To prove the theorem, we need to show that

[TABLE]

for any arbitrary set $A\subseteq\mathbb{L}^{m}$ . Recall that by the definition of a decreasing symmetric rearrangement, we have

[TABLE]

for any threshold $d$ and this implies

[TABLE]

Therefore, the desired statement in (A-B) can be equivalently written as

[TABLE]

Turning to proving (49), recall that by the definition of a decreasing symmetric rearrangement, $\psi^{*}(\alpha)$ is nonincreasing in the angle $\alpha=\angle(\mathbf{y},\mathbf{z}_{0})$ over the interval $0\leq\alpha\leq\pi$ . Let $\beta$ be the smallest value such that $\psi^{*}(\beta)=(1-\epsilon)V$ , or more explicitly,

[TABLE]

If $\beta\geq\pi/2+\epsilon$ , then (49) would follow trivially from (44) and the fact that $\psi^{*}(\alpha)$ would be greater than $(1-\epsilon)V$ for all $0<\alpha<\pi/2+\epsilon$ . We will therefore assume that $0<\beta<\pi/2+\epsilon$ . It remains to show that even if this is the case, we have (49).

By the definition of $\beta$ and the fact that $\psi^{*}$ is nonincreasing,

[TABLE]

To bound the first and third terms of (50) note that

[TABLE]

as a consequence of (44). In order to bound the second term, we establish the following chain of (in)equalities which will be justified below.

[TABLE]

Combining (57) with (51) reveals that the second term in (50) is also bounded by $\epsilon/2$ , therefore

[TABLE]

must be bounded by $\epsilon$ , which proves Theorem II.2.

The first inequality (53) is a consequence of the fact that over the range of the integral, $\psi^{*}$ is less than or equal to $(1-\epsilon)V$ and $\bar{\bar{\psi}}$ is non-negative. The equality in (54) follows from

[TABLE]

which is itself a consequence of (38) with $C=\mathbb{S}^{m-1}$ and

[TABLE]

Next we have (55) which is due to the rearrangement inequality (42) when $C$ is the super-level set $\{{\mathbf{y}}:\psi(\mathbf{y})>(1-\epsilon)V\}$ . By the definition of a symmetric decreasing rearrangement, $\mu(\{{\mathbf{y}}:\psi(\mathbf{y})>(1-\epsilon)V\})=\mu(\{{\mathbf{y}}:\psi^{*}(\mathbf{y})>(1-\epsilon)V\})$ , and the set on the right-hand side is an open or closed spherical cap of angle $\beta$ . Thus $C^{*}$ is a spherical cap with angle $\beta$ and the rearrangement inequality (42) gives

[TABLE]

Finally, for the inequality (56), we first replace the lower integral limit with $\max\{\beta,\pi/2-\epsilon\}\geq\beta$ . Then $\bar{\bar{\psi}}\geq V$ over the range of the integral due to (45). Additionally, $\psi^{*}\leq(1-\epsilon)V$ over the range of the integral, and the inequality follows.

A-C Proof of Theorem II.1 (The Sphere Case)

Given any $A\subseteq\mathbb{S}^{m-1}$ with effective angle $\theta>0$ , construct a corresponding

[TABLE]

The set $A_{\text{shell}}$ also has effective angle $\theta$ as a subset of $\mathbb{L}^{m}$ since

[TABLE]

For any $\epsilon>0$ , we can apply Theorem II.2 to find an $M(\epsilon)$ such that for $m>M(\epsilon)$ ,

[TABLE]

where $V_{\text{shell}}=|\text{ShellCap}(\mathbf{z}_{0},\theta)\cap\text{ShellCap}(\mathbf{y}_{0},\omega)|$ with $\angle(\mathbf{z}_{0},\mathbf{y}_{0})=\pi/2$ . Because the set $\text{ShellCap}(\mathbf{y},\omega)$ depends only on the direction of $\mathbf{y}$ , and not on its magnitude, the probability in (59) is the same whether we consider $\mathbf{Y}$ to be uniformly distributed on $\mathbb{S}^{m-1}$ or from some rotationally invariant probability distribution on $\mathbb{L}^{m}$ . Using spherical coordinates, we have

[TABLE]

and similarly,

[TABLE]

By dividing out the $\int_{R_{L}}^{R_{U}}\left(\frac{r}{R}\right)^{m-1}dr$ term, (59) implies

[TABLE]

where $V=\mu(\text{Cap}(\mathbf{z}_{0},\theta)\cap\text{Cap}(\mathbf{y}_{0},\omega))$ as desired.

Appendix B Proofs of Typicality Lemmas

Here we prove the typicality lemmas presented in Section III-A.

B-A Proof of Lemma III.1

Recalling that $\mathbf{Z}=[Z^{n}(1),Z^{n}(2),\ldots,Z^{n}(B)]$ , we have

[TABLE]

Therefore by the weak law of large numbers, for any $\delta>0$ and $B$ sufficiently large we have

[TABLE]

i.e.,

[TABLE]

since by assumption $E[\|X^{n}\|^{2}]=nP$ and thus $E[\|Z^{n}\|^{2}]=n(P+N)$ . Because $\mathbf{Z}$ and $\mathbf{Y}$ are identically distributed, the above relation also holds with $\|\mathbf{Z}\|^{2}$ replaced by $\|\mathbf{Y}\|^{2}$ . This completes the proof of the lemma.

B-B Proof of Lemma III.2

We now present the proof of Lemma III.2. By the law of large numbers and Lemma III.1, we have for any $\epsilon>0$ and sufficiently large $B$ ,

[TABLE]

where

[TABLE]

Note that in terms of $S_{\epsilon}(X^{n},Z^{n})$ , the set $S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ in Lemma III.2 can be simply written as

[TABLE]

Therefore, for $B$ sufficiently large, we have

[TABLE]

On the other hand, defining $S_{\epsilon}(X^{n},I_{n}):=\{(\mathbf{x},\mathbf{i}):\mbox{Pr}(\mathbf{Z}\in S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})|\mathbf{x},\mathbf{i})\geq 1-\sqrt{\epsilon}\}$ , we have

[TABLE]

Therefore, we have for $B$ sufficiently large,

[TABLE]

and thus

[TABLE]

which proves (13).

To prove (14), consider any $(\mathbf{x},\mathbf{i})\in S_{\epsilon}(X^{n},I_{n})$ . From the definition of $S_{\epsilon}(X^{n},I_{n})$ , $\mbox{Pr}(S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})|\mathbf{x},\mathbf{i})\geq 1-\sqrt{\epsilon}$ . Therefore, $S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ must be nonempty, i.e., there exists at least one $\mathbf{z}\in S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ . Consider any $\mathbf{z}\in S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ . By the definition of $S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ , we have $f(\mathbf{z})=\mathbf{i}$ and $(\mathbf{x},\mathbf{z})\in S_{\epsilon}(X^{n},Z^{n})$ . Then, it follows from the definition of $S_{\epsilon}(X^{n},Z^{n})$ that

[TABLE]

This further implies that

[TABLE]

for sufficiently large $B$ , which concludes the proof of (14) and Lemma III.2.

B-C Proof of Corollary III.1

Let the effective angle of $A$ be denoted by $\theta^{\prime}$ , i.e.,

[TABLE]

for some

[TABLE]

where

[TABLE]

Then using the formula for the volume of a shell cap (c.f. Appendix C-A and in particular (66)), we have

[TABLE]

for some $\epsilon_{1}\to 0$ as $m\to\infty$ . Recall that by assumption

[TABLE]

and we hence have

[TABLE]

for some $\epsilon_{2}\to 0$ as $m\to\infty$ .

We now apply Theorem II.2 to this specific shell and subset $A$ . First, using the formula of the intersection volume of two shell caps (c.f. Appendices C-B and in particular Lemma C.2), we have

[TABLE]

for some $\epsilon_{3},\epsilon_{4}\to 0$ as $m\to\infty$ , where $\angle(\mathbf{z}_{0},\mathbf{y}_{0})=\pi/2$ and $\theta^{\prime}+\omega>\pi/2$ . Then Theorem II.2 asserts that for any $\omega\in(\pi/2-\theta^{\prime},\pi/2]$ and $m$ sufficiently large,

[TABLE]

where $\mathbf{Y}$ is a random vector drawn from any rotationally invariant distribution on the shell. Since $\pi/2-\theta^{\prime}\leq\pi/2-\theta+\epsilon_{2}$ , the condition $\omega\in(\pi/2-\theta^{\prime},\pi/2]$ in the above can be replaced with the weaker condition $\omega\in(\pi/2-\theta+\epsilon_{2},\pi/2]$ . Now by choosing $m$ sufficiently large we can make $\epsilon_{2},\epsilon_{4}$ and $\frac{2}{m}\log(1-\epsilon)$ as small as desired, so we have

[TABLE]

for any $\omega\in(\pi/2-\theta,\pi/2]$ and $m$ sufficiently large. Finally, observe that for any $\mathbf{y}$ in the considered shell,

[TABLE]

This simply follows from the geometry illustrated in Fig. 5 combined with the triangle inequality and the fact that the thickness of the shell can be trivially bounded by $2\sqrt{m\epsilon}$ . Therefore, we can conclude that

[TABLE]

for any $\omega\in(\pi/2-\theta,\pi/2]$ and $m$ sufficiently large. This completes the proof of Corollary III.1.

B-D Proof of Lemma III.3

Fix $\epsilon>0$ and consider a pair $(\mathbf{x},\mathbf{i})\in S_{\epsilon}(X^{n},I_{n})$ . From Lemma III.2, we have

[TABLE]

for $B$ sufficiently large. We also have

[TABLE]

for some $\epsilon_{1}\to 0$ as $\epsilon\to 0$ , where $p(\mathbf{z}|\mathbf{x})$ refers to the conditional density of $\mathbf{z}$ given $\mathbf{x}$ . The second inequality in the above follows because for any $\mathbf{z}\in S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ , we have

[TABLE]

and therefore using the fact that $\mathbf{Z}$ is Gaussian distributed given $\mathbf{x}$ , we have for any $\mathbf{z}\in S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ ,

[TABLE]

where $\epsilon_{1}\to 0$ as $\epsilon\to 0$ . Therefore, for $B$ sufficiently large, the volume of $S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ can be lower bounded by

[TABLE]

Let $\theta^{\prime}_{n}$ be defined such that

[TABLE]

Obviously, we have $\theta^{\prime}_{n}\leq\theta_{n}$ and $\theta^{\prime}_{n}\to\theta_{n}$ as $\epsilon\to 0$ . Noting that $S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ is a subset of

[TABLE]

by Corollary III.1, for any $\omega\in(\pi/2-\theta^{\prime}_{n},\pi/2]$ we have

[TABLE]

for any $\mathbf{U}$ drawn from a rotationally invariant distribution around $\mathbf{x}$ on $\mbox{Shell}\left(\mathbf{x},\sqrt{nB(N-\epsilon)},\sqrt{nB(N+\epsilon)}\right)$ , where $\epsilon_{2}$ is defined such that

[TABLE]

and $\epsilon_{3}$ is defined such that

[TABLE]

and both $\epsilon_{2}$ and $\epsilon_{3}$ tend to zero as $\epsilon$ goes to zero.

We now translate the bound (61) on the probability involving a rotationally invariantly distributed $\mathbf{U}$ on the shell to a bound on the probability involving $\mathbf{Y}$ . Define $\mathcal{Y}_{(\mathbf{x},\mathbf{i})}$ to be the following set of $\mathbf{y}$ :

[TABLE]

Then we have for $(\mathbf{x},\mathbf{i})\in S_{\epsilon}(X^{n},I_{n})$ and $B$ sufficiently large,

[TABLE]

where the second inequality simply follows by applying the law of large numbers in a manner similar to the proof of Lemma III.1, and the last inequality follows from combining (61) and the fact that if $\mathbf{x}$ is known and $\mathbf{Y}$ is restricted to $\mbox{Shell}\left(\mathbf{x},\sqrt{nB(N-\epsilon)},\sqrt{nB(N+\epsilon)}\right)$ then $\mathbf{Y}$ is rotationally invariant around $\mathbf{x}$ on this shell.

Since by definition $S_{\epsilon}(Z^{n}|\mathbf{x},\mathbf{i})$ is a subset of $f^{-1}(\mathbf{i})\cap\text{Ball}\left(\mathbf{0},\sqrt{nB(P+N+\epsilon)}\right),$ we have

[TABLE]

for any $\mathbf{y}\in\mathcal{Y}_{(\mathbf{x},\mathbf{i})}$ , and therefore for $B$ sufficiently large,

[TABLE]

for any $\omega\in(\pi/2-\theta^{\prime}_{n},\pi/2]$ . Finally, choosing $\delta=\max\{4\sqrt{\epsilon},\epsilon_{2},\epsilon_{3},\theta_{n}-\theta^{\prime}_{n}\}$ concludes the proof of Lemma III.3. Note that by choosing $B$ sufficiently large, $\epsilon$ and therefore $\delta$ can be made arbitrarily small.

Appendix C Miscellaneous Results in High-Dimensional Geometry

This appendix derives some miscellaneous results in high-dimensional geometry, including the surface area (volume) of a spherical (shell) cap, the surface area (volume) of the intersection of two spherical (shell) caps, and the volume of the intersection of two balls.

C-A Surface Area (Volume) of A Spherical (Shell) Cap

We first derive the surface area (volume) formula for a spherical (shell) cap. See also [23].

Let $C\subseteq\mathbb{S}^{m-1}$ be a spherical cap with angle $\theta$ on the $(m-1)$ -sphere of radius $R=\sqrt{mN}$ . The area $\mu(C)$ of $C$ can be written as

[TABLE]

where $A_{m-2}(R\mbox{sin}\,\rho)$ is the total surface area of the $(m-2)$ -sphere of radius $R\mbox{sin}\,\rho$ . Plugging in the expression for the surface area of an $(m-2)$ -sphere leads to

[TABLE]

We now characterize the exponent of $\mu(C)$ . First, by Stirling’s approximation, $\frac{2\pi^{\frac{m-1}{2}}}{\Gamma\left(\frac{m-1}{2}\right)}(mN)^{\frac{m-2}{2}}$ in the above can be bounded as

[TABLE]

for some $\epsilon_{1}\to 0$ as $m\to\infty$ . Also for $m$ sufficiently large, we have

[TABLE]

and

[TABLE]

for some $\epsilon_{2}\to 0$ as $m\to\infty$ . Therefore, the area $\mu(C)$ can be bounded as

[TABLE]

for some $\epsilon\to 0$ as $m\to\infty$ .

Now suppose that $C=\text{ShellCap}(\mathbf{z}_{0},\theta)$ is a shell cap on

[TABLE]

where $\|\mathbf{z}_{0}\|=\sqrt{m(N-\delta)}$ . Let $R_{L}=\sqrt{m(N-\delta)}$ , $R_{U}=\sqrt{m(N+\delta)}$ and define $\mathbb{S}_{R_{L}}^{m-1}$ to be the $m-1$ sphere of radius $R_{L}$ with Haar measure $\mu_{R_{L}}$ . We use spherical coordinates to integrate over the surface areas of the individual caps that make up the shell cap,

[TABLE]

where the integral term on the right is bounded as

[TABLE]

Together with (C-A), (63) and (65) imply

[TABLE]

for sufficiently large $m$ . In a similar way,

[TABLE]

and therefore

[TABLE]

where $\epsilon\to 0$ as $m\to\infty$ .

C-B Surface Area (Volume) of the Intersection of Two Spherical (Shell) Caps

Recall $\mathbb{S}^{m-1}\subset\mathbb{R}^{m}$ is the $(m-1)$ -sphere of radius $R=\sqrt{mN}$ . Let

[TABLE]

be two spherical caps on $\mathbb{S}^{m-1}$ such that $\angle({\mathbf{v}_{1}},{\mathbf{v}_{2}})=\frac{\pi}{2}$ , $\theta_{i}\leq\frac{\pi}{2}$ , and $\theta_{1}+\theta_{2}>\frac{\pi}{2}$ . We have the following lemma that characterizes the intersection measure $\mu(C_{1}\cap C_{2})$ of these two caps.

Lemma C.1

For any $\epsilon>0$ there exists an $M(\epsilon)$ such that for $m>M(\epsilon)$ ,

[TABLE]

and

[TABLE]

Proof:

To prove this lemma, we will first derive the surface area formula for the intersection of the above two caps (see also [24]), and then characterize the exponent of this area.

Deriving the Surface Area Formula: Consider the points ${\bf v}\in\mathbb{S}^{m-1}$ such that

[TABLE]

and

[TABLE]

These points satisfy the linear relations

[TABLE]

and

[TABLE]

and therefore all such ${\bf v}$ lie in the unique $m-1$ dimensional subspace $H$ defined by

[TABLE]

The angle between the hyperplane $H$ and the vector ${\mathbf{v}_{2}}$ is

[TABLE]

and because ${\mathbf{v}_{1}}$ and ${\mathbf{v}_{2}}$ are orthogonal and $\|{\mathbf{v}_{2}}\|=R$ ,

[TABLE]

The approach will be as follows. Divide the intersection $C_{1}\cap C_{2}$ into two parts $C^{+}$ and $C^{-}$ that are on either side of the hyperplane $H$ . More concretely,

[TABLE]

and

[TABLE]

Each part $C^{+}$ and $C^{-}$ can be written as a union of lower dimensional spherical caps. We will find the measure of each part by integrating the measures of these lower dimensional caps.

The measure of the cap $C_{2}$ can be expressed as the integral

[TABLE]

where $A_{m-2}(R\mbox{sin}\,\rho)$ is the surface area of the $(m-2)$ -sphere with radius $R\mbox{sin}\,\rho$ . If we consider a single $(m-2)$ -sphere at some angle $\rho$ , then the hyperplane $H$ divides that $(m-2)$ -sphere into two spherical caps. The claim is that each of these $m-2$ dimensional caps that is on the side of $H$ with ${\mathbf{v}_{1}}$ is contained in $C^{+}$ (and those on the side with ${\mathbf{v}_{2}}$ are contained in $C^{-}$ ). Furthermore, all points in $C^{+}$ are in one of these $m-2$ dimensional caps. The claim follows because

[TABLE]

implies

[TABLE]

and since $\angle({\bf v},{\mathbf{v}_{2}})\leq\theta_{2}$ and $\cos\big{(}\angle({\bf v},{\mathbf{v}_{2}})\big{)}\geq\cos\theta_{2}$ , this implies

[TABLE]

Finally, this implies $\angle({\bf v},{\mathbf{v}_{1}})\leq\theta_{1}$ , ${\bf v}\in C_{1}$ , and ${\bf v}\in C^{+}$ .

Note that for $\rho<\phi$ , the $(m-2)$ -sphere at angle $\rho$ is entirely on the ${\mathbf{v}_{2}}$ side of $H$ , and does not need to be included when computing the measure of $C^{+}$ . This establishes the fact that

[TABLE]

where $C_{m-2}^{\theta_{\rho}}(R\mbox{sin}\,\rho)$ is the surface area of an $m-2$ dimensional spherical cap defined by angle $\theta_{\rho}$ on the $(m-2)$ -sphere of radius $R\mbox{sin}\,\rho$ . Writing

[TABLE]

note that $h$ is the distance from the center of the $(m-2)$ -sphere at angle $\rho$ to the $m-2$ dimensional hyperplane that divides the sphere into two caps. Furthermore, since the $(m-2)$ -sphere has center $(R\cos\rho){\mathbf{v}_{2}}$ , we have

[TABLE]

Therefore,

[TABLE]

Combining this with the corresponding result for $\mu(C^{-})$ yields

[TABLE]

This expression can be rewritten using known expressions for the area of a spherical cap in terms of the regularized incomplete beta function as

[TABLE]

where $J(\phi,\theta_{2})$ is defined as

[TABLE]

and $J(\pi/2-\phi,\theta_{1})$ is defined similarly. Here in (67), $I_{x}(a,b)$ is the regularized incomplete beta function, given by

[TABLE]

where $B(x;a,b)$ and $B(a,b)$ are the incomplete beta function and the complete beta function respectively:

[TABLE]

Characterizing the Exponent: We now lower and upper bound $J(\phi,\theta_{2})$ with exponential functions. First, using Stirling’s approximation, $\frac{{(\pi mN)}^{\frac{m-1}{2}}}{\Gamma\left(\frac{m-1}{2}\right)}$ on the R.H.S. of (67) can be bounded as

[TABLE]

for some $\epsilon_{1}\to 0$ as $m\to\infty$ .

Now consider

[TABLE]

inside the integral on the R.H.S. of (67). In light of (68), it can be written as

[TABLE]

For the denominator in (70), by Stirling’s approximation, we have

[TABLE]

For the numerator in (70), we have

[TABLE]

for some $\epsilon_{2}\to 0$ as $m\to\infty$ , and

[TABLE]

for some $\epsilon_{3}\to 0$ as $m\to\infty$ . Also noting that

[TABLE]

with $\rho\in[\phi,\theta_{2}]$ , we can bound the integrand in (67) as

[TABLE]

and

[TABLE]

for some $\epsilon_{4}\to 0$ as $m\to\infty$ . For sufficiently large $m$ ,

[TABLE]

and

[TABLE]

for some $\epsilon_{5}\to 0$ as $m\to\infty$ .

Combining this with (69), we can bound $J(\phi,\theta_{2})$ as

[TABLE]

for some $\epsilon_{6}\to 0$ as $m\to\infty$ .

Due to symmetry, we can also bound $J(\pi/2-\phi,\theta_{1})$ as

[TABLE]

Noting that $\text{sin}^{2}\theta_{2}-\cos^{2}\theta_{1}=\text{sin}^{2}\theta_{1}-\cos^{2}\theta_{2}$ , we have

[TABLE]

and

[TABLE]

for some $\epsilon\to 0$ as $m\to\infty$ . This completes the proof of the lemma. ∎

We now utilize Lemma C.1 to characterize the volume of the intersection of two shell caps. Consider a spherical shell

[TABLE]

with $R_{L}=\sqrt{m(N-\delta)}$ , $R_{U}=\sqrt{m(N+\delta)}$ and two caps on this shell, i.e. $S_{1}=\text{ShellCap}(\mathbf{z}_{0},\theta)$ and $S_{2}=\text{ShellCap}(\mathbf{y}_{0},\omega)$ , where $\angle(\mathbf{z}_{0},\mathbf{y}_{0})=\pi/2$ and $\theta+\omega>\pi/2$ . The following lemma bounds the intersection volume $|S_{1}\cap S_{2}|$ of these two shell caps.

Lemma C.2

For any $\epsilon>0$ there exists an $M(\epsilon)$ such that for $m>M(\epsilon)$ ,

[TABLE]

and

[TABLE]

Proof:

Using spherical coordinates, we have

[TABLE]

where the integral term on the right is bounded as

[TABLE]

Given $\epsilon>0$ , set $M=\max\{M_{1},M_{2}\}$ where $M_{1}$ is given by Lemma C.1 to ensure

[TABLE]

and $M_{2}$ is chosen to be sufficiently large so that the right-hand side of (C-B) satisfies

[TABLE]

Together with (C-B), this implies

[TABLE]

for $m>M$ .

For the inequality in the other direction, define $\mathbb{S}_{R_{U}}^{m-1}$ to be the $m-1$ sphere of radius $R_{U}$ with Haar measure $\mu_{R_{U}}$ . Then

[TABLE]

where the integral term on the right is bounded as

[TABLE]

Given $\epsilon>0$ , set $M=\max\{M_{1},M_{2}\}$ where $M_{1}$ is given by Lemma C.1 to ensure

[TABLE]

and $M_{2}$ is chosen to be sufficiently large so that the right-hand side of (74) satisfies

[TABLE]

Together with (C-B), this implies

[TABLE]

for $m>M$ . ∎

C-C Volume of the Intersection of Two Balls

Proof:

The intersection of $\text{Ball}(\mathbf{c}_{1},\sqrt{mR_{1}})$ and $\text{Ball}(\mathbf{c}_{2},\sqrt{mR_{1}})$ consists of two caps: $C_{1}$ and $C_{2}$ , as depicted in Fig. 7. To bound the volume of $\text{Ball}(\mathbf{c}_{1},\sqrt{mR_{1}})\cap\text{Ball}(\mathbf{c}_{2},\sqrt{mR_{1}})$ , we will bound $|C_{1}|$ and $|C_{2}|$ respectively.

We first bound $|C_{1}|$ . By the cosine formula, we have

[TABLE]

and therefore

[TABLE]

From Appendix C-A, we have for any $\epsilon>0$ and $m$ sufficiently large,

[TABLE]

where

[TABLE]

Similarly, we have

[TABLE]

and therefore

[TABLE]

Combining the above, we obtain

[TABLE]

for any $\epsilon>0$ and $m$ sufficiently large. ∎

Acknowledgement

The authors would like to acknowledge inspiring discussions with Liang-Liang Xie within a preceding collaboration [4]. They would also like to thank the anonymous reviewers and the Associate Editor for many valuable comments that helped improve the presentation of this paper.

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] X. Wu, L. Barnes, A. Ozgur, “Cover’s open problem: “The capacity of the relay channel”,” Proc. of 54th Annual Allerton Conference on Communication, Control, and Computing , Allerton Retreat Center, Monticello, Illinois, 2016.
2[2] T. M. Cover, “The capacity of the relay channel,” Open Problems in Communication and Computation , edited by T. M. Cover and B. Gopinath, Eds. New York: Springer-Verlag, 1987, pp. 72–73.
3[3] X. Wu and A. Ozgur, “Improving on the cut-set bound via geometric analysis of typical sets,” in Proc. of 2016 International Zurich Seminar on Communications .
4[4] X. Wu, A. Ozgur, L.-L. Xie, “Improving on the cut-set bound via geometric analysis of typical sets,” IEEE Trans. Inform. Theory , vol. 63, pp. 2254–2277, April 2017.
5[5] X. Wu and A. Ozgur, “Cut-set bound is loose for Gaussian relay networks,” in Proc. of 53rd Annual Allerton Conference on Communication, Control, and Computing , Allerton Retreat Center, Monticello, Illinois, Sept. 29–Oct. 1, 2015.
6[6] X. Wu and A. Ozgur, “Cut-set bound is loose for Gaussian relay networks,” IEEE Trans. Inform. Theory , vol. 64, pp. 1023–1037, February 2018.
7[7] X. Wu and A. Ozgur, “Improving on the cut-set bound for general primitive relay channels,” in Proc. of IEEE Int. Symposium on Information Theory , Barcelona, Spain, Jul. 2016.
8[8] X. Wu, L. Barnes and A. Ozgur, “The geometry of the relay channel,” in Proc. of IEEE Int. Symposium on Information Theory , Aachen, Germany, June 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

“The Capacity of the Relay Channel”:

Abstract

Index Terms:

I Problem Setup and Main Result

I-A Main Result

Theorem I.1

Theorem I.2

I-B Technical Approach

I-C Organization of The Paper

II Geometry of High-Dimensional Spheres

II-A Basic Results on High-Dimensional Spheres

Proposition II.1

Proposition II.2

Proof:

Proposition II.3

Proof:

II-B Extended Isoperimetry on the Sphere and the Shell

Theorem II.1

Theorem II.2

III Information Tension in

Theorem III.1

III-A Typicality Lemmas

Lemma III.1

Lemma III.2

Corollary III.1

Lemma III.3

III-B Proof of Theorem III.1

Lemma III.4

IV Proofs of Theorems I.1 and I.2

IV-A Proof of Theorem I.2

IV-B Proof of Theorem I.1

V Conclusion

Appendix A Proofs of Extended Isoperimetric Inequalities

A-A Preliminaries

Lemma A.1** (Baernstein and Taylor [25])**

A-B Proof of Theorem II.2 (The Shell Case)

A-C Proof of Theorem II.1 (The Sphere Case)

Appendix B Proofs of Typicality Lemmas

B-A Proof of Lemma III.1

B-B Proof of Lemma III.2

B-C Proof of Corollary III.1

B-D Proof of Lemma III.3

Appendix C Miscellaneous Results in High-Dimensional Geometry

C-A Surface Area (Volume) of A Spherical (Shell) Cap

C-B Surface Area (Volume) of the Intersection of Two Spherical (Shell) Caps

Lemma C.1

Proof:

Lemma C.2

Proof:

C-C *Volume of the Intersection of Two Balls *

Proof:

Acknowledgement

Lemma A.1 (Baernstein and Taylor [25])

C-C Volume of the Intersection of Two Balls