"The Capacity of the Relay Channel": Solution to Cover's Problem in the Gaussian Case
Xiugang Wu, Leighton Pate Barnes, Ayfer Ozgur

TL;DR
This paper solves a long-standing open problem by showing that in Gaussian relay channels, the capacity cannot be achieved with a finite relay link capacity, using a new high-dimensional geometric approach.
Contribution
The paper introduces a novel geometric method to bound the capacity of Gaussian relay channels, providing a definitive answer to Cover's problem.
Findings
Capacity cannot be achieved with finite relay link capacity in Gaussian channels.
Develops a new high-dimensional isoperimetric inequality extension.
Provides a new upper bound on the relay channel capacity.
Abstract
Consider a memoryless relay channel, where the relay is connected to the destination with an isolated bit pipe of capacity . Let denote the capacity of this channel as a function of . What is the critical value of such that first equals ? This is a long-standing open problem posed by Cover and named "The Capacity of the Relay Channel," in , Springer-Verlag, 1987. In this paper, we answer this question in the Gaussian case and show that can not equal to unless , regardless of the SNR of the Gaussian channels. This result follows as a corollary to a new upper bound we develop on the capacity of this channel. Instead of "single-letterizing" expressions involving information measures in a high-dimensional space as is typically done in converse results in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWireless Communication Security Techniques · Cooperative Communication and Network Coding · Limits and Structures in Graph Theory
“The Capacity of the Relay Channel”:
Solution to Cover’s Problem in the Gaussian Case
Xiugang Wu, Leighton Pate Barnes, and Ayfer Özgür The work was supported in part by NSF award CCF-1704624 and by the Center for Science of Information (CSoI), an NSF Science and Technology Center, under grant agreement CCF-0939370. This paper was presented in part at the 2016 Allerton Conference on Communication, Control, and Computing [1].X. Wu is with the Department of Electrical and Computer Engineering, University of Delaware, Newark, DE 19716, USA (e-mail: [email protected]). The work of X. Wu was done when he was with Stanford University.L. P. Barnes and A. Özgür are with the Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA (e-mail: [email protected]; [email protected]).
Abstract
Consider a memoryless relay channel, where the relay is connected to the destination with an isolated bit pipe of capacity . Let denote the capacity of this channel as a function of . What is the critical value of such that first equals ? This is a long-standing open problem posed by Cover and named “The Capacity of the Relay Channel,” in Open Problems in Communication and Computation, Springer-Verlag, 1987. In this paper, we answer this question in the Gaussian case and show that can not equal to unless , regardless of the SNR of the Gaussian channels. This result follows as a corollary to a new upper bound we develop on the capacity of this channel. Instead of “single-letterizing” expressions involving information measures in a high-dimensional space as is typically done in converse results in information theory, our proof directly quantifies the tension between the pertinent -letter forms. This is done by translating the information tension problem to a problem in high-dimensional geometry. As an intermediate result, we develop an extension of the classical isoperimetric inequality on a high-dimensional sphere, which can be of interest in its own right.
Index Terms:
Relay channel, capacity, information inequality, geometry, isoperimetric inequality, concentration of measure
I Problem Setup and Main Result
In 1987, Thomas M. Cover formulated a seemingly simple question in Open Problems in Communication and Computation, Springer-Verlag [2], which he called “The Capacity of the Relay Channel”. This problem, not much longer than a single page in [2], remains open to date. His problem statement, taken verbatim from [2] with only a few minor notation changes, is as follows:
The Capacity of the Relay Channel
Consider the following seemingly simple discrete memoryless relay channel:
Here and are conditionally independent and conditionally identically distributed given , that is, . Also, the channel from to does not interfere with . A code for this channel is a map , a relay function and a decoding function . The probability of error is given by
[TABLE]
where the message is uniformly distributed over and
[TABLE]
Let be the supremum of achievable rates for a given , that is, the supremum of the rates for which can be made to tend to zero. We note the following facts:
**
- 2.
**
- 3.
* is a nondecreasing function of .*
*What is the critical value of such that first equals ? *
I-A Main Result
As is customary in network information theory, Cover formulates the problem for discrete memoryless channels. However, the same question clearly applies to channels with continuous input and output alphabets, and in particular when the channels from the source to the relay and the destination are Gaussian, which is the canonical model for wireless relay channels. More formally, assume
[TABLE]
with the transmitted signal being constrained to average power , i.e.,
[TABLE]
and representing Gaussian noises that are independent of each other and . See Fig. 1.
For this Gaussian relay channel, it is easy to observe that111All logarithms throughout the paper are to base two.
[TABLE]
Let denote the threshold in Cover’s problem, i.e.
[TABLE]
For the Gaussian model, there is no known scheme that allows to achieve at a finite regardless of the parameters of the channels, i.e. the signal to noise power ratio (SNR) . Therefore, from an achievability perspective we only have the trivial bound
[TABLE]
On the converse side, any upper bound on the capacity of this channel can be used to establish a lower bound on . The only upper bound on the capacity of this channel (prior to our work in [5]–[6] preceding the current paper) was the celebrated cut-set bound developed by Cover and El Gamal in 1979 [10]. It yields the following lower bound on :
[TABLE]
Note that the cut-set bound does not preclude achieving at finite . Moreover, it is interesting to note that as decreases to zero, this lower bound decreases to zero. This implies a sharp dichotomy between the current achievability and converse results for this problem, which becomes even more apparent in the limit when SNR goes to zero: the cut-set bound does not preclude achieving at diminishing if itself is diminishing, while from an achievability perspective we need regardless of the SNRs of the channels (apart from the trivial case when is exactly equal to [math]). The main result of our paper is to show that regardless of the parameters of the problem, answering Cover’s long-standing question for the canonical Gaussian model.
Theorem I.1
For the symmetric Gaussian relay channel depicted in Fig. 1, .
This theorem follows immediately from the following theorem which establishes a new upper bound on the capacity of this channel for any .
Theorem I.2
For the symmetric Gaussian relay channel depicted in Fig. 1, the capacity satisfies
[TABLE]
*where *
[TABLE]
In Fig. 2 we plot this upper bound (label: New bound) under three different SNR values of the Gaussian channels, together with the cut-set bound [10] and an upper bound on the capacity of this channel we have previously derived in [6] (label: Old bound). For reference, we also provide the rate achieved by a compress-and-forward relay strategy (label: C-F), which employs Gaussian input distribution at the source combined with Gaussian quantization and Wyner-Ziv binning at the relay.222In the low SNR regime, we can achieve higher rates using bursty compress-and-forward [21], as demonstrated in the left-most plot of Fig. 2. Note that since we still impose the Gaussian restriction on the input and quantization distributions for bursty compress-forward, the resultant rates are not concave in and can be further improved by time sharing. The flat levels at which the cut-set bound and our old bound saturate in these plots precisely correspond to . Note that while these earlier bounds reach at finite values, hence leading to finite lower bounds on , our new bound remains bounded away from in all the three plots. Indeed, it can be formally shown that the new bound remains bounded away from (the flat level in the plots) at any finite value. We prove this formally in the proof of Theorem I.1.
While in this paper we restrict our attention to the symmetric case, an assumption imposed by Cover in his original formulation of the problem given above, our methods and results also extend to the asymmetric case. In [8], we show that when the relay’s and the destination’s observations are corrupted by independent Gaussian noises of different variances, it is still true that regardless of the channel parameters. The extension to this asymmetric case heavily builds on the methods and results we develop in this paper for the symmetric case. Interestingly, the symmetric case, which Cover seems to somewhat arbitrarily assume in his problem formulation, turns out to be the canonical case for our proof technique. We also provide a solution to Cover’s problem for binary symmetric channels in [9] using a similar approach.
I-B Technical Approach
There are two basic aspects in an information-theoretic characterization of an operational problem: the so-called achievability result and converse result. An achievability result establishes what is possible in a given setting, while the converse result distinguishes what is impossible. The ideal situation is when these two results match, in which case an information limit is born. The most famous example goes back to Shannon and the inception of the field: Reliable communication is possible over a noisy channel if, and only if, the rate of transmission does not exceed the capacity of the channel [18].
Over the last two decades, there has been significant leap forward in developing achievable schemes for multi-user problems, ranging from schemes based on interference alignment and distributed MIMO, to lattice-based techniques, to strategies inspired by network coding and linear deterministic models. This stands in fairly stark contrast to the set of converse arguments in the information theorist’s toolkit. Almost all converse arguments rely on a few fundamental tools that go back to the early years of the field: information measure calculus (e.g., chain rules, non-negativity of divergence), Fano’s inequality, and the entropy power inequality. The typical converse program follows from a clever application of these tools to “single-letterize” an expression involving information measures in a high-dimensional space (so called -letter forms), with the possible introduction of auxiliary random variables as needed.
In this paper, we take a different approach. Instead of focusing on single-letterizing pertinent -letter forms, we aim to directly quantify the tension between them. To do this, we lift the problem to an even higher dimensional space and study the geometry of the typical sequences generated independently and identically (i.i.d.) from these -dimensional distributions. We establish non-trivial geometric properties satisfied by these typical sequences, which are then translated to inequalities satisfied by the original -dimensional information measures. This notion of “typicality”, connecting information measures associated with a distribution to probabilities of long i.i.d. sequences generated from this distribution, is a standard tool in establishing achievability results in information theory but to the best of our knowledge has been rarely used in proving converse results in network information theory, with only a few examples such as the work of Zhang [11] from 1988 and our recent works [3]–[7].
To study the geometry of the typical sequences, we use classical tools from high-dimensional geometry, such as the isoperimetric inequality [14], measure concentration [12], and rearrangement and symmetrization theory [13, 25]. We also prove a new geometric result which can be regarded as an extension of the classical isoperimetric inequality on a high-dimensional sphere and can be of interest in its own right. Note that the classical isoperimetric inequality on the sphere states that among all sets on the sphere with a given measure (area), the spherical cap has the smallest boundary or more generally the smallest neighborhood [16]. As an intermediate result in this paper, we show that the spherical cap not only minimizes the measure of its neighborhood, but roughly speaking, also minimizes the measure of its intersection with the neighborhood of a randomly chosen point on the sphere.
The incorporation of geometric insight in information theory is not new. Formulating the problem of determining the communication capacity of channels as a problem in high-dimensional geometry is indeed one of Shannon’s most important insights that has led to the conception of the field. In his classical paper “Communication in the presence of noise”, 1949 [17], Shannon develops a geometric representation of any point-to-point communication system, and then uses this geometric representation to derive the capacity formula for the AWGN channel. His converse proof is based on a sphere-packing argument, which relies on the notion of sphere hardening (i.e. measure concentration) in high-dimensional space. Our approach resembles Shannon’s approach in [17] in that the main argument in our proof is also a packing argument; however, instead of packing smaller spheres in a larger sphere, we pack (quantization) regions of some minimal measure (and unknown shape) inside a spherical cap. The key ingredient in our packing argument is the extended isoperimetric inequality we develop, which guarantees that each of these quantization regions has some minimal intersection with the spherical cap. Also, note that we do not directly study the geometry of the codewords as in [17], but rather use geometry in an indirect way to solve an -letter information tension problem.
I-C Organization of The Paper
The remainder of the paper is organized as follows. In Section II, we review some basic definitions and results for high-dimensional spheres, and state our main geometric result in Theorem II.2, which can be regarded as an extension of the classical isoperimetric inequality on the sphere. In Section III, we introduce some typicality lemmas and combine them with Theorem II.2 to prove a key information inequality stated in Theorem III.1. The proofs of our main theorems, Theorem I.1 and I.2, are almost immediate given Theorem III.1 and are provided in Section IV.
Appendices A and B are then devoted to the proof of Theorem II.2 and the proofs of the typicality lemmas introduced in Section III, respectively. The proofs of these typicality lemmas require us to derive formulas and exponential characterizations for the area/volume of various high dimensional sets including balls, spherical caps, shell caps, and intersections of such sets. We derive these characterizations in Appendix C.
II Geometry of High-Dimensional Spheres
In this section, we summarize some basic definitions and results for high-dimensional spheres and present our main geometric result which can be regarded as an extension of the classical isoperimetric inequality on high-dimensional spheres. This result is the key to proving the information inequality we present in the next section, which in turn is the key to proving Theorems I.1 and I.2.
II-A Basic Results on High-Dimensional Spheres
We now summarize some basic results on high-dimensional spheres that will be referred to later in the paper.
- (i)
Isoperimetric Inequality: Let denote the -sphere of radius , i.e.,
[TABLE]
equipped with the rotation invariant (Haar) measure that is normalized such that
[TABLE]
i.e. the usual surface area. Let denote the probability of a set or event with respect to the corresponding Haar probability measure, i.e. the normalized Haar measure such that . A spherical cap is defined as a ball on in the geodesic metric (or simply the angle) , i.e.,
[TABLE]
See Fig. 3. We will often say that an arbitrary set has an effective angle if , where for some arbitrary .
The following proposition is the so-called isoperimetric inequality, which was first proved by Levy in 1951 [14]. (See also [16].) It states the intuitive fact that among all sets on the sphere with a given measure, the spherical cap has the smallest boundary, or more generally the smallest neighborhood. This is formalized as follows:
Proposition II.1
For any arbitrary set such that , where is a spherical cap, it holds that
[TABLE]
where is the -neighborhood of , defined as
[TABLE]
and similarly
[TABLE]
- (ii)
Measure Concentration: Measure concentration on the sphere refers to the fact that most of the measure of a high-dimensional sphere is concentrated around any equator. The following elementary result capturing this phenomenon will be used later in the paper when we prove the extended isoperimetric inequality.
Proposition II.2
Given any , there exists some such that for any and any ,
[TABLE]
where is distributed according to the Haar probability measure.
Proof:
Let . Note for any , the distribution of is the same as the distribution of , since can be written in the form , where is an orthogonal matrix, and the distribution of is rotation-invariant. Therefore, without loss of generality, we can assume . Since , we have ; we also have because and . Therefore by Chebyshev’s inequality, for any ,
[TABLE]
Recalling that and noting that the R.H.S. of the above inequality can be made arbitrarily small by choosing to be sufficiently large, we have proved the proposition. ∎
- (iii)
Blowing-Up Lemma: The above measure concentration result combined with the isoperimetric inequality immediately yields the following result:
Proposition II.3
Let be an arbitrary set and be a spherical cap such that , i.e. has an effective angle of . Then for any and sufficiently large,
[TABLE]
Proof:
If , due to Proposition II.2. If is not a spherical cap, then where , due to the isoperimetric inequality in Proposition II.1. ∎
If we take to be a half sphere, this result says that most of the measure of the sphere is concentrated around the boundary of this half-sphere, i.e. an equator, which is the result in Proposition II.2. However, due to the isoperimetric inequality, Proposition II.3 allows us to make the stronger statement that the measure is concentrated around the boundary of any set with probability . While the elementary results we establish above suggest that this concentration takes place at a polynomial speed in the dimension , it can be shown that the measure concentrates around the boundary of any set with probability exponentially fast in the dimension ; see [15].
II-B Extended Isoperimetry on the Sphere and the Shell
An almost equivalent way to state the blowing-up lemma in Proposition II.3 is the following: Let be an arbitrary set with effective angle . Then for any and sufficiently large ,
[TABLE]
where is distributed according to the normalized Haar measure on . In words, if we take a uniformly at random on the sphere and draw a spherical cap of angle slightly larger than around it, this cap will intersect the set with high probability. This statement is almost equivalent to (4) since the ’s for which the intersection has non-zero measure lie in the -neighborhood of . Note that similarly to Proposition II.3, this statement would trivially follow from measure concentration on the sphere (Proposition II.2) if were known to be a spherical cap, and it holds for any due to the isoperimetric inequality in Proposition II.1. By building on the Riesz rearrangement inequality [25], we prove the following extended result:
Theorem II.1
Let be any arbitrary subset of with effective angle , and let where with and . (See Fig. 4.) Then for any , there exists an such that for ,
[TABLE]
where is a random vector on distributed according to the normalized Haar measure.
If itself is a cap, then the statement in Theorem II.1 is straightforward and follows from the fact that with high probability will be concentrated around the equator at angle from the pole of (Proposition II.2). Therefore, as gets large for almost all , the intersection of the two spherical caps will be given by . See Fig. 4. The statement, however, is stronger than this and holds for any arbitrary set , analogous to the isoperimetric inequality in (5). It states that no matter what the set is, if we take a random point on the sphere and draw a cap of angle slightly larger than centered at this point, for any , then with high probability the intersection of the cap with the set would be at least as large as the intersection we would get if were a spherical cap. In this sense, Theorem II.1 can be regarded as an extension of the isoperimetric inequality in Proposition II.1, even though the latter can be stated purely geometrically and implies the weaker probabilistic statement in (5), while our result is inherently probabilistic.
Theorem II.1 is in fact a special case of a more general theorem that is true for subsets on a spherical shell. Let
[TABLE]
be this shell, where . A cap on this shell with pole and angle can be defined as a ball in terms of the angle:
[TABLE]
on the shell, i.e.,
[TABLE]
Let denote the standard -dimensional Euclidean measure of a subset . We will say that an arbitrary set has effective angle if its measure is equal to that of a shell cap of angle , i.e. for some . We will also say that a probability measure for subsets of is rotationally invariant if for any orthogonal matrix , where denotes the image of the set under the linear transformation . The following more general theorem holds in the shell setting.
Theorem II.2
Let be any arbitrary subset of with effective angle , and let where with and . Then for any , there exists an such that for ,
[TABLE]
where is a random vector drawn from any rotationally invariant probability measure on .
We prove Theorems II.1 and II.2 in Appendix A. Note that in these two results depends only on —in particular it does not depend on the radius parameters for and , respectively, which means that these two results also apply if the radius parameters depend on the dimension . In the following section, we will be mainly interested in the case when the radius parameters scale in the square-root of the dimension.
III Information Tension in
A Symmetric Markov Chain
In this section, we prove an inequality between information measures in a certain type of Markov chain, which can be of interest in its own right. The proof of this inequality builds on Theorem II.2 from the previous section. As we will see in Section IV, the main theorems in this paper, i.e. Theorems I.1 and I.2, are almost immediate given this result. We now state this result in the following theorem.
Theorem III.1
Consider a Markov chain where , and are -length random vectors and is a deterministic mapping of to a set of integers. Assume moreover that and are i.i.d. white Gaussian vectors given , i.e. where denotes the identity matrix, , and for some . Then the following inequality holds for any ,
[TABLE]
Note that is trivially lower bounded by for any Markov chain . The above theorem says that if satisfies the conditions of the theorem, then can also be upper bounded in terms of . In particular, it provides an upper bound on in terms of . It can be easily verified that this upper bound on is decreasing with increasing , or equivalently decreasing with decreasing , and implies that as .
We next turn to proving Theorem III.1. The reader who is interested in seeing how this theorem leads to Theorems I.1 and I.2, without seeing its own proof, can jump to Section IV. In order to prove Theorem III.1, we will first establish some properties that are satisfied with high probability by long i.i.d. sequences generated from the source distribution satisfying the assumptions of the theorem. We now state and discuss these properties in Section III-A and then use them to prove Theorem III.1 in Section III-B.
III-A Typicality Lemmas
Assume satisfy the assumptions of Theorem III.1. Consider the -length i.i.d. sequence
[TABLE]
where for any , has the same distribution as . For notational convenience, in the sequel we write the -length sequence as and similarly define and ; note that we have . Also let denote the spherical shell
[TABLE]
and let denote the Euclidean ball
[TABLE]
We next state several properties that satisfy with high probability when is large. The proofs of these properties are given in Appendix B.
Lemma III.1
For any and sufficiently large, we have
[TABLE]
where and are defined to be the following two events respectively:
[TABLE]
and
[TABLE]
The proof of this lemma is a simple application of the law of large numbers and is included in Appendix B-A. The lemma simply states that when is large, and will concentrate in a thin -dimensional shell of radius .
Lemma III.2
Given any and a pair of , let be a set of ’s defined as333Note that under this definition of , if a pair doesn’t satisfy then the set is empty because no can satisfy the condition in (12).
[TABLE]
where as in Theorem III.1. Then for sufficiently large, there exists a set of pairs, such that
[TABLE]
and for any ,
[TABLE]
This lemma establishes the existence of a high probability set of sequences, and a conditional typical set for each such that satisfies some natural properties. Note that all properties in the definition of as well as (14) are analogous to properties of strongly typical sets as stated in [21, Ch. 2]. However, the notion of strong typicality does not apply to the current case since and are continuous random vectors and may or may not be continuous. Nevertheless, analogous properties can still be proved in this case; see the proof of this lemma in Appendix B-B.
The following result has a slightly different flavor from the previous two lemmas in that it is simply a corollary of Theorem II.2 from Section II.
Corollary III.1
For any such that , consider the spherical shell in
[TABLE]
Let be an arbitrary subset on this shell with volume
[TABLE]
where . For any and sufficiently large, we have
[TABLE]
where is drawn from any rotationally invariant distribution on the .
This is a simple corollary of Theorem II.2 when applied to a specific shell and a subset of this shell with measure prescribed by (15). The prescribed measure means that has an effective angle (asymptotically) greater than or equal to . The corollary follows by observing that due to the triangle inequality (see also Fig. 5), for any in the shell, considered in Theorem II.2 is contained in the Euclidean ball
[TABLE]
The lower bound on the intersection volume in (16) follows from an explicit characterization of
[TABLE]
in Theorem II.2, where and ; see Appendix C-B, and in particular Lemma C.2, for this characterization. A formal proof of Corollary III.1 is given in Appendix B-C.
The above corollary together with Lemma III.2 leads to the following lemma.
Lemma III.3
For any and sufficiently large, we have
[TABLE]
where is defined to be the following event:
[TABLE]
in which and .
This lemma can also be regarded as a typicality lemma as it states a property satisfied by pair with high probability when is large. However, this is a non-trivial property. The lemma follows by first fixing a pair and showing that the volume of the set defined in Lemma III.2 can be lower bounded by
[TABLE]
up to the first order term in the exponent. Since by definition is a subset of the shell
[TABLE]
and given , is isotropic Gaussian (therefore rotationally invariant around when constrained to this shell), we can apply Corollary III.1 to the above shell by choosing the set to be . This allows us to conclude that
[TABLE]
The conclusion of Lemma III.3 then follows by observing that by definition
[TABLE]
and removing the conditioning with respect to in (III-A). The formal proof of Lemma III.3 is given in Appendix B-D.
III-B Proof of Theorem III.1
We are now ready to prove Theorem III.1, which mainly builds on Lemma III.3. Consider a that with high probability lies in the ball with center and approximate radius , and draw another ball around of approximate radius and intersect this ball with the original ball; equivalently, this corresponds to considering a cap around of angle on the original ball (see Fig. 6). Lemma III.3 asserts that this cap around will have a certain minimal intersection volume with . In other words, there is a subset of this cap with certain minimal volume that is mapped to . This naturally lends itself to a packing argument: the number of distinct values plausible under a given can be upper bounded by the ratio between the volume of the cap around and the minimal intersection volume occupied for each distinct . This in turn leads to a bound on .
We now proceed with the formal proof. Consider the indicator function
[TABLE]
where is defined as
[TABLE]
and the events and are as given by (8), (9) and (17) respectively. Obviously, by the union bound, we have
[TABLE]
for any and sufficiently large, and therefore
[TABLE]
To bound , it suffices to bound for any
[TABLE]
For this, we apply a packing argument as follows. Consider a ball centered at any satisfying (20) and of radius , i.e.,
[TABLE]
where satisfies
[TABLE]
We now use the following lemma (whose proof is included in Appendix C-C) to upper bound the volume of the intersection between this ball and , i.e.,
[TABLE]
Lemma III.4
Let and be two balls in with , where satisfies . Then for any and sufficiently large, we have
[TABLE]
where
[TABLE]
Using the above lemma, we have for sufficiently large,
[TABLE]
for some as , where the first inequality is an immediate application of Lemma III.4, the first equality follows from the fact that
[TABLE]
and the continuity of the function in its arguments, and the second equality follows from a simple evaluation of .
On the other hand, the condition (c.f. the definition of in Lemma III.3) also ensures that
[TABLE]
Since are disjoint sets for different , given and , the number of different possible values for can be upper bounded by the ratio between
[TABLE]
and
[TABLE]
which can be further upper bounded by
[TABLE]
where as . This immediately implies the following upper bound on and therefore ,
[TABLE]
which combined with (19) yields that
[TABLE]
Dividing both sides of the above inequality by and noting that
[TABLE]
we have
[TABLE]
which holds for any
[TABLE]
Since and in (21)–(22) can all be made arbitrarily small by choosing sufficiently large, we obtain
[TABLE]
for any . This completes the proof of Theorem III.1.
IV Proofs of Theorems I.1 and I.2
We now prove Theorem I.2 by using Theorem III.1, and use Theorem I.2 to prove Theorem I.1.
IV-A Proof of Theorem I.2
Suppose a rate is achievable. Then there exists a sequence of codes such that the average probability of error as . Let the relay’s transmission be denoted by . By standard information theoretic arguments, for this sequence of codes we have
[TABLE]
for any and sufficiently large. In the above, (24) follows from applying the data processing inequality to the Markov chain and Fano’s inequality, (25) uses the fact that form a Markov chain and thus , (26) follows by defining the time sharing random variable to be uniformly distributed over , and (27) follows because
[TABLE]
Given (27), the standard way to proceed would be to upper bound the first entropy term by and lower bound the second entropy term simply by [math]. This would lead to the so-called multiple-access bound in the well-known cut-set bound on the capacity of this channel [10]. However, as we already point out in our previous works [3]–[7], this leads to a loose bound since it does not capture the inherent tension between how large the first entropy term can be and how small the second one can be. Instead, we can use Theorem III.1 to more tightly upper bound the difference in (27).
We start by verifying that the random variables and associated with a code of blocklength satisfy the conditions in Theorem III.1. It is trivial to observe that they satisfy the required Markov chain condition and and are i.i.d. Gaussian given due to the channel structure. Also assume that
[TABLE]
with , and assume that . Then, applying Theorem III.1 to the random variables associated with a code for the relay channel, we have
[TABLE]
and therefore,
[TABLE]
where is defined as
[TABLE]
in which satisfies
[TABLE]
Plugging (28) into (27), we conclude that for any achievable rate ,
[TABLE]
At the same time, for any achievable rate , we also have
[TABLE]
which simply follows from (27) by upper bounding with and plugging in the definition of . Therefore, if a rate is achievable, then for any and sufficiently large it should simultaneously satisfy both (31) and (32) for some that satisfies the condition in (30). This concludes the proof of the theorem.
IV-B Proof of Theorem I.1
In order to show that Theorem I.1 follows from Theorem I.2, consider the following bound on implied by Theorem I.2:
[TABLE]
With defined as , we can upper bound the right-hand side of (33) to obtain
[TABLE]
Also because given any fixed , for any , we further have
[TABLE]
The significance of the function is that for any ,
[TABLE]
and is increasing at , or more precisely,
[TABLE]
Therefore, as long as , which is the case when is finite, the minimization of with respect to in (34) yields a value strictly smaller than in (35). This would allow us to conclude that the capacity for any finite is strictly smaller than .
We now formalize the above argument. Using the definition of the derivative, one obtains
[TABLE]
Therefore, there exists a sufficiently small such that and
[TABLE]
For such we have
[TABLE]
which further implies that
[TABLE]
Combining (34) and (36) we obtain that for any finite , there exists some such that
[TABLE]
This proves Theorem I.1.
V Conclusion
We have proved a new upper bound on the capacity of the Gaussian relay channel and solved a problem posed by Cover in [2], which has remained open since 1987. The derivation of our upper bound focuses on directly characterizing the tension between information measures of pertinent -letter random variables. In particular, this is done via the following steps:
- •
we first use “typicality” to translate the information tension problem to a problem regarding the geometry of the typical sets of these -letter random variables;
- •
we then use results and tools in the (broadly defined) field of concentration of measure, in particular rearrangement theory, to establish non-trivial geometric properties for these typical sets;
- •
we finally use these geometric properties to construct a packing argument, which leads to an inequality between the original -letter information measures.
In contrast, the typical program for proving converses in network information theory focuses on “single-letterizing” -letter information measures. This makes it difficult to invoke tools from geometry and concentration of measure, which in retrospect appear well-suited for quantifying information tensions that lie at the hearth of network problems. Indeed, to the best of our knowledge, the use of concentration of measure in information theory has been mostly limited to establishing strong converses for problems whose capacity is already known (c.f., e.g. [26, 12]), and it has been rarely used to derive first-order results, i.e. bounds on the capacity of multi-user networks. Our proof suggests that measure concentration, in particular geometric inequalities and their functional counterparts, can have a bigger role to play in network information theory. It would be interesting to better understand this role and see if the program developed in this paper can be used to prove converses for other open problems in network information theory.
Appendix A Proofs of Extended Isoperimetric Inequalities
In this appendix, we prove the extended isoperimetric inequalities on the sphere and on the shell, as stated in Theorems II.1 and II.2 respectively. In particular, we will first prove the shell case and then show that the sphere case follows as a corollary.
A-A Preliminaries
We begin with some preliminaries that will be used in the proofs. Our main tool for proving Theorems II.1 and II.2 is the symmetric decreasing rearrangement of functions on the sphere, along with a version of the Riesz rearrangement inequality on the sphere due to Baernstein and Taylor [25].
For any measurable function and pole , the symmetric decreasing rearrangement of about is defined to be the function such that depends only on the angle , is nonincreasing in , and has super-level sets of the same Haar measure as , i.e.
[TABLE]
for all . The function is unique up to its value on sets of measure zero.
One important special case is when the function is the characteristic function for a subset . The function is just the function such that
[TABLE]
In this case, is equal to the characteristic function associated with a spherical cap of the same size as . In other words, if is a spherical cap about the pole such that , then .
Lemma A.1** (Baernstein and Taylor [25])**
Let be a nondecreasing bounded measurable function on the interval . Then for all functions ,
[TABLE]
For any , define
[TABLE]
to be the inner integral in Lemma A.1. When applying Lemma A.1 we will use test functions that are characteristic functions. Let where for some (i.e. is a super-level set of ). For a fixed measure , the left-hand side of the inequality from Lemma A.1 will be maximized by this choice of . With this choice we have the following equality:
[TABLE]
This follows from the layer-cake decomposition for any non-negative and measurable function in that
[TABLE]
Using this equality and our choice for we will rewrite the inequality from Lemma A.1 as
[TABLE]
where
[TABLE]
Note that both and are spherically symmetric. More concretely, they both depend only on the angle , so in an abuse of notation we will write and where .
For convenience we will define a measure by
[TABLE]
where denotes the Haar measure of the -sphere with radius . We do this so that an integral like
[TABLE]
can be expressed as
[TABLE]
A-B Proof of Theorem II.2 (The Shell Case)
Let be a given subset with effective angle . In order to apply Lemma A.1, note that
[TABLE]
by using spherical coordinates, so that if we define
[TABLE]
for and
[TABLE]
then
[TABLE]
Both and can be thought of as functions on the sphere . Let be their respective symmetric decreasing rearrangements about a pole . Define
[TABLE]
so that by definition we have (39).
The inequality (39) allows to compare and , but we require a way to compare with the function arising from a shell cap of angle . Let
[TABLE]
and
[TABLE]
We will show that
[TABLE]
so that along with (39),
[TABLE]
To show the inequality (41) note
[TABLE]
The term inside the parentheses is the measure of the intersection between the cap centered at and a cap of angle centered at . This intersection measure is a function only of the angle and is nonincreasing in that angle. Consider functions with and . Both and satisfy these properties and moreover is extremal in the sense that when and [math] when . Therefore (A-B) is maximized by replacing with , and
[TABLE]
Equipped with (42), we are now ready to finish the proof of Theorem II.2. Proposition II.2 implies that for any , there exists an such that for we have
[TABLE]
where is drawn from any rotationally invariant distribution on . Because the random quantity depends only on the direction of , and not on its magnitude, we can instead consider to be distributed according to the Haar measure on . The constant is determined only by the concentration of measure phenomenon cited above, and it does not depend on any parameters in the problem other than . From now on, let us restrict our attention to dimensions . Due to the triangle inequality for the geodesic metric, for such that we have
[TABLE]
where is such that . Therefore,
[TABLE]
for all for such that and
[TABLE]
To prove the theorem, we need to show that
[TABLE]
for any arbitrary set . Recall that by the definition of a decreasing symmetric rearrangement, we have
[TABLE]
for any threshold and this implies
[TABLE]
Therefore, the desired statement in (A-B) can be equivalently written as
[TABLE]
Turning to proving (49), recall that by the definition of a decreasing symmetric rearrangement, is nonincreasing in the angle over the interval . Let be the smallest value such that , or more explicitly,
[TABLE]
If , then (49) would follow trivially from (44) and the fact that would be greater than for all . We will therefore assume that . It remains to show that even if this is the case, we have (49).
By the definition of and the fact that is nonincreasing,
[TABLE]
To bound the first and third terms of (50) note that
[TABLE]
as a consequence of (44). In order to bound the second term, we establish the following chain of (in)equalities which will be justified below.
[TABLE]
Combining (57) with (51) reveals that the second term in (50) is also bounded by , therefore
[TABLE]
must be bounded by , which proves Theorem II.2.
The first inequality (53) is a consequence of the fact that over the range of the integral, is less than or equal to and is non-negative. The equality in (54) follows from
[TABLE]
which is itself a consequence of (38) with and
[TABLE]
Next we have (55) which is due to the rearrangement inequality (42) when is the super-level set . By the definition of a symmetric decreasing rearrangement, , and the set on the right-hand side is an open or closed spherical cap of angle . Thus is a spherical cap with angle and the rearrangement inequality (42) gives
[TABLE]
Finally, for the inequality (56), we first replace the lower integral limit with . Then over the range of the integral due to (45). Additionally, over the range of the integral, and the inequality follows.
A-C Proof of Theorem II.1 (The Sphere Case)
Given any with effective angle , construct a corresponding
[TABLE]
The set also has effective angle as a subset of since
[TABLE]
For any , we can apply Theorem II.2 to find an such that for ,
[TABLE]
where with . Because the set depends only on the direction of , and not on its magnitude, the probability in (59) is the same whether we consider to be uniformly distributed on or from some rotationally invariant probability distribution on . Using spherical coordinates, we have
[TABLE]
and similarly,
[TABLE]
By dividing out the term, (59) implies
[TABLE]
where as desired.
Appendix B Proofs of Typicality Lemmas
Here we prove the typicality lemmas presented in Section III-A.
B-A Proof of Lemma III.1
Recalling that , we have
[TABLE]
Therefore by the weak law of large numbers, for any and sufficiently large we have
[TABLE]
i.e.,
[TABLE]
since by assumption and thus . Because and are identically distributed, the above relation also holds with replaced by . This completes the proof of the lemma.
B-B Proof of Lemma III.2
We now present the proof of Lemma III.2. By the law of large numbers and Lemma III.1, we have for any and sufficiently large ,
[TABLE]
where
[TABLE]
Note that in terms of , the set in Lemma III.2 can be simply written as
[TABLE]
Therefore, for sufficiently large, we have
[TABLE]
On the other hand, defining , we have
[TABLE]
Therefore, we have for sufficiently large,
[TABLE]
and thus
[TABLE]
which proves (13).
To prove (14), consider any . From the definition of , . Therefore, must be nonempty, i.e., there exists at least one . Consider any . By the definition of , we have and . Then, it follows from the definition of that
[TABLE]
This further implies that
[TABLE]
for sufficiently large , which concludes the proof of (14) and Lemma III.2.
B-C Proof of Corollary III.1
Let the effective angle of be denoted by , i.e.,
[TABLE]
for some
[TABLE]
where
[TABLE]
Then using the formula for the volume of a shell cap (c.f. Appendix C-A and in particular (66)), we have
[TABLE]
for some as . Recall that by assumption
[TABLE]
and we hence have
[TABLE]
for some as .
We now apply Theorem II.2 to this specific shell and subset . First, using the formula of the intersection volume of two shell caps (c.f. Appendices C-B and in particular Lemma C.2), we have
[TABLE]
for some as , where and . Then Theorem II.2 asserts that for any and sufficiently large,
[TABLE]
where is a random vector drawn from any rotationally invariant distribution on the shell. Since , the condition in the above can be replaced with the weaker condition . Now by choosing sufficiently large we can make and as small as desired, so we have
[TABLE]
for any and sufficiently large. Finally, observe that for any in the considered shell,
[TABLE]
This simply follows from the geometry illustrated in Fig. 5 combined with the triangle inequality and the fact that the thickness of the shell can be trivially bounded by . Therefore, we can conclude that
[TABLE]
for any and sufficiently large. This completes the proof of Corollary III.1.
B-D Proof of Lemma III.3
Fix and consider a pair . From Lemma III.2, we have
[TABLE]
for sufficiently large. We also have
[TABLE]
for some as , where refers to the conditional density of given . The second inequality in the above follows because for any , we have
[TABLE]
and therefore using the fact that is Gaussian distributed given , we have for any ,
[TABLE]
where as . Therefore, for sufficiently large, the volume of can be lower bounded by
[TABLE]
Let be defined such that
[TABLE]
Obviously, we have and as . Noting that is a subset of
[TABLE]
by Corollary III.1, for any we have
[TABLE]
for any drawn from a rotationally invariant distribution around on , where is defined such that
[TABLE]
and is defined such that
[TABLE]
and both and tend to zero as goes to zero.
We now translate the bound (61) on the probability involving a rotationally invariantly distributed on the shell to a bound on the probability involving . Define to be the following set of :
[TABLE]
Then we have for and sufficiently large,
[TABLE]
where the second inequality simply follows by applying the law of large numbers in a manner similar to the proof of Lemma III.1, and the last inequality follows from combining (61) and the fact that if is known and is restricted to then is rotationally invariant around on this shell.
Since by definition is a subset of we have
[TABLE]
for any , and therefore for sufficiently large,
[TABLE]
for any . Finally, choosing concludes the proof of Lemma III.3. Note that by choosing sufficiently large, and therefore can be made arbitrarily small.
Appendix C Miscellaneous Results in High-Dimensional Geometry
This appendix derives some miscellaneous results in high-dimensional geometry, including the surface area (volume) of a spherical (shell) cap, the surface area (volume) of the intersection of two spherical (shell) caps, and the volume of the intersection of two balls.
C-A Surface Area (Volume) of A Spherical (Shell) Cap
We first derive the surface area (volume) formula for a spherical (shell) cap. See also [23].
Let be a spherical cap with angle on the -sphere of radius . The area of can be written as
[TABLE]
where is the total surface area of the -sphere of radius . Plugging in the expression for the surface area of an -sphere leads to
[TABLE]
We now characterize the exponent of . First, by Stirling’s approximation, in the above can be bounded as
[TABLE]
for some as . Also for sufficiently large, we have
[TABLE]
and
[TABLE]
for some as . Therefore, the area can be bounded as
[TABLE]
for some as .
Now suppose that is a shell cap on
[TABLE]
where . Let , and define to be the sphere of radius with Haar measure . We use spherical coordinates to integrate over the surface areas of the individual caps that make up the shell cap,
[TABLE]
where the integral term on the right is bounded as
[TABLE]
Together with (C-A), (63) and (65) imply
[TABLE]
for sufficiently large . In a similar way,
[TABLE]
and therefore
[TABLE]
where as .
C-B Surface Area (Volume) of the Intersection of Two Spherical (Shell) Caps
Recall is the -sphere of radius . Let
[TABLE]
be two spherical caps on such that , , and . We have the following lemma that characterizes the intersection measure of these two caps.
Lemma C.1
For any there exists an such that for ,
[TABLE]
and
[TABLE]
Proof:
To prove this lemma, we will first derive the surface area formula for the intersection of the above two caps (see also [24]), and then characterize the exponent of this area.
Deriving the Surface Area Formula: Consider the points such that
[TABLE]
and
[TABLE]
These points satisfy the linear relations
[TABLE]
and
[TABLE]
and therefore all such lie in the unique dimensional subspace defined by
[TABLE]
The angle between the hyperplane and the vector is
[TABLE]
and because and are orthogonal and ,
[TABLE]
The approach will be as follows. Divide the intersection into two parts and that are on either side of the hyperplane . More concretely,
[TABLE]
and
[TABLE]
Each part and can be written as a union of lower dimensional spherical caps. We will find the measure of each part by integrating the measures of these lower dimensional caps.
The measure of the cap can be expressed as the integral
[TABLE]
where is the surface area of the -sphere with radius . If we consider a single -sphere at some angle , then the hyperplane divides that -sphere into two spherical caps. The claim is that each of these dimensional caps that is on the side of with is contained in (and those on the side with are contained in ). Furthermore, all points in are in one of these dimensional caps. The claim follows because
[TABLE]
implies
[TABLE]
and since and \cos\big{(}\angle({\bf v},{\mathbf{v}_{2}})\big{)}\geq\cos\theta_{2}, this implies
[TABLE]
Finally, this implies , , and .
Note that for , the -sphere at angle is entirely on the side of , and does not need to be included when computing the measure of . This establishes the fact that
[TABLE]
where is the surface area of an dimensional spherical cap defined by angle on the -sphere of radius . Writing
[TABLE]
note that is the distance from the center of the -sphere at angle to the dimensional hyperplane that divides the sphere into two caps. Furthermore, since the -sphere has center , we have
[TABLE]
Therefore,
[TABLE]
Combining this with the corresponding result for yields
[TABLE]
This expression can be rewritten using known expressions for the area of a spherical cap in terms of the regularized incomplete beta function as
[TABLE]
where is defined as
[TABLE]
and is defined similarly. Here in (67), is the regularized incomplete beta function, given by
[TABLE]
where and are the incomplete beta function and the complete beta function respectively:
[TABLE]
Characterizing the Exponent: We now lower and upper bound with exponential functions. First, using Stirling’s approximation, on the R.H.S. of (67) can be bounded as
[TABLE]
for some as .
Now consider
[TABLE]
inside the integral on the R.H.S. of (67). In light of (68), it can be written as
[TABLE]
For the denominator in (70), by Stirling’s approximation, we have
[TABLE]
For the numerator in (70), we have
[TABLE]
for some as , and
[TABLE]
for some as . Also noting that
[TABLE]
with , we can bound the integrand in (67) as
[TABLE]
and
[TABLE]
for some as . For sufficiently large ,
[TABLE]
and
[TABLE]
for some as .
Combining this with (69), we can bound as
[TABLE]
for some as .
Due to symmetry, we can also bound as
[TABLE]
Noting that , we have
[TABLE]
and
[TABLE]
for some as . This completes the proof of the lemma. ∎
We now utilize Lemma C.1 to characterize the volume of the intersection of two shell caps. Consider a spherical shell
[TABLE]
with , and two caps on this shell, i.e. and , where and . The following lemma bounds the intersection volume of these two shell caps.
Lemma C.2
For any there exists an such that for ,
[TABLE]
and
[TABLE]
Proof:
Using spherical coordinates, we have
[TABLE]
where the integral term on the right is bounded as
[TABLE]
Given , set where is given by Lemma C.1 to ensure
[TABLE]
and is chosen to be sufficiently large so that the right-hand side of (C-B) satisfies
[TABLE]
Together with (C-B), this implies
[TABLE]
for .
For the inequality in the other direction, define to be the sphere of radius with Haar measure . Then
[TABLE]
where the integral term on the right is bounded as
[TABLE]
Given , set where is given by Lemma C.1 to ensure
[TABLE]
and is chosen to be sufficiently large so that the right-hand side of (74) satisfies
[TABLE]
Together with (C-B), this implies
[TABLE]
for . ∎
C-C *Volume of the Intersection of Two Balls *
Proof:
The intersection of and consists of two caps: and , as depicted in Fig. 7. To bound the volume of , we will bound and respectively.
We first bound . By the cosine formula, we have
[TABLE]
and therefore
[TABLE]
From Appendix C-A, we have for any and sufficiently large,
[TABLE]
where
[TABLE]
Similarly, we have
[TABLE]
and therefore
[TABLE]
Combining the above, we obtain
[TABLE]
for any and sufficiently large. ∎
Acknowledgement
The authors would like to acknowledge inspiring discussions with Liang-Liang Xie within a preceding collaboration [4]. They would also like to thank the anonymous reviewers and the Associate Editor for many valuable comments that helped improve the presentation of this paper.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] X. Wu, L. Barnes, A. Ozgur, “Cover’s open problem: “The capacity of the relay channel”,” Proc. of 54th Annual Allerton Conference on Communication, Control, and Computing , Allerton Retreat Center, Monticello, Illinois, 2016.
- 2[2] T. M. Cover, “The capacity of the relay channel,” Open Problems in Communication and Computation , edited by T. M. Cover and B. Gopinath, Eds. New York: Springer-Verlag, 1987, pp. 72–73.
- 3[3] X. Wu and A. Ozgur, “Improving on the cut-set bound via geometric analysis of typical sets,” in Proc. of 2016 International Zurich Seminar on Communications .
- 4[4] X. Wu, A. Ozgur, L.-L. Xie, “Improving on the cut-set bound via geometric analysis of typical sets,” IEEE Trans. Inform. Theory , vol. 63, pp. 2254–2277, April 2017.
- 5[5] X. Wu and A. Ozgur, “Cut-set bound is loose for Gaussian relay networks,” in Proc. of 53rd Annual Allerton Conference on Communication, Control, and Computing , Allerton Retreat Center, Monticello, Illinois, Sept. 29–Oct. 1, 2015.
- 6[6] X. Wu and A. Ozgur, “Cut-set bound is loose for Gaussian relay networks,” IEEE Trans. Inform. Theory , vol. 64, pp. 1023–1037, February 2018.
- 7[7] X. Wu and A. Ozgur, “Improving on the cut-set bound for general primitive relay channels,” in Proc. of IEEE Int. Symposium on Information Theory , Barcelona, Spain, Jul. 2016.
- 8[8] X. Wu, L. Barnes and A. Ozgur, “The geometry of the relay channel,” in Proc. of IEEE Int. Symposium on Information Theory , Aachen, Germany, June 2017.
