Coherent multiple-antenna block-fading channels at finite blocklength
Austin Collins, Yury Polyanskiy

TL;DR
This paper derives finite blocklength limits for multi-antenna block-fading channels, revealing how antenna configuration impacts coding delay and highlighting the importance of orthogonal designs like Alamouti's scheme for optimal coding.
Contribution
It provides a formula for channel dispersion in multi-antenna block-fading channels and uncovers the significance of orthogonal designs in achieving dispersion-optimal coding schemes.
Findings
Capacity equivalence for $n_t\times n_r$ and $n_r \times n_t$ configurations at fixed SNR
Coding delay varies significantly with antenna configuration, e.g., 60% difference at 20 dB SNR
Orthogonal designs like Alamouti's scheme are dispersion-optimal for MISO channels.
Abstract
In this paper we consider a channel model that is often used to describe the mobile wireless scenario: multiple-antenna additive white Gaussian noise channels subject to random (fading) gain with full channel state information at the receiver. Dynamics of the fading process are approximated by a piecewise-constant process (frequency non-selective isotropic block fading). This work addresses the finite blocklength fundamental limits of this channel model. Specifically, we give a formula for the channel dispersion -- a quantity governing the delay required to achieve capacity. Multiplicative nature of the fading disturbance leads to a number of interesting technical difficulties that required us to enhance traditional methods for finding channel dispersion. Alas, one difficulty remains: the converse (impossibility) part of our result holds under an extra constraint on the growth of the…
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 2 | 8 | 16 | 18 | 24 | 26 | 32 | ||
| 3 | 36 | [39,45] | [46,54] | [57,63] | 72 | |||
| 4 | 64 | [68,80] | [80,96] | [100,112] | 128 | |||
| 5 | [89,125] | [118,150] | [155,175] | 200 | ||||
| 6 | [168,216] | [222,252] | 288 | |||||
| 7 | [301,343] | 392 | ||||||
| 8 | 512 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Coherent multiple-antenna block-fading channels at finite blocklength
Austin Collins and Yury Polyanskiy Authors are with the Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA 02139 USA. e-mail: {austinc,yp}@mit.edu. This material is based upon work supported by the National Science Foundation CAREER award under grant agreement CCF-12-53205, by the NSF grant CCF-17-17842 and by the Center for Science of Information (CSoI), an NSF Science and Technology Center, under grant agreement CCF-09-39370.
Abstract
In this paper we consider a channel model that is often used to describe the mobile wireless scenario: multiple-antenna additive white Gaussian noise channels subject to random (fading) gain with full channel state information at the receiver. Dynamics of the fading process are approximated by a piecewise-constant process (frequency non-selective isotropic block fading). This work addresses the finite blocklength fundamental limits of this channel model. Specifically, we give a formula for the channel dispersion – a quantity governing the delay required to achieve capacity. The multiplicative nature of the fading disturbance leads to a number of interesting technical difficulties that required us to enhance traditional methods for finding the channel dispersion. Alas, one difficulty remains: the converse (impossibility) part of our result holds under an extra constraint on the growth of the peak-power with blocklength.
Our results demonstrate, for example, that while capacities of and antenna configurations coincide (under fixed received power), the coding delay can be sensitive to this switch. For example, at the received SNR of dB the system achieves capacity with codes of length (delay) which is only of the length required for the system. Another interesting implication is that for the MISO channel, the dispersion-optimal coding schemes require employing orthogonal designs such as Alamouti’s scheme – a surprising observation considering the fact that Alamouti’s scheme was designed for reducing demodulation errors, not improving coding rate. Finding these dispersion-optimal coding schemes naturally gives a criteria for producing orthogonal design-like inputs in dimensions where orthogonal designs do not exist.
I Introduction
Given a noisy communication channel, the maximal cardinality of a codebook of blocklength which can be decoded with block error probability no greater than is denoted as . The evaluation of this function – the fundamental performance limit of block coding – is alas computationally impossible for most channels of interest. As a resolution of this difficulty [1] proposed a closed-form normal approximation, based on the asymptotic expansion:
[TABLE]
where the capacity and dispersion are two intrinsic characteristics of the channel and is the inverse of the -function111As usual, . One immediate consequence of the normal approximation is an estimate for the minimal blocklength (delay) required to achieve a given fraction of the channel capacity:
[TABLE]
Asymptotic expansions such as (1) are rooted in the central-limit theorem and have been known classically for discrete memoryless channels [2, 3] and later extended in a wide variety of directions; see the surveys in [4, 5].
The fading channel is the centerpiece of the theory and practice of wireless communication, and hence there are many slightly different variations of the model: differing assumptions on the dynamics and distribution of the fading process, antenna configurations, and channel state knowledge. The capacity of the fading channel was found independently by Telatar [6] and Foschini and Gans [7] for the case of Rayleigh fading and channel state information available at the receiver only (CSIR) and at both the transmitter and receiver (CSIRT). Motivated by the linear gains promised by capacity results, space time codes were introduced to exploit multiple antennas, most notable amongst them is Alamouti’s ingenious orthogonal scheme [8] along with a generalization of Tarokh, Jafarkhani and Calderbank [9]. Motivated by a recent surge of orthogonal frequency division (OFDM) technology, this paper focuses on an isotropic channel gain distribution, which is piecewise independent (“block-fading”) and assume full channel state information available at the receiver (CSIR). This work describes finite blocklength effects incurred by the fading on the fundamental communication limits.
Some of the prior work on similar questions is as follows. Single antenna channel dispersion was computed in [10] for a more general stationary channel gain process with memory. In [11] finite-blocklength effects are explored for the non-coherent block fading setup. Quasi-static fading channels in the general MIMO setting have been thoroughly investigated in [12], showing that the expansion (1) changes dramatically (in particular the channel dispersion term becomes zero); see also [13] for evaluation of the bounds. Coherent quasi-static channel has been studied in the limit of infinitely many antennas in [14] appealing to concentration properties of random matrices. Dispersion for lattices (infinite constellations) in fading channels has been investigated in a sequence of works, see [15] and references. Note also that there are some very fine differences between stationary and block-fading channel models, cf. [16, Section 4]. The minimum energy to send bits over a MIMO channel for both the coherent and non-coherent case was studied in [17], showing the latter requires orders of magnitude larger latencies. [18] investigates the problem of power control with an average power constraint on the codebook in the quasi-static fading channel with perfect CSIRT. A novel achievability bound was found and evaluated for the fading channel with CSIR in [19]. Parts of this work have previously appeared in [20, 21].
The paper is organized as follows. In Section II we describe the channel model and state all our main results formally. Section III characterizes capacity achieving input/output distributions (caid/caod, resp.) and evaluates moments of the information density. Then in Sections IV and V we prove the achievability and converse parts of our (non rank-1) results, respectively. Section VI focuses on the special case of when the matrix of channel gains has rank 1. Finally, Section VII contains a discussion of numerical results and the behavior of channel dispersion as a function of the number of antennas.
The numerical software used to compute the achievability bounds, dispersion and normal approximation in this work can be found online under the Spectre project [22].
II Main Results
II-A Channel Model
The channel model considered in this paper is the frequency-nonselective coherent real block fading (BF) discrete-time channel with multiple transmit and receive antennas (MIMO) (See [23, Section II] for background on this model). We will simply refer to it as the MIMO-BF channel, which we formally define here. Given parameters as follows: let be the number of transmit antennas, be the number of receive antennas, and be the coherence time of the channel. The input-output relation at block (spanning time instants to ) with is given by
[TABLE]
where is a matrix-valued random fading process, is a matrix channel input, is a Gaussian random real-valued matrix with independent entries of zero mean and unit variance, and is the matrix-valued channel output. The process is assumed to be i.i.d. with isotropic distribution , i.e. for any orthogonal matrices and , both and are equal in distribution to . We also assume
[TABLE]
to avoid trivialities. Note that due to merging channel inputs at time instants into one matrix-input, the block-fading channel becomes memoryless. We assume coherent demodulation so that the channel state information (CSI) is fully known to the receiver (CSIR).
An code of blocklength , probability of error and power-constraint is a pair of maps: the encoder and the decoder satisfying the probability of error constraint
[TABLE]
on the probability space
[TABLE]
where the message is uniformly distributed on , , is as described in (3), and . In addition the input sequences are required to satisfy the power constraint:
[TABLE]
where is the Frobenius norm of the matrix .
Under the isotropy assumption on , the capacity appearing in (1) of this channel is given by [6]
[TABLE]
where is the capacity of the additive white Gaussian noise (AWGN) channel with SNR , is the minimum of the transmit and receive antennas, and are eigenvalues of . Note that it is common to think that as the expression (7) scales as , but this is only true if .
The goal of this line of work is to characterize the dispersion of the present channel. Since the channel is memoryless it is natural to expect, given the results in [1, 10], that dispersion (for ) is given by
[TABLE]
where we denoted (single -block) information density by
[TABLE]
and is the capacity achieving output distribution (caod). Justification of (8) as the actual (operational) dispersion, appearing in the expansion of is by no means trivial and is the subject of this work.
II-B Statement of Main Theorems
Here we formally state the main results, then go into more detail in the following sections. Our first result is an achievability and partial converse bound for the MIMO-BF fading channel for fixed parameters .
Theorem 1**.**
For the MIMO-BF channel, there exists an maximal probability of error code with satisfying
[TABLE]
Furthermore, for any there exists so that every code with extra constraint that , must satisfy
[TABLE]
where the capacity is given by (6) and dispersion by (8).222For the explicit expression for see (III-C) below.
Proof.
This follows from Theorem 16 and Theorem 19 below. ∎
Remark 1**.**
Note that the converse has an extra constraint . Mathematically, this constraint is needed so that the -fold information information density behaves Gaussian-like, via the Berry-Esseen theorem. For example, if had and zeroes in all other coordinates, then one term in the information density would be while the rest would be , and hence no asymptotic structure would emerge. All known bounds to obtain the channel dispersion rely on approximating the information density by a Gaussian, and hence a fundamentally different method of analysis is needed to handle the situation where .
Note that to violate this constraint, a significant portion of the power budget must be poured into a single coherent block, which 1) creates a very large peak-to-average power ratio (PAPR) – an illegal (for regulating bodies) or impractical (for power amplifiers) situation, and 2) does a poor job of exploiting the diversity gain from coding over multiple independent coherent blocks. Therefore, our converse results are sufficient from the point of view of any practical system.
In addition, the random codebook used for the achievability (uniform on the power sphere) can be expurgated with a rate loss of so that it entirely consists of codewords satisfying . This is easiest to see by noticing that a standard Gaussian vector satisfies . This observation shows that our analysis of the random coding bound (with spherical codebook) is tight in terms of the dispersion term.
Remark 2**.**
The remainder term in (11) depends on the system parameters in a complicated way, which we do not attempt to study here.
The behavior of dispersion found in Theorem 1 turns out to depend crucially on whether a.s. or not. When , all capacity achieving input distributions (caids) yield the same conditional variance (8), yet when , the conditional variance varies over the set of caids. The following theorem discusses the case where . In this case, the dispersion (8) can be calculated for the simplest Telatar caid (i.i.d. Gaussian matrix ). The following theorem gives full details.
Theorem 2**.**
Assume that , then , where
[TABLE]
where are eigenvalues of , , and
[TABLE]
Proof.
This is proved in Proposition 11 below. ∎
Remark 3**.**
Each of the three terms in (12) is non-negative, see Remark 7 below for more details.
In the case where the fading process has rank 1 (e.g. for MISO systems), there are a multitude of caids, and the minimization problem in (8) is non-trivial. Quite surprisingly, for some values of , we show that the (essentially unique) minimizer is a full-rate orthogonal design. The latter were introduced into the field of communications by Alamouti [8] and Tarokh et al [9]. This shows a somewhat unexpected connection between schemes optimal from modulation-theoretic and information-theoretic points of view. The precise results are as follows.
Theorem 3**.**
When , we have
[TABLE]
where is the non-zero eigenvalues of , and
[TABLE]
Proof.
This is the content of Proposition 12 below.∎
The quantity is defined separately in Theorem 3 because it isolates how the dispersion depends on the input distribution. Unfortunately, is generally unknown, since the maximization in (18) is over a manifold of matrix-valued random variables. However, for many dimensions, the maximum can be found by invoking the Hurwitz-Radon theorem [24]. We state this below to introduce the notation, and expand on it in Section VI.
Theorem 4** (Hurwitz-Radon).**
There exists a family of real matrices satisfying
[TABLE]
if and only if , where
[TABLE]
In particular, and only for .
For a concrete example, note that Alamouti’s scheme is created from a Hurwitz-Radon family for . Indeed, take the matrices
[TABLE]
then Alamouti’s orthogonal design can be formed by taking . It turns out that “maximal” Hurwitz-Radon families give capacity achieving input distributions for the MIMO-BF channel, see Proposition 22 for the details.
The following theorem summarizes our current knowledge of .
Theorem 5**.**
For any pair of positive integers we have
[TABLE]
If or then a full-rate orthogonal design is dispersion-optimal and
[TABLE]
If instead and then for a jointly-Gaussian capacity-achieving input we have333So that in these cases the bound (22) is either non-tight, or is achieved by a non-jointly-Gaussian caid.
[TABLE]
Finally, if and (23) holds, then for any (and similarly with the roles of and switched).
Note that the function is monotonic in even values of (and is for odd), and along even . Therefore, for any number of transmit antennas , there is a large enough such that , in which case an full rate orthogonal design achieves the optimal .
III Preliminary results
The section gives some results that will be useful for the achievability and converse proofs (Theorem 16 and Theorem 19, respectively), along with generally aiding our understanding of the MIMO-BF channel at finite blocklength. The results in this section and where they are used is summarized as follows:
- •
Theorem 6 gives a characterization of the caids for MIMO-BF channel. While all caids give the same capacity (by definition), when the channel matrix is rank 1, they do not all yield the same dispersion. This characterization is needed to reason about the minimizers in (8), especially in the rank 1 case.
- •
Proposition 8 computes variance of information density conditioned on the channel input . A key characteristic of the fading channel is that varies as moves around the input space, which does not happen in DMC’s or the AWGN channel. This variation in poses additional challenges in the converse proof, where we partition the codebook based on thresholding (see the proof of Theorem 19 for details). Knowledge of will also allow us to understand when the information density can be well approximated by a Gaussian (see Lemma 13).
- •
Propositions 11 and 12 explicitly give the expression for the dispersion found from the achievability and converse proofs for the and case, respectively. These expressions show how the dispersion depends on , and are the contents of Theorems 2 and 3 above.
III-A Known results: capacity and capacity achieving output distribution
First we review a few known results on the MIMO-BF channel. Since the channel is memoryless, the capacity is given by
[TABLE]
It was shown by Telatar [6] that whenever distribution of is isotropic, the input with entry given by
[TABLE]
is a maximizer, resulting in the capacity formula (6). The distribution induced by a caid at the channel output is called the capacity achieving output distribution (caod). A classical fact is that, while there may be many caids, the caod is unique, e.g. [25, Section 4.4]. Thus, from (26) we infer that the caod is given by
[TABLE]
, where is -th column of , which, as we specified in (3), is a matrix.
III-B Capacity achieving input distributions
A key feature of the MIMO-BF channel is that it has many caids, whereas many commonly studied channels (e.g. BSC, BEC, AWGN) have a unique caid. Understanding the set of distributions that achieve capacity is essential for reasoning about the minimizer of the condition variance in (8). The following theorem characterizes the set of caids for the MIMO-BF channel. Somewhat surprisingly, for the case of rank-1 (e.g. for MISO) there are multiple non-trivial jointly Gaussian caids with different correlation structures. For example, space-time block codes can achieve the capacity in the rank 1 case, but do not achieve capacity when the rank is 2 or greater e.g. [26].
Theorem 6**.**
Every caid satisfies
[TABLE]
If then condition (30) is also sufficient for to be caid. 2. 2.
Let be decomposed into rows . If is a caid, then each (i.i.d. Gaussian) and
[TABLE]
If is jointly zero-mean Gaussian and , then (31)-(32) are sufficient for to be caid. 3. 3.
Let be decomposed into columns . If is a caid, then each (i.i.d. Gaussian) and
[TABLE]
If is jointly zero-mean Gaussian and , then (33)-(34) are sufficient for to be caid. 4. 4.
When , any caid has pairwise independent rows:
[TABLE]
and in particular
[TABLE]
Therefore, among jointly Gaussian the i.i.d. is the unique caid. 5. 5.
There exist non-Gaussian caids if and only if .
Remark 4**.**
(Special case of rank-1 ) In the MISO case when and (or more generally, a.s.), there is not only a multitude of caids, but in fact they can have non-trivial correlations between entries of (and this is ruled out by (36) for all other cases). As an example, for the case, any of the following random matrix-inputs (parameterized by ) is a Gaussian caid:
[TABLE]
where i.i.d.. In particular, there are caids for which not all entries of are pairwise independent.
Remark 5**.**
Another way to state conditions (31)-(32) is: all elements in a row (resp. column) are pairwise independent and each minor has antipodal correlation for the two diagonals. In particular, if is a caid, then and any submatrix of are caids too (for different and ).
Proof.
We will rely repeatedly on the following observations:
if are two random vectors in then for any we have
[TABLE]
This is easy to show by computing characteristic functions. 2. 2.
If are two random vectors in independent of , then
[TABLE]
This follows from the fact that the characteristic function of is nowhere zero. 3. 3.
For two matrices we have
[TABLE]
This follows from the fact that a quadratic form that is zero everywhere on must have all coefficients equal to zero.
Part 1 (necessity). Recall that the caod is unique and given by (27). Thus an input is a caid iff for -almost every we have
[TABLE]
where is an matrix with i.i.d. entries (for sufficiency, just write with denoting differential entropy). We will argue next that (43) implies (under isotropy assumption on ) that
[TABLE]
From (40), (44) is equivalent to for all .
Let be a -almost sure subset of for which (43) holds. Let denote the group of orthogonal matrices, with the topology inherited from . Let and for be countable dense subsets of and , respectively. (These exist since is a second-countable topological space). By isotropy of we have and therefore
[TABLE]
is also almost sure: , since is the intersection of countably many almost sure sets. Here, denotes the image of under . By assumption (4), must contain a non-zero element , for otherwise we would have , contradicting (4). Consequently, for all , and so for all . Since for , the map is a bijective continuous transformation of , we have that and are also countable dense subsets of and , respectively. From (41) and (43) along with the definition of , we conclude that
[TABLE]
Arguing by continuity and using the density of and , this implies also
[TABLE]
In particular, for any there must exist a choice of such that has the top row equal to for some constant . Choosing these in (46) and comparing distributions of top rows, we conclude (44) after scaling by .
Part 1 (sufficiency). Suppose . Then our goal is to show that (44) implies that is a caid. To that end, it is sufficient to show for all rank-1 . In the special case
[TABLE]
the claim follows directly from (44). Every other rank-1 can be decomposed as for some matrix , and thus again we get , concluding the proof.
Parts 2 and 3 (necessity). From part 1 we have that for every we must have . Computing expected square we get
[TABLE]
Thus, expressing the left-hand side in terms of rows as we get
[TABLE]
and thus by (42) we conclude that for all :
[TABLE]
Each entry of the matrices above is a quadratic form in and thus again by (42) we conclude (31)-(32). Part 3 is argued similarly with roles of and interchanged.
Parts 2 and 3 (sufficiency). When is (at most) rank-1, we have from part 1 that it is sufficient to show that . When is jointly zero-mean Gaussian, we have is zero-mean Gaussian and so we only need to check its second moment satisfies (47). But as we just argued, (47) is equivalent to either (31)-(32) or (33)-(34).
Part 4. As in Part 1, there must exist such that (46) holds and . Thus, by choosing we can diagonalize and thus we conclude any pair of rows must be independent.
Part 5. This part is never used in subsequent parts of the paper, so we only sketch the argument and move the most technical part of the proof to Appendix A. Let . Then arguing as for (46) we conclude that is a caid if and only if for any with we have
[TABLE]
In other words, we have
[TABLE]
If , then rank condition on is not active and hence, we conclude by (40) that . So assume . Note that (48) is equivalent to the condition on characteristic function of as follows:
[TABLE]
It is easy to find polynomial (in ) that vanishes on all matrices of rank (e.g. take the product of all minors). Then Proposition 24 in Appendix A constructs non-Gaussian satisfying (49) and hence (48). ∎
III-C Information density and its moments
In finite blocklength analysis, a key object of study is the information density, along with its first and second moments. In this section we’ll find expressions for these moments, along with showing when the information density is asymptotically normal.
It will be convenient to assume that the matrix is represented as
[TABLE]
where are uniformly distributed on and (which follows from the isotropic assumption on ), respectively,444Recall that is the space of all orthogonal matrices. This space is compact in a natural topology and admits a Haar probability measure. and is the diagonal matrix with diagonal entries . Joint distribution of depends on the fading model. It does not matter for our analysis whether ’s are sorted in some way, or permutation-invariant.
For the MIMO-BF channel, let denote the caod (27). To compute the information density with respect to (for a single -block of symbols) as defined in (9), denote and write an SVD decomposition for matrix as
[TABLE]
where , and is an matrix which is zero except for the diagonal entries, which are equal to . Note that this representation is unique up to permutation of , but the choice of this permutation will not affect any of the expressions below. With this decomposition we have:
[TABLE]
where we denoted by the -th column of , and have set , with representing the -th row of . The definition naturally extends to blocks of length additively:
[TABLE]
We compute the (conditional) mean of information density to get
[TABLE]
where we used the following simple fact:
Lemma 7**.**
Let be uniformly distributed on the unit sphere, and be a fixed matrix, then
[TABLE]
Proof.
Note that by additivity of across columns, it is sufficient to consider the case , for which the statement is clear from symmetry. ∎
Remark 6**.**
A simple consequence of Lemma 7 is , which follows from considering the SVD of .
Proposition 8**.**
Let , then we have
[TABLE]
where the function defined as is given by
[TABLE]
where was defined in (13) and
[TABLE]
Remark 7**.**
Every term in the definition of (except the one with ) is non-negative (for -term, see (90)). The -term will not be important because for inputs satisfying power-constraint with equality it vanishes. Note also that the first term in (65) can alternatively be given as
[TABLE]
Proof.
From (III-C), we have the form of the information density. First note that the information density over channel uses decomposes into a sum of independent terms,
[TABLE]
As such, the variance conditioned on also decomposes as
[TABLE]
from which (56) follows. Because the variance decomposes as a sum in (67), we focus on only computing for a single coherent block. Define
[TABLE]
so that in notation from (III-C). With this, the quantity of interest is
[TABLE]
where (71) follows from the identity
[TABLE]
Below we show that and corresponds to (59), corresponds to (57), corresponds to (58), and corresponds to (60) and (61). We evaluate each term separately.
[TABLE]
where (75) follows from noting that
[TABLE]
Now, since is independent from by the rotational invariance assumption, we have that is independent from , since only depends on through its eigenvalues, see (62). We are only concerned with the expectation over in (74), which reduces to
[TABLE]
giving (75).
Next, in (71) becomes
[TABLE]
For in (71), we obtain
[TABLE]
where
- •
(82) follows from taking the variance over (recall in (III-C)).
- •
(83) follows from Lemma 7 applied to , and adding and subtracting the term
[TABLE]
Continuing with from (71),
[TABLE]
where
- •
(87) follows from taking the expectation over ,
- •
(88) follows from applying the variance identity (72) with respect to and , as well as recalling (63).
We are left to show that the term (88) equals (61). To that end, define
[TABLE]
We will finish the proof by showing
[TABLE]
To that end, we first compute moments of drawn from the Haar measure on the orthogonal group.
Lemma 9**.**
Let be drawn from the Haar measure on , then for all unique,
[TABLE]
Proof of this Lemma is given below.
First, note that the variance does not depend on , since the marginal distribution of each is uniform on the unit sphere. Hence below we only consider . We obtain
[TABLE]
where denotes the -th row of . Now it is a matter counting similar terms.
[TABLE]
where
- •
(100) follows from collecting like terms from the summation in (99).
- •
(101) uses Lemma 9 to compute each expectation.
- •
(102) follows from realizing that
[TABLE]
Plugging this back into (97) yields the variance term,
[TABLE]
Now we compute the covariance term from (90) in a similar way. By symmetry of the columns of , we can consider only the covariance between and , i.e.
[TABLE]
Expanding the expectation, we get
[TABLE]
With this, we obtain from (106),
[TABLE]
where the steps follow just as in the variance computation (100)-(102).
Finally, returning to (90), using the variance (105) and covariance (112), we obtain
[TABLE]
Plugging this into (88) finishes the proof. ∎
Proof of Lemma 9.
We first note that all entries of have identical distribution, since permutations of rows and columns leave the distribution invariant. Because of this, we can WLOG only consider to prove the lemma.
- •
(91) follows immediately from a.s.
- •
Let be any two distinct columns of , then (92) follows from
[TABLE]
- •
For (93) and (96), let and . The following relations between hold,
[TABLE]
and, noticing that multiplication of by the matrix
[TABLE]
where is the identity matrix. This is an orthogonal matrix, so we obtain the relation
[TABLE]
from which we obtain . With this and (116), we obtain
[TABLE]
- •
For (94), take
[TABLE]
Solving for yields (94).
- •
For (96), let denote the first and second column of respectively, and let , then (96) follows from
[TABLE]
Using from (124) and solving for gives (96).
∎
The following propsition gives the value of the conditional variance of the information density when input distribution has i.i.d. entries. This will turn out to be the operational dispersion in the case where .
Proposition 10**.**
Let be i.i.d. with Telatar distribution (26) for each entry. Then
[TABLE]
where is the right-hand side of (12).
Proof.
To show this, we take the expectation of the expression given in Proposition 8 when has i.i.d. entries. The terms (57) and (58) do not depend on , and these give us the first two terms in (12). (59) vanishes immediately, since by the power constraint. It is left to compute the expectation over (60) and (61) from the expression in Proposition 8. Using identities for distributed random variables (namely, , ), we get:
[TABLE]
Hence, the sum of terms in (60) + (61) after taking expectation over yields
[TABLE]
Introducing random variables the expression in the square brackets equals
[TABLE]
At the same time, the third term in expression (12) is
[TABLE]
One easily checks that (135) and (136) are equal. ∎
The next proposition shows that, when the rank of is larger than , the conditional variance in (8) is constant over the set of caids. Thus we can compute the conditional variance for the i.i.d. caid, and conclude that this expression is the minimizer in (8).
Proposition 11**.**
If , then for any caid we have
[TABLE]
In particular, the defined as infimum over all caids (8) satisfies .
Proof.
For any caid the term (59) vanishes. Let be Telatar distributed. To analyze (60) we recall that from (36) we have
[TABLE]
For the term (61) we notice that
[TABLE]
where is the -th row of . By (35) from Theorem 6 we then also have
[TABLE]
To conclude, . ∎
In the case where , it turns out that the conditional variance does vary over the set of caids. The following proposition gives the expression for the conditional variance in this case, as a function of the caid.
Proposition 12**.**
If , then for any capacity achieving input we have
[TABLE]
where are defined in (14)-(15).
Proof.
As in Prop. 10 we need to evaluate the expectation of terms in (59)-(61). Any caid should satisfy and thus the term (59) is zero. The term (60) can be expressed in terms of , but the (61) presents a non-trivial complication due to the presence of , whose expectation is possible (but rather tedious) to compute by invoking properties of caids established in Theorem 6. Instead, we recall that the sum (60)+(61) equals (88). Evaluation of the latter can be simplified in this case due to constraint on the rank of . Overall, we get
[TABLE]
where is from (13). The last term in (140) can be written as
[TABLE]
which follows from the identity for independent . The second term in (141) is easily handled since from Lemma 7 we have . To compute the first term in (141) recall from Theorem 6 that for any fixed unit-norm and caid we must have . Therefore, we have
[TABLE]
Putting everything together we get that (141) equals
[TABLE]
concluding the proof. ∎
The question at hand is: which input distribution that achieves capacity minimizes (137)? Proposition 12 reduces this problem to maximizing over the set of capacity achieving input distributions. This will be analyzed in Section VI.
Finally, the following lemma computes the Berry Esseen constant. This is a technical result that will be needed for both the achievability and converse proofs.
Lemma 13**.**
Fix and let , where are distributed as the output of channel (3) with input . Define the Berry-Esseen ratio
[TABLE]
Then whenever and we have
[TABLE]
where are constants which only depend on channel parameters but not or .
The proof of Lemma 13 can be found in Appendix B.
III-D Hypothesis testing
Many finite blocklength results are derived by considering an optimal hypothesis between appropriate distributions. We define to be the minimum error probability of all statistical tests between distributions and , given that the test chooses when is correct with at least probability . Formally:
[TABLE]
The classical Neyman-Pearson lemma shows that the optimal test achieves
[TABLE]
where denotes the Radon-Nikodym derivative of with respect to , and is chosen to satisfy
[TABLE]
We recall a simple bound on following from the data-processing inequality (see [1, (154)-(156)] or, in different notation, [27, (10.21)]):
[TABLE]
A more precise bound [1, (102)] is
[TABLE]
We will also need to define the performance of composite hypothesis tests. To this end, let and be a random transformation. We define
[TABLE]
We can lower bound the error in a composite hypothesis test by the error in an appropriately chosen binary hypothesis test as follows:
Lemma 14**.**
For any on we have
[TABLE]
Proof.
Let be any test satisfying conditions in the definition (149). We have the chain
[TABLE]
where (151) is from Fubini and (152) from constraints on the test. Thus is also a test satisfying conditions in the definition of . Optimizing over the tests completes the proof. ∎
IV Achievability
In this section, we prove the achievability side of the coding theorem for the MIMO-BF channel. We will rely on the bound [1, Theorem 25], quoted here:
Theorem 15** ( bound).**
Given a channel with input alphabet and output alphabet , for any distribution on , any non-empty set , and such that , there exists and -max code satisfying
[TABLE]
The art of applying this theorem is in choosing and appropriately. The intuition in choosing these is as follows: although we know the distributions in the collection , we do not know which is actually true in the composite, so if is in the “center” of the collection, then the two hypotheses can be difficult to distinguish, making the numerator large. However, for a given , vs may still be easily to distinguish, making the denominator small. The main principle for applying the -bound is thus: Choose and such that vs is easy to distinguish for any given , yet the composite hypothesis is hard to distinguish from a simple one .
The main theorem of this section gives achievable rates for the MIMO-BF channel, as follows:
Theorem 16**.**
Fix an arbitrary caid on and let
[TABLE]
where is introduced in Proposition 8. Then we have
[TABLE]
with given by (6).
Proof.
Let be a small constant (it will be taken to zero at the end). We apply the bound (153) with auxiliary distribution , where is the caod (27), and the set is to be specified shortly. Recall notation , and from (53), (56) and (143). For any such that , we have from [28, Lemma 14],
[TABLE]
where is a constant that only depends on channel parameters. We mention that obtaining (156) from [28, Lemma 14] also requires that be bounded away from zero by a constant, which holds since in the expression for in Proposition 8, the term (58) is strictly positive, term (59) will vanish, and terms (60) and (61) are both non-negative.
Considering (156), our choice of the set should not be surprising:
[TABLE]
where is chosen so that Lemma 13 implies for any . Under this choice from (156), (54) and Lemma 13 we conclude
[TABLE]
where .
To lower bound the numerator we first state two auxiliary lemmas, whose proofs follow. The first, Lemma 17, shows that the output distribution induced by an input distribution that is uniform on the sphere is “similar” (in the sense of divergence) to the -fold product of the caod.
Lemma 17**.**
Fix an arbitrary caid and let have i.i.d. components . Let
[TABLE]
where . Then
[TABLE]
where is the -fold product of the caod (27).
The second, Lemma 18, shows that a uniform distribution on the sphere has nearly all of its mass in as .
Lemma 18**.**
With as in Lemma 17 and set defined as in (157) (with arbitrary and ) we have as ,
[TABLE]
Denote the right-hand side of (160) by and consider the following chain:
[TABLE]
where (161) follows from Lemmas 14 and (147) with as in Lemma 17, (162) is from Lemma 17, (163) is from Lemma 18, and in (164) we introduced a -dependent constant .
Putting (158) and (164) into the -bound we obtain
[TABLE]
Taking and then completes the proof. ∎
Now we prove the two lemmas used in the Theorem.
Proof of Lemma 17.
In the case of no-fading () and SISO, this Lemma follows from [29, Proposition 2]. Here we prove the general case. Let us introduce an auxiliary channel acting on as follows:
[TABLE]
With this notation, consider the following chain:
[TABLE]
where (166) is by clear from (165), (167) follows since is a caid, (168)-(169) are standard identities for divergence, (170) follows since both and are unit-variance Gaussians and , (171) is from Lemma 7 (see Remark 6) and (172) is just algebra along with the assumption that .
It remains to lower bound the expectation . Notice that for any uncorrelated random variables with mean 1 and variance 2 we have
[TABLE]
which follows from for all and simple computations. Next consider the chain:
[TABLE]
where in (176) we used the fact that for any caid, i.i.d. (from Theorem 6) and applied (173) with . Putting together (172) and (176) completes the proof. ∎
Proof of Lemma 18.
Note that since is a sum of i.i.d. random variables, we have almost surely. In addition we have
[TABLE]
where we used the fact (Theorem 6) that ’s entries are Gaussian. Then we have from independence of ’s and Chebyshev’s inequality,
[TABLE]
as . Consequently,
[TABLE]
as .
Next we analyze the behavior of . From Proposition 8 we see that, due to , the term (59) vanishes, while (60) simplifies. Overall, we have
[TABLE]
where we replaced the terms that do not depend on with . Note that the first term in parentheses (premultiplying the sum) converges almost-surely to 1, by the strong law of large numbers. Similarly, the normalized sum converges to the expectation (also by the strong law of large numbers). Overall, applying the SLLN in the limit as , we obtain:
[TABLE]
In particular, . This concludes the proof of . ∎
V Converse
Here we state and prove the converse part of Theorem 1. There are two challenges in proving the converse relative to other finite blocklength proofs. First, behavior of the information density (III-C) varies widely as varies over the power-sphere
[TABLE]
Indeed, when the distribution of information density ceases to be Gaussian. In contrast, the information density for the AWGN channel is constant over .
Second, assuming asymptotic normality, we have for any :
[TABLE]
However, the problem is that is also non-constant. In fact there exists regions of where is abnormally small. Thus we need to also show that no capacity-achieving codebook can live on those abnormal sets.
The main theorem of the section is the following:
Theorem 19**.**
For any there exists such that any -max code with and codewords satisfying has size bounded by
[TABLE]
where and are defined in (7) and (8), respectively.
Proof.
As usual, without loss of generality we may assume that all codewords belong to as defined in (180), see [1, Lemma 39]. The maximal probability of error code size is bounded by a meta-converse theorem [1, Theorem 31], which states that for any code and distribution on the output space of the channel,
[TABLE]
where infimum is taken over all codewords. The main problem is to select appropriately. We do this separately for the two subcodes defined as follows. Fix arbitrary (it will be taken to 0 at the end) and introduce:
[TABLE]
To bound the cardinality of , we select to be the -product of the caod (27), then apply the following estimate from [28, Lemma 14], quoted here: for any we have
[TABLE]
where , and are given by (54), (56) and (143), respectively. We choose and then from Lemma 13 (which relies on the assumption that ) we get that for some constants we have for all :
[TABLE]
From (182) and (185) we therefore obtain
[TABLE]
where as .
Next we proceed to bounding . To that end, we first state two lemmas. Lemma 20 shows that, if in addition to the power constraint , we also required , then the capacity of this variance-constrained channel is strictly less than without the latter constraint.
Lemma 20**.**
Consider the following constrained capacity:
[TABLE]
where is from (8) and is from (57). For any there exists such that .
Remark 8**.**
Curiously, if we used constraint instead of in (187), then the resulting capacity equals regardless of .
The following Lemma shows that, with the appropriate choice of an auxiliary distribution , the expected size of the normalized log likelihood ratio is strictly smaller than capacity, while the variance of that same ratio is upper bounded by a constant (i.e. does not scale with ).
Lemma 21**.**
Define the auxiliary distribution
[TABLE]
where is a constant, is the caod for the MIMO-BF channel, and is the caod for the variance-constrained channel in (187). Let , and . Then there exists constants such that for all ,
[TABLE]
where , is the joint distribution.
Remark 9**.**
The reason we let take on two distributions depending on the value of is because we do not know the form of , hence we do not explicitly know how it depends on . This choice of ensures that expectations involving are finite.
Choose as in Lemma 21, so that the bounds on , from (189), (190) respectively, hold. Applying [28, Lemma 15] with (the statement of this lemma is the contents of (191)), we obtain
[TABLE]
Therefore, from (182) we conclude that for all we have
[TABLE]
Overall, from (186) and (193) we get (due to arbitrariness of ) the statement (181). ∎
Proof of Lemma 20.
Introduce the following set of distributions:
[TABLE]
By Prokhorov’s criterion (e.g. [30, Theorem 5.1], tightness implies relative compactness), the norm constraint implies that this set is relatively compact in the topology of weak convergence. So there must exist a sequence of distributions s.t. and where . By Skorokhod representation [30, Theorem 6.7], we may assume , i.e. there exists random variable that is the pointwise limit of the ’s. Notice that for any continuous bounded function we have
[TABLE]
and therefore . Assume (to arrive at a contradiction) that , then by the golden formula, cf. [25, Theorem 3.3], we have
[TABLE]
where is from (54). Therefore, we have
[TABLE]
From weak lower-semicontinuity of divergence [25, Theorem 3.6] we have . In particular, if we denote to have Telatar distribution (26), we must have
[TABLE]
From Lemma 7 (see Remark 6) we have
[TABLE]
and hence from the independence of from we get
[TABLE]
and similarly for the right-hand side of (198). We conclude that
[TABLE]
Finally, plugging this fact into the expression for in (54) and (196) we obtain
[TABLE]
That is, is a caid. But from Fatou’s lemma we have (recall that since it is a variance)
[TABLE]
where the last step follows from . A caid achieving conditional variance strictly less than contradicts the definition of , cf. (8), as the infimum of over all caids. ∎
Proof of Lemma 21.
First we analyze from (189). Denote
[TABLE]
Here, is the information density given by (III-C), while instead has the caod for the variance-constrainted channel (187) in the denominator. Since takes on one of two distributions based on the value of , conditioning on in two ways yields
[TABLE]
The ’s are i.i.d. according to , so we define . Using capacity saddle point, (203) is bounded by
[TABLE]
where denotes the capacity of the MIMO-BF channel with fading distribribution , and denotes the distribution of conditioned on (similarly, will denote conditioned on ). (205) follows from the fact that the information density, i.e. , is not a function of , hence changing the distribution does not affect the form of . Similarly, using Lemma 20, (204) is bounded by
[TABLE]
where is a positive constant, and denotes the solution to the optimization problem (187) when the fading distribution is . Putting together (205) and (207), we obtain an upper bound on ,
[TABLE]
Note that , so the capacity only depends on through the expectation – the expression inside is not a function of because the i.i.d. Gaussian caid achieves capacity for all isotropic ’s. Hence, by the law of total expectation, (208) simplifies to
[TABLE]
Finally, we can upper bound using Markov’s inequality as
[TABLE]
since . Applying this bound to (209), we obtain
[TABLE]
Defining completes the proof of (189).
Next we analyze from (190). The strategy will be to decompose (190) into two terms depending on the value of , then show that each term is upper bounded by , where are constants not depending on . Finally, we will show that when . To this end,
[TABLE]
where (214) follows from the independence of the terms, and (215) is from the bound . Again we condition on in two ways,
[TABLE]
For the first term, (216), we know the expression for from (III-C), so we simply upper bound . To this end,
[TABLE]
where are non-negative constants, and are functions of only that have bounded moments. This follows from:
- •
Bounding the first term via
[TABLE]
which can be derived from the basic inequality .
- •
Noting that the second term is bounded in , since for all ,
[TABLE]
- •
Noting that all moments of are finite because this is the norm of a standard normal vector.
Therefore, after taking the expectation of (219) and summing over all , we obtain
[TABLE]
for some non-negative constants .
To bound the second term, (217), first we split the logarithm as
[TABLE]
The first term in (225) is simple to handle, since its expression is given by the definition of the channel,
[TABLE]
i.e. we have a constant upper bound. For the second term in (225), notice that that is inducible through channel, i.e. there exists an input distribution such that . Using this fact, we obtain the bound
[TABLE]
where (230) follows from Jensen’s inequality, (231) is from the definition of the channel, and (232) follows from applying the inequality along with , then noting that satisfies . Using this, we can bound the second term in (225) via
[TABLE]
where are non-negative constants which do not depend on , (234) is from the above bound (232), and (236) follows from applying the bound
[TABLE]
Putting together (236) and (228), we obtain an upper bound on (217),
[TABLE]
Now, since by assumption, we can control the quantity via
[TABLE]
where the first inequality follows from the non-negativity of the terms in given in Proposition 8, and the second inequality is from the definition of . Hence the sum of fourth powers of the ’s is on . All together, combining (240) and (223) yields the following bound on ,
[TABLE]
which completes the proof of (190). ∎
VI The rank 1 case
When is rank 1, for example in the MISO case, i.e. , the MIMO-BF channel has multiple input distributions that achieve capacity, as shown in Theorem 6. Theorem 1 proved that the dispersion in the general MIMO-BF channel is given by (8), where we minimize the conditional variance of the information density over the set of caids. In this section, we analyze those minimizers for the rank 1 case, which turns out to be non-trivial.
From Theorem 3, when is rank 1, the conditional variance takes the form
[TABLE]
where are constants that depend on the channel parameters but not the input distribution. From (18), computing requires us to maximize the variance of the squared Frobenius norm of the input distribution over the set of caids. Intuitively, this says that minimizing the dispersion is equivalent to maximizing the amount of correlation amongst the entries of when is jointly Gaussian. In a sense, this asks for the capacity achieving input distribution having the least amount of randomness.
Here we characterize . The manifold of caids is not easy to optimize over, since one must account for all the independence constraints on the rows and columns, the covariance constraints on the minors, positive definite constraints, etc. as described in Theorem 6. Our strategy instead will be to give an upper bound on , then show that for certain pairs , the upper bound is tight. Before stating the main theorem of the section, we review orthogonal designs, which will play a large role in the solution to this problem.
VI-A Orthogonal designs
Definition 1** (Orthogonal Design).**
A real orthogonal design of size is defined to be an matrix with entries given by linear forms in and coefficients in satisfying
[TABLE]
In other words, all columns of have squared Euclidean norm , and all columns are pairwise orthogonal. A common representation for an orthogonal design is the sum where is a collection of real matrices satisfying Hurwitz-Radon conditions (19)-(20). Such collection is called a Hurwitz-Radon family. Theorem 4 shows that the maximal cardinality of a Hurwitz-Radon family is the Hurwitz-Radon number , cf. (21).
The definition of orthogonal designs can be generalized to rectangular matrices [9], as follows:
Definition 2** (Generalized Orthogonal Design).**
A generalized orthogonal design is a matrix with with entries as linear forms of the indeterminates satisfying (246).
The quantity is often called the rate of the generalized orthogonal design. This term is justify by noticing that if represents a number channel uses and represents the number of data symbols, then represents sending data symbols in channel uses. In this work, we are only interested in the case (i.e. ), called full-rate orthogonal designs. Full-rate orthogonal design can be constructed from a Hurwitz-Radon family , each by forming the matrix
[TABLE]
where is the vector of indeterminates. It follows immediately from this construction that (246) is satisfied. Theorem 4 allows us to conclude that a generalized full rate orthogonal design exists if and only if .
The following proposition shows that full rate orthogonal designs correspond to caids in the MIMO-BF channel.
Proposition 22**.**
Take and a maximal Hurwitz-Radon family of matrices (cf. Theorem 4). Let be an i.i.d. row-vector. Then the input distribution
[TABLE]
achieves capacity for any MIMO-BF channel provided .
Proof.
Since is a Hurwitz-Radon family, they satisfy (19)-(20). Form as in (248). Then each row and column is jointly Gaussian, and applying the caid conditions (31) and (32) from Theorem 6 shows,
[TABLE]
Therefore satisfies the caid conditions, and hence achieves capacity. ∎
Remark 10**.**
The above argument implies that if is constructed above, then removing the last row of gives an input distribution that also achieves capacity.
VI-B Proof of theorem 5
Theorem 5 states that for dimensions where orthogonal designs exist, the conditional variance (8) is minimized if and only if the input is constructed from an orthogonal design as in Proposition 22. The approach is first to prove an upper bound on , then show that conditions for tightness of the upper bound correspond to conditions of the Hurwitz-Radon theorem.
We start with a simple lemma, which will be applied with equal to the rows of the capacity achieving input .
Lemma 23**.**
Let and each be i.i.d. random vectors from the same distribution with finite second moment . While and are i.i.d. individually, they may have arbitrary correlation between them. Then
[TABLE]
with equality iff almost surely.
Proof.
Simply use the fact that covariance is a bilinear function, and apply the Cauchy-Schwarz inequality as follows:
[TABLE]
We have equality in Cauchy-Schwarz when and are proportional, and since these sums have the same distribution, the constant of proportionality must be equal to 1, so we have equality in (251) iff almost surely. ∎
Proof of Theorem 5.
First, we rewrite defined in (18) as
[TABLE]
From here, follows from the symmetry to transposition of the caid-conditions on (see Theorem 6) and symmetry to transposition of (256). From now on, without loss of generality we assume .
For the upper bound, since the rows and columns of are i.i.d., we can apply Lemma 23 with and (and hence ) to get
[TABLE]
which together with (256) yields the upper bound (22) (recall that ).
Equation (257) implies that if achieves the bound (22), then removing the last row of achieves (22) as an design. In other words, if (22) is tight for then it is tight for all .
Notice that for any such that any pair , is jointly Gaussian, we have
[TABLE]
where
[TABLE]
Take as constructed in (248). By Proposition 22, is capacity achieving and identity (258) clearly holds. In the representation (248), the matrix contains the correlation coefficients between rows and of , since , so
[TABLE]
Therefore we can represent the sum of squared correlation coefficients as
[TABLE]
Line (264) follows since the ’s are orthogonal by the Hurwitz-Radon condition, so each in the summation in (263). Hence the constructed in (248) achieves the upper bound in (257) and (22).
Next we prove (24). Suppose is a jointly-Gaussian caid saturating the bound (257). From Lemma 23, the condition for equality in (251) implies that for all ,
[TABLE]
where is the -th row of for . In particular, this means that every is a linear function of . Consequently, we may represent in terms of a row-vector as in (248), that is for some matrices . We clearly have
[TABLE]
But then the caid constraints (31)-(32) imply that the matrix in (247) constructed using indeterminates and family satisfies Definition 2. Therefore, from Theorem 4, (see also [31, Proposition 4]), we must have . ∎
Remark 11**.**
In the case it is easy to show that for any non-jointly-Gaussian caid, there exists a jointly-Gaussian caid achieving the same . Indeed, consider (39) with . If this phenomena held in general, we would conclude that (23) holds if and only if or . As a step towards the proof of the latter, we notice that any caid achieving equality in (257) satisfies
[TABLE]
which is equivalent to saying for . The latter follows from applying (265) to rows of , where is an arbitrary orthogonal matrix. Identity (266) could be informally stated as “any caid saturating (257) is a random full-rate orthogonal design”.
In summary, the full-rate orthogonal designs (when those exist) achieve the optimal channel dispersion . Some examples ( are i.i.d. ) for and , respectively, are as follows:
[TABLE]
VI-C Beyond full-rate orthogonal designs
For pairs where , full-rate orthogonal design do not exist. For example , so no full-rate orthogonal design exits for , . Which caids are minimizer for (8) in this case? In general, we do not know the answer and do not even know whether one can restrict the search to jointly-Gaussian caids. But one thing is certain: it is definitely not an i.i.d. Gaussian (Telatar) caid. To show this claim, we will give a method for constructing improved caids.
To that end, suppose that consists of entries , , where . Then we have:
[TABLE]
where is the number of times appears in the description of . By this observation and the remark after Theorem 6 (any submatrix of a caid is also a caid), we can obtain lower bounds on for via the following truncation construction:
Take such that and let be a corresponding full-rate orthogonal design with entries . 2. 2.
Choose an submatrix of maximizing the sum of squares of the number of occurrences of each of , cf. (277).
As an example of this method, by truncating a design (271) we obtain the following and submatrices:
[TABLE]
By independent methods we were able to show that designs (283) are dispersion-optimal out of all jointly Gaussian caids. Note that in these cases (23) does not hold, illustrating (24).
Our current knowledge about is summarized in Table I. The lower bounds for cases not handled by Theorem 5 were computed by truncating the 8x8 orthogonal design [9, (5)]. Based on the evidence from and we conjecture this construction to be optimal.
From the proof of Theorem 5 it is clear that Telatar’s i.i.d. Gaussian is never dispersion optimal, unless or . Indeed, for Telatar’s input unless . Thus embedding even a single Alamouti block into an otherwise i.i.d. matrix strictly improves the sum (256). We note that the value of entering (2) can be quite sensitive to the suboptimal choice of the design. For example, for and estimate (2) shows that one needs
- •
around 600 channel inputs (that is 600/8 blocks) for the optimal orthogonal design, or
- •
around 850 channel inputs for Telatar’s i.i.d. Gaussian design
in order to achieve 90% of capacity. This translates into a 40% longer delay or battery spent in running the decoder.
Thus, curiously even in cases where pure multiplexing (that is maximizing transmission rate) is needed – as is often the case in modern cellular networks – transmit diversity enters the picture by enhancing the finite blocklength fundamental limits. Remember, however, that our discussion pertains only to cases when the transmitter (base-station) is equipped with more antennas than the receiver (user equipment), or when the channel does not have more than one diversity branch.
In cases when full-rate designs do not exist, there have been various suggestions as to what could be the best solution, e.g. [31]. Thus for non full-rate designs the property of minimizing dispersion (such as (283)) could be used for selecting the best design for cases .
VII Discussion
Figure 1 plots the capacity, normal approximation, and achievability bound for the MIMO channel with for the complex case. The details of this computation are given in [19]. The bound was developed by Yang et al [19] and is often more computationally friendly than the bound. This figure illustrates the gap between achievability and the normal approximation, as well as the gap to capacity. For example, at blocklength 400, we can achieve about 88% of capacity, and at blocklength 1000 we can achieve about about 92% of capacity, given dB and tolerating an error probability of .
Figure 2 shows the dependence of the rate on the coherence time for the MIMO channel. The normal approximation for is plotted. From (6) and (12), we know the capacity does not depend on , but the dispersion depends on in an affine relationship. Hence, from the dispersion we see that a larger coherence time reduces the maximum transmission rate when the other channel parameters are held fixed. Intuitively, when the coherence time is lower, we are able to average over independent realizations of the fading coefficients in less channel uses. Note that the CSIR assumption implies that we know the channel coefficient perfectly, which may be unrealistic at short coherence times for a practical channel.
We now ask: how does the dispersion depend on the number of transmit and receive antennas? Figures 3 and 4 depict the normalized dispersion , cf. (2), as a function of the number of antennas. The fading process is chosen to be i.i.d. . Each plot has two curves: one curve with fixed and growing, and the other curve with fixed and growing. In both plots, coherence time is . The difference is that on Fig. 3 the received power is held fixed (at 20 dB, i.e. is chosen so that ), whereas on Fig 4 it is the transmit power that is held fixed (also at , i.e. ). The relation between and is as follows:
[TABLE]
These figures also display the asymptotic limiting values of computed via random-matrix theory:
When is fixed and under fixed received power we have
[TABLE] 2. 2.
When is fixed and under fixed received power we have
[TABLE] 3. 3.
When is fixed and under fixed transmitted power we have
[TABLE] 4. 4.
When is fixed and under fixed transmit power we have
[TABLE]
Note that when the received power is fixed, reciprocity holds: the capacity of the channel is the same as the capacity of the one. Having information about dispersion, we may ask the more refined question: although capacities of the channels are the same, which one has better dispersion (i.e. causes smaller coding latency)?
From approximations (286) and (288), we can see that the channel dispersion is not symmetric in . For example, in the setting of Fig. 3 we see that the delay penalty in the regime is of the penalty in the regime. Hence, in a two user channel, if user 1 has antennas and user 2 has antennas, then the asymptotic analysis suggest that channel from user 1 to user 2 can support higher rates than the channel from user 2 to user 1 at finite blocklength.
Figure 4 shows the scenario where the transmit power is fixed. In this case, the capacity approaches a finite limit when is held fixed and , but grows logarithmically when is fixed and , as shown in equations (289) and (291). In this setting, the normalized dispersion approaches a finite limit when is fixed and , yet it vanishes when is fixed and . Consequently in this regime, we can always choose the number of receive antennas large enough so that our system can achieve a given fraction of capacity using blocklength . The normalized dispersion in this case is proportional to .
Appendix A Existence of non-Gaussian caids
Proposition 24**.**
Let be such that a) and b) there exists a non-zero polynomial in variables with real coefficients vanishing on . Then there exists a random variable taking values in with the property that its characteristic function satisfies
[TABLE]
but there exist a such that (i.e. ).
Remark 12**.**
The simplest application of this proposition is the following. Suppose that three random vectors in have the property that projection onto any (2-dimensional) plane has the joint distribution . Does it imply that the joint distribution of them is ? Note that it is easy to argue that joint distribution of any pair of them is indeed and thus the only jointly Gaussian distribution that satisfies the requirements is indeed the i.i.d. triplet. However, the above proposition shows that the general answer is still negative. Here is a subset of all with determinant zero.
Proof.
We will slightly extend the argument of [32]. We will assume familiarity with basic commutatitive algebra on the level of [33]. Consider an identity expressing the well-known computation of the Gaussian characteristic function:
[TABLE]
Setting , changing sign of we get
[TABLE]
Differentiating this in and setting we get
[TABLE]
where is some polynomial of degree with real coefficients (and involving only even powers of ). For later convenience, we also interchange and to get
[TABLE]
(Identity (293) also follows from the fact that Hermite polynomials times Gaussian density are eigenfunctions of the Fourier transform.)
Next, suppose that there is a polynomial such that vanishes on and each monomial in has all even. Then, define the characteristic function
[TABLE]
We will argue that for sufficiently small, is a characteristic function of some (obviously non-Gaussian) probability density function on . By taking the inverse Fourier transform we get that
[TABLE]
where is the inverse Fourier transform of the second term in (294). Since is even in each , we conclude that is real. Since (recall that ) we have , and thus . So to prove that is a valid density function for small we only need to show that
[TABLE]
To that end, notice that applying (293) to each monomial we get
[TABLE]
Multiplying the right-hand side by we conclude that contribution of each monomial of to is bounded by
[TABLE]
Since there are finitely many monomials in , the proof of (295) and of validity of is done.
We are left to argue that there must necessarily exist polynomial with required properties. By assumption there exist some other polynomial vanishing on . Consider an inclusion of rings
[TABLE]
where denotes the ring of polynomials with variables and coefficients in , and denotes an inclusion map. This morphism of rings is obviously finite. Consider ideal of and denote as usual by its contraction. We argue that . Assume otherwise, then we have and (since as is an integral domain). Take all minimal primes of , call these , then the radical of is the intersection of all prime ideals that contain it, i.e. . Then, denoting we get that in . By “prime-avoidance”, cf. [33, Prop. 1.11], we know implies that for some , hence is the zero ideal for some . This contradicts the “going-up theorem”, cf. [33, Corollary 5.9], so we must have , and hence we may take as an arbitrary non-zero element of . ∎
Appendix B Analysis of the Berry-Esseen constant
Proof of Lemma 13.
We begin with upper bounding the numerator in (143), i.e.
[TABLE]
The information density is given by
[TABLE]
where
[TABLE]
Define under the distribution . (298) reduces to
[TABLE]
where the scalar random variable
[TABLE]
is the sum of all the terms that do not depend on . Note that
[TABLE]
Therefore, the “centered” information density is
[TABLE]
where
[TABLE]
Hence we can upper bound the centered third moment as
[TABLE]
We now proceed to upper bound each term individually. First ,
[TABLE]
where
- •
(313) follows since is PSD, and is also PSD as a non-negative combination of PSD matrices, so that both and are non-negative
- •
(314) follows since where
[TABLE]
and in the PSD ordering, so
[TABLE]
and
[TABLE]
Now we bound from (310),
[TABLE]
where
- •
In (321), define and expand the trace.
- •
(322) follows from the triangle inequality, along with .
- •
(323) we have used for along with the bound
[TABLE]
Now notice that
[TABLE]
which can be viewed as the norm inequality for . Finally, we use for any orthogonal matrix .
For the denominator in (143), the expression for is given in (57)-(61). Note that the final term (61) is non-negative, so we have the lower bound
[TABLE]
where
[TABLE]
Hence whenever . Note that we use the assumption freely here, as stated before. The lower bound on the variance (327), we obtain the upper bound
[TABLE]
where all constants are non-negative. There are two cases based on which term achieves the max in the dominator. First, suppose
[TABLE]
Expanding the square yields
[TABLE]
Thus the terms in the numerator are bounded by
[TABLE]
where (333) uses the assumption . Applying this to in (330), we see that in this case,
[TABLE]
where the constant are non-negative constants.
Now take the case when
[TABLE]
Note that since , in the case we must also have for the above inequality to hold. Let be defined as follows
[TABLE]
Here since . Applying (336) yields
[TABLE]
With this, from (330) we obtain the following upper bound
[TABLE]
where (341) uses (339). Now, we can upper bound each term in (341) as
[TABLE]
where in (344) we have used (easily obtained from p-norm inequalities), and both (342) and (346) use the assumption . Using these bounds in (341), we obtain
[TABLE]
where are non-negative constants.
From (335) and (347), we conclude that
[TABLE]
∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Y. Polyanskiy, H. V. Poor, and S. Verdú, “Channel coding rate in the finite blocklength regime,” IEEE Trans. Inf. Theory , vol. 56, no. 5, pp. 2307–2359, May 2010.
- 2[2] R. L. Dobrushin, “Mathematical problems in the Shannon theory of optimal coding of information,” in Proc. 4th Berkeley Symp. Mathematics, Statistics, and Probability , vol. 1, Berkeley, CA, USA, 1961, pp. 211–252.
- 3[3] V. Strassen, “Asymptotische Abschätzungen in Shannon’s Informationstheorie,” in Trans. 3d Prague Conf. Inf. Theory , Prague, 1962, pp. 689–723.
- 4[4] Y. Polyanskiy and S. Verdú, “Finite blocklength methods in information theory (tutorial),” in 2013 IEEE Int. Symp. Inf. Theory (ISIT) , Istanbul, Turkey, Jul. 2013. [Online]. Available: http://people.lids.mit.edu/yp/homepage/data/ISIT 13_tutorial.pdf
- 5[5] V. Y. Tan et al. , “Asymptotic estimates in information theory with non-vanishing error probabilities,” Foundations and Trends® in Communications and Information Theory , vol. 11, no. 1-2, pp. 1–184, 2014.
- 6[6] E. Telatar, “Capacity of multi-antenna Gaussian channels,” European Trans. Telecom. , vol. 10, no. 6, pp. 585–595, 1999.
- 7[7] G. J. Foschini and M. J. Gans, “On limits of wireless communications in a fading environment when using multiple antennas,” Wireless personal communications , vol. 6, no. 3, pp. 311–335, 1998.
- 8[8] S. M. Alamouti, “A simple transmit diversity technique for wireless communications,” Selected Areas in Communications, IEEE Journal on , vol. 16, no. 8, pp. 1451–1458, 1998.
