A Tight Upper Bound on the Second-Order Coding Rate of the Parallel Gaussian Channel with Feedback
Silas L. Fong, Vincent Y. F. Tan

TL;DR
This paper provides a precise upper bound on the second-order coding rate for parallel Gaussian channels with feedback, demonstrating that feedback does not enhance the second-order asymptotics under certain constraints.
Contribution
It offers a self-contained proof of the second-order upper bound for the channel, using advanced probabilistic techniques to show feedback does not improve second-order performance.
Findings
Feedback does not improve the second-order asymptotics.
The proof employs an information spectrum bound and Curtiss' theorem.
The results match existing achievability bounds, confirming the second-order limit.
Abstract
This paper investigates the asymptotic expansion for the maximum rate of fixed-length codes over a parallel Gaussian channel with feedback under the following setting: A peak power constraint is imposed on every transmitted codeword, and the average error probability of decoding the transmitted message is non-vanishing as the blocklength increases. It is well known that the presence of feedback does not increase the first-order asymptotics of the channel, i.e., capacity, in the asymptotic expansion, and the closed-form expression of the capacity can be obtained by the well-known water-filling algorithm. The main contribution of this paper is a self-contained proof of an upper bound on the second-order asymptotics of the parallel Gaussian channel with feedback. The proof techniques involve developing an information spectrum bound followed by using Curtiss' theorem to show that a sum of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWireless Communication Security Techniques · Cooperative Communication and Network Coding · DNA and Biological Computing
A Tight Upper Bound on the Second-Order Coding Rate of the Parallel Gaussian Channel with Feedback
Silas L. Fong and Vincent Y. F. Tan S. L. Fong and V. Y. F. Tan were supported by NUS Young Investigator Award under Grant R-263-000-B37-133. S. L. Fong is with the Department of Electrical and Computer Engineering, NUS, Singapore 117583 (e-mail: [email protected]).V. Y. F. Tan is with the Department of Electrical and Computer Engineering, NUS, Singapore 117583, and also with the Department of Mathematics, NUS, Singapore 119076 (e-mail: [email protected]).
Abstract
This paper investigates the asymptotic expansion for the maximum rate of fixed-length codes over a parallel Gaussian channel with feedback under the following setting: A peak power constraint is imposed on every transmitted codeword, and the average error probabilities of decoding the transmitted message are non-vanishing as the blocklength increases. The main contribution of this paper is a self-contained proof of an upper bound on the first- and second-order asymptotics of the parallel Gaussian channel with feedback. The proof techniques involve developing an information spectrum bound followed by using Curtiss’ theorem to show that a sum of dependent random variables associated with the information spectrum bound converges in distribution to a sum of independent random variables, thus facilitating the use of the usual central limit theorem. Combined with existing achievability results, our result implies that the presence of feedback does not improve the first- and second-order asymptotics.
Index Terms:
Curtiss’ theorem, feedback, fixed-length codes, parallel Gaussian channel, second-order asymptotics
I Introduction
This paper considers a point-to-point communication scenario where a source wants to transmit a message to a destination through a set of independent additive white Gaussian noise (AWGN) channels. The set of independent AWGN channels is referred to as the parallel Gaussian channel [1, Sec. 9.4] (also called the Gaussian product channel in [2, Sec. 3.4.3]). The parallel Gaussian channel has been used to model the multiple-input multiple-output (MIMO) channel [3, Sec. 7.1] — an essential channel model in wireless communications. Suppose the parallel Gaussian channel consists of independent AWGN channels, and let be the index set of the channels. For the channel use, the relation for the channel between the input signal and output signal is
[TABLE]
where are independent Gaussian noises. For each , the variance of the noise induced by the channel is assumed to be some positive number for all channel uses, i.e., for all . To keep notation compact, let , and denote the random column vectors , and respectively. Then, the channel law (1) can be written as
[TABLE]
Throughout this paper, we consider fixed-length codes over the parallel Gaussian channel, where the block length is denoted by unless specified otherwise. Every codeword transmitted by the source over channel uses is subject to the following peak power constraint where denotes the permissible power for :
[TABLE]
If we would like to transmit a uniformly distributed message over this channel where the error probabilities are required to vanish as the blocklength approaches infinity, it was shown by Shannon [4] that the maximum rate of communication converges to a certain limit called capacity. The closed-form expression of the capacity can be obtained by finding the optimal power allocation among the channels, which is described as follows. Define the mapping as
[TABLE]
where can be viewed as the power allocated to channel . If we let , , , , denote the real numbers yielded from the water-filling algorithm [1, Ch 9.4] where
[TABLE]
and
[TABLE]
for each and let
[TABLE]
be the optimal power allocation vector, then the capacity of the parallel Gaussian channel was shown in [4] to be bits per channel use. More specifically, if denotes the maximum number of messages that can be transmitted over channel uses with permissible power and average error probability , one has
[TABLE]
The capacity result (8) has been strengthened by Polyanskiy-Poor-Verdú [5, Th. 78] and Tan-Tomamichel [6, Appendix A] for each as
[TABLE]
where is the Gaussian dispersion function defined as
[TABLE]
and is the cumulative distribution function (cdf) of the standard normal distribution.
Feedback, which is the focus of the current paper, can simplify coding schemes and improve the performance of communication systems in many scenarios. See [2, Ch. 17] for a thorough discussion on the benefits of feedback in single- and multi-user information theory. When feedback is allowed, each input symbol depends on not only the transmitted message but also all the previous channel outputs up to the channel use, i.e., the symbols . In the presence of noiseless feedback, let denote the maximum number of messages that can be transmitted over channel uses with permissible power and average error probability . It was shown by Shannon [7] that the presence of noiseless feedback does not increase the capacity of point-to-point memoryless channels, which together with (8) implies that
[TABLE]
In view of (9), we conclude that
[TABLE]
In this paper, the main contribution is a conceptually simple, concise and self-contained proof that in the presence of feedback, the first- and second-order terms in the asymptotic expansion in (9) remains unchanged, i.e.,
[TABLE]
I-A Related Work
Our work is inspired by the recent study of the fundamental limits of communication over discrete memoryless channels (DMCs) with feedback [8]. It was shown by Altuğ and Wagner [8, Th. 1] that for some classes of DMCs whose capacity-achieving input distributions are not unique (in particular, the minimum and maximum conditional information variances differ), coding schemes with feedback achieve a better second-order asymptotics compared to those without feedback. They also showed [8, Th. 2] that feedback does not improve the second-order asymptotics of DMCs if the conditional variance of the log-likelihood ratio , where is the unique capacity-achieving output distribution, does not depend on the input . Such DMCs include the class of weakly-input symmetric DMCs initially studied by Polyanskiy-Poor-Verdú [9].
However, we note that the proof technique used by Altuğ and Wagner requires the use of a Berry-Esséen-type result for bounded martingale difference sequences [10], and their technique cannot be extended to the parallel Gaussian channel with feedback because each input symbol belongs to an interval that grows without bound as increases. Instead, our proof uses Curtiss’ theorem to show that a sum of dependent random variables that naturally appears in the non-asymptotic analysis converges in distribution to a sum of independent random variables, thus facilitating the use of the usual central limit theorem [11].
For , the parallel Gaussian channel with feedback reduces to the AWGN channel with feedback, whose second-order coding rate is identical to the same channel without feedback by the following symmetry argument: The log-likelihood ratios for all on the power sphere with radius are the same. See [12] for a rigorous but simple proof. In contrast, for , this symmetry argument no longer holds due to the flexible power allocation among the channels, and hence the simple proof suggested in [12] cannot be extended to the parallel Gaussian channel with feedback.
If the peak power constraint in (3) is replaced with the expected power constraint , the first-order coding rate of the AWGN channel with feedback is improved from to [13, Sec. II] (the exact improvement holds for the non-feedback case as well [5, Sec. 4.3.3]) where denotes the tolerable error probability. For the general case , the proof in [13, Sec. II] can be easily extended to show that the first-order coding rate of the parallel Gaussian channel with feedback can be improved from to , and hence (13) no longer holds.
I-B Paper Outline
This paper is organized as follows. The next subsection summarizes the notation used in this paper. Section II provides the problem setup of the parallel Gaussian channel with feedback under the peak power constraint and presents our main theorem. Section III contains the preliminaries required for the proof of our main theorem. The preliminaries include the following: (i) Important properties of non-asymptotic binary hypothesis testing quantities; (ii) Modification of power allocation among the parallel channels; (iii) Curtiss’ theorem. Section IV presents the proof of our main theorem. Section V concludes this paper by explaining the novel ingredients in the proof of the main theorem and the major difficulty in strengthening the main theorem.
I-C Notation
The sets of natural numbers, non-negative integers, real numbers and non-negative real numbers are denoted by , , and respectively. An -dimensional column vector is denoted by where denote the element of . The Euclidean norm of a vector is denoted by . We will take all logarithms to base throughout this paper.
We use to represent the probability of an event , and we let be the indicator function of . Every random variable is denoted by a capital letter (e.g., ), and the realization and the alphabet of the random variable are denoted by the corresponding small letter (e.g., ) and calligraphic letter (e.g., ) respectively. We use to denote a random tuple , where all the elements have the same alphabet . We let be the probability distribution of a random variable . More specifically, is the Radon-Nikodym derivative of a measure with respect to the Lebesgue measure in an appropriate Euclidean space. We let denote the conditional probability distribution of given for any random variables and . We let denote the joint distribution of , i.e., for all and . For any random variable and any real-valued function whose domain includes , we let denote for any real constant where The expectation and the variance of are denoted as and respectively. For simplicity, we drop the subscript of a notation if there is no ambiguity. For any real-valued Gaussian random variable whose mean and variance are and respectively, we let
[TABLE]
be the corresponding probability density function.
II Parallel Gaussian Channel with Feedback
Let and denote the source and the destination respectively. Suppose node transmits a message to node over channel uses through the independent AWGN channels. Before any transmission begins, node chooses message destined for node where is uniformly distributed on the message alphabet
[TABLE]
whose size is denoted by . For the channel use, node transmits and the corresponding channel output satisfies (2). We assume that a noiseless feedback link from the destination node to the source node exists so that is available for encoding for each . In addition, the codewords transmitted by is subject to the peak power constraint (3). Upon receiving , node declares to be the transmitted message.
Definition 1
An -feedback code consists of the following:
A message set at node as defined in (15). Message is uniform on . 2. 2.
An encoding function
[TABLE]
for each and each , where is the encoding function at node for encoding such that
[TABLE]
and the peak power constraint (3) holds. 3. 3.
A decoding function
[TABLE]
where is the decoding function for at node such that
[TABLE]
Definition 2
Let and denote the random vectors and respectively, and let and be their realizations respectively. The parallel Gaussian channel with feedback is characterized by the conditional probability density distribution satisfying
[TABLE]
such that the following holds for any -feedback code: For each ,
[TABLE]
where
[TABLE]
*for all . *
For any -feedback code, let be the joint distribution induced by the code. We can use Definition 1, (17) and (18) to factorize as follows:
[TABLE]
Definition 3
For an -feedback code, we can calculate according to (19) the average probability of decoding error defined as {\mathbb{P}}\big{\{}\hat{W}\neq W\big{\}}. We call an -feedback code with average probability of decoding error no larger than an -feedback code.
Define
[TABLE]
Definition 4
Let . The -capacity of the parallel Gaussian channel with feedback, denoted by , is defined to be
[TABLE]
The capacity is defined to be
[TABLE]
Definition 5
Let . The -second-order coding rate of the parallel Gaussian channel with feedback, denoted by , is defined to be
[TABLE]
Recall how and are determined through (4), (5), (6), (7) and (10). Since the capacity of the parallel Gaussian channel without feedback is (see, e.g., [4] and [2, Sec. 3.4.3]) and an introduction of an extra noiseless feedback link does not increase the capacity (see, e.g., [7] and [1, Sec. 9.6]), it follows that
[TABLE]
Before stating our main result, recall that is the cdf of the standard normal distribution. Since is strictly increasing on , the inverse of is well-defined and is denoted by . The following theorem is the main result in this paper.
Theorem 1
Fix an . Then,
[TABLE]
and the -second-order coding rate satisfies
[TABLE]
Combining (9) and Theorem 1, we complete the characterizations of the first- and second-order asymptotics of the parallel Gaussian channel with feedback as shown in (13).
III Preliminaries for the Proof of Theorem 1
III-A Binary Hypothesis Testing
The following definition concerning the non-asymptotic fundamental limits of a simple binary hypothesis test is standard. See for example [5, Section 2.3].
Definition 6
Let and be two probability distributions on some common alphabet . Let
[TABLE]
be the set of randomized binary hypothesis tests between and where indicates the test chooses , and let be a real number. The minimum type-II error in a simple binary hypothesis test between and with type-I error less than is defined as
[TABLE]
The existence of a minimizing test is guaranteed by the Neyman-Pearson lemma.
We state in the following lemma and proposition some important properties of , which are crucial for the proof of Theorem 1. The proof of the following lemma can be found in, for example, [14, Lemma 1].
Lemma 1
Let and be two probability distributions on some , and let be a function whose domain contains . Then, the following two statements hold:
(Data processing inequality (DPI)) . 2. 2.
For all , .
The proof of the following proposition can be found in [14, Lemma 3] (see also [15, Th. 27]).
Proposition 2
Let be a probability distribution defined on for some finite alphabet , and let be the marginal distribution of . In addition, let be a distribution defined on . Suppose is the uniform distribution, and let
[TABLE]
be a real number in where is distributed according to . Then,
[TABLE]
III-B Modification of Power Allocation among the Parallel Channels
For each transmitted codeword , we can view as the power allocated to the channel for each . In the proof of Theorem 1, an early step is to discretize the power allocated to the channels. To this end, we need the following definition which defines the power allocation vector of a sequence .
Definition 7
The power allocation mapping is defined as
[TABLE]
We call the power type of .
The proof of Theorem 1 involves modifying a given length- code so that useful bounds on the performance of the given code can be obtained by analyzing the modified code. More specifically, the encoding functions the given code are modified so that the power type of the random codeword generated by the modified code always falls into some small bounding box. The specific modification of the encoding functions is described in the following definition.
Definition 8
*Given an -feedback code, let , and be the corresponding message alphabet, encoding functions and decoding function respectively. In addition, let and such that . Then, the -modified code based on the -feedback code consists of the following message alphabet, encoding functions and decoding function which are denoted by , and respectively:
1) A message set at node . Message is uniform on .
2) An encoding function*
[TABLE]
for each and each , which is defined as follows. For each and each , define in a recursive manner in this order as follows: For each , define recursively for as
[TABLE]
It follows from (27) that
[TABLE]
and
[TABLE]
In addition, in view of (28), we define recursively for as follows:
[TABLE]
Combining (27) and (30), we conclude that
[TABLE]
On the other hand, it follows from (29), (30), the fact and the assumption that
[TABLE]
Finally, in view of (32), we define as
[TABLE]
Combining (31), (33) and the assumption that , we have
[TABLE]
3) A decoding function
[TABLE]
for providing an estimate of at node .
Remark 1
The basic idea behind transforming a code in Definition 8 is simple. Suppose we are given an -feedback code, a and an such that . Then, the -modified code is formed by
- (i)
truncating a transmitted codeword if the power transmitted over the channel exceeds , which can be seen from (27) and the third clause of (30); 2. (ii)
boosting the power of the transmitted codeword if the power transmitted over the channel falls below , which can be seen from the second clause of (30); 3. (iii)
adjusting the last symbol transmitted over the channel (i.e., ) so that the total transmitted power is exactly equal to , which can be seen from the second clause of (33).
Given an -feedback code, we consider the corresponding -modified code constructed in Definition 8 and let be the distribution induced by the modified code. By (34), we see that
[TABLE]
Define the -bounding box
[TABLE]
for each and each . It then follows from (35) that
[TABLE]
The following lemma is a natural consequence of Definition 8, and the proof is deferred to Appendix A.
Lemma 3
Given an -feedback code, let be the distribution induced by the code. Fix any and any such that , and let be the distribution induced by the -modified code based on the -feedback code. Then, we have
[TABLE]
for all Borel measurable .
III-C Curtiss’ Theorem
Curtiss’ theorem [16, Th. 3] states that convergence of moment generating functions leads to convergence in distribution. The formal statement is reproduced below.
Theorem 2** (Curtiss’ theorem)**
Let be a sequence of real-valued random variables. Suppose there exists a random variable such that
[TABLE]
for all . Then,
[TABLE]
for every at which is continuous.
In contrast to the more well-known Lévy’s continuity theorem [17, Sec. 18.1], (39) of Theorem 2 is required to be true for all real rather than purely imaginary .
IV Proof of Theorem 1
Fix an and choose an arbitrary sequence of -feedback codes. Since
[TABLE]
by (20), it suffices to show that
[TABLE]
for all . To this end, fix an arbitrary .
IV-A Discretizing the Power Allocation Vectors by Appending Symbols
Using Definition 1, we have
[TABLE]
for the chosen -feedback code for each . Given the chosen -feedback code, we can always construct an -feedback code by appending a carefully chosen tuple to each transmitted codeword generated by the -feedback code such that
[TABLE]
which implies that
[TABLE]
In addition, given the -feedback code, we can always construct an -feedback code by appending a carefully chosen to each transmitted codeword generated by the -feedback code such that
[TABLE]
To simplify notation, we let
[TABLE]
Construct the set of power allocation vectors
[TABLE]
which can be viewed as a set of quantized power allocation vectors with quantization level that satisfy the equality power constraint
[TABLE]
It follows from (47), (45) and Definition 7 that
[TABLE]
and
[TABLE]
IV-B Obtaining a Lower Bound on the Error Probability in Terms of the Type-II Error of a Hypothesis Test
Let be the probability distribution induced by the -feedback code constructed above for each , where is obtained according to (19). Fix an and the corresponding -feedback code. Recall the definition of for each in (6) and define the distribution
[TABLE]
where111We note that even if we exclude the set of power types in the set which is defined later in (58), this leads to another valid definition of .
[TABLE]
The choice of in (51) is motivated by the choice of the auxiliary output distribution in [18, Sec. X-A] where DMCs are considered. Then, it follows from Proposition 2 and Definition 1 with the identifications , , , , and that
[TABLE]
IV-C Obtaining a Non-Asymptotic Bound from Simplifying the Type-II Error of the Binary Hypothesis Test
Using the DPI of by introducing and , we have
[TABLE]
where
[TABLE]
by (19). Combining (53), (54) and (50), we have
[TABLE]
Fix any constant to be specified later. Using Lemma 1, (55) and (18), we have
[TABLE]
which together with (52) implies that
[TABLE]
IV-D Splitting the Probability Term into Multiple Terms Corresponding to Different Power Types of
Define222The conclusion of this proof remains unchanged if the term in (58) is replaced by for any .
[TABLE]
to be the set of power allocation vectors in that are close to the optimal power allocation vector (cf. (7)). Following (57), we use (49) to obtain
[TABLE]
In order to bound the first term in (59), we let
[TABLE]
and define be the distribution induced by the -modified code based on the -feedback code defined in Definition 8. Then, consider the following chain of inequalities:
[TABLE]
where
- •
(61) is due to Lemma 3 and the fact that (cf. (47) and (49)).
- •
(62) is due to the definition of in (51).
Similarly, in order to bound the second term in (59), we let be the distribution induced by the -modified code and consider the following chain of inequalities for each :
[TABLE]
where
- •
- •
(64) is due to the definition of in (51).
Combining (59), (62), (64) and the definition of in (16) followed by letting
[TABLE]
for each and each , we obtain
[TABLE]
where is as defined in (4). In order to simplify the RHS of (66), we define such that
[TABLE]
In addition, for each , let
[TABLE]
for each . By using (66), (67) and (68) together with the facts by (37) that
[TABLE]
and
[TABLE]
for each , we can express (66) as
[TABLE]
IV-E Applying Curtiss’ Theorem When is Close to
In order to simplify the first term in (71), we define
[TABLE]
for each and want to show that
[TABLE]
for all where
[TABLE]
To this end, recall the following statements due to the channel law:
- (i)
for all and all ; 2. (ii)
are independent; 3. (iii)
and are independent for all .
For any and any such that , since
[TABLE]
by (69) and for all , we have
[TABLE]
In order to simplify the above chain of inequalities, we need the following lemma, whose proof is deferred to Appendix B because it involves straightforward calculations based on (68), (72) and the channel law.
Lemma 4
For any , we have
[TABLE]
Lemma 4, which forms the crux of the proof of Theorem 1, is important because it establishes the equivalence in distribution between the sum , which contains dependent random variables, and the sum , which contains independent random variables. The former is intractable to analyze while the latter can be analyzed in a straightforward manner by invoking the central limit theorem.
Using Lemma 4, we can simplify (77) through the identification and obtain
[TABLE]
Combining (80) and (60), we conclude that (73) holds for each . Since the moment generating functions of and converge to the same function, it follows from Curtiss’ theorem [16, Th. 3] (as stated in Theorem 2) that
[TABLE]
Recognizing that \big{\{}V_{k}^{(\mathbf{P}^{*})}\big{\}}_{k=1}^{\infty} are independent zero-mean Gaussian random variables with variance by the definition of in (72) and the definition of in (10), we apply the central limit theorem [11] and obtain
[TABLE]
which together with (81) implies that
[TABLE]
IV-F Applying Large Deviation Bounds When is Far from
In order to bound the second term in (71), we consider a fixed and want to show that there exists some such that
[TABLE]
for all . To this end, we first define the Lagrangian function as
[TABLE]
where is the unique number that satisfies (5) and (6) and is defined for each as
[TABLE]
Define . Then for all , we use Taylor’s theorem to obtain
[TABLE]
for some that lies on the line that connects and , where denotes the gradient which satisfies
[TABLE]
and denotes the Hessian matrix that satisfies
[TABLE]
For the sake of completeness, the derivations of (88) and (89) are contained in Appendix C. Combining (87), (88) and (89), we have for all
[TABLE]
which together with the definitions of and in (85) and (86) respectively implies that
[TABLE]
Consequently, (84) holds by setting
[TABLE]
Following (71), we consider for each
[TABLE]
where
- •
- •
(95) follows from the definition of in (58).
Following the standard approach for obtaining large deviation bounds, we apply Markov’s inequality on the RHS of (95) and obtain for each
[TABLE]
In order to bound the RHS of (96), consider the following chain of inequalities for each :
[TABLE]
where
- •
(97) follows from straightforward calculations based on the definition of in (68), the property of in (70) and the channel law, which are elaborated in Appendix D for the sake of completeness.
- •
(99) is due to the fact that for all .
Combining (96) and (101), we have the following large deviation bound for each :
[TABLE]
Following (71), we use (102) and (48) to obtain
[TABLE]
IV-G Combining Earlier Bounds to Obtain an Upper Bound on
Combining (57), (67), (71), (83), (103) and (48), we have
[TABLE]
for all sufficiently large , which together with (46) implies that
[TABLE]
Since is arbitrary, it follows from (105) and Definition 5 that
[TABLE]
V Concluding Remarks
V-A Novel Ingredients in Proof of Theorem 1
As mentioned in Section I-A, the proof of [8, Th. 2] which obtains upper bounds on the second-order asymptotics of DMCs with feedback cannot be generalized to the parallel Gaussian channel with feedback. Indeed, the proof of Theorem 1 follows the standard procedures for obtaining the second-order asymptotics of DMCs without feedback (see, e.g., [5, proof of Th. 50] and [19, Sec. III]) except the following three novel ingredients:
Instead of classifying transmitted codewords into polynomially many type classes based on their empirical distributions which is generally not possible for channels with continuous input alphabet, we discretize the transmitted power and classify the codewords into polynomially many type classes based on their discretized power types. In particular, the collection of power type classes in (47) plays a key role in our analysis, and there are polynomially many power type classes by (48). The details can be found in Section IV-A in the proof. 2. 2.
Curtiss’ theorem rather than Berry-Esséen theorem is invoked to bound the information spectrum term (the first term in (71)) related to transmitted codewords whose types are close to the optimal power allocation. In particular, Berry-Esséen theorem for bounded martingale difference sequences cannot be used to bound the information spectrum term in the presence of feedback because each input symbol belongs to an interval that grows unbounded as increases. Instead, we apply Curtiss’ theorem to show that the distribution of the sum of random variables in the information spectrum term converges to some distribution generated from a sum of i.i.d. random variables (i.e., (73)), thus facilitating the use of the usual central limit theorem [11]. The details can be found in Section IV-E. 3. 3.
In order to bound the information spectrum term related to transmitted codewords whose types are far from the optimal power allocation (the second term in (71)), the usual approach is to bound the information spectrum term by an *average * of exponentially many upper bounds where each upper bound is then further simplified by invoking Chebyshev’s inequality [18, Sec. X-A]. However, due to the presence of feedback, the information spectrum term can be expressed as only a sum (instead of average) of polynomially many upper bounds as shown in the second term in (71). In order to control the sum of polynomially many upper bounds, we have to resort to large deviation bounds as shown in (102) rather than the weaker Chebyshev’s inequality. The details can be found in Section IV-F.
V-B Major Difficulties in Tightening the Third-Order Term
If the feedback link is absent, the third-order term of the optimal finite blocklenth rate is \Theta\Big{(}\frac{\log n}{n}\Big{)} as shown in (9) in Section I. The third-order term can be obtained by applying Berry-Esséen theorem to bound an information spectrum term (analogous to the first term in (71)).
In the presence of feedback, Theorem 1 shows that the third-order term is o\Big{(}\frac{1}{\sqrt{n}}\Big{)}. If we want to compute an explicit upper bound on the third-order term using the current proof technique, an intuitive way is to invoke a non-asymptotic version of Curtiss’ theorem that can measure the proximity between two distributions based on the proximity between their moment generating functions. However, such a non-asymptotic version of Curtiss’ theorem does not exist to the best of our knowledge, which prohibits us from strengthening the current o\Big{(}\frac{1}{\sqrt{n}}\Big{)} bound on the third-order term. It is worth noting that (77) and (80) in our proof break down if the moment generating functions are replaced with characteristic functions. If one can find a way to make characteristic functions amenable to our proof approach, then a non-asymptotic version of Lévy’s continuity theorem known as Esséen’s smoothing lemma (see, e.g., [20, Th. 1.5.2]) may be invoked to tighten the third-order term herein.
V-C Future Work
The techniques presented herein may be used to analyze the fixed-error asymptotics of fixed-length codes over parallel DMCs with cost constraint and with feedback. We envision that there will be an added layer of complexity as the method of types [21] is typically used to analyze the fixed-error asymptotics of DMCs with and without cost constraint [22]. Hence, we anticipate that two applications of the method of types need to be applied — one for handling power types that specify the power allocation among the parallel channels (as was done in Section IV-A) and another for handling codewords of the same power type but of different empirical distributions (the usual types). In the present work, the latter situation is ameliorated by the fact that the maximum rate achievable by a coding scheme over an AWGN channel is completely determined by the power allocated for the coding scheme.
Appendix A Proof of Lemma 3
Proof:
Let and be the encoding functions of the -feedback code and the -modified code respectively for each and each . For any and any such that
[TABLE]
and
[TABLE]
it follows from (27), (30) and (33) in Definition 8 that
[TABLE]
Since (109) holds for any and that satisfy (107) and (108), it follows that (38) holds for all Borel measurable . ∎
Appendix B Proof of Lemma 4
The proof for the AWGN channel case (i.e., ) is contained in [12, Sec. E]. For a general , consider the following chain of equalities for each :
[TABLE]
where
- •
(111) is due to the fact that and are independent.
- •
(112) is due to the definition of in (68).
Applying (112) recursively from to , we have
[TABLE]
On the other hand, straightforward calculations based on the definition of in (72) and the fact that are independent implies that
[TABLE]
Combining (113) and (114), we obtain (78).
Appendix C Derivations of (88) and (89)
Straightforward calculations based on (85) reveal that for all , we obtain that
[TABLE]
and is a diagonal matrix that satisfies
[TABLE]
Combining (115), (5), (6) and (86), we have . In addition, for any such that , it follows from (116) that for all , which then implies that (89) holds for all .
Appendix D Derivation of (97)
Let . Fix any . Due to (70), it suffices to show that
[TABLE]
Replacing with in the steps leading to (112) and (113), we obtain (117).
Acknowledgments
The authors would like to thank the Associate Editor Prof. Shun Watanabe and the two anonymous reviewers for the useful comments that improve the presentation of this paper.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] T. M. Cover and J. A. Thomas, Elements of Information Theory , 2nd ed. Hoboken, NJ: John Wiley and Sons, 2006.
- 2[2] A. El Gamal and Y.-H. Kim, Network Information Theory . Cambridge, U.K.: Cambridge University Press, 2012.
- 3[3] D. Tse and P. Viswanath, Fundamentals of Wireless Communication . Cambridge, U.K.: Cambridge University Press, 2005.
- 4[4] C. E. Shannon, “Communication in the presence of noise,” Proceedings of IRE , vol. 37, no. 1, pp. 10–21, 1949.
- 5[5] Y. Polyanskiy, “Channel coding: Non-asymptotic fundamental limits,” Ph.D. dissertation, Princeton University, 2010.
- 6[6] V. Y. F. Tan and M. Tomamichel, “The third-order term in the normal approximation for the AWGN channel,” IEEE Trans. Inf. Theory , vol. 61, no. 5, pp. 2430–2438, 2015.
- 7[7] C. E. Shannon, “The zero error capacity of a noisy channel,” IRE Transactions on Information Theory , vol. 2, no. 3, pp. 8–19, 1956.
- 8[8] Y. Altuğ and A. B. Wagner, “Feedback can improve the second-order coding performance in discrete memoryless channels,” in Proc. IEEE Intl. Symp. Inf. Theory , Honolulu, HI, Jul 2014, pp. 2361–2365.
