A Tight Upper Bound on the Second-Order Coding Rate of the Parallel   Gaussian Channel with Feedback

Silas L. Fong; Vincent Y. F. Tan

arXiv:1703.05932·cs.IT·July 11, 2017

A Tight Upper Bound on the Second-Order Coding Rate of the Parallel Gaussian Channel with Feedback

Silas L. Fong, Vincent Y. F. Tan

PDF

Open Access

TL;DR

This paper provides a precise upper bound on the second-order coding rate for parallel Gaussian channels with feedback, demonstrating that feedback does not enhance the second-order asymptotics under certain constraints.

Contribution

It offers a self-contained proof of the second-order upper bound for the channel, using advanced probabilistic techniques to show feedback does not improve second-order performance.

Findings

01

Feedback does not improve the second-order asymptotics.

02

The proof employs an information spectrum bound and Curtiss' theorem.

03

The results match existing achievability bounds, confirming the second-order limit.

Abstract

This paper investigates the asymptotic expansion for the maximum rate of fixed-length codes over a parallel Gaussian channel with feedback under the following setting: A peak power constraint is imposed on every transmitted codeword, and the average error probability of decoding the transmitted message is non-vanishing as the blocklength increases. It is well known that the presence of feedback does not increase the first-order asymptotics of the channel, i.e., capacity, in the asymptotic expansion, and the closed-form expression of the capacity can be obtained by the well-known water-filling algorithm. The main contribution of this paper is a self-contained proof of an upper bound on the second-order asymptotics of the parallel Gaussian channel with feedback. The proof techniques involve developing an information spectrum bound followed by using Curtiss' theorem to show that a sum of…

Equations266

Y_{ℓ, k} = X_{ℓ, k} + Z_{ℓ, k}

Y_{ℓ, k} = X_{ℓ, k} + Z_{ℓ, k}

Y_{k} = X_{k} + Z_{k} .

Y_{k} = X_{k} + Z_{k} .

P {\frac{1}{n} ℓ = 1 \sum L k = 1 \sum n X_{ℓ, k}^{2} \leq P} = 1.

P {\frac{1}{n} ℓ = 1 \sum L k = 1 \sum n X_{ℓ, k}^{2} \leq P} = 1.

C (s) = ℓ = 1 \sum L \frac{1}{2} lo g (1 + \frac{s _{ℓ}}{N _{ℓ}})

C (s) = ℓ = 1 \sum L \frac{1}{2} lo g (1 + \frac{s _{ℓ}}{N _{ℓ}})

ℓ = 1 \sum L P_{ℓ} = P

ℓ = 1 \sum L P_{ℓ} = P

P_{ℓ} = max {0, Λ - N_{ℓ}}

P_{ℓ} = max {0, Λ - N_{ℓ}}

P^{*} = def [P_{1} P_{2} \dots P_{L}]^{t}

P^{*} = def [P_{1} P_{2} \dots P_{L}]^{t}

ε \to 0 lim n \to \infty lim inf \frac{1}{n} lo g M^{*} (n, ε, P) = C (P^{*}) .

ε \to 0 lim n \to \infty lim inf \frac{1}{n} lo g M^{*} (n, ε, P) = C (P^{*}) .

\frac{1}{n}\log M^{*}(n,\varepsilon,P)=\mathrm{C}(\mathbf{P}^{*})+\sqrt{\frac{\mathrm{V}(\mathbf{P}^{*})}{n}}\,\Phi^{-1}(\varepsilon)+\Theta\Big{(}\frac{\log n}{n}\Big{)},

\frac{1}{n}\log M^{*}(n,\varepsilon,P)=\mathrm{C}(\mathbf{P}^{*})+\sqrt{\frac{\mathrm{V}(\mathbf{P}^{*})}{n}}\,\Phi^{-1}(\varepsilon)+\Theta\Big{(}\frac{\log n}{n}\Big{)},

V (s) = ℓ = 1 \sum L \frac{\frac{s _{ℓ}}{N _{ℓ}} ( \frac{s _{ℓ}}{N _{ℓ}} + 2 )}{2 ( \frac{s _{ℓ}}{N _{ℓ}} + 1 ) ^{2}}

V (s) = ℓ = 1 \sum L \frac{\frac{s _{ℓ}}{N _{ℓ}} ( \frac{s _{ℓ}}{N _{ℓ}} + 2 )}{2 ( \frac{s _{ℓ}}{N _{ℓ}} + 1 ) ^{2}}

ε \to 0 lim n \to \infty lim inf \frac{1}{n} lo g M_{fb}^{*} (n, ε, P) = C (P^{*}) .

ε \to 0 lim n \to \infty lim inf \frac{1}{n} lo g M_{fb}^{*} (n, ε, P) = C (P^{*}) .

\frac{1}{n}\log M_{\text{fb}}^{*}(n,\varepsilon,P)\geq\mathrm{C}(\mathbf{P}^{*})+\sqrt{\frac{\mathrm{V}(\mathbf{P}^{*})}{n}}\,\Phi^{-1}(\varepsilon)+\Theta\Big{(}\frac{\log n}{n}\Big{)}.

\frac{1}{n}\log M_{\text{fb}}^{*}(n,\varepsilon,P)\geq\mathrm{C}(\mathbf{P}^{*})+\sqrt{\frac{\mathrm{V}(\mathbf{P}^{*})}{n}}\,\Phi^{-1}(\varepsilon)+\Theta\Big{(}\frac{\log n}{n}\Big{)}.

\frac{1}{n}\log M_{\mathrm{fb}}^{*}(n,\varepsilon,P)=\mathrm{C}(\mathbf{P}^{*})+\sqrt{\frac{\mathrm{V}(\mathbf{P}^{*})}{n}}\,\Phi^{-1}(\varepsilon)+o\Big{(}\frac{1}{\sqrt{n}}\Big{)}.

\frac{1}{n}\log M_{\mathrm{fb}}^{*}(n,\varepsilon,P)=\mathrm{C}(\mathbf{P}^{*})+\sqrt{\frac{\mathrm{V}(\mathbf{P}^{*})}{n}}\,\Phi^{-1}(\varepsilon)+o\Big{(}\frac{1}{\sqrt{n}}\Big{)}.

N (z; μ, σ^{2}) = def \frac{1}{2 π σ ^{2}} e^{- \frac{( z - μ ) ^{2}}{2 σ ^{2}}}

N (z; μ, σ^{2}) = def \frac{1}{2 π σ ^{2}} e^{- \frac{( z - μ ) ^{2}}{2 σ ^{2}}}

W = def {1, 2, \dots, M}

W = def {1, 2, \dots, M}

f_{ℓ, k} : W \times R^{L \times (k - 1)} \to R

f_{ℓ, k} : W \times R^{L \times (k - 1)} \to R

X_{ℓ, k} = f_{ℓ, k} (W, Y^{k - 1})

X_{ℓ, k} = f_{ℓ, k} (W, Y^{k - 1})

φ : R^{L \times n} \to W,

φ : R^{L \times n} \to W,

\hat{W} = φ (Y^{n}) .

\hat{W} = φ (Y^{n}) .

q_{Y ∣ X} (y ∣ x) = ℓ = 1 \prod L N (y_{ℓ}; x_{ℓ}, N_{ℓ})

q_{Y ∣ X} (y ∣ x) = ℓ = 1 \prod L N (y_{ℓ}; x_{ℓ}, N_{ℓ})

p_{W, X^{k}, Y^{k}} = p_{W, X^{k}, Y^{k - 1}} p_{Y_{k} ∣ X_{k}}

p_{W, X^{k}, Y^{k}} = p_{W, X^{k}, Y^{k - 1}} p_{Y_{k} ∣ X_{k}}

p_{Y_{k} ∣ X_{k}} (y_{k} ∣ x_{k}) = q_{Y ∣ X} (y_{k} ∣ x_{k})

p_{Y_{k} ∣ X_{k}} (y_{k} ∣ x_{k}) = q_{Y ∣ X} (y_{k} ∣ x_{k})

p_{W, X^{n}, Y^{n}, \hat{W}}

p_{W, X^{n}, Y^{n}, \hat{W}}

M_{fb}^{*} (n, ε, P) = def max {M \in N ∣ There exists an (n, M, P, ε) -feedback code} .

M_{fb}^{*} (n, ε, P) = def max {M \in N ∣ There exists an (n, M, P, ε) -feedback code} .

C_{ε}^{fb} = def n \to \infty lim inf \frac{1}{n} lo g M_{fb}^{*} (n, ε, P) .

C_{ε}^{fb} = def n \to \infty lim inf \frac{1}{n} lo g M_{fb}^{*} (n, ε, P) .

C_{0}^{fb} = def ε > 0 in f C_{ε}^{fb} .

C_{0}^{fb} = def ε > 0 in f C_{ε}^{fb} .

L_{ε}^{fb} = def n \to \infty lim inf \frac{1}{n} (lo g M_{fb}^{*} (n, ε, P) - n C_{ε}^{fb}) .

L_{ε}^{fb} = def n \to \infty lim inf \frac{1}{n} (lo g M_{fb}^{*} (n, ε, P) - n C_{ε}^{fb}) .

C_{0}^{fb} = C (P^{*}) .

C_{0}^{fb} = C (P^{*}) .

C_{ε}^{fb} = C (P^{*})

C_{ε}^{fb} = C (P^{*})

L_{ε}^{fb} \leq V (P^{*}) Φ^{- 1} (ε) .

L_{ε}^{fb} \leq V (P^{*}) Φ^{- 1} (ε) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWireless Communication Security Techniques · Cooperative Communication and Network Coding · DNA and Biological Computing

Full text

A Tight Upper Bound on the Second-Order Coding Rate of the Parallel Gaussian Channel with Feedback

Silas L. Fong and Vincent Y. F. Tan S. L. Fong and V. Y. F. Tan were supported by NUS Young Investigator Award under Grant R-263-000-B37-133. S. L. Fong is with the Department of Electrical and Computer Engineering, NUS, Singapore 117583 (e-mail: [email protected]).V. Y. F. Tan is with the Department of Electrical and Computer Engineering, NUS, Singapore 117583, and also with the Department of Mathematics, NUS, Singapore 119076 (e-mail: [email protected]).

Abstract

This paper investigates the asymptotic expansion for the maximum rate of fixed-length codes over a parallel Gaussian channel with feedback under the following setting: A peak power constraint is imposed on every transmitted codeword, and the average error probabilities of decoding the transmitted message are non-vanishing as the blocklength increases. The main contribution of this paper is a self-contained proof of an upper bound on the first- and second-order asymptotics of the parallel Gaussian channel with feedback. The proof techniques involve developing an information spectrum bound followed by using Curtiss’ theorem to show that a sum of dependent random variables associated with the information spectrum bound converges in distribution to a sum of independent random variables, thus facilitating the use of the usual central limit theorem. Combined with existing achievability results, our result implies that the presence of feedback does not improve the first- and second-order asymptotics.

Index Terms:

Curtiss’ theorem, feedback, fixed-length codes, parallel Gaussian channel, second-order asymptotics

I Introduction

This paper considers a point-to-point communication scenario where a source wants to transmit a message to a destination through a set of independent additive white Gaussian noise (AWGN) channels. The set of independent AWGN channels is referred to as the parallel Gaussian channel [1, Sec. 9.4] (also called the Gaussian product channel in [2, Sec. 3.4.3]). The parallel Gaussian channel has been used to model the multiple-input multiple-output (MIMO) channel [3, Sec. 7.1] — an essential channel model in wireless communications. Suppose the parallel Gaussian channel consists of $L$ independent AWGN channels, and let $\mathcal{L}\stackrel{{\scriptstyle\text{def}}}{{=}}\{1,2,\ldots,L\}$ be the index set of the $L$ channels. For the $k^{\text{th}}$ channel use, the relation for the $\ell^{\text{th}}$ channel between the input signal $X_{\ell,k}$ and output signal $Y_{\ell,k}$ is

[TABLE]

where $\{Z_{\ell,k}\}_{\ell\in\mathcal{L}}$ are independent Gaussian noises. For each $\ell\in\mathcal{L}$ , the variance of the noise induced by the $\ell^{\text{th}}$ channel is assumed to be some positive number $N_{\ell}>0$ for all channel uses, i.e., ${\rm{Var}}[Z_{\ell,k}]=N_{\ell}$ for all $k\in\mathbb{N}$ . To keep notation compact, let $\boldsymbol{X}_{k}$ , $\boldsymbol{Y}_{k}$ and $\boldsymbol{Z}_{k}$ denote the random column vectors $[X_{1,k}\ X_{2,k}\ \ldots\ X_{L,k}]^{t}$ , $[Y_{1,k}\ Y_{2,k}\ \ldots\ Y_{L,k}]^{t}$ and $[Z_{1,k}\ Z_{2,k}\ \ldots\ Z_{L,k}]^{t}$ respectively. Then, the channel law (1) can be written as

[TABLE]

Throughout this paper, we consider fixed-length codes over the parallel Gaussian channel, where the block length is denoted by $n$ unless specified otherwise. Every codeword $\boldsymbol{X}^{n}$ transmitted by the source over $n$ channel uses is subject to the following peak power constraint where $P>0$ denotes the permissible power for $\boldsymbol{X}^{n}$ :

[TABLE]

If we would like to transmit a uniformly distributed message $W\in\{1,2,\ldots,\lceil 2^{nR}\rceil\}$ over this channel where the error probabilities are required to vanish as the blocklength $n$ approaches infinity, it was shown by Shannon [4] that the maximum rate of communication $R$ converges to a certain limit called capacity. The closed-form expression of the capacity can be obtained by finding the optimal power allocation among the $L$ channels, which is described as follows. Define the mapping $\mathrm{C}(\mathbf{s}):\mathbb{R}_{+}^{L}\rightarrow\mathbb{R}_{+}$ as

[TABLE]

where $s_{\ell}$ can be viewed as the power allocated to channel $\ell$ . If we let $\Lambda$ , $P_{1}$ , $P_{2}$ , $\ldots$ , $P_{L}$ denote the $L+1$ real numbers yielded from the water-filling algorithm [1, Ch 9.4] where

[TABLE]

and

[TABLE]

for each $\ell\in\mathcal{L}$ and let

[TABLE]

be the optimal power allocation vector, then the capacity of the parallel Gaussian channel was shown in [4] to be $\mathrm{C}(\mathbf{P}^{*})$ bits per channel use. More specifically, if $M^{*}(n,\varepsilon,P)$ denotes the maximum number of messages that can be transmitted over $n$ channel uses with permissible power $P$ and average error probability $\varepsilon$ , one has

[TABLE]

The capacity result (8) has been strengthened by Polyanskiy-Poor-Verdú [5, Th. 78] and Tan-Tomamichel [6, Appendix A] for each $\varepsilon\in(0,1)$ as

[TABLE]

where $\mathrm{V}:\mathbb{R}_{+}^{L}\rightarrow\mathbb{R}_{+}$ is the Gaussian dispersion function defined as

[TABLE]

and $\Phi$ is the cumulative distribution function (cdf) of the standard normal distribution.

Feedback, which is the focus of the current paper, can simplify coding schemes and improve the performance of communication systems in many scenarios. See [2, Ch. 17] for a thorough discussion on the benefits of feedback in single- and multi-user information theory. When feedback is allowed, each input symbol $\boldsymbol{X}_{k}$ depends on not only the transmitted message $W$ but also all the previous channel outputs up to the $(k-1)^{\text{th}}$ channel use, i.e., the symbols $(\boldsymbol{Y}_{1},\boldsymbol{Y}_{2},\ldots,\boldsymbol{Y}_{k-1})$ . In the presence of noiseless feedback, let $M_{\text{fb}}^{*}(n,\varepsilon,P)$ denote the maximum number of messages that can be transmitted over $n$ channel uses with permissible power $P$ and average error probability $\varepsilon$ . It was shown by Shannon [7] that the presence of noiseless feedback does not increase the capacity of point-to-point memoryless channels, which together with (8) implies that

[TABLE]

In view of (9), we conclude that

[TABLE]

In this paper, the main contribution is a conceptually simple, concise and self-contained proof that in the presence of feedback, the first- and second-order terms in the asymptotic expansion in (9) remains unchanged, i.e.,

[TABLE]

I-A Related Work

Our work is inspired by the recent study of the fundamental limits of communication over discrete memoryless channels (DMCs) with feedback [8]. It was shown by Altuğ and Wagner [8, Th. 1] that for some classes of DMCs whose capacity-achieving input distributions are not unique (in particular, the minimum and maximum conditional information variances differ), coding schemes with feedback achieve a better second-order asymptotics compared to those without feedback. They also showed [8, Th. 2] that feedback does not improve the second-order asymptotics of DMCs $q_{Y|X}$ if the conditional variance of the log-likelihood ratio $\log\frac{q_{Y|X}(Y|x)}{p^{*}(Y)}$ , where $p^{*}$ is the unique capacity-achieving output distribution, does not depend on the input $x$ . Such DMCs include the class of weakly-input symmetric DMCs initially studied by Polyanskiy-Poor-Verdú [9].

However, we note that the proof technique used by Altuğ and Wagner requires the use of a Berry-Esséen-type result for bounded martingale difference sequences [10], and their technique cannot be extended to the parallel Gaussian channel with feedback because each input symbol $X_{\ell,k}$ belongs to an interval $[-\sqrt{nP},\sqrt{nP}]$ that grows without bound as $n$ increases. Instead, our proof uses Curtiss’ theorem to show that a sum of dependent random variables that naturally appears in the non-asymptotic analysis converges in distribution to a sum of independent random variables, thus facilitating the use of the usual central limit theorem [11].

For $L=1$ , the parallel Gaussian channel with feedback reduces to the AWGN channel with feedback, whose second-order coding rate is identical to the same channel without feedback by the following symmetry argument: The log-likelihood ratios $\log\frac{q_{Y|X}(Y|x)}{p^{*}(Y)}$ for all $x$ on the power sphere with radius $\sqrt{nP}$ are the same. See [12] for a rigorous but simple proof. In contrast, for $L>1$ , this symmetry argument no longer holds due to the flexible power allocation among the $L$ channels, and hence the simple proof suggested in [12] cannot be extended to the parallel Gaussian channel with feedback.

If the peak power constraint in (3) is replaced with the expected power constraint ${\mathbb{E}}\left[\frac{1}{n}\sum_{\ell=1}^{L}\sum_{k=1}^{n}X_{\ell,k}^{2}\right]\leq P$ , the first-order coding rate of the AWGN channel with feedback is improved from $\mathrm{C}(P)$ to $\mathrm{C}(\frac{P}{1-\varepsilon})$ [13, Sec. II] (the exact improvement holds for the non-feedback case as well [5, Sec. 4.3.3]) where $\varepsilon$ denotes the tolerable error probability. For the general case $L>1$ , the proof in [13, Sec. II] can be easily extended to show that the first-order coding rate of the parallel Gaussian channel with feedback can be improved from $\mathrm{C}(\mathbf{P}^{*})$ to $\mathrm{C}(\frac{\mathbf{P}^{*}}{1-\varepsilon})$ , and hence (13) no longer holds.

I-B Paper Outline

This paper is organized as follows. The next subsection summarizes the notation used in this paper. Section II provides the problem setup of the parallel Gaussian channel with feedback under the peak power constraint and presents our main theorem. Section III contains the preliminaries required for the proof of our main theorem. The preliminaries include the following: (i) Important properties of non-asymptotic binary hypothesis testing quantities; (ii) Modification of power allocation among the parallel channels; (iii) Curtiss’ theorem. Section IV presents the proof of our main theorem. Section V concludes this paper by explaining the novel ingredients in the proof of the main theorem and the major difficulty in strengthening the main theorem.

I-C Notation

The sets of natural numbers, non-negative integers, real numbers and non-negative real numbers are denoted by $\mathbb{N}$ , $\mathbb{Z}_{+}$ , $\mathbb{R}$ and $\mathbb{R}_{+}$ respectively. An $L$ -dimensional column vector is denoted by $\mathbf{a}\stackrel{{\scriptstyle\text{def}}}{{=}}[a_{1}\ a_{2}\ \ldots\ a_{L}]^{t}$ where $a_{\ell}$ denote the $\ell^{\text{th}}$ element of $\mathbf{a}$ . The Euclidean norm of a vector $\mathbf{a}\in\mathbb{R}^{L}$ is denoted by $\|\mathbf{a}\|_{2}\stackrel{{\scriptstyle\text{def}}}{{=}}\sqrt{\sum_{\ell=1}^{L}a_{\ell}^{2}}$ . We will take all logarithms to base $e$ throughout this paper.

We use ${\mathbb{P}}\{\mathcal{E}\}$ to represent the probability of an event $\mathcal{E}$ , and we let $\mathbf{1}\{\mathcal{E}\}$ be the indicator function of $\mathcal{E}$ . Every random variable is denoted by a capital letter (e.g., $X$ ), and the realization and the alphabet of the random variable are denoted by the corresponding small letter (e.g., $x$ ) and calligraphic letter (e.g., $\mathcal{X}$ ) respectively. We use $X^{n}$ to denote a random tuple $(X_{1},X_{2},\ldots,X_{n})$ , where all the elements $X_{k}$ have the same alphabet $\mathcal{X}$ . We let $p_{X}$ be the probability distribution of a random variable $X$ . More specifically, $p_{X}$ is the Radon-Nikodym derivative of a measure with respect to the Lebesgue measure in an appropriate Euclidean space. We let $p_{Y|X}$ denote the conditional probability distribution of $Y$ given $X$ for any random variables $X$ and $Y$ . We let $p_{X}p_{Y|X}$ denote the joint distribution of $(X,Y)$ , i.e., $p_{X}p_{Y|X}(x,y)=p_{X}(x)p_{Y|X}(y|x)$ for all $x$ and $y$ . For any random variable $X\sim p_{X}$ and any real-valued function $g$ whose domain includes $\mathcal{X}$ , we let ${\mathbb{P}}_{p_{X}}\{g(X)\geq\xi\}$ denote $\int_{\mathcal{X}}p_{X}(x)\mathbf{1}\{g(x)\geq\xi\}\,\mathrm{d}x$ for any real constant $\xi$ where $p_{X}$ The expectation and the variance of $g(X)$ are denoted as ${\mathbb{E}}_{p_{X}}[g(X)]$ and ${\rm{Var}}_{p_{X}}[g(X)]$ respectively. For simplicity, we drop the subscript of a notation if there is no ambiguity. For any real-valued Gaussian random variable $Z$ whose mean and variance are $\mu$ and $\sigma^{2}$ respectively, we let

[TABLE]

be the corresponding probability density function.

II Parallel Gaussian Channel with Feedback

Let $\mathrm{s}$ and $\mathrm{d}$ denote the source and the destination respectively. Suppose node $\mathrm{s}$ transmits a message to node $\mathrm{d}$ over $n$ channel uses through the $L$ independent AWGN channels. Before any transmission begins, node $\mathrm{s}$ chooses message $W$ destined for node $\mathrm{d}$ where $W$ is uniformly distributed on the message alphabet

[TABLE]

whose size is denoted by $M$ . For the $k^{\text{th}}$ channel use, node $\mathrm{s}$ transmits $\boldsymbol{X}_{k}$ and the corresponding channel output $\boldsymbol{Y}_{k}$ satisfies (2). We assume that a noiseless feedback link from the destination node $\rm d$ to the source node $\rm s$ exists so that $(W,\boldsymbol{Y}^{k-1})$ is available for encoding $\boldsymbol{X}_{k}$ for each $k\in\{1,2,\ldots,n\}$ . In addition, the codewords $\boldsymbol{X}^{n}$ transmitted by $\mathrm{s}$ is subject to the peak power constraint (3). Upon receiving $\boldsymbol{Y}^{n}$ , node $\mathrm{d}$ declares $\hat{W}$ to be the transmitted message.

Definition 1

An $(n,M,P)$ -feedback code consists of the following:

A message set $\mathcal{W}$ at node $\mathrm{s}$ as defined in (15). Message $W$ is uniform on $\mathcal{W}$ . 2. 2.

An encoding function

[TABLE]

for each $\ell\in\mathcal{L}$ and each $k\in\{1,2,\ldots,n\}$ , where $f_{\ell,k}$ is the encoding function at node $\mathrm{s}$ for encoding $X_{\ell,k}$ such that

[TABLE]

and the peak power constraint (3) holds. 3. 3.

A decoding function

[TABLE]

where $\varphi$ is the decoding function for $W$ at node $\mathrm{d}$ such that

[TABLE]

Definition 2

Let $\boldsymbol{X}$ and $\boldsymbol{Y}$ denote the random vectors $[X_{1}\ X_{2}\ \ldots\ X_{L}]^{t}$ and $[Y_{1}\ Y_{2}\ \ldots\ Y_{L}]^{t}$ respectively, and let $\mathbf{x}$ and $\mathbf{y}$ be their realizations respectively. The parallel Gaussian channel with feedback is characterized by the conditional probability density distribution $q_{\boldsymbol{Y}|\boldsymbol{X}}$ satisfying

[TABLE]

such that the following holds for any $(n,M,P)$ -feedback code: For each $k\in\{1,2,\ldots,n\}$ ,

[TABLE]

where

[TABLE]

*for all $(\mathbf{x}^{n},\mathbf{y}^{n})\in\mathbb{R}^{L\times n}\times\mathbb{R}^{L\times n}$ . *

For any $(n,M,P)$ -feedback code, let $p_{W,\boldsymbol{X}^{n},\boldsymbol{Y}^{n},\hat{W}}$ be the joint distribution induced by the code. We can use Definition 1, (17) and (18) to factorize $p_{W,\boldsymbol{X}^{n},\boldsymbol{Y}^{n},\hat{W}}$ as follows:

[TABLE]

Definition 3

For an $(n,M,P)$ -feedback code, we can calculate according to (19) the average probability of decoding error defined as ${\mathbb{P}}\big{\{}\hat{W}\neq W\big{\}}$ . We call an $(n,M,P)$ -feedback code with average probability of decoding error no larger than $\varepsilon$ an $(n,M,P,\varepsilon)$ -feedback code.

Define

[TABLE]

Definition 4

Let $\varepsilon\in(0,1)$ . The $\varepsilon$ -capacity of the parallel Gaussian channel with feedback, denoted by $C_{\varepsilon}^{\text{fb}}$ , is defined to be

[TABLE]

The capacity is defined to be

[TABLE]

Definition 5

Let $\varepsilon\in(0,1)$ . The $\varepsilon$ -second-order coding rate of the parallel Gaussian channel with feedback, denoted by $\mathrm{L}_{\varepsilon}^{\text{fb}}$ , is defined to be

[TABLE]

Recall how $\mathrm{C}(\mathbf{P}^{*})$ and $\mathrm{V}(\mathbf{P}^{*})$ are determined through (4), (5), (6), (7) and (10). Since the capacity of the parallel Gaussian channel without feedback is $\mathrm{C}(\mathbf{P}^{*})$ (see, e.g., [4] and [2, Sec. 3.4.3]) and an introduction of an extra noiseless feedback link does not increase the capacity (see, e.g., [7] and [1, Sec. 9.6]), it follows that

[TABLE]

Before stating our main result, recall that $\Phi:(-\infty,\infty)\rightarrow(0,1)$ is the cdf of the standard normal distribution. Since $\Phi$ is strictly increasing on $(-\infty,\infty)$ , the inverse of $\Phi$ is well-defined and is denoted by $\Phi^{-1}$ . The following theorem is the main result in this paper.

Theorem 1

Fix an $\varepsilon\in(0,1)$ . Then,

[TABLE]

and the $\varepsilon$ -second-order coding rate satisfies

[TABLE]

Combining (9) and Theorem 1, we complete the characterizations of the first- and second-order asymptotics of the parallel Gaussian channel with feedback as shown in (13).

III Preliminaries for the Proof of Theorem 1

III-A Binary Hypothesis Testing

The following definition concerning the non-asymptotic fundamental limits of a simple binary hypothesis test is standard. See for example [5, Section 2.3].

Definition 6

Let $p_{X}$ and $q_{X}$ be two probability distributions on some common alphabet $\mathcal{X}$ . Let

[TABLE]

be the set of randomized binary hypothesis tests between $p_{X}$ and $q_{X}$ where $\{Z=0\}$ indicates the test chooses $q_{X}$ , and let $\delta\in[0,1]$ be a real number. The minimum type-II error in a simple binary hypothesis test between $p_{X}$ and $q_{X}$ with type-I error less than $1-\delta$ is defined as

[TABLE]

The existence of a minimizing test $r_{Z|X}$ is guaranteed by the Neyman-Pearson lemma.

We state in the following lemma and proposition some important properties of $\beta_{\delta}(p_{X}\|q_{X})$ , which are crucial for the proof of Theorem 1. The proof of the following lemma can be found in, for example, [14, Lemma 1].

Lemma 1

Let $p_{X}$ and $q_{X}$ be two probability distributions on some $\mathcal{X}$ , and let $g$ be a function whose domain contains $\mathcal{X}$ . Then, the following two statements hold:

(Data processing inequality (DPI)) $\beta_{\delta}(p_{X}\|q_{X})\leq\beta_{\delta}(p_{g(X)}\|q_{g(X)})$ . 2. 2.

For all $\xi>0$ , $\beta_{\delta}(p_{X}\|q_{X})\geq\frac{1}{\xi}\left(\delta-\int_{\mathcal{X}}p_{X}(x)\boldsymbol{1}\left\{\frac{p_{X}(x)}{q_{X}(x)}\geq\xi\right\}\,\mathrm{d}x\right)$ .

The proof of the following proposition can be found in [14, Lemma 3] (see also [15, Th. 27]).

Proposition 2

Let $p_{U,V}$ be a probability distribution defined on $\mathcal{W}\times\mathcal{W}$ for some finite alphabet $\mathcal{W}$ , and let $p_{U}$ be the marginal distribution of $p_{U,V}$ . In addition, let $q_{V}$ be a distribution defined on $\mathcal{W}$ . Suppose $p_{U}$ is the uniform distribution, and let

[TABLE]

be a real number in $[0,1]$ where $(U,V)$ is distributed according to $p_{U,V}$ . Then,

[TABLE]

III-B Modification of Power Allocation among the Parallel Channels

For each transmitted codeword $\mathbf{x}^{n}\in\mathbb{R}^{L\times n}$ , we can view $\sum_{k=1}^{n}x_{\ell,k}^{2}$ as the power allocated to the $\ell^{\text{th}}$ channel for each $\ell\in\mathcal{L}$ . In the proof of Theorem 1, an early step is to discretize the power allocated to the $L$ channels. To this end, we need the following definition which defines the power allocation vector of a sequence $\mathbf{x}^{n}\in\mathbb{R}^{L\times n}$ .

Definition 7

The power allocation mapping $\phi:\mathbb{R}^{L\times n}\rightarrow\mathbb{R}_{+}^{L}$ is defined as

[TABLE]

We call $\phi(\mathbf{x}^{n})$ the power type of $\mathbf{x}^{n}$ .

The proof of Theorem 1 involves modifying a given length- $n$ code so that useful bounds on the performance of the given code can be obtained by analyzing the modified code. More specifically, the encoding functions the given code are modified so that the power type of the random codeword generated by the modified code always falls into some small bounding box. The specific modification of the encoding functions is described in the following definition.

Definition 8

*Given an $(n,M,P)$ -feedback code, let $\mathcal{W}$ , $\{f_{\ell,k}|1\leq\ell\leq L,1\leq k\leq n\}$ and $\varphi$ be the corresponding message alphabet, encoding functions and decoding function respectively. In addition, let $\gamma\geq 0$ and $\mathbf{s}=[s_{1}\ s_{2}\ \ldots\ s_{L}]\in\mathbb{R}_{+}^{L}$ such that $\sum_{\ell=1}^{L}s_{\ell}=P$ . Then, the $(\gamma,\mathbf{s})$ -modified code based on the $(n,M,P)$ -feedback code consists of the following message alphabet, encoding functions and decoding function which are denoted by $\tilde{\mathcal{W}}$ , $\{\tilde{f}_{\ell,k}|1\leq\ell\leq L,1\leq k\leq n\}$ and $\tilde{\varphi}$ respectively:

1) A message set $\tilde{\mathcal{W}}=\mathcal{W}$ at node $\mathrm{s}$ . Message $W$ is uniform on $\tilde{\mathcal{W}}$ .

2) An encoding function*

[TABLE]

for each $\ell\in\mathcal{L}$ and each $k\in\{1,2,\ldots,n\}$ , which is defined as follows. For each $w\in\mathcal{W}$ and each $\mathbf{y}^{k-1}\in\mathbb{R}^{L\times(k-1)}$ , define $\tilde{f}_{\ell,k}$ in a recursive manner in this order $\tilde{f}_{1,1},\tilde{f}_{2,1},\ldots,\tilde{f}_{L,1},\ldots,\tilde{f}_{1,n},\tilde{f}_{2,n},\ldots,\tilde{f}_{L,n}$ as follows: For each $k=1,2,\ldots,n-1$ , define $\tilde{f}_{\ell,k}$ recursively for $\ell=1,2,\ldots,L$ as

[TABLE]

It follows from (27) that

[TABLE]

and

[TABLE]

In addition, in view of (28), we define $\tilde{f}_{\ell,n}$ recursively for $\ell=1,2,\ldots,L-1$ as follows:

[TABLE]

Combining (27) and (30), we conclude that

[TABLE]

On the other hand, it follows from (29), (30), the fact ${\mathbb{P}}\left\{\sum_{\ell=1}^{L}\sum_{k=1}^{n}f_{\ell,i}(W,\boldsymbol{Y}^{i-1})^{2}\leq nP\right\}=1$ and the assumption $\sum_{\ell=1}^{L}s_{\ell}=P$ that

[TABLE]

Finally, in view of (32), we define $\tilde{f}_{L,n}$ as

[TABLE]

Combining (31), (33) and the assumption that $\sum_{\ell=1}^{L}s_{\ell}=P$ , we have

[TABLE]

3) A decoding function

[TABLE]

for providing an estimate of $W$ at node $\mathrm{d}$ . $\blacksquare$

Remark 1

The basic idea behind transforming a code in Definition 8 is simple. Suppose we are given an $(n,M,P)$ -feedback code, a $\gamma\geq 0$ and an $\mathbf{s}=[s_{1}\ s_{2}\ \ldots\ s_{L}]\in\mathbb{R}_{+}^{L}$ such that $\sum_{\ell=1}^{L}s_{\ell}=P$ . Then, the $(\gamma,\mathbf{s})$ -modified code is formed by

(i)

truncating a transmitted codeword if the power transmitted over the $\ell^{\text{th}}$ channel exceeds $n(s_{\ell}+\gamma)$ , which can be seen from (27) and the third clause of (30); 2. (ii)

boosting the power of the transmitted codeword if the power transmitted over the $\ell^{\text{th}}$ channel falls below $n(s_{\ell}-L\gamma)$ , which can be seen from the second clause of (30); 3. (iii)

adjusting the last symbol transmitted over the $L^{\text{th}}$ channel (i.e., $X_{L,n}$ ) so that the total transmitted power is exactly equal to $nP$ , which can be seen from the second clause of (33).

Given an $(n,M,P)$ -feedback code, we consider the corresponding $(\gamma,\mathbf{s})$ -modified code constructed in Definition 8 and let $\tilde{p}_{W,\boldsymbol{X}^{n},\boldsymbol{Y}^{n},\hat{W}}$ be the distribution induced by the modified code. By (34), we see that

[TABLE]

Define the $\Delta$ -bounding box

[TABLE]

for each $\gamma\geq 0$ and each $\mathbf{s}\in\mathbb{R}_{+}^{L}$ . It then follows from (35) that

[TABLE]

The following lemma is a natural consequence of Definition 8, and the proof is deferred to Appendix A.

Lemma 3

Given an $(n,M,P)$ -feedback code, let $p_{\boldsymbol{X}^{n},\boldsymbol{Y}^{n}}$ be the distribution induced by the code. Fix any $\gamma\geq 0$ and any $\mathbf{s}\in\mathbb{R}_{+}^{L}$ such that $\sum_{\ell=1}^{L}s_{\ell}=P$ , and let $\tilde{p}_{\boldsymbol{X}^{n},\boldsymbol{Y}^{n}}$ be the distribution induced by the $(\gamma,\mathbf{s})$ -modified code based on the $(n,M,P)$ -feedback code. Then, we have

[TABLE]

for all Borel measurable $\mathcal{A}\subseteq\mathbb{R}^{L\times n}\times\mathbb{R}^{L\times n}$ .

III-C Curtiss’ Theorem

Curtiss’ theorem [16, Th. 3] states that convergence of moment generating functions leads to convergence in distribution. The formal statement is reproduced below.

Theorem 2 (Curtiss’ theorem)

Let $U^{(n)}$ be a sequence of real-valued random variables. Suppose there exists a random variable $V$ such that

[TABLE]

for all $t\in\mathbb{R}$ . Then,

[TABLE]

for every $a\in\mathbb{R}$ at which $a\mapsto{\mathbb{P}}\{V\leq a\}$ is continuous.

In contrast to the more well-known Lévy’s continuity theorem [17, Sec. 18.1], (39) of Theorem 2 is required to be true for all real rather than purely imaginary $t$ .

IV Proof of Theorem 1

Fix an $\varepsilon\in(0,1)$ and choose an arbitrary sequence of $(\bar{n},M_{\mathrm{fb}}^{*}(\bar{n},\varepsilon,P),P,\varepsilon)$ -feedback codes. Since

[TABLE]

by (20), it suffices to show that

[TABLE]

for all $\tau>0$ . To this end, fix an arbitrary $\tau>0$ .

IV-A Discretizing the Power Allocation Vectors by Appending Symbols

Using Definition 1, we have

[TABLE]

for the chosen $(\bar{n},M_{\mathrm{fb}}^{*}(\bar{n},\varepsilon,P),P,\varepsilon)$ -feedback code for each $\bar{n}\in\mathbb{N}$ . Given the chosen $(\bar{n},M_{\mathrm{fb}}^{*}(\bar{n},\varepsilon,P),P,\varepsilon)$ -feedback code, we can always construct an $(\bar{n}+L,M_{\mathrm{fb}}^{*}(\bar{n},\varepsilon,P),P,\varepsilon)$ -feedback code by appending a carefully chosen tuple $(\boldsymbol{X}_{\bar{n}+1},\boldsymbol{X}_{\bar{n}+2},\ldots,\boldsymbol{X}_{\bar{n}+L})$ to each transmitted codeword $\boldsymbol{X}^{\bar{n}}$ generated by the $(\bar{n},M_{\mathrm{fb}}^{*}(\bar{n},\varepsilon,P),P,\varepsilon)$ -feedback code such that

[TABLE]

which implies that

[TABLE]

In addition, given the $(\bar{n}+L,M_{\mathrm{fb}}^{*}(\bar{n},\varepsilon,P),P,\varepsilon)$ -feedback code, we can always construct an $(\bar{n}+L+1,M_{\mathrm{fb}}^{*}(\bar{n},\varepsilon,P),\linebreak P,\varepsilon)$ -feedback code by appending a carefully chosen $\boldsymbol{X}_{\bar{n}+L+1}$ to each transmitted codeword $\boldsymbol{X}^{\bar{n}+L}$ generated by the $(\bar{n}+L,M_{\mathrm{fb}}^{*}(\bar{n},\varepsilon,P),P,\varepsilon)$ -feedback code such that

[TABLE]

To simplify notation, we let

[TABLE]

Construct the set of power allocation vectors

[TABLE]

which can be viewed as a set of quantized power allocation vectors $\mathbf{s}$ with quantization level $P/n$ that satisfy the equality power constraint

[TABLE]

It follows from (47), (45) and Definition 7 that

[TABLE]

and

[TABLE]

IV-B Obtaining a Lower Bound on the Error Probability in Terms of the Type-II Error of a Hypothesis Test

Let $p_{W,\boldsymbol{X}^{n},\boldsymbol{Y}^{n},\hat{W}}$ be the probability distribution induced by the $(n,M_{\mathrm{fb}}^{*}(\bar{n},\varepsilon,P),P,\varepsilon)$ -feedback code constructed above for each $n\in\{L+2,L+3,\ldots\}$ , where $p_{W,\boldsymbol{X}^{n},\boldsymbol{Y}^{n},\hat{W}}$ is obtained according to (19). Fix an $n\in\{L+2,L+3,\ldots\}$ and the corresponding $(n,M_{\mathrm{fb}}^{*}(\bar{n},\varepsilon,P),P,\varepsilon)$ -feedback code. Recall the definition of $P_{\ell}$ for each $\ell\in\mathcal{L}$ in (6) and define the distribution

[TABLE]

where111We note that even if we exclude the set of power types in the set $\Pi^{(n)}$ which is defined later in (58), this leads to another valid definition of $r_{\boldsymbol{Y}^{n}}(\mathbf{y}^{n})$ .

[TABLE]

The choice of $r_{\boldsymbol{Y}^{n}}$ in (51) is motivated by the choice of the auxiliary output distribution in [18, Sec. X-A] where DMCs are considered. Then, it follows from Proposition 2 and Definition 1 with the identifications $U\equiv W$ , $V\equiv\hat{W}$ , $p_{U,V}\equiv p_{W,\hat{W}}$ , $q_{V}\equiv r_{\hat{W}}$ , $|\mathcal{W}|\equiv M_{\mathrm{fb}}^{*}(\bar{n},\varepsilon,P)$ and $\alpha\equiv{\mathbb{P}}\{\hat{W}\neq W\}\leq\varepsilon$ that

[TABLE]

IV-C Obtaining a Non-Asymptotic Bound from Simplifying the Type-II Error of the Binary Hypothesis Test

Using the DPI of $\beta_{1-\varepsilon}$ by introducing $\boldsymbol{X}^{n}$ and $\boldsymbol{Y}^{n}$ , we have

[TABLE]

where

[TABLE]

by (19). Combining (53), (54) and (50), we have

[TABLE]

Fix any constant $\xi_{n}>0$ to be specified later. Using Lemma 1, (55) and (18), we have

[TABLE]

which together with (52) implies that

[TABLE]

IV-D Splitting the Probability Term into Multiple Terms Corresponding to Different Power Types of $\boldsymbol{X}^{n}$

Define222The conclusion of this proof remains unchanged if the $n^{1/6}$ term in (58) is replaced by $n^{a}$ for any $a\in(0,1/2)$ .

[TABLE]

to be the set of power allocation vectors in $\mathcal{S}^{(n)}$ that are close to the optimal power allocation vector $\mathbf{P}^{*}$ (cf. (7)). Following (57), we use (49) to obtain

[TABLE]

In order to bound the first term in (59), we let

[TABLE]

and define $p_{\boldsymbol{X}^{n},\boldsymbol{Y}^{n}}^{*}$ be the distribution induced by the $(\gamma,\mathbf{P}^{*})$ -modified code based on the $(n,M_{\mathrm{fb}}^{*}(\bar{n},\varepsilon,P),P,\varepsilon)$ -feedback code defined in Definition 8. Then, consider the following chain of inequalities:

[TABLE]

where

•

(61) is due to Lemma 3 and the fact that ${\mathbb{P}}_{p_{\boldsymbol{X}^{n},\boldsymbol{Y}^{n}}}\left\{\sum_{\ell=1}^{L}\sum_{k=1}^{n}X_{\ell,k}^{2}=nP\right\}=1$ (cf. (47) and (49)).

•

(62) is due to the definition of $r_{\boldsymbol{Y}^{n}}$ in (51).

Similarly, in order to bound the second term in (59), we let $p_{\boldsymbol{X}^{n},\boldsymbol{Y}^{n}}^{(\mathbf{s})}$ be the distribution induced by the $(0,\mathbf{s})$ -modified code and consider the following chain of inequalities for each $\mathbf{s}\in\mathcal{S}^{(n)}\setminus\Pi^{(n)}$ :

[TABLE]

where

•

(63) is due to Lemma 3.

•

(64) is due to the definition of $r_{\boldsymbol{Y}^{n}}$ in (51).

Combining (59), (62), (64) and the definition of $q_{\boldsymbol{Y}|\boldsymbol{X}}$ in (16) followed by letting

[TABLE]

for each $\ell\in\mathcal{L}$ and each $k\in\{1,2,\ldots,n\}$ , we obtain

[TABLE]

where $\mathrm{C}(\cdot)$ is as defined in (4). In order to simplify the RHS of (66), we define $\xi_{n}>0$ such that

[TABLE]

In addition, for each $\mathbf{d}\in\mathbb{R}_{+}^{L}$ , let

[TABLE]

for each $k\in\{1,2,\ldots,n\}$ . By using (66), (67) and (68) together with the facts by (37) that

[TABLE]

and

[TABLE]

for each $\mathbf{s}\in\mathcal{S}^{(n)}$ , we can express (66) as

[TABLE]

IV-E Applying Curtiss’ Theorem When $\phi(\boldsymbol{X}^{n})$ is Close to $\mathbf{P}^{}$*

In order to simplify the first term in (71), we define

[TABLE]

for each $k\in\{1,2,\ldots,n\}$ and want to show that

[TABLE]

for all $t\in\mathbb{R}$ where

[TABLE]

To this end, recall the following statements due to the channel law:

(i)

$Z_{\ell,k}\sim\mathcal{N}(z_{\ell,k};0,N_{\ell})$ for all $\ell\in\mathcal{L}$ and all $k\in\{1,2,\ldots,n\}$ ; 2. (ii)

$\{Z_{\ell,k}|\ell\in\mathcal{L},k\in\{1,2,\ldots,n\}\}$ are independent; 3. (iii)

$\boldsymbol{Z}_{k}$ and $(\boldsymbol{X}^{k},\boldsymbol{Y}^{k-1},\boldsymbol{Z}^{k-1})$ are independent for all $k\in\{1,2,\ldots,n\}$ .

For any $t\in\mathbb{R}$ and any $n\in\{L+2,L+3,\ldots\}$ such that $n\geq t^{2}$ , since

[TABLE]

by (69) and $P_{\ell}+N_{\ell}+\frac{tP_{\ell}}{\sqrt{n}}>0$ for all $\ell\in\mathcal{L}$ , we have

[TABLE]

In order to simplify the above chain of inequalities, we need the following lemma, whose proof is deferred to Appendix B because it involves straightforward calculations based on (68), (72) and the channel law.

Lemma 4

For any $\lambda\in\mathbb{R}$ , we have

[TABLE]

Lemma 4, which forms the crux of the proof of Theorem 1, is important because it establishes the equivalence in distribution between the sum $\sum_{k=1}^{n}U_{k}^{(\mathbf{P}^{*})}$ , which contains dependent random variables, and the sum $\sum_{k=1}^{n}V_{k}^{(\mathbf{P}^{*})}$ , which contains independent random variables. The former is intractable to analyze while the latter can be analyzed in a straightforward manner by invoking the central limit theorem.

Using Lemma 4, we can simplify (77) through the identification $\lambda\equiv\frac{t}{\sqrt{n}}$ and obtain

[TABLE]

Combining (80) and (60), we conclude that (73) holds for each $t\in\mathbb{R}$ . Since the moment generating functions of $\frac{1}{\sqrt{n}}\sum_{k=1}^{n}V_{k}^{(\mathbf{P}^{*})}$ and $\frac{1}{\sqrt{n}}\sum_{k=1}^{n}U_{k}^{(\mathbf{P}^{*})}$ converge to the same function, it follows from Curtiss’ theorem [16, Th. 3] (as stated in Theorem 2) that

[TABLE]

Recognizing that $\big{\{}V_{k}^{(\mathbf{P}^{*})}\big{\}}_{k=1}^{\infty}$ are independent zero-mean Gaussian random variables with variance $\mathrm{V}(\mathbf{P}^{*})$ by the definition of $V_{k}^{(\mathbf{P}^{*})}$ in (72) and the definition of $\mathrm{V}(\mathbf{P}^{*})$ in (10), we apply the central limit theorem [11] and obtain

[TABLE]

which together with (81) implies that

[TABLE]

IV-F Applying Large Deviation Bounds When $\phi(\boldsymbol{X}^{n})$ is Far from $\mathbf{P}^{}$*

In order to bound the second term in (71), we consider a fixed $n\in\{L+2,L+3,\ldots\}$ and want to show that there exists some $\kappa>0$ such that

[TABLE]

for all $\mathbf{s}\in\mathcal{S}^{(n)}$ . To this end, we first define the Lagrangian function $f:\mathbb{R}^{L}\rightarrow\mathbb{R}$ as

[TABLE]

where $\Lambda\geq 0$ is the unique number that satisfies (5) and (6) and $\mu_{\ell}\geq 0$ is defined for each $\ell\in\mathcal{L}$ as

[TABLE]

Define $N_{\text{max}}\stackrel{{\scriptstyle\text{def}}}{{=}}\max\limits_{\ell\in\mathcal{L}}N_{\ell}$ . Then for all $\mathbf{s}\in\mathcal{S}^{(n)}$ , we use Taylor’s theorem to obtain

[TABLE]

for some $\bar{\mathbf{s}}$ that lies on the line that connects $\mathbf{s}$ and $\mathbf{P}^{*}$ , where $\triangledown f(\mathbf{P}^{*})$ denotes the gradient which satisfies

[TABLE]

and $\triangledown^{2}f(\bar{\mathbf{s}})$ denotes the Hessian matrix that satisfies

[TABLE]

For the sake of completeness, the derivations of (88) and (89) are contained in Appendix C. Combining (87), (88) and (89), we have for all $\mathbf{s}\in\mathcal{S}^{(n)}$

[TABLE]

which together with the definitions of $f$ and $\mu_{\ell}$ in (85) and (86) respectively implies that

[TABLE]

Consequently, (84) holds by setting

[TABLE]

Following (71), we consider for each $\mathbf{s}\in\mathcal{S}^{(n)}\setminus\Pi^{(n)}$

[TABLE]

where

•

(94) is due to (84).

•

(95) follows from the definition of $\Pi^{(n)}$ in (58).

Following the standard approach for obtaining large deviation bounds, we apply Markov’s inequality on the RHS of (95) and obtain for each $\mathbf{s}\in\mathcal{S}^{(n)}\setminus\Pi^{(n)}$

[TABLE]

In order to bound the RHS of (96), consider the following chain of inequalities for each $\mathbf{s}\in\mathcal{S}^{(n)}\setminus\Pi^{(n)}$ :

[TABLE]

where

•

(97) follows from straightforward calculations based on the definition of $U_{k}^{(\mathbf{s})}$ in (68), the property of $p_{\boldsymbol{X}^{n},\boldsymbol{Y}^{n}}^{(\mathbf{s})}$ in (70) and the channel law, which are elaborated in Appendix D for the sake of completeness.

•

(99) is due to the fact that $(1-\frac{1}{x})^{x}\leq e^{-1}$ for all $x>1$ .

Combining (96) and (101), we have the following large deviation bound for each $\mathbf{s}\in\mathcal{S}^{(n)}\setminus\Pi^{(n)}$ :

[TABLE]

Following (71), we use (102) and (48) to obtain

[TABLE]

IV-G Combining Earlier Bounds to Obtain an Upper Bound on $\mathrm{L}_{\varepsilon}^{\text{fb}}$

Combining (57), (67), (71), (83), (103) and (48), we have

[TABLE]

for all sufficiently large $n$ , which together with (46) implies that

[TABLE]

Since $\tau>0$ is arbitrary, it follows from (105) and Definition 5 that

[TABLE]

V Concluding Remarks

V-A Novel Ingredients in Proof of Theorem 1

As mentioned in Section I-A, the proof of [8, Th. 2] which obtains upper bounds on the second-order asymptotics of DMCs with feedback cannot be generalized to the parallel Gaussian channel with feedback. Indeed, the proof of Theorem 1 follows the standard procedures for obtaining the second-order asymptotics of DMCs without feedback (see, e.g., [5, proof of Th. 50] and [19, Sec. III]) except the following three novel ingredients:

Instead of classifying transmitted codewords into polynomially many type classes based on their empirical distributions which is generally not possible for channels with continuous input alphabet, we discretize the transmitted power and classify the codewords into polynomially many type classes based on their discretized power types. In particular, the collection of power type classes $\mathcal{S}^{(n)}$ in (47) plays a key role in our analysis, and there are polynomially many power type classes by (48). The details can be found in Section IV-A in the proof. 2. 2.

Curtiss’ theorem rather than Berry-Esséen theorem is invoked to bound the information spectrum term (the first term in (71)) related to transmitted codewords whose types are close to the optimal power allocation. In particular, Berry-Esséen theorem for bounded martingale difference sequences cannot be used to bound the information spectrum term in the presence of feedback because each input symbol $X_{\ell,k}$ belongs to an interval $[-\sqrt{nP},\sqrt{nP}]$ that grows unbounded as $n$ increases. Instead, we apply Curtiss’ theorem to show that the distribution of the sum of random variables in the information spectrum term converges to some distribution generated from a sum of i.i.d. random variables (i.e., (73)), thus facilitating the use of the usual central limit theorem [11]. The details can be found in Section IV-E. 3. 3.

In order to bound the information spectrum term related to transmitted codewords whose types are far from the optimal power allocation (the second term in (71)), the usual approach is to bound the information spectrum term by an *average * of exponentially many upper bounds where each upper bound is then further simplified by invoking Chebyshev’s inequality [18, Sec. X-A]. However, due to the presence of feedback, the information spectrum term can be expressed as only a sum (instead of average) of polynomially many upper bounds as shown in the second term in (71). In order to control the sum of polynomially many upper bounds, we have to resort to large deviation bounds as shown in (102) rather than the weaker Chebyshev’s inequality. The details can be found in Section IV-F.

V-B Major Difficulties in Tightening the Third-Order Term

If the feedback link is absent, the third-order term of the optimal finite blocklenth rate $\frac{1}{n}\log M_{\text{fb}}^{*}(n,\varepsilon,P)$ is $\Theta\Big{(}\frac{\log n}{n}\Big{)}$ as shown in (9) in Section I. The third-order term can be obtained by applying Berry-Esséen theorem to bound an information spectrum term (analogous to the first term in (71)).

In the presence of feedback, Theorem 1 shows that the third-order term is $o\Big{(}\frac{1}{\sqrt{n}}\Big{)}$ . If we want to compute an explicit upper bound on the third-order term using the current proof technique, an intuitive way is to invoke a non-asymptotic version of Curtiss’ theorem that can measure the proximity between two distributions based on the proximity between their moment generating functions. However, such a non-asymptotic version of Curtiss’ theorem does not exist to the best of our knowledge, which prohibits us from strengthening the current $o\Big{(}\frac{1}{\sqrt{n}}\Big{)}$ bound on the third-order term. It is worth noting that (77) and (80) in our proof break down if the moment generating functions are replaced with characteristic functions. If one can find a way to make characteristic functions amenable to our proof approach, then a non-asymptotic version of Lévy’s continuity theorem known as Esséen’s smoothing lemma (see, e.g., [20, Th. 1.5.2]) may be invoked to tighten the third-order term herein.

V-C Future Work

The techniques presented herein may be used to analyze the fixed-error asymptotics of fixed-length codes over parallel DMCs with cost constraint and with feedback. We envision that there will be an added layer of complexity as the method of types [21] is typically used to analyze the fixed-error asymptotics of DMCs with and without cost constraint [22]. Hence, we anticipate that two applications of the method of types need to be applied — one for handling power types that specify the power allocation among the parallel channels (as was done in Section IV-A) and another for handling codewords of the same power type but of different empirical distributions (the usual types). In the present work, the latter situation is ameliorated by the fact that the maximum rate achievable by a coding scheme over an AWGN channel is completely determined by the power allocated for the coding scheme.

Appendix A Proof of Lemma 3

Proof:

Let $f_{\ell,k}$ and $\tilde{f}_{\ell,k}$ be the encoding functions of the $(n,M,P)$ -feedback code and the $(\gamma,\mathbf{s})$ -modified code respectively for each $\ell\in\mathcal{L}$ and each $k\in\{1,2,\ldots,n\}$ . For any $w\in\mathcal{W}$ and any $\mathbf{y}^{n}\in\mathbb{R}^{L\times n}$ such that

[TABLE]

and

[TABLE]

it follows from (27), (30) and (33) in Definition 8 that

[TABLE]

Since (109) holds for any $w\in\mathcal{W}$ and $\mathbf{y}^{n}\in\mathbb{R}^{L\times n}$ that satisfy (107) and (108), it follows that (38) holds for all Borel measurable $\mathcal{A}\subseteq\mathbb{R}^{L\times n}\times\mathbb{R}^{L\times n}$ . ∎

Appendix B Proof of Lemma 4

The proof for the AWGN channel case (i.e., $L=1$ ) is contained in [12, Sec. E]. For a general $L\in\mathbb{N}$ , consider the following chain of equalities for each $m=n,n-1,\ldots,1$ :

[TABLE]

where

•

(111) is due to the fact that $\boldsymbol{Z}_{m}$ and $(\boldsymbol{X}^{m},\boldsymbol{Z}^{m-1})$ are independent.

•

(112) is due to the definition of $U_{k}^{(\mathbf{P}^{*})}$ in (68).

Applying (112) recursively from $m=n$ to $m=1$ , we have

[TABLE]

On the other hand, straightforward calculations based on the definition of $V_{k}^{(\mathbf{P}^{*})}$ in (72) and the fact that $\{\boldsymbol{Z}_{k}\}_{k=1}^{n}$ are independent implies that

[TABLE]

Combining (113) and (114), we obtain (78).

Appendix C Derivations of (88) and (89)

Straightforward calculations based on (85) reveal that for all $\mathbf{s}\in\mathbb{R}_{+}^{L}$ , we obtain that

[TABLE]

and $\triangledown^{2}f(\mathbf{s})$ is a diagonal matrix that satisfies

[TABLE]

Combining (115), (5), (6) and (86), we have $\triangledown f(\mathbf{P}^{*})=0$ . In addition, for any $\mathbf{s}$ such that $\sum_{\ell=1}^{L}s_{\ell}\leq P$ , it follows from (116) that $N_{\ell}+s_{\ell}\leq N_{\max}+P$ for all $\ell\in\mathcal{L}$ , which then implies that (89) holds for all $\mathbf{s}\in\mathcal{S}^{(n)}$ .

Appendix D Derivation of (97)

Let $t=\frac{1}{\sqrt{n}}$ . Fix any $\mathbf{s}\in\mathcal{S}^{(n)}\setminus\Pi^{(n)}$ . Due to (70), it suffices to show that

[TABLE]

Replacing $\mathbf{P}^{*}$ with $\mathbf{s}$ in the steps leading to (112) and (113), we obtain (117).

Acknowledgments

The authors would like to thank the Associate Editor Prof. Shun Watanabe and the two anonymous reviewers for the useful comments that improve the presentation of this paper.

Bibliography22

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] T. M. Cover and J. A. Thomas, Elements of Information Theory , 2nd ed. Hoboken, NJ: John Wiley and Sons, 2006.
2[2] A. El Gamal and Y.-H. Kim, Network Information Theory . Cambridge, U.K.: Cambridge University Press, 2012.
3[3] D. Tse and P. Viswanath, Fundamentals of Wireless Communication . Cambridge, U.K.: Cambridge University Press, 2005.
4[4] C. E. Shannon, “Communication in the presence of noise,” Proceedings of IRE , vol. 37, no. 1, pp. 10–21, 1949.
5[5] Y. Polyanskiy, “Channel coding: Non-asymptotic fundamental limits,” Ph.D. dissertation, Princeton University, 2010.
6[6] V. Y. F. Tan and M. Tomamichel, “The third-order term in the normal approximation for the AWGN channel,” IEEE Trans. Inf. Theory , vol. 61, no. 5, pp. 2430–2438, 2015.
7[7] C. E. Shannon, “The zero error capacity of a noisy channel,” IRE Transactions on Information Theory , vol. 2, no. 3, pp. 8–19, 1956.
8[8] Y. Altuğ and A. B. Wagner, “Feedback can improve the second-order coding performance in discrete memoryless channels,” in Proc. IEEE Intl. Symp. Inf. Theory , Honolulu, HI, Jul 2014, pp. 2361–2365.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A Tight Upper Bound on the Second-Order Coding Rate of the Parallel Gaussian Channel with Feedback

Abstract

Index Terms:

I Introduction

I-A Related Work

I-B Paper Outline

I-C Notation

II Parallel Gaussian Channel with Feedback

Definition 1

Definition 2

Definition 3

Definition 4

Definition 5

Theorem 1

III Preliminaries for the Proof of Theorem 1

III-A Binary Hypothesis Testing

Definition 6

Lemma 1

Proposition 2

III-B Modification of Power Allocation among the Parallel Channels

Definition 7

Definition 8

Remark 1

Lemma 3

III-C Curtiss’ Theorem

Theorem 2** (Curtiss’ theorem)**

IV Proof of Theorem 1

IV-A Discretizing the Power Allocation Vectors by Appending Symbols

IV-B Obtaining a Lower Bound on the Error Probability in Terms of the Type-II Error of a Hypothesis Test

IV-C Obtaining a Non-Asymptotic Bound from Simplifying the Type-II Error of the Binary Hypothesis Test

IV-D Splitting the Probability Term into Multiple Terms Corresponding to Different Power Types of Xn\boldsymbol{X}^{n}Xn

IV-E Applying Curtiss’ Theorem When ϕ(Xn)\phi(\boldsymbol{X}^{n})ϕ(Xn) is Close to P∗\mathbf{P}^{*}P∗

Lemma 4

IV-F Applying Large Deviation Bounds When ϕ(Xn)\phi(\boldsymbol{X}^{n})ϕ(Xn) is Far from P∗\mathbf{P}^{*}P∗

IV-G Combining Earlier Bounds to Obtain an Upper Bound on Lεfb\mathrm{L}_{\varepsilon}^{\text{fb}}Lεfb​

V Concluding Remarks

V-A Novel Ingredients in Proof of Theorem 1

V-B Major Difficulties in Tightening the Third-Order Term

V-C Future Work

Appendix A Proof of Lemma 3

Proof:

Appendix B Proof of Lemma 4

Appendix C Derivations of (88) and (89)

Appendix D Derivation of (97)

Acknowledgments

Theorem 2 (Curtiss’ theorem)

IV-D Splitting the Probability Term into Multiple Terms Corresponding to Different Power Types of $\boldsymbol{X}^{n}$

IV-E Applying Curtiss’ Theorem When $\phi(\boldsymbol{X}^{n})$ is Close to $\mathbf{P}^{}$*

IV-F Applying Large Deviation Bounds When $\phi(\boldsymbol{X}^{n})$ is Far from $\mathbf{P}^{}$*

IV-G Combining Earlier Bounds to Obtain an Upper Bound on $\mathrm{L}_{\varepsilon}^{\text{fb}}$