Feedback Capacity of Stationary Gaussian Channels Further Examined

Tao Liu; Guangyue Han

arXiv:1702.03059·cs.IT·January 10, 2018

Feedback Capacity of Stationary Gaussian Channels Further Examined

Tao Liu, Guangyue Han

PDF

Open Access

TL;DR

This paper investigates the feedback capacity of stationary Gaussian channels, proving the uniqueness of optimal solutions for non-white noise and providing algorithms and explicit formulas for calculating feedback capacity in autoregressive moving-average noise models.

Contribution

It establishes the uniqueness of the optimal solution for the feedback capacity problem when noise is not white and introduces an efficient recursive algorithm for its computation.

Findings

01

Optimal solution is unique for non-white Gaussian noise.

02

Proposed recursive algorithm converges and is computationally efficient.

03

Explicit formulas for feedback capacity in ARMA noise models for k=1,2 cases.

Abstract

It is well known that the problem of computing the feedback capacity of a stationary Gaussian channel can be recast as an infinite-dimensional optimization problem; moreover, necessary and sufficient conditions for the optimality of a solution to this optimization problem have been characterized, and based on this characterization, an explicit formula for the feedback capacity has been given for the case that the noise is a first-order autoregressive moving-average Gaussian process. In this paper, we further examine the above-mentioned infinite-dimensional optimization problem. We prove that unless the Gaussian noise is white, its optimal solution is unique, and we propose an algorithm to recursively compute the unique optimal solution, which is guaranteed to converge in theory and features an efficient implementation for a suboptimal solution in practice. Furthermore, for the case that…

Equations379

Y_{i} = X_{i} (M, Y_{1}^{i - 1}) + Z_{i}, i = 1, 2, \dots

Y_{i} = X_{i} (M, Y_{1}^{i - 1}) + Z_{i}, i = 1, 2, \dots

\frac{1}{n} i = 1 \sum n E [(X_{i} (M, Y_{1}^{i - 1}))^{2}] \leq P .

\frac{1}{n} i = 1 \sum n E [(X_{i} (M, Y_{1}^{i - 1}))^{2}] \leq P .

C_{F B, n} = \mbox t r (K_{X, n}) \leq n P max \frac{1}{2 n} lo g \frac{det ( K _{Y, n} )}{det ( K _{Z, n} )},

C_{F B, n} = \mbox t r (K_{X, n}) \leq n P max \frac{1}{2 n} lo g \frac{det ( K _{Y, n} )}{det ( K _{Z, n} )},

C_{F B, n} = B_{n}, K_{V, n} max \frac{1}{2 n} lo g \frac{det (( B _{n} + I ) K _{Z, n} ( B _{n} + I ) ^{T} + K _{V, n} )}{det ( K _{Z, n} )},

C_{F B, n} = B_{n}, K_{V, n} max \frac{1}{2 n} lo g \frac{det (( B _{n} + I ) K _{Z, n} ( B _{n} + I ) ^{T} + K _{V, n} )}{det ( K _{Z, n} )},

\mbox t r (B_{n} K_{Z, n} B_{n}^{T} + K_{V, n}) \leq n P,

\mbox t r (B_{n} K_{Z, n} B_{n}^{T} + K_{V, n}) \leq n P,

C_{F B} = n \to \infty lim C_{F B, n} .

C_{F B} = n \to \infty lim C_{F B, n} .

C_{F B} = B max \frac{1}{2} \int_{- π}^{π} lo g ∣1 + B (e^{i θ}) ∣^{2} S_{Z} (e^{i θ}) \frac{d θ}{2 π},

C_{F B} = B max \frac{1}{2} \int_{- π}^{π} lo g ∣1 + B (e^{i θ}) ∣^{2} S_{Z} (e^{i θ}) \frac{d θ}{2 π},

\int_{- π}^{π} ∣ B (e^{i θ}) ∣^{2} S_{Z} (e^{i θ}) \frac{d θ}{2 π} \leq P .

\int_{- π}^{π} ∣ B (e^{i θ}) ∣^{2} S_{Z} (e^{i θ}) \frac{d θ}{2 π} \leq P .

\int_{- π}^{π} ∣ B^{⋆} (e^{i θ}) ∣^{2} S_{Z} (e^{i θ}) \frac{d θ}{2 π} = P;

\int_{- π}^{π} ∣ B^{⋆} (e^{i θ}) ∣^{2} S_{Z} (e^{i θ}) \frac{d θ}{2 π} = P;

η := θ \in [- π, π) ess in f ∣1 + B^{⋆} (e^{i θ}) ∣^{2} S_{Z} (e^{i θ}) > 0;

η := θ \in [- π, π) ess in f ∣1 + B^{⋆} (e^{i θ}) ∣^{2} S_{Z} (e^{i θ}) > 0;

\frac{λ}{1 + B ^{⋆} ( e ^{i θ} )} - B^{⋆} (e^{- i θ}) S_{Z} (e^{i θ})

\frac{λ}{1 + B ^{⋆} ( e ^{i θ} )} - B^{⋆} (e^{- i θ}) S_{Z} (e^{i θ})

S_{Z} (e^{i θ}) = ∣ H_{Z} (e^{i θ}) ∣^{2} = \frac{P ( e ^{i θ} )}{Q ( e ^{i θ} )}^{2} = \frac{\prod _{i = 1}^{k} ( 1 + α _{i} e ^{i θ} )}{\prod _{i = 1}^{k} ( 1 + β _{i} e ^{i θ} )}^{2} .

S_{Z} (e^{i θ}) = ∣ H_{Z} (e^{i θ}) ∣^{2} = \frac{P ( e ^{i θ} )}{Q ( e ^{i θ} )}^{2} = \frac{\prod _{i = 1}^{k} ( 1 + α _{i} e ^{i θ} )}{\prod _{i = 1}^{k} ( 1 + β _{i} e ^{i θ} )}^{2} .

B (e^{i θ}) = b (e^{i θ}) \frac{R ( e ^{i θ} )}{P ( e ^{i θ} )} - 1,

B (e^{i θ}) = b (e^{i θ}) \frac{R ( e ^{i θ} )}{P ( e ^{i θ} )} - 1,

b (z) = \frac{A ( z )}{A ^{#} ( z )} = \frac{\prod _{n} ( 1 - γ _{n}^{- 1} z )}{\prod _{n} ( 1 - γ _{n} z )}

b (z) = \frac{A ( z )}{A ^{#} ( z )} = \frac{\prod _{n} ( 1 - γ _{n}^{- 1} z )}{\prod _{n} ( 1 - γ _{n} z )}

\int_{- π}^{π} ∣ B^{⋆} (e^{i θ}) ∣^{2} S_{Z} (e^{i θ}) \frac{d θ}{2 π} = P;

\int_{- π}^{π} ∣ B^{⋆} (e^{i θ}) ∣^{2} S_{Z} (e^{i θ}) \frac{d θ}{2 π} = P;

0 < S_{Y}^{⋆} (γ_{n}) = λ \leq θ \in [- π, π) min S_{Y}^{⋆} (e^{i θ});

0 < S_{Y}^{⋆} (γ_{n}) = λ \leq θ \in [- π, π) min S_{Y}^{⋆} (e^{i θ});

P (z) A^{#} (z) - R (z) A (z)

P (z) A^{#} (z) - R (z) A (z)

C_{F B} = - \frac{1}{2} lo g x^{2},

C_{F B} = - \frac{1}{2} lo g x^{2},

P x^{2} = \frac{( 1 - x ^{2} ) ( 1 + α x ) ^{2}}{( 1 + β x ) ^{2}},

P x^{2} = \frac{( 1 - x ^{2} ) ( 1 + α x ) ^{2}}{( 1 + β x ) ^{2}},

x \in {(- 1, 0) (0, 1) \mbox i f α \geq β, \mbox i f α < β .

x \in {(- 1, 0) (0, 1) \mbox i f α \geq β, \mbox i f α < β .

C (e^{i θ}) ≜ B (e^{i θ}) H_{Z} (e^{i θ});

C (e^{i θ}) ≜ B (e^{i θ}) H_{Z} (e^{i θ});

S_{Y} (e^{i θ}) = ∣ C (e^{i θ}) + H (e^{i θ}) ∣^{2},

S_{Y} (e^{i θ}) = ∣ C (e^{i θ}) + H (e^{i θ}) ∣^{2},

C_{F B} = C max \frac{1}{2} \int_{- π}^{π} lo g ∣ C (e^{i θ}) + H (e^{i θ}) ∣^{2} \frac{d θ}{2 π},

C_{F B} = C max \frac{1}{2} \int_{- π}^{π} lo g ∣ C (e^{i θ}) + H (e^{i θ}) ∣^{2} \frac{d θ}{2 π},

\int_{- π}^{π} ∣ C (e^{i θ}) ∣^{2} \frac{d θ}{2 π} \leq P .

\int_{- π}^{π} ∣ C (e^{i θ}) ∣^{2} \frac{d θ}{2 π} \leq P .

\int_{- π}^{π} ∣ C^{⋆} (e^{i θ}) ∣^{2} \frac{d θ}{2 π} = P;

\int_{- π}^{π} ∣ C^{⋆} (e^{i θ}) ∣^{2} \frac{d θ}{2 π} = P;

η := θ \in [- π, π) ess in f ∣ C^{⋆} (e^{i θ}) + H (e^{i θ}) ∣^{2} > 0;

η := θ \in [- π, π) ess in f ∣ C^{⋆} (e^{i θ}) + H (e^{i θ}) ∣^{2} > 0;

\frac{λ}{C ^{⋆} ( e ^{i θ} ) + H ( e ^{i θ} )} - C^{⋆} (e^{- i θ})

\frac{λ}{C ^{⋆} ( e ^{i θ} ) + H ( e ^{i θ} )} - C^{⋆} (e^{- i θ})

D = {z \in C : ∣ z ∣ < 1},

D = {z \in C : ∣ z ∣ < 1},

\partial D = {z \in C : ∣ z ∣ = 1}, \overline{D} = {z \in C : ∣ z ∣ \leq 1} .

\partial D = {z \in C : ∣ z ∣ = 1}, \overline{D} = {z \in C : ∣ z ∣ \leq 1} .

\oint_{\partial D} \frac{f ( z )}{( z - z _{0} ) ^{n + 1}} \frac{d z}{2 π i} = \frac{f ^{(n)} ( z _{0} )}{n !},

\oint_{\partial D} \frac{f ( z )}{( z - z _{0} ) ^{n + 1}} \frac{d z}{2 π i} = \frac{f ^{(n)} ( z _{0} )}{n !},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPolynomial and algebraic computation · Markov Chains and Monte Carlo Methods · Advanced Optimization Algorithms Research

Full text

Feedback Capacity of Stationary Gaussian Channels Further Examined 111Results in this paper have been partially presented in the 2017 IEEE ISIT [14].

Tao Liu Guangyue Han

The University of Hong Kong The University of Hong Kong

email: [email protected] email: [email protected]

Abstract

It is well known that the problem of computing the feedback capacity of a stationary Gaussian channel can be recast as an infinite-dimensional optimization problem; moreover, necessary and sufficient conditions for the optimality of a solution to this optimization problem have been characterized, and based on this characterization, an explicit formula for the feedback capacity has been given for the case that the noise is a first-order autoregressive moving-average Gaussian process. In this paper, we further examine the above-mentioned infinite-dimensional optimization problem. We prove that unless the Gaussian noise is white, its optimal solution is unique, and we propose an algorithm to recursively compute the unique optimal solution, which is guaranteed to converge in theory and features an efficient implementation for a suboptimal solution in practice. Furthermore, for the case that the noise is a $k$ -th order autoregressive moving-average Gaussian process, we give a relatively more explicit formula for the feedback capacity; more specifically, the feedback capacity is expressed as a simple function evaluated at a solution to a system of polynomial equations, which is amenable to numerical computation for the cases $k=1,2$ and possibly beyond.

1 Introduction

We consider the following additive Gaussian channel with feedback

[TABLE]

where $M$ denotes the message to be communicated through the channel, the noise $\{Z_{i}\}$ , which is independent of $M$ , is a zero mean stationary Gaussian process, and $X_{i}$ , the channel input at time $i$ , may depend on $M$ and previous channel outputs $Y_{1}^{i-1}$ . And we assume the channel input $\{X_{i}\}$ satisfies the following average power constraint: there is $P>0$ such that for all $n$ ,

[TABLE]

Let $C_{FB}$ denote the capacity of the channel (1), which is often referred to as Gaussian feedback capacity in the literature.

It is well known that the non-feedback capacity of (1) can be obtained through the power spectral density (PSD) water-filling method [22]. As a matter of fact, when the channel noise is white (i.e., $\{Z_{i}\}$ is i.i.d.), Shannon [23] showed that feedback does not increase capacity, which means, like its non-feedback counterpart, the feedback capacity features an explicit and simple formula (Here we note that in [8], [9], Kadota, Zakai and Ziv also proved this statement for continuous-time white Gaussian channels). On the other hand though, if the channel is not white, feedback may increase capacity (see [15], [16]), and little has been known about its feedback capacity despite a number of papers [4], [17], [6], [3] relating the two capacities. Computing $C_{FB}$ has been a long-standing open problem that is of fundamental importance in information theory.

An prominent approach to tackle Gaussian feedback capacity can be found in a pioneering work [3], where Cover and Pombra characterized the capacity through the sequence of the so-called “ $n$ -block feedback capacity”:

[TABLE]

where $K_{X,n}$ , $K_{Y,n}$ , $K_{Z,n}$ stand for the covariance matrices of $X^{n}$ , $Y^{n}$ and $Z^{n}$ , respectively. It is also shown that the maximization can be taken over $X^{n}$ of the special form $X^{n}=B_{n}Z^{n}+V^{n}$ , where $B_{n}$ is a strictly lower-triangular $n\times n$ matrix and the Gaussian vector $V^{n}$ is independent of $Z^{n}$ . So, (2) can be rewriten as

[TABLE]

subject to the constraint

[TABLE]

where $K_{V,n}$ is a negative semi-definite $n\times n$ matrix. Then, using the asymptotic equipartition property for arbitrary (non-stationary non-ergodic) Gaussian processes, a coding theorem can then be proved to characterize the Gaussian feedback capacity as the limiting expression below:

[TABLE]

Though considerable efforts have been devoted to follow up the Cover-Pombra formulation, a “computable” formula for the Gaussian feedback capacity does not seem to be within sight: it is already difficult to find the sequence of the optimal $\{B_{n},K_{V,n}\}$ acheiving $\{C_{FB,n}\}$ , and its limiting behavior seems to be as evasive.

Another prominent approach came along in a recent work of Kim [11], which led to a number of breakthroughs deepening our understanding of Gaussian feedback capacity. Roughly speaking, instead of examining the channel (1) over a finite time window, Kim justifies certain interchanges between limits and integrals when evaluating (3) and (4) and recast the problem of computing $C_{FB}$ as an infinite-dimensional optimization problem. Below, we state one of the theorems in [11] that is relevant to our results.

Theorem 1.1 (Theorem $4.1$ of [11]).

Suppose that the power spectral density $S_{Z}(e^{i\theta})$ of the Gaussian noise process $\{Z_{i}\}_{i=1}^{\infty}$ is bounded away from 0, and has a canonical spectral factorization $S_{Z}(e^{i\theta})=|H_{Z}(e^{i\theta})|^{2}$ , where $H_{Z}(e^{i\theta})\in\mathcal{H}_{2}$ . Then the feedback capacity $C_{FB}$ is given by

[TABLE]

where the maximum is taken over all strictly causal $B(e^{i\theta})$ satisfying the power constraint

[TABLE]

Furthermore, a filter $B^{\star}(e^{i\theta})$ attains the maximum in (5) if and only if

i)

Power:

[TABLE] 2. ii)

Output spectrum:

[TABLE] 3. iii)

Strong orthogonality: For some $0<\lambda\leq\eta$

[TABLE]

is causal.

Using Theorem 1.1 and relevant tools from the theory of Hardy spaces, Kim further characterized the capacity achieving $B(e^{i\theta})$ for the special case that $\{Z_{i}\}$ is a $k$ -th order autoregressive moving-average (ARMA( $k$ )) Gaussian process. Roughly speaking, the following theorem says that the optimal $B$ must be rational satisfying three conditions corresponding to those in Theorem 1.1.

Theorem 1.2 (Proposition $5.1$ of [11]).

Suppose the noise $\{Z_{i}\}$ is not white and is an ARMA( $k$ ) Gaussian process with parameters $\alpha_{i},\beta_{i}$ , $|\alpha_{i}|<1$ , $|\beta_{i}|<1$ for all $i=1,2,\dots,k$ , namely, it has the power spectral density

[TABLE]

Then the feedback capacity $C_{FB}$ in (5) is necessarily achieved by a filter $B$ of the form

[TABLE]

where $R(z)$ is a stable polynomial whose degree is at most $k$ , and

[TABLE]

is a normalized Blaschke product of at most $k$ zeros. Furthermore, a filter $B^{\star}(e^{i\theta})$ of the form (8) is optimal if and only if the following hold:

i)

Power:

[TABLE] 2. ii)

Output spectrum: For all zeros $\gamma_{n}$ of b(z)

[TABLE] 3. iii)

Factorization:

[TABLE]

has a factor $Q(z)$ .

When applied to the case $k=1$ , Theorem 1.2 readily yields a rather tractable expression for the capacity achieving $B$ and gives a simple and explicit formula for $C_{FB}$ , as detailed in the following theorem.

Theorem 1.3 (Theorem $5.3$ in [11]).

Suppose the noise process $\{Z_{i}\}$ is an ARMA( $1$ ) Gaussian process with parameters $\alpha$ and $\beta$ , $|\alpha|<1$ , $|\beta|<1$ . Then, the Gaussian feedback capacity is given by

[TABLE]

where $x$ is the unique root of the following fourth-order polynomial

[TABLE]

satisfying

[TABLE]

We now digress a bit to briefly mention related results on the ARMA( $1$ ) Gaussian feedback capacity in the literature: Generalizing the celebrated Schalkwijk-Kailath scheme [20], [21], Butman [2] obtained a lower bound of the feedback capacity of AR( $1$ ) channel (a special ARMA( $1$ ) channel with $\alpha=0$ ). Butman’s bound was shown to be optimal under some cases of linear feedback schemes by Wolfowitz [27] and Tiernan [25]. Tiernan and Schalkwijk [26] also found an upper bound of AR( $1$ ) Gaussian channel capacity, which is equal to Butman’s lower bound at very low and very high signal-to-noise ratio. It was shown [10] that Butman’s lower bound is indeed the capacity, and the capacity of MA( $1$ ) channel (a special ARMA( $1$ ) channel with $\beta=0$ ) was also derived in the same paper. More recently, Yang, Kavčić and Tatikonda [28] studied the ARMA( $k$ ) Gaussian channel by analyzing the structure of the optimal input distribution and reformulating the problem as a stochastic control optimization problem. And based on a speculation of the limiting behavior of the optimal input distribution, they derived the formula (9) and conjectured that it gives the ARMA( $1$ ) Gaussian feedback capacity.

As mentioned above, the power of the variational formulation as in Theorem 1.1 has been showcased in Theorem 1.3, where the conjecture of [28] has been confirmed and the ARMA( $1$ ) Gaussian feedback capacity is given as an explicit and simple formula. To the best of our knowledge, the ARMA( $1$ ) Gaussian feedback channel is the only non-trivial scenario whose Gaussian feedback capacity is “explicit”. The success by the variational formulation approach, contrasted by all the above-mentioned other approaches that have been struggling dealing with special cases of an ARMA( $1$ ) channel, naturally posed the question of whether it can be extended to deal with more general channels, for instance, ARMA( $k$ ) Gaussian feedback channels. Attempts in this direction, however, have somehow encountered certain technical barriers, due to the fact that the form in (8) is “less manageable” (see Page $78$ in [11]). As a matter of fact, instead of following the variational formulation framework, an alternative state-space representation approach has been proposed in [11] to deal with the ARMA( $k$ ) Gaussian feedback capacity, only to yield an intractable optimization problem (see Theorem $6.1$ in [11]). Here we remark that prior to [11], a result of similar nature has also been derived in Theorem $6$ of [28], which however appears to be equally intractable.

In this paper, we will position ourselves within Kim’s framework [11] and further examine feedback capacity of a stationary Gaussian channel as in (1). Our starting point is precisely Theorem 1.1, but instead of considering the filter $B(e^{i\theta})$ , we use the method of “change of variables” and consider

[TABLE]

here we note that since $B(e^{i\theta})$ is strictly causal and $H_{Z}(e^{i\theta})\in\mathcal{H}_{2}$ , it is obvious that $C(e^{i\theta})$ is also strictly causal, and thereby can be written as $C(e^{i\theta})=\sum_{k=1}^{\infty}c_{k}e^{ik\theta}$ for some $c_{1},c_{2},\dots\in\mathbb{R}$ . Apparently, (12) can be used to reformulate other quantities, such as the PSD of the channel output

[TABLE]

and eventually reformulate Theorem 1.1 as follows:

Theorem 1.4 (Theorem $4.1$ of [11] reformulated).

Suppose that the power spectral density $S_{Z}(e^{i\theta})$ of the Gaussian noise process $\{Z_{i}\}_{i=1}^{\infty}$ is bounded away from 0, and has a canonical spectral factorization $S_{Z}(e^{i\theta})=|H_{Z}(e^{i\theta})|^{2}$ , where $H_{Z}(e^{i\theta})\in\mathcal{H}_{2}$ . Then the feedback capacity $C_{FB}$ is given by

[TABLE]

where the maximum is taken over all strictly causal $C(e^{i\theta})$ satisfying the power constraint

[TABLE]

Furthermore, a $C^{\star}(e^{i\theta})$ attains the maximum in (14) if and only if

i)

Power:

[TABLE] 2. ii)

Output spectrum:

[TABLE] 3. iii)

Strong orthogonality: For some $0<\lambda\leq\eta$

[TABLE]

is causal.

The remainder of the paper is organized as follows. In Section 2, we review relevant results from complex analysis and the theory of Hardy spaces as mathematical preliminaries that will be used in our proofs. Section 3 contains the main results of this paper, which can roughly summarized below:

•

We prove in Section 3.1 that unless the noise $\{Z_{n}\}$ is white, the optimal solution to the optimization problem (14) is unique; see Theorem 3.2.

•

In Section 3.2, we propose an algorithm to recursively compute the optimal solution, which is guaranteed to converge to the unique optimal solution in theory and features an efficient implementation for a suboptimal solution in practice; see Algorithm 3.5.

•

In Section 3.3, we will establish Theorem 3.9, a “more manageable” version of Theorem 1.2 and a natural extension to Theorem 1.3 combined, and derive a relatively more explicit formula for the ARMA( $k$ ) Gaussian feedback capacity as a simple function evaluated at a solution to a system of equations, which is amenable to numerical computation for the cases $k=1,2$ and possibly beyond.

Several examples are given in Section 4. More specifically, Example 4.1 details the fact that Theorem 3.9 naturally extends Theorem 1.3, and Example 4.2 use Theorem 3.9 to numerically compute the feedback capacity of ARMA( $k$ ) Gaussian channels. Focusing on the application of Algorithm 3.5 to ARMA( $k$ ) Gaussian channels, we discuss its efficient implementation and numerically compute lower bounds on the feedback capacity of ARMA( $3$ ) Gaussian channels.

2 Mathematical Preliminaries

In this section, we review a number of important theorems in complex analysis and the theory of Hardy spaces, which will be used in our proofs and may not be stated in the most general form.

Let $\mathbb{D}$ denote the open unit disk on the complex plane $\mathbb{C}$ , that is,

[TABLE]

and let $\partial\mathbb{D}$ and $\overline{\mathbb{D}}$ denote its boundary and closure, respectively, that is,

[TABLE]

We first review two fundamental theorems in complex analysis, which are relatively better-known yet still included for self-containedness.

The following theorem gives the classical Cauchy’s integral formula for an analytic function on $\overline{\mathbb{D}}$ .

Theorem 2.1 (Cauchy’s integral formula).

Let $U$ be an open subset of the complex plane $\mathbb{C}$ which contains $\overline{\mathbb{D}}$ , and let $f:U\to\mathbb{C}$ be an analytic function. Then for any $n\geq 0$ and any $z_{0}\in\mathbb{D}$ , we have

[TABLE]

where the contour integral is taken counter-clockwise, and the superscript $(n)$ denotes the $n$ -th order complex derivative.

The Cauchy integral formula can be used to establish the following Jensen’s formula.

Theorem 2.2 (Jensen’s formula).

Let $U$ be an open subset of the complex plane $\mathbb{C}$ which contains $\overline{\mathbb{D}}$ . Let $f:U\to\mathbb{C}$ be an analytic function, and let $z_{1},z_{2},\dots,z_{n}$ denote the zeros of $f$ in $\mathbb{D}$ repeated according to multiplicity. Suppose that $f(0)\neq 0$ . Then, we have

[TABLE]

Next, we will review some basic notions, terminology and needed results from the theory of Hardy spaces.

Let $1\leq p<\infty$ and let $f(z)$ be an analytic function on $\mathbb{D}$ . The function $f(z)$ is said to be of class $\mathcal{H}_{p}=\mathcal{H}_{p}(\mathbb{D})$ if

[TABLE]

It is well known that by taking the pointwise radial limit, any $f(z)\in\mathcal{H}_{p}$ can be extended to a function $f(e^{i\theta})\in\mathcal{L}_{p}=\mathcal{L}_{p}(\partial\mathbb{D})$ , where

[TABLE]

When there is no risk of confusion, we will follow the usual convention and identify $f(z)$ and $f(e^{i\theta})$ , which we may oftentimes simply denote by $f$ . Then, $\mathcal{H}_{p}$ can be viewed as a closed vector subspace of $\mathcal{L}_{p}$ .

For any $f\in\mathcal{H}_{p}$ , we say that $f$ is causal (or strictly causal) if its Fourier coefficients $c_{n}$ is equal to [math] for all $n<0$ (or $n\leq 0$ ), where

[TABLE]

It is well known that $\mathcal{H}_{p}$ is precisely the subset of causal functions in $\mathcal{L}_{p}$ . For a quick example, we note that $\mathcal{H}_{2}$ , represented by infinite sequences indexed by $\mathbb{N}\cup\{0\}$ as

[TABLE]

sits naturally inside the space $\mathcal{L}_{2}$ , which can be represented by bi-infinite sequences indexed by $\mathbb{Z}$ as

[TABLE]

Now, we recall the inner-outer decomposition theorem in the theory of Hardy spaces.

Theorem 2.3 (Theorem 2.8 in [5]).

Every function $f(z)\not\equiv 0$ in $\mathcal{H}_{p}$ has a unique factorization of the form $f(z)=B(z)S(z)F(z)$ , where

•

$B(z)$ * is a Blaschke product taking the following form:*

[TABLE]

where $m$ is a nonnegative integer and $\{z_{n}\}$ is the set of all the zeros of $f(z)$ in $\mathbb{D}$ ,

•

$S(z)$ * is a singular inner function, which can be represented by the following Poisson-Stieltjes integral:*

[TABLE]

where $\mu(\theta)$ is a bounded nondecreasing singular function with $\mu^{\prime}(t)=0$ a.e.,

•

$F(z)$ * is an outer function taking the following form:*

[TABLE]

where $\gamma$ is a real constant.

Remark 2.4.

Note that it can be shown that $B(z)$ as in (19) is analytic on $\mathbb{D}$ with the same set of zeros as $f(z)$ , and $S(z)$ and $F(z)$ are also analytic without any zeros in $\mathbb{D}$ . Furthermore, it is well known (see, e.g., Page $84$ of [12]) that $S(z)\equiv 1$ if and only if

[TABLE]

Roughly speaking, the following theorem says that a function in $\mathcal{H}_{p}$ is uniquely determined by its boundary values on any set of positive measure.

Theorem 2.5 (Theorem $2.2$ in [5]).

Let $f(e^{i\theta})\in\mathcal{H}_{p}$ be not identically [math]. Then $\{e^{i\theta}|f(e^{i\theta})=0\}$ has measure [math] (with respect to the Lebesgue measure on $\partial\mathbb{D}$ ). Furthermore, if $f(e^{i\theta}),g(e^{i\theta})\in\mathcal{H}_{p}$ and $f(e^{i\theta})=g(e^{i\theta})$ for all $\theta$ in a positive measure subset $T\subset[-\pi,\pi)$ , then $f(e^{i\theta})=g(e^{i\theta})$ almost everywhere.

3 Main Results

3.1 Uniqueness of Optimal $C(e^{i\theta})$

Recall that $C(e^{i\theta})$ is defined as in (12), and we say $C(e^{i\theta})$ is an optimal solution if it solve the optimization problem (14), namely, it satisfies (15) and achieves the maximum in (14). In this section, we will establish the uniqueness of optimal $C(e^{i\theta})$ .

We will first need the following lemma.

Lemma 3.1.

Let $C^{\star}(e^{i\theta})$ be an optimal solution to (14). Then, for any $C(e^{i\theta})$ satisfying (15), we have

[TABLE]

Proof.

Note that

[TABLE]

where in deriving (a) we have used the easily verifiable fact that

[TABLE]

Moreover, by (18), we have for almost all $\theta$ ,

[TABLE]

and

[TABLE]

It then follows that for any $C(e^{i\theta})$ satisfying (15),

[TABLE]

where we have used (16) in deriving (b). ∎

The following theorem first shows that all optimal $C(e^{i\theta})$ give rise to the same $S_{Y}(e^{i\theta})$ , the corresponding channel output PSD, and then establishes the uniqueness of optimal $C(e^{i\theta})$ when the channel noise is not white.

Theorem 3.2.

a) For any two optimal $C^{\star}(e^{i\theta})$ and $C^{\star\star}(e^{i\theta})$ , we have, almost everywhere,

[TABLE]

b) Suppose that $\{Z_{n}\}$ is not white, that is, $S_{Z}(e^{i\theta})$ is not a constant function. Then, for any two optimal $C^{\star}(e^{i\theta})$ and $C^{\star\star}(e^{i\theta})$ , we have, almost everywhere,

[TABLE]

Proof.

a) Using the well-known fact that for any $x>0$ ,

[TABLE]

we deduce that for all $\theta$ ,

[TABLE]

and thereby

[TABLE]

where (a) follows from Lemma 3.1 and (b) follows from the fact the optimal solutions $C^{\star}(e^{i\theta})$ and $C^{\star\star}(e^{i\theta})$ give rise to the same optimal value. It then follows that the first inequality in (24) is in fact an equality, or equivalently,

[TABLE]

which, together with (23), immediately implies that almost everywhere,

[TABLE]

Now, using the fact that $\log x=x-1$ if and only if $x=1$ , we deduce that for almost all $\theta$

[TABLE]

which immediately implies a), as desired.

b) We first consider the optimal solution $C^{\star}$ , which satisfies i), ii) and iii) in Theorem 1.4, which can be alternatively stated below:

•

[TABLE]

•

For some $\lambda^{\star}>0$

[TABLE]

is causal;

•

For almost all $\theta\in[-\pi,\pi)$ ,

[TABLE]

where $\lambda^{\star}$ is as in (26).

From (26), straightforward computations yield that

[TABLE]

Now, we consider the optimal solution $C^{\star\star}$ , which similarly satisfies:

•

[TABLE]

•

For some $\lambda^{\star\star}>0$

[TABLE]

is causal;

•

For almost all $\theta\in[-\pi,\pi)$ ,

[TABLE]

where $\lambda^{\star\star}$ is as in (26).

And parallel to (28) and (29), we have

[TABLE]

Note that, by a), we have almost everywhere,

[TABLE]

Now, using (25), (30) and (35), we deduce that (28)-(29)+(33)-(34) can be simplified as

[TABLE]

or equivalently,

[TABLE]

Note that, by (27), (32) and (35), we have, for almost all $\theta$ ,

[TABLE]

which means the integrand in (37) is non-negative, and thereby must be [math], that is,

[TABLE]

for almost all $\theta\in[-\pi,\pi)$ .

We now claim that there exists a positive measure set $T\subset\partial\mathbb{D}$ such that on $T$

[TABLE]

To see this, by way of contradiction, we suppose the opposite is true, that is, almost everywhere,

[TABLE]

which, together with (38), immediately implies that almost everywhere

[TABLE]

Some straightforward computations employing this yield

[TABLE]

which, together with (26), immediately implies that $\overline{H_{Z}}(e^{i\theta})$ is causal. Since $H_{Z}(e^{i\theta})$ is causal, we deduce that $H_{Z}(e^{i\theta})$ is a constant, and thereby $S_{Z}(e^{i\theta})$ is also a constant, a contradiction to the assumption that $\{Z_{n}\}$ is not white.

Now, with the claim in (40), we infer from (39) that on the positive measure set $T\subset\partial\mathbb{D}$ ,

[TABLE]

which, by Theorem 2.5, immediately implies b). ∎

3.2 Computation of Optimal $C(e^{i\theta})$

Assuming $\{Z_{n}\}$ is not white, we give in this section a recursive algorithm to compute the unique optimal solution $C(e^{i\theta})$ .

We will first consider the the following optimization problem and establish the uniqueness of its optimal solution:

[TABLE]

where $C^{\star}(e^{i\theta})$ is the unique optimal solution to (14).

Theorem 3.3.

A solution $C^{\star\star}(e^{i\theta})$ to (41) is optimal if and only if the following conditions are satisfied:

i)

[TABLE]

ii)

For some $\lambda>0$

[TABLE]

is causal;

iii)

For almost all $\theta\in[-\pi,\pi)$ ,

[TABLE]

where $\lambda$ is as in (43).

Proof.

The proof is very similar to that of Theorem 1.1, and thus postponed to Appendix A. ∎

Theorem 3.4.

Assume that $\{Z_{n}\}$ is not white. Then the optimal solution to (41) is unique.

Proof.

Note that by Lemma 3.1, we have for any $C(e^{i\theta})$ satisfying (15),

[TABLE]

In other words, other than being the unique optimal solution to (14), $C^{\star}(e^{i\theta})$ is also one of the optimal solution to (41). Let $C^{\star\star}(e^{i\theta})$ be another optimal solution to (41). Then, by Theorem 3.3, $C^{\star}(e^{i\theta})$ and $C^{\star\star}(e^{i\theta})$ satisfy (42), (43) and (44) with $\lambda^{\star}$ and $\lambda^{\star\star}$ , respectively. Now, a completely parallel argument as in the proof of Theorem 3.2 will yield

[TABLE]

which will collectively imply

[TABLE]

and furthermore

[TABLE]

for almost all $\theta\in[-\pi,\pi)$ . The remainder of the proof then uses exactly the same argument as in the proof of Theorem 3.2 to establish

[TABLE]

almost everywhere and thereby the uniqueness of the optimal solution to (41). ∎

Now, we consider the following algorithm to compute the optimal $C^{i\theta}$ via recursively solving a sequence of optimization problems:

Algorithm 3.5.

Arbitrarily choose $C^{(0)}(e^{i\theta})\in\mathcal{H}_{2}$ satisfying

[TABLE] 2. 2)

For $n=0,1,\dots$ , solve the following optimization problem

[TABLE]

and then set $C^{(n+1)}(e^{i\theta})$ to be one of the optimal solutions. 3. 3)

Set $n=n+1$ and repeat 2).

Obviously, the above recursive procedure yields a sequence of functions $\{C^{(n)}(e^{i\theta})\}$ in $\mathcal{H}_{2}$ . The following theorem discusses the convergence behavior of this sequence.

Theorem 3.6.

Assume that $\{Z_{n}\}$ is not white. If there is a pointwise convergent subsequence $\{C^{(n_{k})}(e^{i\theta})\}$ such that

[TABLE]

then $\{C^{(n_{k})}(e^{i\theta})\}$ must converge to $C^{\star}(e^{i\theta})$ , the unique optimal solution to (14), almost everywhere.

Proof.

First of all, we will show that

[TABLE]

Apparently, we have, for all $i=0,1,\dots$ ,

[TABLE]

which immediately implies that

[TABLE]

So, to show (49), we only need to prove

[TABLE]

To show this, suppose, by way of contradiction, that

[TABLE]

Then, there exist $\delta>0$ and a subsequence $\{C^{(n_{j})}(e^{i\theta})\}_{i=0}^{\infty}$ such that

[TABLE]

for all $j\in\mathbb{N}$ . It then follows from

[TABLE]

that

[TABLE]

But this would imply that optimal value of the optimization problem is infinity, a contradiction. And therefore we have established (50) and thereby (49).

Now, let $C^{\infty}(e^{i\theta})$ denote the pointwise limit of the subsequence $\{C^{(n_{k})}(e^{i\theta})\}_{k=0}^{\infty}$ . Applying (27), (48) and (49), we deduce that

[TABLE]

On the other hand, by Lemma 3.1, we have

[TABLE]

for any $C(e^{i\theta})$ satisfying (15). Therefore,

[TABLE]

in other words, $C^{\infty}(e^{i\theta})$ is an optimal solution to the optimization problem (41). Now, by Theorem 3.4, we conclude that almost everywhere

[TABLE]

and thereby completing the proof of the theorem. ∎

Remark 3.7.

Roughly speaking, Theorem 3.6 says that any convergent subsequence produced by Algorithm 3.5 will converge to the optimal solution to (14). Algorithm 3.5 will practically compute the Gaussian feedback capacity if the global minimum of the optimization problem (47) can be computed. Although this is a feasible task for certain special families of channels, we are not aware of any efficient way to solve the optimization problem in (47) for a general stationary Gaussian channel, which is a great impediment for implementing Algorithm 3.5. One effective way to circumvent this issue is to find a local minimum in lieu of the global minimum of (47). Obviously, with such a replacement, the performance of the algorithm is compromised in the sense that it will only produce a suboptimal solution. On the other hand though, we have observed that the recursive update in Step 2) provides an effective means to prevent the produced sequence from getting stuck at some local optimal solution locally. As a matter of fact, for many practical channels for which we know the capacity (see Section 3.3), the compromised algorithm appears to be quickly convergent to the true optimal solution; see Example 4.3.

3.3 Optimal $C(e^{i\theta})$ for ARMA( $k$ ) Gaussian Channels

In this section, we generalize Theorem 1.3 and give a more explicit characterization of the optimal solution $C^{\star}(e^{i\theta})$ for the case that $\{Z_{n}\}$ is an ARMA( $k$ ) Gaussian process.

The proof of our main result in this section will use the following lemma, whose proof closely follows that of Proposition $4.2$ in [11] and is included for completeness.

Lemma 3.8.

Suppose that the assumptions of Theorem 1.4 are satisfied. If $C^{\star}$ is an optimal solution to (14), then $\overline{C^{\star}}(C^{\star}+H_{Z})$ is causal.

Proof.

Suppose, by way of contradiction, that $\overline{C^{\star}}(C^{\star}+H_{Z})$ is not causal, then for some $n\geq 1$ , we have

[TABLE]

Let $A(e^{i\theta})=xe^{in\theta}$ with $|x|<1$ . Then, for $C^{\star\star}\triangleq(1+A)(C^{\star}+H_{Z})-H_{Z}$ , one verifies that it is also strictly causal, and furthermore,

[TABLE]

By Jensen’s formula, the entropy rate of $S_{Y}^{\star\star}$ is the same as that of $S_{Y}^{\star}$ . On the other hand, the power of $C^{\star}$ can be computed as follows:

[TABLE]

where $P_{Y}=\int S_{Y}^{\star}d\theta/2\pi>0$ . Therefore, we can choose certain $x$ such that $P^{\star\star}(x)<P$ , i.e., we can achieve same information rate using less power, which is contradictory to Condition i) of Theorem 1.4. ∎

We are now ready to state the main result of this section.

Theorem 3.9.

Suppose the noise $\{Z_{i}\}$ is not white with the power spectral density $S_{Z}(e^{i\theta})$ taking the form as in (7). Then, the feedback capacity $C_{FB}$ can be achieved by $C(z)$ taking the following form:

[TABLE]

where $m_{i}$ are positive integers for all $i=1,2,\dots,l$ and $\sum_{i=1}^{l}m_{i}\leq k$ , $x_{i}\in\mathbb{C}$ are all distinct and $|x_{i}|<1$ for all $i=1,2,\dots,l$ , $y_{ij}\in\mathbb{C}$ for all $i$ and $j$ . Furthermore, $C(z)$ is optimal yielding the capacity

[TABLE]

if and only if all $x_{i}$ , $m_{i}$ and $y_{ij}$ satisfy the following four conditions:

i)

Power:

[TABLE]

where, as elsewhere in this paper, the parenthesized superscript means the derivative with respect to $z$ ; 2. ii)

Roots: $x_{1},x_{2},\dots,x_{l}$ are the roots of the function

[TABLE]

that are strictly inside the unit circle, while the other roots $r_{1}^{-1},r_{2}^{-1},\dots,r_{k}^{-1}$ are all strictly outside the unit circle; 3. iii)

Strong orthogonality: there exists a real number $\lambda>0$ such that for all $i=1,2,\dots,l$ and $j=1,2,\dots,m_{i}$ ,

[TABLE]

where

[TABLE] 4. iv)

Output spectrum: For almost all $\theta\in[-\pi,\pi)$ ,

[TABLE]

Proof.

Through a similar argument as in the proof of Theorem 1.2, we first show that any capacity achieving $C^{\star}(z)\triangleq\sum_{k=1}^{\infty}c^{\star}_{k}z^{k}$ must take the form in (51). To this end, we consider $\hat{S}_{Y}^{*}(e^{i\theta})\triangleq|Q(e^{i\theta})|^{2}S_{Y}^{\star}(e^{i\theta})$ , which, by straightforward computations, can be rewritten as follows:

[TABLE]

Now, it follows from Lemma 3.8, (53) and the fact that $P(z)$ and $Q(z)$ are both polynomials of degree at most $k$ that $\hat{S}_{Y}^{*}(e^{i\theta})$ must be of the following form:

[TABLE]

Then, by the fact that $\hat{S}_{Y}^{*}(e^{i\theta})$ is symmetric, we deduce that on $\partial\mathbb{D}$ , $\hat{S}_{Y}^{*}$ can be written as

[TABLE]

or alternatively, on $\mathbb{D}$ ,

[TABLE]

Note that $\hat{S}_{Y}^{*}(e^{i\theta})$ has a canonical factorization (see Page $733,734$ of [18]), namely, it can be written as

[TABLE]

where $\sigma$ is a positive constant and $R(z)$ is a $k$ -th order stable polynomial with $R(0)=1$ . Now, we consider

[TABLE]

Since $C^{\star}(z)+H_{Z}(z)$ is an $\mathcal{H}_{2}$ function and $Q(z),R(z)$ are both stable polynomials, $T(z)$ is an $\mathcal{H}_{2}$ function. It then follows from (55) and (56) that

[TABLE]

which, by (21), implies that the outer function in the inner-outer decomposition of $T(z)$ is the constant function $1$ . Now, by (54) and (56), we have

[TABLE]

It then follows from (54) and the fact that $R(z)$ is a stable polynomial that

[TABLE]

which, by Remark 2.4, implies that $T(z)$ is nothing but a Blaschke product, and furthermore, $C^{\star}(z)+H_{Z}(z)$ must take the following form:

[TABLE]

for some complex numbers $x_{1},x_{2},\dots$ with $|x_{j}|<1$ for all $j$ and $\prod_{j}|x_{j}|^{2}=1/\sigma^{2}$ . By Condition iii) of Theorem 1.4,

[TABLE]

is causal, which means that

[TABLE]

is analytic on $\mathbb{D}$ , which, together with the fact that $C^{\star}(z)+H_{Z}(z)$ has the factor of $\prod_{i=1}^{\infty}(1-x_{i}^{-1}z)$ (for this, see (58)), implies that $1-\lambda S_{Y}^{\star}(z)$ must also have the same factor. By symmetry, $1-\lambda S_{Y}^{\star}(z)$ must also have the factor $\prod_{i=1}^{\infty}(1-x_{i}^{-1}z^{-1})$ , which means that all $x_{i}$ and $x_{1}^{-1}$ are zeros of $1-\lambda S_{Y}^{\star}(z)$ . Since $1-\lambda S_{Y}^{\star}(z)$ is a rational spectrum with degree at most $2k$ , it has at most $2k$ zeros. Therefore, we conclude that

[TABLE]

where all $x_{i}$ are distinct with $|x_{i}|<1$ , all $m_{i}$ are positive integers with $\sum_{i=1}^{l}m_{i}\leq k$ .

The causality of

[TABLE]

implies that for any $k=1,2,\dots$ ,

[TABLE]

which, together with (60), yields

[TABLE]

Rewriting the above integral as a line integral, we have

[TABLE]

where $\gamma$ is the unit circle. Denote

[TABLE]

It’s easy to check that $h(z)$ is an analytic function on the unit disk since $R(z)$ is stable. Via the Heaviside cover-up method, the integrand of the LHS of (61) can be decomposed as

[TABLE]

where $\tilde{h}_{ij}(z)=a_{ij}h(z)$ and

[TABLE]

is a constant depending on $x_{i}$ and $m_{i}$ . Thus $\tilde{h}_{ij}(z)$ is also an analytic function on the unit disk for all $i,j$ . Applying Cauchy’s integral formula, we deduce that for any $k$ ,

[TABLE]

or equivalently,

[TABLE]

Hence, each $c^{\star}_{k}$ takes the following form

[TABLE]

where $\tilde{y}_{ij}$ is a constant independent of $k$ , which immediately implies that

[TABLE]

where $y_{ij}\triangleq\tilde{y}_{ij}/(j-1)!$ . Hence, together with (60),

[TABLE]

where for the last equality, all $\bar{x}_{i}$ are replaced by $x_{i}$ , which can be justified by the fact that $\{x_{i}\}=\{\bar{x}_{i}\}$ , thanks to the fact that $C^{\star}(z)$ has only real-valued coefficients.

We next prove that Conditions i)-iv) are necessary and sufficient for the optimality of $C^{\star}(z)$ , which, given (64), readily follows from Theorem 1.1 and some technical computations.

First of all, Condition i) follows from (64) and Condition i) in Theorem 1.1:

[TABLE]

where for (a), we have replaced $\bar{y}_{pq}$ by $y_{pq}$ , which can be justified by the fact that $\{y_{pq}\}=\{\bar{y}_{pq}\}$ , again due to the fact that $C^{\star}(z)$ has only real-valued coefficients.

Second, it follows from (60) and (64) that

[TABLE]

which immediately implies Condition ii).

Condition iii) follows from the fact that the coefficients of each $x_{i}^{k-j}$ at both sides of (61) are equal. More precisely, by (63), the coefficient of $x_{i}^{k-j}$ on the right hand side is $(j-1)!(k-1)\cdots(k-j+1)\lambda y_{ij}$ . On the other hand, via (62), the coefficient of $x_{i}^{k-j}$ on the LHS of (61) is as follows:

[TABLE]

Condition iii) then immediately follows.

Last, Condition iv) follows from Condition iii) of Theorem 1.1 and some technical computations.

Finally, noting the uniqueness of the output PSD $S^{\star}_{Y}$ corresponding to the optimal $C^{\star}(z)$ (Theorem 3.2) and applying Jensen’s formula, we obtain

[TABLE]

The proof of Theorem 3.9 is then complete. ∎

Remark 3.10.

By Theorem 3.9, to compute the ARMA( $k$ ) Gaussian feedback capacity, one needs to first find a solution to one of the following systems of rational equations: for some positive $m_{1},m_{2},\dots,m_{l}$ with $\sum_{j=1}^{l}m_{i}\leq k$ ,

[TABLE]

such that $|x_{i}|<1$ for all $i$ and it also satisfies Condition iv) in Theorem 3.9 to compute the capacity with (52).

4 Examples and Numerical Results

In this section, we give a couple of examples and some numerical results.

Example 4.1.

When $k=1$ , both $l$ and $m_{l}$ are necessarily $1$ , and the corresponding system of equations is:

[TABLE]

which immediately gives rise to (10). An elementary analysis (see, e.g., [11] or [13]) will show that Condition iv) of Theorem 3.9 translates to (11), an extra condition $x$ has to satisfy. It turns out that for this case, $x_{1}$ is unique, which, by (52), yields

[TABLE]

So, Theorem 3.9 recovers Theorem 1.3 as a special case.

Example 4.2.

When $k=2$ , by Theorem 3.9, we have three cases to deal with:

$l=1$ and $m_{1}=1$ : We need to find $|x_{1}|<1$ , $y_{11}\neq 0$ such that

[TABLE]

and for all $\theta\in[-\pi,\pi)$ ,

[TABLE]

where $r_{1}+r_{2}=x_{1}-x_{1}^{-1}-\alpha_{1}-\alpha_{2}-y_{11}$ and $r_{1}r_{2}=\alpha_{1}\alpha_{2}x_{1}^{2}-\beta_{1}\beta_{2}x_{1}y_{11}$ . If such $x_{1}$ exists, we have

[TABLE] 2. 2.

$l=1$ and $m_{1}=2$ : We need to find $|x_{1}|<1$ and $y_{11},y_{12}\neq 0$ such that

[TABLE]

and for all $\theta\in[-\pi,\pi)$

[TABLE]

where

[TABLE]

and $r_{1}+r_{2}=2x_{1}-2x_{1}^{-1}-\alpha_{1}-\alpha_{2}-y_{11}$ and $r_{1}r_{2}=\alpha_{1}\alpha_{2}x_{1}^{4}-\beta_{1}\beta_{2}x_{1}^{3}y_{11}-\beta_{1}\beta_{2}x^{2}y_{12}$ . If such $x_{1},y_{11},y_{12}$ exist, then we have

[TABLE] 3. 3.

$l=2$ and $m_{1}=1$ , $m_{2}=1$ : We need to find distinct $|x_{1}|,|x_{2}|<1$ and $y_{11},y_{21}\neq 0$ such that

[TABLE]

and for all $\theta\in[-\pi,\pi)$ ,

[TABLE]

where $r_{1}+r_{2}=x_{1}+x_{2}-x_{1}^{-1}-x_{2}^{-1}-\alpha_{1}-\alpha_{2}-y_{11}-y_{21}$ and $r_{1}r_{2}=\alpha_{1}\alpha_{2}x_{1}^{2}x_{2}^{2}-\beta_{1}\beta_{2}x_{1}^{2}x_{2}y_{21}-\beta_{1}\beta_{2}x_{1}x_{2}^{2}y_{11}$ . If such $x_{1},x_{2},y_{11},y_{21}$ exist, then we have

[TABLE]

Complicated as they may look, the systems of equations in (67), (68) and (69) all have finitely many solutions for generic $\alpha_{1},\alpha_{2},\beta_{1},\beta_{2}$ and therefore can be numerically solved (for instance, Bertini [1], a numerical algebraic geometry package, can be used to efficiently find their zero-dimensional roots). Below, fixing $P=1$ , $\alpha_{2}=0.1$ , and $\beta_{2}=0$ , assuming different values for $\beta_{1}$ , we have plotted the values of $C_{FB}$ against the values of $\alpha_{1}$ .

Example 4.3.

As evidenced in Example 4.2, solving the polynomial system in (66) will yield the ARMA( $k$ ) Gaussian feedback capacity. Nevertheless, the computational complexity drastically increases as $k$ gets larger. Our observation is that with this approach, the computation can be measured in minutes (for moderate computing power) for $k=2$ , but it will be measured in days for $k=3$ . In this example, we demonstrate the effectiveness of Algorithm 3.5 in terms of computing/estimating Gaussian feedback capacity. Apparently this algorithm works for much more general settings, but for the purpose of comparison, we will also focus on applying the algorithm to compute the ARMA( $k$ ) Gaussian feedback channels.

We first discuss a couple of technical issues for the implementation of Algorithm 3.5.

The first issue is about the form that $C(z)$ should take for implementing the algorithm. Note that, albeit explicit, the expression as in (51) gives different forms for different $l$ and $m_{1},m_{2},\dots,m_{l}$ , which will create technical problems for Step 2), where the recursive computation of $\{C^{(n)}(e^{i\theta})\}$ is conducted. One way to circumvent this issue is to adopt the following unified form:

[TABLE]

where $\hat{y}_{n}$ are complex numbers and $\hat{x}_{n}$ are complex numbers inside unit circle. One verifies that the above form encompasses all the possible cases in (51).

As in Remark 3.7, as there does not seem to exist an effective way to find the global minimum for (47), we instead update the sequence $\{C^{(n)}(e^{i\theta})\}$ by a local minimum in (47) via some gradient-descent like method. This, however, create another problem for choosing the initial $C^{(0)}(e^{i\theta})$ ; more specifically, if $C^{(0)}(e^{i\theta})$ is chosen such that $C^{(0)}(e^{i\theta})+H_{Z}(e^{i\theta})$ has no zeros inside the unit circle, and thereby any $C(e^{i\theta})$ “close” to $C^{(0)}(e^{i\theta})$ , $C(e^{i\theta})+H_{Z}(e^{i\theta})$ will likely not have zeros inside the unit circle either. Then by Jensen’s formula,

[TABLE]

Therefore, it is difficult to use a gradient-like method to find a feasible $C^{(1)}(e^{i\theta})$ such that

[TABLE]

not to mention to find a local minimum point $C^{(1)}(e^{i\theta})$ . To overcome this issue, one can further assume $C^{(0)}(e^{i\theta})$ is chosen such that $C^{(0)}(e^{i\theta})+H_{Z}(e^{i\theta})$ has at least one zero (denote by $s$ below) inside the unit circle, that is,

[TABLE]

where $|s|<1$ , $\gamma_{1},\gamma_{2},\dots,\gamma_{2k-1}$ are appropriately chosen complex numbers.

With these two issues addressed, Algorithm 3.5 can be efficiently implemented to yield a lower bound (denoted by $C_{FB}^{(low)}$ ) on the Gaussian feedback capacity. We observe that for the ARMA( $k$ ) channels, $k=1,2$ , the implemented algorithm actually quickly converges to the true capacity; moreover, it can also handle larger $k$ ’s within reasonably short time (measured in hours with moderate computing pwoer). Below, fixing $P=10$ , $\alpha_{1}=0.3$ , $\alpha_{2}=0.4$ , $\beta_{1}=-0.3$ , $\beta_{2}=0.7$ , assuming different values for $\alpha_{3}$ , we have plotted the values of $C_{FB}^{(low)}$ against the values of $\beta_{3}$ .

Appendices

Appendix A Proof of Theorem 3.3

For the necessity part, we directly use the method of Lagrangian multiplier. Consider the Lagragian of (41)

[TABLE]

Apparently $C^{\star\star}(e^{i\theta})$ satisfies the KKT condition, that is,

[TABLE]

and for any $k=1,2,\dots$ ,

[TABLE]

which yield (30) and (31), respectively. Furthermore, the infinite-dimensional Hessian matrix $H$ of $L(c,\lambda)$ can be computed as

[TABLE]

for all feasible $k$ , and

[TABLE]

for all all feasible $j\neq k$ . Note that $H$ can be decomposed as $2\lambda A-2I$ , where

[TABLE]

for all feasible $j,k$ . Now, at the global maximum solution $C^{\star\star}(e^{i\theta})=\sum_{j=1}^{\infty}c^{\star\star}_{j}e^{ij\theta}$ , $H$ must satisfy: for any $n$ and any $z=(z_{1},z_{2},\dots,z_{n})\neq\mathbf{0}$ with $\sum_{i=1}^{n}c^{\star\star}_{i}z_{i}=0$ ,

[TABLE]

where $H^{(n)}$ the leading principle $n\times n$ submatrix of $H$ , i.e., $H^{(n)}=(H_{j,k})_{j,k=1}^{n}$ . It then follows that at most $1$ eigenvalue of $H^{(n)}$ is positive, or equivalently, at most $1$ eigenvalue of $A^{(n)}$ is larger than $1/\lambda$ , where $A^{(n)}$ is the leading principle $n\times n$ submatrix of $A$ . Denote by $\lambda^{(n)}_{2}$ the second largest eigenvalue of $A^{(n)}$ , then $\lambda^{(n)}_{2}\leq 1/\lambda$ for all $n$ . It then follows from the well-known fact on the eigenvalue distribution of Toeplitz forms (see, Page 63 of [7]), $\lambda^{(n)}_{2}$ converges to $\mathop{esssup}\limits_{\theta\in[-\pi,\pi)}|C^{\star}(e^{i\theta})+H_{Z}(e^{i\theta})|^{-2}$ as $n$ tends to infinity. Therefore, we conclude that

[TABLE]

for almost all $\theta\in[-\pi,\pi)$ .

For the sufficiency part, we use the same idea as given in the proof in Theorem 4.1 in [11]. More precisely, we need to prove that for any $C(e^{i\theta})$ satisfying (15),

[TABLE]

To see this, note that

[TABLE]

Note that by (31), we have for almost all $\theta$ ,

[TABLE]

and

[TABLE]

It then follows that for any $C(e^{i\theta})$ satisfying (15), we have

[TABLE]

The proof of the theorem is then complete.

Bibliography28

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. J. Bates, J. D. Hauenstein, A. J. Sommese and C. W. Wampler. Bertini: Software for Numerical Algebraic Geometry. Available at bertini.nd.edu with permanent doi: dx.doi.org/10.7274/R 0H 41PB 5.
2[2] S. Butman. A general formulation of linear feedback communication systems with solutions. IEEE Trans. Info. Theory , vol. 15, no. 3, pp. 392-400, 1969.
3[3] T. M. Cover and S. Pombra. Gaussian feekback capacity. IEEE Trans. Info. Theory , vol. 35, no. 1, pp. 1072-1076, 1989.
4[4] A. Dembo. On Gaussian feekback capacity. IEEE Trans. Info. Theory , vol. 35, no. 5, pp. 37-43, 1989.
5[5] P. Duren. Theory of H p subscript 𝐻 𝑝 H_{p} Spaces , New York: Academic Press, 1970.
6[6] P. Ebert. The capacity of the Gaussian channel with feedback. Bell Syst. Tech. J , vol. 49, pp. 1705-1712, 1970.
7[7] U. Grenander, G. Szegö. Toeplitz forms and their applications , Second Edition, New York, 1958.
8[8] T. T. Kadota, M. Zakai and J. Ziv. Mutual information of the white Gaussian channel with and without feedback. IEEE Trans. Info. Theory , vol. 17, pp. 368-371, 1971.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Feedback Capacity of Stationary Gaussian Channels Further Examined 111Results in this paper have been partially presented in the 2017 IEEE ISIT [14].

Abstract

1 Introduction

Theorem 1.1** (Theorem 4.14.14.1 of [11]).**

Theorem 1.2** (Proposition 5.15.15.1 of [11]).**

Theorem 1.3** (Theorem 5.35.35.3 in [11]).**

Theorem 1.4** (Theorem 4.14.14.1 of [11] reformulated).**

2 Mathematical Preliminaries

Theorem 2.1** (Cauchy’s integral formula).**

Theorem 2.2** (Jensen’s formula).**

Theorem 2.3** (Theorem 2.8 in [5]).**

Remark 2.4**.**

Theorem 2.5** (Theorem 2.22.22.2 in [5]).**

3 Main Results

3.1 Uniqueness of Optimal C(eiθ)C(e^{i\theta})C(eiθ)

Lemma 3.1**.**

Proof.

Theorem 3.2**.**

Proof.

3.2 Computation of Optimal C(eiθ)C(e^{i\theta})C(eiθ)

Theorem 3.3**.**

Proof.

Theorem 3.4**.**

Proof.

Algorithm 3.5**.**

Theorem 3.6**.**

Proof.

Remark 3.7**.**

3.3 Optimal C(eiθ)C(e^{i\theta})C(eiθ) for ARMA(kkk) Gaussian Channels

Lemma 3.8**.**

Proof.

Theorem 3.9**.**

Proof.

Remark 3.10**.**

4 Examples and Numerical Results

Example 4.1**.**

Example 4.2**.**

Example 4.3**.**

Appendices

Appendix A Proof of Theorem 3.3

Theorem 1.1 (Theorem $4.1$ of [11]).

Theorem 1.2 (Proposition $5.1$ of [11]).

Theorem 1.3 (Theorem $5.3$ in [11]).

Theorem 1.4 (Theorem $4.1$ of [11] reformulated).

Theorem 2.1 (Cauchy’s integral formula).

Theorem 2.2 (Jensen’s formula).

Theorem 2.3 (Theorem 2.8 in [5]).

Remark 2.4.

Theorem 2.5 (Theorem $2.2$ in [5]).

3.1 Uniqueness of Optimal $C(e^{i\theta})$

Lemma 3.1.

Theorem 3.2.

3.2 Computation of Optimal $C(e^{i\theta})$

Theorem 3.3.

Theorem 3.4.

Algorithm 3.5.

Theorem 3.6.

Remark 3.7.

3.3 Optimal $C(e^{i\theta})$ for ARMA( $k$ ) Gaussian Channels

Lemma 3.8.

Theorem 3.9.

Remark 3.10.

Example 4.1.

Example 4.2.

Example 4.3.