Feedback Capacity of Stationary Gaussian Channels Further Examined
Tao Liu, Guangyue Han

TL;DR
This paper investigates the feedback capacity of stationary Gaussian channels, proving the uniqueness of optimal solutions for non-white noise and providing algorithms and explicit formulas for calculating feedback capacity in autoregressive moving-average noise models.
Contribution
It establishes the uniqueness of the optimal solution for the feedback capacity problem when noise is not white and introduces an efficient recursive algorithm for its computation.
Findings
Optimal solution is unique for non-white Gaussian noise.
Proposed recursive algorithm converges and is computationally efficient.
Explicit formulas for feedback capacity in ARMA noise models for k=1,2 cases.
Abstract
It is well known that the problem of computing the feedback capacity of a stationary Gaussian channel can be recast as an infinite-dimensional optimization problem; moreover, necessary and sufficient conditions for the optimality of a solution to this optimization problem have been characterized, and based on this characterization, an explicit formula for the feedback capacity has been given for the case that the noise is a first-order autoregressive moving-average Gaussian process. In this paper, we further examine the above-mentioned infinite-dimensional optimization problem. We prove that unless the Gaussian noise is white, its optimal solution is unique, and we propose an algorithm to recursively compute the unique optimal solution, which is guaranteed to converge in theory and features an efficient implementation for a suboptimal solution in practice. Furthermore, for the case thatâŚ
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPolynomial and algebraic computation ¡ Markov Chains and Monte Carlo Methods ¡ Advanced Optimization Algorithms Research
Feedback Capacity of Stationary Gaussian Channels Further Examined 111Results in this paper have been partially presented in the 2017 IEEE ISIT [14].
Tao Liu Guangyue Han
The University of Hong Kong The University of Hong Kong
email: [email protected] email: [email protected]
Abstract
It is well known that the problem of computing the feedback capacity of a stationary Gaussian channel can be recast as an infinite-dimensional optimization problem; moreover, necessary and sufficient conditions for the optimality of a solution to this optimization problem have been characterized, and based on this characterization, an explicit formula for the feedback capacity has been given for the case that the noise is a first-order autoregressive moving-average Gaussian process. In this paper, we further examine the above-mentioned infinite-dimensional optimization problem. We prove that unless the Gaussian noise is white, its optimal solution is unique, and we propose an algorithm to recursively compute the unique optimal solution, which is guaranteed to converge in theory and features an efficient implementation for a suboptimal solution in practice. Furthermore, for the case that the noise is a -th order autoregressive moving-average Gaussian process, we give a relatively more explicit formula for the feedback capacity; more specifically, the feedback capacity is expressed as a simple function evaluated at a solution to a system of polynomial equations, which is amenable to numerical computation for the cases and possibly beyond.
1 Introduction
We consider the following additive Gaussian channel with feedback
[TABLE]
where denotes the message to be communicated through the channel, the noise , which is independent of , is a zero mean stationary Gaussian process, and , the channel input at time , may depend on and previous channel outputs . And we assume the channel input satisfies the following average power constraint: there is such that for all ,
[TABLE]
Let denote the capacity of the channel (1), which is often referred to as Gaussian feedback capacity in the literature.
It is well known that the non-feedback capacity of (1) can be obtained through the power spectral density (PSD) water-filling method [22]. As a matter of fact, when the channel noise is white (i.e., is i.i.d.), Shannon [23] showed that feedback does not increase capacity, which means, like its non-feedback counterpart, the feedback capacity features an explicit and simple formula (Here we note that in [8], [9], Kadota, Zakai and Ziv also proved this statement for continuous-time white Gaussian channels). On the other hand though, if the channel is not white, feedback may increase capacity (see [15], [16]), and little has been known about its feedback capacity despite a number of papers [4], [17], [6], [3] relating the two capacities. Computing has been a long-standing open problem that is of fundamental importance in information theory.
An prominent approach to tackle Gaussian feedback capacity can be found in a pioneering work [3], where Cover and Pombra characterized the capacity through the sequence of the so-called â-block feedback capacityâ:
[TABLE]
where , , stand for the covariance matrices of , and , respectively. It is also shown that the maximization can be taken over of the special form , where is a strictly lower-triangular matrix and the Gaussian vector is independent of . So, (2) can be rewriten as
[TABLE]
subject to the constraint
[TABLE]
where is a negative semi-definite matrix. Then, using the asymptotic equipartition property for arbitrary (non-stationary non-ergodic) Gaussian processes, a coding theorem can then be proved to characterize the Gaussian feedback capacity as the limiting expression below:
[TABLE]
Though considerable efforts have been devoted to follow up the Cover-Pombra formulation, a âcomputableâ formula for the Gaussian feedback capacity does not seem to be within sight: it is already difficult to find the sequence of the optimal acheiving , and its limiting behavior seems to be as evasive.
Another prominent approach came along in a recent work of Kim [11], which led to a number of breakthroughs deepening our understanding of Gaussian feedback capacity. Roughly speaking, instead of examining the channel (1) over a finite time window, Kim justifies certain interchanges between limits and integrals when evaluating (3) and (4) and recast the problem of computing as an infinite-dimensional optimization problem. Below, we state one of the theorems in [11] that is relevant to our results.
Theorem 1.1** (Theorem of [11]).**
Suppose that the power spectral density of the Gaussian noise process is bounded away from 0, and has a canonical spectral factorization , where . Then the feedback capacity is given by
[TABLE]
where the maximum is taken over all strictly causal satisfying the power constraint
[TABLE]
Furthermore, a filter attains the maximum in (5) if and only if
- i)
Power:
[TABLE] 2. ii)
Output spectrum:
[TABLE] 3. iii)
Strong orthogonality: For some
[TABLE]
is causal.
Using Theorem 1.1 and relevant tools from the theory of Hardy spaces, Kim further characterized the capacity achieving for the special case that is a -th order autoregressive moving-average (ARMA()) Gaussian process. Roughly speaking, the following theorem says that the optimal must be rational satisfying three conditions corresponding to those in Theorem 1.1.
Theorem 1.2** (Proposition of [11]).**
Suppose the noise is not white and is an ARMA() Gaussian process with parameters , , for all , namely, it has the power spectral density
[TABLE]
Then the feedback capacity in (5) is necessarily achieved by a filter of the form
[TABLE]
where is a stable polynomial whose degree is at most , and
[TABLE]
is a normalized Blaschke product of at most zeros. Furthermore, a filter of the form (8) is optimal if and only if the following hold:
- i)
Power:
[TABLE] 2. ii)
Output spectrum: For all zeros of b(z)
[TABLE] 3. iii)
Factorization:
[TABLE]
has a factor .
When applied to the case , Theorem 1.2 readily yields a rather tractable expression for the capacity achieving and gives a simple and explicit formula for , as detailed in the following theorem.
Theorem 1.3** (Theorem in [11]).**
Suppose the noise process is an ARMA() Gaussian process with parameters and , , . Then, the Gaussian feedback capacity is given by
[TABLE]
where is the unique root of the following fourth-order polynomial
[TABLE]
satisfying
[TABLE]
We now digress a bit to briefly mention related results on the ARMA() Gaussian feedback capacity in the literature: Generalizing the celebrated Schalkwijk-Kailath scheme [20], [21], Butman [2] obtained a lower bound of the feedback capacity of AR() channel (a special ARMA() channel with ). Butmanâs bound was shown to be optimal under some cases of linear feedback schemes by Wolfowitz [27] and Tiernan [25]. Tiernan and Schalkwijk [26] also found an upper bound of AR() Gaussian channel capacity, which is equal to Butmanâs lower bound at very low and very high signal-to-noise ratio. It was shown [10] that Butmanâs lower bound is indeed the capacity, and the capacity of MA() channel (a special ARMA() channel with ) was also derived in the same paper. More recently, Yang, KavÄiÄ and Tatikonda [28] studied the ARMA() Gaussian channel by analyzing the structure of the optimal input distribution and reformulating the problem as a stochastic control optimization problem. And based on a speculation of the limiting behavior of the optimal input distribution, they derived the formula (9) and conjectured that it gives the ARMA() Gaussian feedback capacity.
As mentioned above, the power of the variational formulation as in Theorem 1.1 has been showcased in Theorem 1.3, where the conjecture of [28] has been confirmed and the ARMA() Gaussian feedback capacity is given as an explicit and simple formula. To the best of our knowledge, the ARMA() Gaussian feedback channel is the only non-trivial scenario whose Gaussian feedback capacity is âexplicitâ. The success by the variational formulation approach, contrasted by all the above-mentioned other approaches that have been struggling dealing with special cases of an ARMA() channel, naturally posed the question of whether it can be extended to deal with more general channels, for instance, ARMA() Gaussian feedback channels. Attempts in this direction, however, have somehow encountered certain technical barriers, due to the fact that the form in (8) is âless manageableâ (see Page in [11]). As a matter of fact, instead of following the variational formulation framework, an alternative state-space representation approach has been proposed in [11] to deal with the ARMA() Gaussian feedback capacity, only to yield an intractable optimization problem (see Theorem in [11]). Here we remark that prior to [11], a result of similar nature has also been derived in Theorem of [28], which however appears to be equally intractable.
In this paper, we will position ourselves within Kimâs framework [11] and further examine feedback capacity of a stationary Gaussian channel as in (1). Our starting point is precisely Theorem 1.1, but instead of considering the filter , we use the method of âchange of variablesâ and consider
[TABLE]
here we note that since is strictly causal and , it is obvious that is also strictly causal, and thereby can be written as for some . Apparently, (12) can be used to reformulate other quantities, such as the PSD of the channel output
[TABLE]
and eventually reformulate Theorem 1.1 as follows:
Theorem 1.4** (Theorem of [11] reformulated).**
Suppose that the power spectral density of the Gaussian noise process is bounded away from 0, and has a canonical spectral factorization , where . Then the feedback capacity is given by
[TABLE]
where the maximum is taken over all strictly causal satisfying the power constraint
[TABLE]
Furthermore, a attains the maximum in (14) if and only if
- i)
Power:
[TABLE] 2. ii)
Output spectrum:
[TABLE] 3. iii)
Strong orthogonality: For some
[TABLE]
is causal.
The remainder of the paper is organized as follows. In Section 2, we review relevant results from complex analysis and the theory of Hardy spaces as mathematical preliminaries that will be used in our proofs. Section 3 contains the main results of this paper, which can roughly summarized below:
- â˘
We prove in Section 3.1 that unless the noise is white, the optimal solution to the optimization problem (14) is unique; see Theorem 3.2.
- â˘
In Section 3.2, we propose an algorithm to recursively compute the optimal solution, which is guaranteed to converge to the unique optimal solution in theory and features an efficient implementation for a suboptimal solution in practice; see Algorithm 3.5.
- â˘
In Section 3.3, we will establish Theorem 3.9, a âmore manageableâ version of Theorem 1.2 and a natural extension to Theorem 1.3 combined, and derive a relatively more explicit formula for the ARMA() Gaussian feedback capacity as a simple function evaluated at a solution to a system of equations, which is amenable to numerical computation for the cases and possibly beyond.
Several examples are given in Section 4. More specifically, Example 4.1 details the fact that Theorem 3.9 naturally extends Theorem 1.3, and Example 4.2 use Theorem 3.9 to numerically compute the feedback capacity of ARMA() Gaussian channels. Focusing on the application of Algorithm 3.5 to ARMA() Gaussian channels, we discuss its efficient implementation and numerically compute lower bounds on the feedback capacity of ARMA() Gaussian channels.
2 Mathematical Preliminaries
In this section, we review a number of important theorems in complex analysis and the theory of Hardy spaces, which will be used in our proofs and may not be stated in the most general form.
Let denote the open unit disk on the complex plane , that is,
[TABLE]
and let and denote its boundary and closure, respectively, that is,
[TABLE]
We first review two fundamental theorems in complex analysis, which are relatively better-known yet still included for self-containedness.
The following theorem gives the classical Cauchyâs integral formula for an analytic function on .
Theorem 2.1** (Cauchyâs integral formula).**
Let be an open subset of the complex plane which contains , and let be an analytic function. Then for any and any , we have
[TABLE]
where the contour integral is taken counter-clockwise, and the superscript denotes the -th order complex derivative.
The Cauchy integral formula can be used to establish the following Jensenâs formula.
Theorem 2.2** (Jensenâs formula).**
Let be an open subset of the complex plane which contains . Let be an analytic function, and let denote the zeros of in repeated according to multiplicity. Suppose that . Then, we have
[TABLE]
Next, we will review some basic notions, terminology and needed results from the theory of Hardy spaces.
Let and let be an analytic function on . The function is said to be of class if
[TABLE]
It is well known that by taking the pointwise radial limit, any can be extended to a function , where
[TABLE]
When there is no risk of confusion, we will follow the usual convention and identify and , which we may oftentimes simply denote by . Then, can be viewed as a closed vector subspace of .
For any , we say that is causal (or strictly causal) if its Fourier coefficients is equal to [math] for all (or ), where
[TABLE]
It is well known that is precisely the subset of causal functions in . For a quick example, we note that , represented by infinite sequences indexed by as
[TABLE]
sits naturally inside the space , which can be represented by bi-infinite sequences indexed by as
[TABLE]
Now, we recall the inner-outer decomposition theorem in the theory of Hardy spaces.
Theorem 2.3** (Theorem 2.8 in [5]).**
Every function in has a unique factorization of the form , where
- â˘
* is a Blaschke product taking the following form:*
[TABLE]
where is a nonnegative integer and is the set of all the zeros of in ,
- â˘
* is a singular inner function, which can be represented by the following Poisson-Stieltjes integral:*
[TABLE]
where is a bounded nondecreasing singular function with a.e.,
- â˘
* is an outer function taking the following form:*
[TABLE]
where is a real constant.
Remark 2.4**.**
Note that it can be shown that as in (19) is analytic on with the same set of zeros as , and and are also analytic without any zeros in . Furthermore, it is well known (see, e.g., Page of [12]) that if and only if
[TABLE]
Roughly speaking, the following theorem says that a function in is uniquely determined by its boundary values on any set of positive measure.
Theorem 2.5** (Theorem in [5]).**
Let be not identically [math]. Then has measure [math] (with respect to the Lebesgue measure on ). Furthermore, if and for all in a positive measure subset , then almost everywhere.
3 Main Results
3.1 Uniqueness of Optimal
Recall that is defined as in (12), and we say is an optimal solution if it solve the optimization problem (14), namely, it satisfies (15) and achieves the maximum in (14). In this section, we will establish the uniqueness of optimal .
We will first need the following lemma.
Lemma 3.1**.**
Let be an optimal solution to (14). Then, for any satisfying (15), we have
[TABLE]
Proof.
Note that
[TABLE]
where in deriving (a) we have used the easily verifiable fact that
[TABLE]
Moreover, by (18), we have for almost all ,
[TABLE]
and
[TABLE]
It then follows that for any satisfying (15),
[TABLE]
where we have used (16) in deriving (b). â
The following theorem first shows that all optimal give rise to the same , the corresponding channel output PSD, and then establishes the uniqueness of optimal when the channel noise is not white.
Theorem 3.2**.**
a) For any two optimal and , we have, almost everywhere,
[TABLE]
b) Suppose that is not white, that is, is not a constant function. Then, for any two optimal and , we have, almost everywhere,
[TABLE]
Proof.
a) Using the well-known fact that for any ,
[TABLE]
we deduce that for all ,
[TABLE]
and thereby
[TABLE]
where (a) follows from Lemma 3.1 and (b) follows from the fact the optimal solutions and give rise to the same optimal value. It then follows that the first inequality in (24) is in fact an equality, or equivalently,
[TABLE]
which, together with (23), immediately implies that almost everywhere,
[TABLE]
Now, using the fact that if and only if , we deduce that for almost all
[TABLE]
which immediately implies a), as desired.
b) We first consider the optimal solution , which satisfies i), ii) and iii) in Theorem 1.4, which can be alternatively stated below:
- â˘
[TABLE]
- â˘
For some
[TABLE]
is causal;
- â˘
For almost all ,
[TABLE]
where is as in (26).
From (26), straightforward computations yield that
[TABLE]
[TABLE]
Now, we consider the optimal solution , which similarly satisfies:
- â˘
[TABLE]
- â˘
For some
[TABLE]
is causal;
- â˘
For almost all ,
[TABLE]
where is as in (26).
And parallel to (28) and (29), we have
[TABLE]
[TABLE]
Note that, by a), we have almost everywhere,
[TABLE]
Now, using (25), (30) and (35), we deduce that (28)-(29)+(33)-(34) can be simplified as
[TABLE]
or equivalently,
[TABLE]
Note that, by (27), (32) and (35), we have, for almost all ,
[TABLE]
which means the integrand in (37) is non-negative, and thereby must be [math], that is,
[TABLE]
for almost all .
We now claim that there exists a positive measure set such that on
[TABLE]
To see this, by way of contradiction, we suppose the opposite is true, that is, almost everywhere,
[TABLE]
which, together with (38), immediately implies that almost everywhere
[TABLE]
Some straightforward computations employing this yield
[TABLE]
which, together with (26), immediately implies that is causal. Since is causal, we deduce that is a constant, and thereby is also a constant, a contradiction to the assumption that is not white.
Now, with the claim in (40), we infer from (39) that on the positive measure set ,
[TABLE]
which, by Theorem 2.5, immediately implies b). â
3.2 Computation of Optimal
Assuming is not white, we give in this section a recursive algorithm to compute the unique optimal solution .
We will first consider the the following optimization problem and establish the uniqueness of its optimal solution:
[TABLE]
where is the unique optimal solution to (14).
Theorem 3.3**.**
A solution to (41) is optimal if and only if the following conditions are satisfied:
- i)
[TABLE]
- ii)
For some
[TABLE]
is causal;
- iii)
For almost all ,
[TABLE]
where is as in (43).
Proof.
The proof is very similar to that of Theorem 1.1, and thus postponed to Appendix A. â
Theorem 3.4**.**
Assume that is not white. Then the optimal solution to (41) is unique.
Proof.
Note that by Lemma 3.1, we have for any satisfying (15),
[TABLE]
In other words, other than being the unique optimal solution to (14), is also one of the optimal solution to (41). Let be another optimal solution to (41). Then, by Theorem 3.3, and satisfy (42), (43) and (44) with and , respectively. Now, a completely parallel argument as in the proof of Theorem 3.2 will yield
[TABLE]
[TABLE]
[TABLE]
[TABLE]
which will collectively imply
[TABLE]
and furthermore
[TABLE]
for almost all . The remainder of the proof then uses exactly the same argument as in the proof of Theorem 3.2 to establish
[TABLE]
almost everywhere and thereby the uniqueness of the optimal solution to (41). â
Now, we consider the following algorithm to compute the optimal via recursively solving a sequence of optimization problems:
Algorithm 3.5**.**
Arbitrarily choose satisfying
[TABLE] 2. 2)
For , solve the following optimization problem
[TABLE]
and then set to be one of the optimal solutions. 3. 3)
Set and repeat 2).
Obviously, the above recursive procedure yields a sequence of functions in . The following theorem discusses the convergence behavior of this sequence.
Theorem 3.6**.**
Assume that is not white. If there is a pointwise convergent subsequence such that
[TABLE]
then must converge to , the unique optimal solution to (14), almost everywhere.
Proof.
First of all, we will show that
[TABLE]
Apparently, we have, for all ,
[TABLE]
which immediately implies that
[TABLE]
So, to show (49), we only need to prove
[TABLE]
To show this, suppose, by way of contradiction, that
[TABLE]
Then, there exist and a subsequence such that
[TABLE]
for all . It then follows from
[TABLE]
that
[TABLE]
But this would imply that optimal value of the optimization problem is infinity, a contradiction. And therefore we have established (50) and thereby (49).
Now, let denote the pointwise limit of the subsequence . Applying (27), (48) and (49), we deduce that
[TABLE]
On the other hand, by Lemma 3.1, we have
[TABLE]
for any satisfying (15). Therefore,
[TABLE]
in other words, is an optimal solution to the optimization problem (41). Now, by Theorem 3.4, we conclude that almost everywhere
[TABLE]
and thereby completing the proof of the theorem. â
Remark 3.7**.**
Roughly speaking, Theorem 3.6 says that any convergent subsequence produced by Algorithm 3.5 will converge to the optimal solution to (14). Algorithm 3.5 will practically compute the Gaussian feedback capacity if the global minimum of the optimization problem (47) can be computed. Although this is a feasible task for certain special families of channels, we are not aware of any efficient way to solve the optimization problem in (47) for a general stationary Gaussian channel, which is a great impediment for implementing Algorithm 3.5. One effective way to circumvent this issue is to find a local minimum in lieu of the global minimum of (47). Obviously, with such a replacement, the performance of the algorithm is compromised in the sense that it will only produce a suboptimal solution. On the other hand though, we have observed that the recursive update in Step 2) provides an effective means to prevent the produced sequence from getting stuck at some local optimal solution locally. As a matter of fact, for many practical channels for which we know the capacity (see Section 3.3), the compromised algorithm appears to be quickly convergent to the true optimal solution; see Example 4.3.
3.3 Optimal for ARMA() Gaussian Channels
In this section, we generalize Theorem 1.3 and give a more explicit characterization of the optimal solution for the case that is an ARMA() Gaussian process.
The proof of our main result in this section will use the following lemma, whose proof closely follows that of Proposition in [11] and is included for completeness.
Lemma 3.8**.**
Suppose that the assumptions of Theorem 1.4 are satisfied. If is an optimal solution to (14), then is causal.
Proof.
Suppose, by way of contradiction, that is not causal, then for some , we have
[TABLE]
Let with . Then, for , one verifies that it is also strictly causal, and furthermore,
[TABLE]
By Jensenâs formula, the entropy rate of is the same as that of . On the other hand, the power of can be computed as follows:
[TABLE]
where . Therefore, we can choose certain such that , i.e., we can achieve same information rate using less power, which is contradictory to Condition i) of Theorem 1.4. â
We are now ready to state the main result of this section.
Theorem 3.9**.**
Suppose the noise is not white with the power spectral density taking the form as in (7). Then, the feedback capacity can be achieved by taking the following form:
[TABLE]
where are positive integers for all and , are all distinct and for all , for all and . Furthermore, is optimal yielding the capacity
[TABLE]
if and only if all , and satisfy the following four conditions:
- i)
Power:
[TABLE]
where, as elsewhere in this paper, the parenthesized superscript means the derivative with respect to ; 2. ii)
Roots: are the roots of the function
[TABLE]
that are strictly inside the unit circle, while the other roots are all strictly outside the unit circle; 3. iii)
Strong orthogonality: there exists a real number such that for all and ,
[TABLE]
where
[TABLE] 4. iv)
Output spectrum: For almost all ,
[TABLE]
Proof.
Through a similar argument as in the proof of Theorem 1.2, we first show that any capacity achieving must take the form in (51). To this end, we consider , which, by straightforward computations, can be rewritten as follows:
[TABLE]
Now, it follows from Lemma 3.8, (53) and the fact that and are both polynomials of degree at most that must be of the following form:
[TABLE]
Then, by the fact that is symmetric, we deduce that on , can be written as
[TABLE]
or alternatively, on ,
[TABLE]
Note that has a canonical factorization (see Page of [18]), namely, it can be written as
[TABLE]
where is a positive constant and is a -th order stable polynomial with . Now, we consider
[TABLE]
Since is an function and are both stable polynomials, is an function. It then follows from (55) and (56) that
[TABLE]
which, by (21), implies that the outer function in the inner-outer decomposition of is the constant function . Now, by (54) and (56), we have
[TABLE]
It then follows from (54) and the fact that is a stable polynomial that
[TABLE]
which, by Remark 2.4, implies that is nothing but a Blaschke product, and furthermore, must take the following form:
[TABLE]
for some complex numbers with for all and . By Condition iii) of Theorem 1.4,
[TABLE]
is causal, which means that
[TABLE]
is analytic on , which, together with the fact that has the factor of (for this, see (58)), implies that must also have the same factor. By symmetry, must also have the factor , which means that all and are zeros of . Since is a rational spectrum with degree at most , it has at most zeros. Therefore, we conclude that
[TABLE]
where all are distinct with , all are positive integers with .
The causality of
[TABLE]
implies that for any ,
[TABLE]
which, together with (60), yields
[TABLE]
Rewriting the above integral as a line integral, we have
[TABLE]
[TABLE]
where is the unit circle. Denote
[TABLE]
Itâs easy to check that is an analytic function on the unit disk since is stable. Via the Heaviside cover-up method, the integrand of the LHS of (61) can be decomposed as
[TABLE]
where and
[TABLE]
is a constant depending on and . Thus is also an analytic function on the unit disk for all . Applying Cauchyâs integral formula, we deduce that for any ,
[TABLE]
or equivalently,
[TABLE]
Hence, each takes the following form
[TABLE]
where is a constant independent of , which immediately implies that
[TABLE]
where . Hence, together with (60),
[TABLE]
where for the last equality, all are replaced by , which can be justified by the fact that , thanks to the fact that has only real-valued coefficients.
We next prove that Conditions i)-iv) are necessary and sufficient for the optimality of , which, given (64), readily follows from Theorem 1.1 and some technical computations.
First of all, Condition i) follows from (64) and Condition i) in Theorem 1.1:
[TABLE]
where for (a), we have replaced by , which can be justified by the fact that , again due to the fact that has only real-valued coefficients.
Second, it follows from (60) and (64) that
[TABLE]
which immediately implies Condition ii).
Condition iii) follows from the fact that the coefficients of each at both sides of (61) are equal. More precisely, by (63), the coefficient of on the right hand side is . On the other hand, via (62), the coefficient of on the LHS of (61) is as follows:
[TABLE]
Condition iii) then immediately follows.
Last, Condition iv) follows from Condition iii) of Theorem 1.1 and some technical computations.
Finally, noting the uniqueness of the output PSD corresponding to the optimal (Theorem 3.2) and applying Jensenâs formula, we obtain
[TABLE]
The proof of Theorem 3.9 is then complete. â
Remark 3.10**.**
By Theorem 3.9, to compute the ARMA() Gaussian feedback capacity, one needs to first find a solution to one of the following systems of rational equations: for some positive with ,
[TABLE]
such that for all and it also satisfies Condition iv) in Theorem 3.9 to compute the capacity with (52).
4 Examples and Numerical Results
In this section, we give a couple of examples and some numerical results.
Example 4.1**.**
When , both and are necessarily , and the corresponding system of equations is:
[TABLE]
which immediately gives rise to (10). An elementary analysis (see, e.g., [11] or [13]) will show that Condition iv) of Theorem 3.9 translates to (11), an extra condition has to satisfy. It turns out that for this case, is unique, which, by (52), yields
[TABLE]
So, Theorem 3.9 recovers Theorem 1.3 as a special case.
Example 4.2**.**
When , by Theorem 3.9, we have three cases to deal with:
and : We need to find , such that
[TABLE]
and for all ,
[TABLE]
where and . If such exists, we have
[TABLE] 2. 2.
and : We need to find and such that
[TABLE]
and for all
[TABLE]
where
[TABLE]
and and . If such exist, then we have
[TABLE] 3. 3.
and , : We need to find distinct and such that
[TABLE]
and for all ,
[TABLE]
where and . If such exist, then we have
[TABLE]
Complicated as they may look, the systems of equations in (67), (68) and (69) all have finitely many solutions for generic and therefore can be numerically solved (for instance, Bertini [1], a numerical algebraic geometry package, can be used to efficiently find their zero-dimensional roots). Below, fixing , , and , assuming different values for , we have plotted the values of against the values of .
Example 4.3**.**
As evidenced in Example 4.2, solving the polynomial system in (66) will yield the ARMA() Gaussian feedback capacity. Nevertheless, the computational complexity drastically increases as gets larger. Our observation is that with this approach, the computation can be measured in minutes (for moderate computing power) for , but it will be measured in days for . In this example, we demonstrate the effectiveness of Algorithm 3.5 in terms of computing/estimating Gaussian feedback capacity. Apparently this algorithm works for much more general settings, but for the purpose of comparison, we will also focus on applying the algorithm to compute the ARMA() Gaussian feedback channels.
We first discuss a couple of technical issues for the implementation of Algorithm 3.5.
The first issue is about the form that should take for implementing the algorithm. Note that, albeit explicit, the expression as in (51) gives different forms for different and , which will create technical problems for Step 2), where the recursive computation of is conducted. One way to circumvent this issue is to adopt the following unified form:
[TABLE]
where are complex numbers and are complex numbers inside unit circle. One verifies that the above form encompasses all the possible cases in (51).
As in Remark 3.7, as there does not seem to exist an effective way to find the global minimum for (47), we instead update the sequence by a local minimum in (47) via some gradient-descent like method. This, however, create another problem for choosing the initial ; more specifically, if is chosen such that has no zeros inside the unit circle, and thereby any âcloseâ to , will likely not have zeros inside the unit circle either. Then by Jensenâs formula,
[TABLE]
Therefore, it is difficult to use a gradient-like method to find a feasible such that
[TABLE]
not to mention to find a local minimum point . To overcome this issue, one can further assume is chosen such that has at least one zero (denote by below) inside the unit circle, that is,
[TABLE]
where , are appropriately chosen complex numbers.
With these two issues addressed, Algorithm 3.5 can be efficiently implemented to yield a lower bound (denoted by ) on the Gaussian feedback capacity. We observe that for the ARMA() channels, , the implemented algorithm actually quickly converges to the true capacity; moreover, it can also handle larger âs within reasonably short time (measured in hours with moderate computing pwoer). Below, fixing , , , , , assuming different values for , we have plotted the values of against the values of .
Appendices
Appendix A Proof of Theorem 3.3
For the necessity part, we directly use the method of Lagrangian multiplier. Consider the Lagragian of (41)
[TABLE]
Apparently satisfies the KKT condition, that is,
[TABLE]
and for any ,
[TABLE]
which yield (30) and (31), respectively. Furthermore, the infinite-dimensional Hessian matrix of can be computed as
[TABLE]
for all feasible , and
[TABLE]
for all all feasible . Note that can be decomposed as , where
[TABLE]
for all feasible . Now, at the global maximum solution , must satisfy: for any and any with ,
[TABLE]
where the leading principle submatrix of , i.e., . It then follows that at most eigenvalue of is positive, or equivalently, at most eigenvalue of is larger than , where is the leading principle submatrix of . Denote by the second largest eigenvalue of , then for all . It then follows from the well-known fact on the eigenvalue distribution of Toeplitz forms (see, Page 63 of [7]), converges to as tends to infinity. Therefore, we conclude that
[TABLE]
for almost all .
For the sufficiency part, we use the same idea as given in the proof in Theorem 4.1 in [11]. More precisely, we need to prove that for any satisfying (15),
[TABLE]
To see this, note that
[TABLE]
Note that by (31), we have for almost all ,
[TABLE]
and
[TABLE]
It then follows that for any satisfying (15), we have
[TABLE]
The proof of the theorem is then complete.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. J. Bates, J. D. Hauenstein, A. J. Sommese and C. W. Wampler. Bertini: Software for Numerical Algebraic Geometry. Available at bertini.nd.edu with permanent doi: dx.doi.org/10.7274/R 0H 41PB 5.
- 2[2] S. Butman. A general formulation of linear feedback communication systems with solutions. IEEE Trans. Info. Theory , vol. 15, no. 3, pp. 392-400, 1969.
- 3[3] T. M. Cover and S. Pombra. Gaussian feekback capacity. IEEE Trans. Info. Theory , vol. 35, no. 1, pp. 1072-1076, 1989.
- 4[4] A. Dembo. On Gaussian feekback capacity. IEEE Trans. Info. Theory , vol. 35, no. 5, pp. 37-43, 1989.
- 5[5] P. Duren. Theory of H p subscript đť đ H_{p} Spaces , New York: Academic Press, 1970.
- 6[6] P. Ebert. The capacity of the Gaussian channel with feedback. Bell Syst. Tech. J , vol. 49, pp. 1705-1712, 1970.
- 7[7] U. Grenander, G. SzegĂś. Toeplitz forms and their applications , Second Edition, New York, 1958.
- 8[8] T. T. Kadota, M. Zakai and J. Ziv. Mutual information of the white Gaussian channel with and without feedback. IEEE Trans. Info. Theory , vol. 17, pp. 368-371, 1971.
