Feedback Capacity of the Continuous-Time ARMA(1,1) Gaussian Channel

Jun Su; Guangyue Han; Shlomo Shamai (Shitz)

arXiv:2302.13073·cs.IT·April 11, 2024

Feedback Capacity of the Continuous-Time ARMA(1,1) Gaussian Channel

Jun Su, Guangyue Han, Shlomo Shamai (Shitz)

PDF

Open Access

TL;DR

This paper derives a closed-form expression for the feedback capacity of the continuous-time ARMA(1,1) Gaussian channel, revealing conditions under which feedback increases capacity and challenging existing bounds and conjectures.

Contribution

The paper provides the first explicit formula for feedback capacity of the continuous-time ARMA(1,1) Gaussian channel, showing feedback may not always increase capacity.

Findings

01

Feedback capacity is given by the root of a specific equation under certain conditions.

02

Feedback may not increase the capacity of continuous-time Gaussian channels with colored noise.

03

Disproves analogues of the half-bit bound and Cover's 2P conjecture in continuous-time setting.

Abstract

We consider the continuous-time ARMA(1,1) Gaussian channel and derive its feedback capacity in closed form. More specifically, the channel is given by $y (t) = x (t) + z (t)$ , where the channel input ${x (t)}$ satisfies average power constraint $P$ and the noise ${z (t)}$ is a first-order {\em autoregressive moving average} (ARMA(1,1)) Gaussian process satisfying $z^{'} (t) + κ z (t) = (κ + λ) w (t) + w^{'} (t),$ where $κ > 0, λ \in R$ and ${w (t)}$ is a white Gaussian process with unit double-sided spectral density. We show that the feedback capacity of this channel is equal to the unique positive root of the equation $P (x + κ)^{2} = 2 x (x + ∣ κ + λ ∣)^{2}$ when $- 2 κ < λ < 0$ and is equal to $P /2$ …

Equations380

y (t) = x (t) + w (t), \leavevmode - \infty < t < + \infty,

y (t) = x (t) + w (t), \leavevmode - \infty < t < + \infty,

Y (t) = \int_{0}^{t} X (u) d u + B (t), \leavevmode t \geq 0,

Y (t) = \int_{0}^{t} X (u) d u + B (t), \leavevmode t \geq 0,

y (t) = x (t) + z (t), \leavevmode - \infty < t < + \infty,

y (t) = x (t) + z (t), \leavevmode - \infty < t < + \infty,

Y (t) = \int_{0}^{t} X (u) d u + Z (t), \leavevmode \leavevmode t \geq 0,

Y (t) = \int_{0}^{t} X (u) d u + Z (t), \leavevmode \leavevmode t \geq 0,

\frac{1}{T} \int_{0}^{T} E [∣ X (u) ∣^{2}] d u \leq P .

\frac{1}{T} \int_{0}^{T} E [∣ X (u) ∣^{2}] d u \leq P .

d Y (t) = g_{t} (W, Y_{0}^{t}) d t + d Z (t) .

d Y (t) = g_{t} (W, Y_{0}^{t}) d t + d Z (t) .

π^{(T)} = P (\overset{g}{^} (Y_{0}^{T}) \neq = W) .

π^{(T)} = P (\overset{g}{^} (Y_{0}^{T}) \neq = W) .

C_{n f b} (P) = \frac{1}{4 π} \int_{- \infty}^{\infty} lo g [max (\frac{A}{S _{z} ( x )}, 1)] d x,

C_{n f b} (P) = \frac{1}{4 π} \int_{- \infty}^{\infty} lo g [max (\frac{A}{S _{z} ( x )}, 1)] d x,

P = \int_{[S_{z} (x) \leq A]} (A - S_{z} (x)) d x .

P = \int_{[S_{z} (x) \leq A]} (A - S_{z} (x)) d x .

Y (t) = \int_{0}^{t} X (u) d u + B (t) + λ \int_{0}^{t} \int_{- \infty}^{s} e^{- κ (s - u)} d B (u) d s, t \geq 0,

Y (t) = \int_{0}^{t} X (u) d u + B (t) + λ \int_{0}^{t} \int_{- \infty}^{s} e^{- κ (s - u)} d B (u) d s, t \geq 0,

y (t) = x (t) + w (t) + λ u (t), - \infty < t < \infty,

y (t) = x (t) + w (t) + λ u (t), - \infty < t < \infty,

Y (t) = \int_{0}^{t} X (u) d u + B (t) + \int_{0}^{t} \int_{0}^{s} h (s, u) d B (u) d s, t \geq 0,

Y (t) = \int_{0}^{t} X (u) d u + B (t) + \int_{0}^{t} \int_{0}^{s} h (s, u) d B (u) d s, t \geq 0,

\overset{ˉ}{I}_{SK} (Θ; Y) = P r_{P}^{2},

\overset{ˉ}{I}_{SK} (Θ; Y) = P r_{P}^{2},

Z (t) = B (t) + \int_{0}^{t} \int_{0}^{s} h (s, u) d B (u) d s,

Z (t) = B (t) + \int_{0}^{t} \int_{0}^{s} h (s, u) d B (u) d s,

- h (s, u)

- h (s, u)

= l (s, u) + \int_{u}^{s} l (s, v) h (v, u) d v

B (t) = Z (t) + \int_{0}^{t} \int_{0}^{s} l (s, u) d Z (u) d s .

B (t) = Z (t) + \int_{0}^{t} \int_{0}^{s} l (s, u) d Z (u) d s .

I (X; Y) = E [lo g \frac{f _{X, Y} ( X , Y )}{f _{X} ( X ) f _{Y} ( Y )}],

I (X; Y) = E [lo g \frac{f _{X, Y} ( X , Y )}{f _{X} ( X ) f _{Y} ( Y )}],

I (X_{0}^{T}; Y_{0}^{T}) = ⎩ ⎨ ⎧ E [lo g \frac{d μ _{X_{0}^{T}, Y_{0}^{T}}}{d μ _{X_{0}^{T}} \times μ _{Y_{0}^{T}}} (X_{0}^{T}, Y_{0}^{T})], \infty, \mbox i f \frac{d μ _{X_{0}^{T}, Y_{0}^{T}}}{d μ _{X_{0}^{T}} \times μ _{Y_{0}^{T}}} \mbox e x i s t s, \mbox o t h er w i se,

I (X_{0}^{T}; Y_{0}^{T}) = ⎩ ⎨ ⎧ E [lo g \frac{d μ _{X_{0}^{T}, Y_{0}^{T}}}{d μ _{X_{0}^{T}} \times μ _{Y_{0}^{T}}} (X_{0}^{T}, Y_{0}^{T})], \infty, \mbox i f \frac{d μ _{X_{0}^{T}, Y_{0}^{T}}}{d μ _{X_{0}^{T}} \times μ _{Y_{0}^{T}}} \mbox e x i s t s, \mbox o t h er w i se,

I (X; Y) = sup I (X (ϕ_{1}), X (ϕ_{2}), \dots, X (ϕ_{m}); Y (ψ_{1}), Y (ψ_{2}), \dots, Y (ψ_{n})),

I (X; Y) = sup I (X (ϕ_{1}), X (ϕ_{2}), \dots, X (ϕ_{m}); Y (ψ_{1}), Y (ψ_{2}), \dots, Y (ψ_{n})),

X (ϕ_{i}) = \int X (t) ϕ_{i} (t) d t, i = 1, 2, \dots, m,

X (ϕ_{i}) = \int X (t) ϕ_{i} (t) d t, i = 1, 2, \dots, m,

Y (ψ_{j}) = \int Y (t) ψ_{j} (t) d t, j = 1, 2, \dots, n .

Y (ψ_{j}) = \int Y (t) ψ_{j} (t) d t, j = 1, 2, \dots, n .

I (X; Y) = sup I (X (ϕ_{1}), X (ϕ_{2}), \dots, X (ϕ_{m}); Y),

I (X; Y) = sup I (X (ϕ_{1}), X (ϕ_{2}), \dots, X (ϕ_{m}); Y),

d Y (t) = g_{t} (Θ (t), Y_{0}^{t}) d t + d Z (t),

d Y (t) = g_{t} (Θ (t), Y_{0}^{t}) d t + d Z (t),

C_{f b, T} (P) = (Θ, X) sup \frac{1}{T} I (Θ_{0}^{T}; Y_{0}^{T})

C_{f b, T} (P) = (Θ, X) sup \frac{1}{T} I (Θ_{0}^{T}; Y_{0}^{T})

\frac{1}{T} \int_{0}^{T} E [X^{2} (t)] d t \leq P .

\frac{1}{T} \int_{0}^{T} E [X^{2} (t)] d t \leq P .

\overset{ˉ}{I} (Θ; Y) = T \to \infty lim sup \frac{1}{T} I (Θ_{0}^{T}; Y_{0}^{T}),

\overset{ˉ}{I} (Θ; Y) = T \to \infty lim sup \frac{1}{T} I (Θ_{0}^{T}; Y_{0}^{T}),

C_{f b, \infty} (P) = (Θ, X) sup \overline{I} (Θ; Y),

C_{f b, \infty} (P) = (Θ, X) sup \overline{I} (Θ; Y),

T \to \infty \overline{lim} \frac{1}{T} \int_{0}^{T} E [X^{2} (t)] d t \leq P .

T \to \infty \overline{lim} \frac{1}{T} \int_{0}^{T} E [X^{2} (t)] d t \leq P .

T \to \infty lim \frac{1}{T} C_{f b, T} (P) = 0.

T \to \infty lim \frac{1}{T} C_{f b, T} (P) = 0.

I (Θ_{0}^{T}; Y_{0}^{T}) = \frac{1}{2} \int_{0}^{T} E [∣ X_{l} (t) - X_{l} (t) ∣^{2}] d t,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBluetooth and Wireless Communication Technologies

Full text

Feedback Capacity of OU-Colored AWGN Channels

Jun Su Guangyue Han Shlomo Shamai (Shitz)

The University of Hong Kong The University of Hong Kong Technion-Israel Institute of Technology

email: [email protected] email: [email protected] email: [email protected]

Abstract

We derive an explicit feedback capacity formula for the OU-Colored AWGN channel. Among many others, this result shows that at least in some cases, the continuous-time Schalkwijk-Kailath coding scheme achieves the feedback capacity for such a channel, and feedback may not increase the capacity of a continuous-time ACGN channel even if the noise process is colored.

1 Introduction

We start with the following continuous-time additive white Gaussian noise (AWGN) channel

[TABLE]

where the channel noise $\{\bm{w}(t)\}$ is a white Gaussian process with unit double-sided spectral density, $\{\bm{x}(t)\}$ is the channel input and $\{\bm{y}(t)\}$ is the channel output. Since $\{\bm{w}(t)\}$ can be regarded as the derivative $\{\dot{B}(t)\}$ of the standard Brownian motion $\{B(t)\}$ in the generalized sense [1, 2], or equivalently, $\{B(t)\}$ the integral of $\{\bm{w}(t)\}$ , the AWGN channel as in (1) can be alternatively characterized by

[TABLE]

where $X=\{X(t)\}$ is the channel input and $Y=\{Y(t)\}$ is the channel output. Unlike white Gaussian noise, which is a generalized stochastic process in the sense of Schwartz’s distribution [3], Brownian motion is an ordinary stochastic process that has been extensively studied in stochastic calculus. Evidently, the two formulations as in (1) and (2) allow us to examine an AWGN channel from different perspectives; in particular, the use of Brownian motion equips us with a wide range of established tools and techniques in stochastic calculus (see, e.g., [4, 5] and references therein).

This paper is concerned with the following continuous-time additive colored Gaussian noise (ACGN) channel

[TABLE]

where the channel noise $\bm{z}=\{\bm{z}(t)\}$ is a (possibly colored and generalized) stationary Gaussian process. Evidently, AWGN channels are a degenerated case of ACGN channels. Similarly as above, the ACGN channel as in (3) can be alternatively characterized by

[TABLE]

where $\{Z(t)\}$ is the (generalized) integral of $\{\bm{z}(t)\}$ . Following [4], the treatment of ACGN channels in this work is mainly based on the formulation in (4).

For any $M\in\mathbb{N}$ and $T>0$ , an $(M,T)$ * code* for the ACGN channel (4) consists of the following:

( $a$ )

A message index $W$ independent of $\{Z(t);t\in[0,T]\}$ and uniformly distributed over $\{1,2,...,M\}$ .

( $b$ )

For the non-feedback case, an encoding function $g_{u}:\{1,2,...,M\}\rightarrow\mathbb{R},\leavevmode\nobreak\ u\in[0,T]$ , yielding codewords $X(u)=g_{u}(W)$ ; for the feedback case, an encoding function $g_{u}:\{1,2,...,M\}\times C[0,u]\rightarrow\mathbb{R},\leavevmode\nobreak\ u\in[0,T]$ , yielding codewords $X(u)=g_{u}(W,Y_{0}^{u-})$ . For both cases, the classical average power constraint is satisfied:

[TABLE]

( $c$ )

A decoding functional $\hat{g}:C[0,T]\rightarrow\{1,2,...,M\}$ .

Here we remark that for the feedback case, it follows from the pathwise continuity of $\{Y(t)\}$ that $X(t)=g_{t}(W,Y_{0}^{t})$ , and therefore the channel output $\{Y(t)\}$ is in fact the unique solution to the following stochastic functional differential equation:

[TABLE]

The error probability $\pi^{(T)}$ for the $(M,T)$ code as above is defined as

[TABLE]

A rate $R$ is achievable if there exists a sequence of $([e^{TR}],T)$ codes with $\lim_{T\to\infty}\pi^{(T)}=0$ . The channel capacity is defined as the supremum of all achievable rates, denoted by $C_{nfb}(P)$ for the non-feedback case and $C_{fb}(P)$ for the feedback case.

The literature on ACGN channels is vast, and so we only survey those results that are most relevant to this work below. It has been shown by Huang and Johnson [6, 7] that $C_{nfb}(P)$ can be achieved by a Gaussian input. For a special family of ACGN channels, Hitsuda [8] has applied a canonical representation method to derive a fundamental formula for the channel mutual information (see Lemma 3.2); based on this result, Ihara [9] showed that $C_{fb}(P)$ can be achieved by a Gaussian input with an additive feedback term. Similarly as in the discrete-time case, the property that feedback can at most double the capacity of an ACGN channel, i.e., $C_{fb}(P)\leq 2C_{nfb}(P)$ , is established by examining a discrete-time approximation of $\{Z(t)\}$ (see [10] [11] [12]). Employing a Hilbert space approach [13, 14], Baker [15, 16] has derived a theoretical formula for $C_{nfb}(P)$ , which however is somewhat difficult to evaluate. When it comes to effective computation of $C_{nfb}(P)$ or $C_{fb}(P)$ , to the best of our knowledge, there are only a few results featuring an “explicit” and “computable” formula, detailed below. Here, we remark that Baker, Ihara and Hitsuda have studied the capacity of some families of ACGN channels, yet under different types of power constraints (see [16, 15, 17, 8]).

For the ACGN channel formulated as in (3), when $\bm{z}$ is an ordinary stationary Gaussian process with rational spectrum, $C_{nfb}(P)$ can be determined by the water-filling method (see, e.g., [18, 14, 19, 17]). More specifically,

[TABLE]

where $S_{\bm{z}}(x)$ is the spectral density function (SDF) of the noise process $\bm{z}$ and the water level $A$ is a constant determined by

[TABLE]

2.

For the AWGN channel as in (1) or (2), it is a classical result that $C_{nfb}(P)=P/2$ and feedback does not increase the channel capacity, that is to say, $C_{fb}(P)=P/2$ (see, e.g., [4, 5, 20]). Moreover, $C_{fb}(P)$ can be achieved by a linear feedback coding scheme that maximizes the channel mutual information and minimizes the filtering error simultaneously [21, 22, 23].

In this paper, we will focus our attention on a special family of ACGN channels, which is characterized as

[TABLE]

where $\lambda\in\mathbb{R},\kappa>0$ . Note that the channel above can be alternatively characterized by

[TABLE]

where, as before, $\bm{w}(t)=\dot{B}(t)$ is a white Gaussian process, and $\bm{u}(t)=\int_{-\infty}^{t}e^{-\kappa(t-u)}dB(u)$ is a stationary Ornstein-Uhlenbeck (OU) process, arguably the simplest nontrivial continuous-time stationary Gaussian process. Evidently, when $\lambda=0$ , the equation (7) boils down to (1), and when $\lambda\neq 0$ , the channel input, after going through an AWGN channel, will be further corrupted by an OU noise. For this reason, we will henceforth refer to the channel (6) as an OU-Colored AWGN channel.

The main contribution in this work is an explicit characterization of the feedback capacity of an OU-Colored AWGN channel. Before this work, no “explicit” and “computable” formula is known for any nontrivial stationary ACGN channel (3). Throughout the remainder of this paper, the notations $C_{nfb}(P)$ and $C_{fb}(P)$ will be reserved for the OU-Colored AWGN channel (6).

We will first derive a lower bound on $C_{fb}(P)$ , which turns out to be tight for some cases. To achieve this, we will examine the following ACGN channel

[TABLE]

where $h(s,u)$ is a Volterra kernel function on $L^{2}([0,T]^{2})$ for any $T>0$ . Here we emphasize that the channel (8) may not correspond to a stationary ACGN channel as in (3). However, it can be shown that $\{B(t)+\int_{0}^{t}\int_{0}^{s}h(s,u)dB(u)ds\}$ is equivalent to the Brownian motion $\{B(t)\}$ [24], which renders the channel (8) more amenable to in-depth mathematical analysis, as evidenced by relevant results in the literature (see, e.g., [8, 9, 25]).

More specifically, let $\{\Theta(t)\}$ be the message process, and let $\bar{I}_{\text{SK}}(\Theta;Y)$ denote the mutual information rate between $\{\Theta(t)\}$ and $\{Y(t)\}$ under the so-called continuous-time Schalkwijk-Kailath (SK) coding scheme. We will show (Theorem 4.2) that

[TABLE]

where $r_{P}$ is the limit of the unique solution to an ordinary differential equation, and moreover, one of the real roots of a third-order polynomial. It turns out that an OU-colored AWGN channel can be regarded as a special case of (8), and therefore $\bar{I}_{\text{SK}}(\Theta;Y)$ can help provide a lower bound on $C_{fb}(P)$ .

With the aforementioned lower bound, we are ready to derive an explicit expression of $C_{fb}(P)$ . More specifically, by examining a discrete-time approximation of the channel (6), we prove (Theorem 5.1) that for the case $-2\kappa<\lambda<0$ , $C_{fb}(P)$ is upper bounded by $\bar{I}_{\text{SK}}(\Theta;Y)$ , which means $C_{fb}(P)=\bar{I}_{\text{SK}}(\Theta;Y)$ ; for the other cases, we show $C_{fb}(P)=C_{nfb}(P)=P/2$ . As a byproduct, this result shows that feedback may not increase the capacity of a continuous-time ACGN channel even if noise process is colored. By contrast, for a discrete-time ACGN channel, feedback does not increase the capacity if and only if the noise spectrum is white (see [26, Corollary 4.3]).

The remainder of the paper is organized as follows. In Section 2, we review necessary notation and terminlogies. We review the coding theorem for the feedback capacity and introduce the continuous-time SK coding scheme in Section 3. Section 4 provides an asymptotic characterization of $\overline{I}_{\text{SK}}(\Theta;Y)$ for a subclass of ACGN channels, which represents a lower bound on $C_{fb}(P)$ . In Section 5, we derive an explicit formula for $C_{fb}(P)$ .

2 Notation and Terminlogies

We use $(\Omega,\mathcal{F},{\mathbb{P}})$ to denote the underlying probability space, and ${\mathbb{E}}$ to denote the expectation with respect to the probability measure ${\mathbb{P}}$ . As is typical in the theory of stochastic calculus, we assume the probability space is equipped with a filtration $\{\mathcal{F}_{t}:0\leq t<\infty\}$ , which satisfies the usual conditions [27] and is rich enough to accommodate a standard Brownian motion. Throughout the paper, we will mostly use uppercase letters (e.g., $X$ , $Y$ ) to denote random variables, and their lowercase counterparts (e.g., $x$ , $y$ ) to denote their realizations.

Let $C[0,\infty)$ denote the space of all continuous functions over $[0,\infty)$ , and let $C^{1}[0,\infty)$ be the space of all functions in $C[0,\infty)$ that have continuous derivatives on $[0,\infty)$ . For any $T>0$ , let $C[0,T]$ denote the space of all continuous functions over $[0,T]$ . Let $X,Y$ be random variables defined on the probability space $(\Omega,\mathcal{F},{\mathbb{P}})$ , which will be used to illustrate most of the notions and facts in this section (note that the same notations may have different connotations in other sections). Note that in this paper, a random variable can be real-valued with a probability density function, or path-valued (more precisely, $C[0,\infty)$ - or $C[0,T]$ -valued).

For any two path-valued random variables $X_{0}^{T}=\{X(t);0\leq t\leq T\}$ and $Y_{0}^{T}=\{Y(t);0\leq t\leq T\}$ , we use $\mu_{X_{0}^{T}}$ and $\mu_{Y_{0}^{T}}$ to denote the probability distributions on $C[0,T]$ induced by $X_{0}^{T}$ and $Y_{0}^{T}$ , respectively, and $\mu_{X_{0}^{T}}\times\mu_{Y_{0}^{T}}$ the product distribution of $\mu_{X_{0}^{T}}$ and $\mu_{X_{0}^{T}}$ ; moreover, we will use $\mu_{X_{0}^{T},Y_{0}^{T}}$ to denote their joint probability distribution on $C[0,T]\times C[0,T]$ . Besides, we use $\mathcal{F}_{t}(Y)$ to denote the $\sigma$ -field generated by $Y_{0}^{t}$ .

For any two probability measures $\mu$ and $\nu$ , we write $\mu\sim\nu$ to mean they are equivalent, namely, $\mu$ is absolutely continuous with respect to $\nu$ and vice versa. By Hitsuda [24], if a Gaussian process $\{Z(t)\}$ is equivalent to a given Brownian motion, then there exists a (possibly different) Brownian motion $\{B(t)\}$ such that $Z(t)$ can be uniquely represented by

[TABLE]

where $h(s,u)$ is a Volterra kernel function on $L^{2}([0,T]^{2})$ for any $T>0$ , i.e., $h(s,u)=0$ if $s<u$ and $\int_{0}^{T}\int_{0}^{T}h(s,u)^{2}dsdu<\infty$ for any $T>0$ . Conversely, for a given Brownian motion $\{B(t)\}$ , if $\{Z(t)\}$ has a representation in the form (9), then $\{Z(t)\}$ is equivalent to $\{B(t)\}$ . Note that, for any $T>0$ , there exists a Volterra kernel function $l(s,u)\in L^{2}([0,T]^{2})$ , referred to as the resolvent kernel of $h(s,u)$ , such that

[TABLE]

for any $s,u\in[0,T]$ (see [28, Chapter 2]). Therefore, the Brownian motion $\{B(t)\}$ can be also uniquely determined in terms of $\{Z(t)\}$ as

[TABLE]

The mutual information $I(X;Y)$ between two real-valued random variables $X,Y$ is defined as

[TABLE]

where $f_{X},f_{Y}$ denote the probability density functions of $X,Y$ , respectively, and $f_{X,Y}$ their joint probability density function. More generally, for two $C[0,T]$ -valued random variables $X_{0}^{T},Y_{0}^{T}$ , we define

[TABLE]

where $\frac{d\mu_{X_{0}^{T},{Y_{0}^{T}}}}{d\mu_{X_{0}^{T}}\times\mu_{Y_{0}^{T}}}$ denotes the Radon-Nikodym derivative of $\mu_{X_{0}^{T},{Y_{0}^{T}}}$ with respect to $\mu_{X_{0}^{T}}\times\mu_{Y_{0}^{T}}$ .

The notion of mutual information can be further extended to generalized random processes, which we will only briefly describe and we refer the reader to [13] for a more comprehensive exposition.

The mutual information between two generalized random processes $\bm{x}=\{x\}$ and $Y=\{Y(t)\}$ is defined as

[TABLE]

where the supremum is over all possible $n,m\in\mathbb{N}$ and all possible testing functions $\phi_{1},\phi_{2},\dots,\phi_{m}$ and $\psi_{1},\psi_{2},\dots,\psi_{n}$ , and we have defined

[TABLE]

It can be verified that the general definition of mutual information as in (14) includes (12) and (13) as special cases; moreover, when one of $X$ and $Y$ , say, $Y$ , is a random variable, the general definition boils down to

[TABLE]

where the supremum is over all possible $n\in\mathbb{N}$ and all possible testing functions $\phi_{1},\phi_{2},\dots,\phi_{m}$ .

3 Continuous-Time SK Coding

In this section, we shall examine the continuous-time ACGN channel (8). Throughout this section, let $Z(t)=B(t)+\int_{0}^{t}\int_{0}^{s}h(s,u)dB(u)ds$ .

The celebrated channel coding theorem by Shannon [20] states, roughly speaking, that for a discrete memoryless channel, the capacity can be written as a supremum of the mutual information between the channel input and output. This classical result has been extensively extended and generalized to various channel models. Not surprisingly, under some mild assumptions, similar results hold for the non-feedback and feedback capacity of our channel. We will present the coding theorem for the feedback capacity below, while that for the non-feedback capacity can be found in Section 5.2.

For the purpose of presenting a coding theorem for the feedback capacity, instead of transmitting a message index $W$ , a random variable taking values from a finite alphabet, we will transmit a message process $\Theta=\{\Theta(t)\}$ , a real-valued random process. Then, compared to (5), the associated stochastic functional differential equation will take the following form:

[TABLE]

where we have set $X(t)=g_{t}(\Theta(t),Y_{0}^{t})$ . Following [4], we consider the so-called $T$ -block feedback capacity

[TABLE]

where the supremum is taken over all pairs $(\Theta,X)$ satisfying the following constraint

[TABLE]

Now, we define

[TABLE]

provided the limit exists, and furthermore define

[TABLE]

where the supremum is taken for all pairs $(\Theta,X)$ satisfying the constraint

[TABLE]

Then, the aforementioned coding theorem for the feedback capacity is stated below.

Theorem 3.1 ([29, Theorem 1]).

Assume that

[TABLE]

If $R<C_{fb,\infty}(P)$ and $P$ is continuous point of $C_{fb,\infty}(P)$ , then the rate $R$ is achievable. Conversely, if a rate $R$ is achievable, then $R\leq C_{fb,\infty}(P)$ .

The following lemma generalizes the classical I-CMMSE relationship in [5, 30].

Lemma 3.2 ([12, Theorem 1]).

Suppose $\int_{0}^{T}\mathbb{E}[X^{2}(t)]dt<\infty$ . Then, we have

[TABLE]

where $X_{l}=\{X_{l}(t);t\in[0,T]\}$ is a random process defined by

[TABLE]

and $l=l(s,u)$ is the resolvent kernel of $h$ in $L^{2}([0,T]^{2})$ and $\widehat{X_{l}}(t)\triangleq\text{E}[X_{l}(t)|\mathcal{F}_{t}(Y)]$ .

When it comes to the $T$ -block feedback capacity of the channel (8), we remark that the so-called additive feedback coding scheme can achieve $C_{fb,T}(P)$ (see, e.g., [31, 9]). This coding scheme is formulated as follows. Consider the additive feedback coding scheme $(\Theta,X)=(\{\Theta(t)\},\{X(t)\})$ with $X(t)=\Theta(t)-\zeta(t)$ , where $\zeta=\{\zeta(t)\}$ represents the feedback term, causally dependent on the output $Y=\{Y(t)\}$ , and is appropriately chosen such that the stochastic functional differential equation

[TABLE]

admits a unique solution. Obviously, if there is no feedback, (17) becomes

[TABLE]

Slightly extending the result [4, Theorem 6.2.3], we can prove the following lemma in the same manner.

Lemma 3.3.

Suppose that

[TABLE]

Then, for any $t\in[0,T]$ , we have

[TABLE]

Note that (18) means that for the channel (8) under this scheme, additive feedback will not provide the receiver with any new information. However, feedback can be used as a means to save transmission energy, since, for a fixed message $\Theta$ , we can lower $\mathbb{E}[|\Theta(t)-\zeta(t)|^{2}]$ by appropriately choosing $\zeta$ . This observation suggests an effective way to design a coding scheme to maximize $I(\Theta_{0}^{T};Y_{0}^{T})$ for the channel (8) in which $X$ satisfies (15). Indeed, Ihara proved the following result, for which a relatively more direct proof is provided in Appendix A.

Theorem 3.4 ([9, Theorem 3] Reformulated).

For the continuous-time ACGN channel (8) under the constraint (15), $C_{fb,T}(P)$ of can be achieved by a Gaussian pair $(\Theta,X)$ of the following form

[TABLE]

where

[TABLE]

Moreover, $\mathcal{F}_{t}(Y^{\ast})=\mathcal{F}_{t}(Y)$ , and so the pair $(\Theta,X)$ characterizes an additive feedback coding scheme of the form (17) where $\zeta(t)=\mathbb{E}[\Theta(t)|\mathcal{F}_{t}(Y)]$ .

The essence of the above theorem is that we can restrict our attention to the coding schemes of the form as in (19). Following the spirits of the classical Schalkwijk-Kailath (SK) coding scheme, we formulate in our notation the continuous-time version of the celebrated SK coding scheme $(\Theta,X)$ in the form of

[TABLE]

satisfying

[TABLE]

where $\Theta_{0}$ is a standard Gaussian random variable and $A(t)$ is some function.

In general, the above continuous-time SK coding scheme can be invalid in the sense that $A(t)$ may not exist. However, in Sections 4 and 5, we will show that the continuous-time SK coding scheme is valid for a subclass of ACGN channels (8) and is also optimal for some special families of ACGN channels.

4 Mutual Information Rate

In this section, we narrow our attention to the special family of ACGN channels (8) in which the resolvent kernel $l(t,s)$ of $h(t,s)$ can be written as

[TABLE]

where $l_{u}(t)\in C[0,+\infty)$ and $l_{d}(t)\in C^{1}[0,+\infty)$ .

We first prove a lemma characterizing the asymptotics of the solution $g$ to the following ordinary differential equation (ODE)

[TABLE]

where $p(t),q(t)\in C[0,\infty)$ satisfying $\lim_{t\to\infty}p(t)=p$ and $\lim_{t\to\infty}q(t)=q$ for two constants $p,q\in\mathbb{R}$ .

Lemma 4.1.

For every $P>0$ , the ODE (22) admits a unique solution $g(t)\in C^{1}[0,\infty)$ . Moreover, $lim_{t\to\infty}g(t)$ exists, which is one of the real roots of the following cubic equation:

[TABLE]

Equipped with Lemma 4.1, we can prove the following theorem.

Theorem 4.2.

Assume the resolvent kernel $l(t,s)$ of $h(t,s)$ in (8) can be written in the form (21) with

[TABLE]

where $\alpha,\beta\in\mathbb{R}$ . Then, we have

[TABLE]

where $r_{P}=\lim_{t\to\infty}g(t)$ and $g$ is the solution of the ODE (22) with $p(t)=-l^{\prime}_{d}(t)/l_{d}(t)$ and $q(t)=(l_{u}(t)+l^{\prime}_{d}(t))/l_{d}(t)$ . Moreover, $r_{P}$ is one of the real roots of the following cubic equation

[TABLE]

Proof.

We shall employ a continuous-time SK coding scheme $(\Theta,X)$ . Let $A(t)$ be a function defined by

[TABLE]

where the function $g(t)$ is defined to be a solution of the following Abel equation of the first kind:

[TABLE]

It then follows from (23) and Lemma 4.1 that $\lim_{t\to\infty}g(t)$ exists (denoted by $r_{P}$ ) and $r_{P}$ is one of the real roots of the cubic equation (25).

Next, we shall prove that the continuous-time SK coding scheme defined by (26) and (20) is valid, that is, for any $t\geq 0$

[TABLE]

Indeed, since $g$ satisfies (27), it holds that for all $t$

[TABLE]

Multiplying both sides of (29) by $l_{d}(t)A(t)$ , we obtain

[TABLE]

and

[TABLE]

Therefore, (29) leads to

[TABLE]

which is equivalent to

[TABLE]

where

[TABLE]

Therefore, noting the initial condition $A^{2}(0)=P$ , it holds that

[TABLE]

By [32, Theorem 12.2], we can readily establish

[TABLE]

which, together with (32), immediately implies (28), as desired.

Now we are ready to prove (24). From Lemma 3.2 and (33), it follows that for a fixed $T$ ,

[TABLE]

Thus, we have

[TABLE]

where (a) follows from (31) and (32), (b) follows from Lemma 4.1. Thus, (24) is established and then the proof is complete. ∎

Remark 4.3.

It turns out that from the proof of Theorem 4.2, $r_{P}$ is uniquely determined by $l(t,s)$ , rather than the choice of $l_{u}(s),l_{d}(t)$ .

To illustrate the application of the above theorem, we give the following two examples.

Example 4.4.

When $l(t,s)\equiv 0$ , the channel (8) boils down to the AWGN channel (2). Apparently, one can choose $l_{u}\equiv 0$ and $l_{d}\equiv 1$ , yielding $\overline{I}_{\text{SK}}(\Theta;Y)=P/2$ , which is widely known as the capacity of (2). **

Example 4.5.

When $l(t,s)\equiv 1$ , it turns out that the channel (8) boils down to

[TABLE]

Apparently, it can be verified that $l_{u}\equiv l_{d}\equiv c$ , where $c$ is a non-zero constant. Thus, we have $\alpha=1,\beta=0$ , yielding that $\bar{I}_{\text{SK}}(\Theta;Y)$ is the unique positive root of the cubic equation $P(x+1)^{2}=2x^{3}$ . This recovers Proposition 1 in [12]. **

To conclude this section, although Theorem 4.2 provides a lower bound on feedback capacity of a subclass of ACGN channels, this lower bound is somewhat implicit. In Section 5, we find more detailed answers by narrowing our attention to a special class of channel models.

5 Capacity of OU-Colored AWGN Channels

In this section, we focus on the following OU-Colored AWGN channel

[TABLE]

where

[TABLE]

The following theorem is our main result in which we derive an explicit formula for $C_{fb}(P)$ .

Theorem 5.1.

$C_{fb}(P)$ * is determined in the following two cases:*

(1)

if $\lambda\leq-2\kappa$ or $\lambda\geq 0$ , then $C_{fb}(P)=P/2$ ;

(2)

if $-2\kappa<\lambda<0$ , then $C_{fb}(P)$ is the unique positive root of the third-order polynomial

[TABLE]

Before the proof, we introduce two auxiliary random processes $Z_{0}=\{Z_{0}(t);t\in[0,T]\}$ , $Z_{\ast}=\{Z_{\ast}(t);t\in[0,T]\}$ by

[TABLE]

respectively. Let $\zeta_{0}$ be a Gaussian random variable defined by

[TABLE]

Note that $Z_{0}$ solves the stochastic differential equation

[TABLE]

Thus, we obtain

[TABLE]

Moreover, it holds that

[TABLE]

5.1 Proof of the Converse Part (Upper Bound)

In this subsection, we prove the converse part of Theorem 5.1, which relies on some existing results on the feedback capacity of discrete-time ARMA(1,1) Gaussian channels under the average power constraint (see detailed definitions in [33]). For such channels, Yang et al. [34, Theorem 7] derived a relatively explicit formula for feedback capacity under the assumption that stationary inputs can achieve feedback capacity, which has been confirmed by Kim in the proof of [26, Theorem 3.1]. Thus, feedback capacity for the ARMA(1,1) noise channels is known, as reformulated below.

Theorem 5.2 ([34],[26]111Theorem 5.2 has been stated and proved in [26, Theorem 5.3]. However, a recent paper [35] pointed out that the proof of a key result [26, Corollary 4.4] is incorrect, and as a consequence, the proof of Theorem 5.3 in [26] is invalid.).

Suppose the noise process $\{Z_{i}\}$ is an ARMA(1,1) Gaussian process satisfying

[TABLE]

where $\{U_{i}\}$ is a white Gaussian process with zero mean and unit variance. Then, under the average power constraint

[TABLE]

the feedback capacity of additive Gaussian channel $Y_{i}=X_{i}+Z_{i},i=1,2,...$ is given by

[TABLE]

where $x_{0}$ is the unique positive root of the fourth-order polynomial

[TABLE]

Remark 5.3.

Then, we can derive an upper bound for the $T$ -block feedback capacity $C_{f,T}(P)$ in the following lemma.

Lemma 5.4.

For any $T>0$ , the $T$ -block feedback capacity of the OU-Colored AWGN channel (34) is upper bounded by

[TABLE]

where $x_{0}(P;\lambda,\kappa)$ is the unique positive root of polynomial (35).

Proof.

By Theorem 3.4, we can prove (39) by considering any Gaussian pair $(\Theta,X)$ of the form (19) in which $X$ satisfies the constraint (15). Thus, WLOG, the message process $\Theta=\{\Theta(t);t\in[0,T]\}$ is assumed to be Gaussian such that $\int_{0}^{T}\mathbb{E}[\Theta^{2}(t,t)]dt<\infty$ . If there is no feedback, the channel output $Y^{\ast}=\{Y^{\ast}(t)\}$ is given by

[TABLE]

The channel input $X(t)$ is assumed in the form $\Theta(t)-\mathbb{E}[\Theta(t)|\mathcal{F}_{t}(Y^{\ast})]$ . Then, the channel output $Y=\{Y(t)\}$ is given by

[TABLE]

Moreover, it is known [9] that there exists a Volterra kernel $K(t,s)$ on $L^{2}([0,T]^{2})$ such that $\mathbb{E}[\Theta(t)|\mathcal{F}_{t}(Y^{\ast})]=\int_{0}^{t}K(t,s)dY^{\ast}(s)$ . The remainder of the proof is divided into three steps. In Steps 1 & 2, we assume that the following condition:

(C.0)

The Volterra kernel $K(t,s)$ is continuous on the set $\{(t,s)\in[0,T]^{2};t\geq s\}$

is satisfied.

Step 1. In this step, we shall introduce a sequence of ARMA(1,1) Gaussian channels constructed from the OU-Colored AWGN channel (34) by using a discrete-time approximation method.

For any $n\in\mathbb{N}$ , we consider a partition $\{t_{k}^{(n)};k=0,1,...,n\}$ of $[0,T]$ satisfying $t^{(n)}_{k+1}-t^{(n)}_{k}=\delta_{n}$ for all $k$ , where $\delta_{n}=T/n$ . Define $\{B^{(n)}_{k};k=0,1,...,n-1\},\{Z_{k}^{(n)};k=0,1,...,n-1\}$ by

[TABLE]

respectively, where $d_{k}^{(n)}\triangleq\int_{t^{(n)}_{k}}^{t^{(n)}_{k+1}}e^{-\kappa s}ds$ . Then, it is shown that $\{Z^{(n)}_{k}/\sqrt{\delta_{n}};k=0,1,...,n-1\}$ is an ARMA(1,1) Gaussian process satisfying

[TABLE]

which, however, is not stationary. It turns out that we can modify (41) to guarantee stationarity. Specifically, we redefine $\{\widetilde{Z}^{(n)}_{k};k=0,1,...,n-1\}$ as follows:

[TABLE]

where $m(x)\triangleq\sqrt{2\kappa x/(1-e^{-2\kappa x})}$ . It is straightforward to verify that $\{\widetilde{Z}_{k}^{(n)}/\sqrt{\delta_{n}};k=0,..,n-1\}$ is a stationary ARMA(1,1) process of the following form

[TABLE]

Furthermore, we define $\{Y^{\ast,(n)}_{k}\}$ and $\{Y^{(n)}_{k}\}$ as follows:

[TABLE]

where $\{\Theta^{(n)}_{k}\}$ and $\{\zeta^{(n)}_{k}\}$ are defined by

[TABLE]

respectively. Note that (45) and (44) corresponds to $n$ -block discrete-time ARMA(1,1) Gaussian channels with feedback and without feedback, respectively.

Step 2. This step will be devoted to approximating $P/2$ and $x_{0}(P;\lambda,\kappa)$ by feedback capacities of the sequence of ARMA(1,1) Gaussian channels (45).

We have the following chain of inequalities:

[TABLE]

where $e(\delta_{n})$ is some function (to be specified later) dependent on $\{\Theta(t)\}$ with the property $\lim_{n\to\infty}e(\delta_{n})=0$ , where $C_{FB,n}(P)$ denotes the $n$ -block feedback capacity [33] of the channel (45) under the constraint that the average power of the channel input is bounded by $P$ (see [26]) and $C_{FB}(P)$ denotes feedback capacity. Now, with (a)-(f) validified (proofs can be founded in Appendix C), (39) immediately follows from (46) and Theorem 3.4.

Step 3. We will prove that the continuity assumption (C.0) can be dropped. Indeed, there exists a sequence of Volterra kernels $\{K_{(m)};m=1,2,...\}$ satisfying (C.0) and

[TABLE]

Set

[TABLE]

Then, we have

[TABLE]

where we have used the fact

[TABLE]

Note that (c)-(f) in Step 2 hold true for any continuous Volterra kernel function $K$ . Thus, replacing $K$ in (46) by $K^{(m)}$ in the derivation of (c,d,e,f), we obtain

[TABLE]

which, together with (47), establishes the same inequality (46). ∎

The following corollary is an immediate consequence of Lemma 5.4.

Corollary 5.5.

It holds that

[TABLE]

Proof.

For any input $(\Theta,X)$ satisfying (16), there exists a function $e_{P}(T)$ with $\lim_{T\to\infty}e_{P}(T)=0$ such that

[TABLE]

for all $T>0$ . By the definition of $C_{fb,T}(P)$ , we obtain

[TABLE]

Thus, (48) immediately follows from (39) and the continuity of $P/2$ and $x_{0}(P;\lambda,\kappa)$ on $P$ . ∎

Proof of the Converse Part.

The converse part immediately follows from Theorem 3.1, Lemma 5.4 and Corollary 5.5. ∎

5.2 Proof of the Achievability Part (Lower Bound)

We will first prove the case (2) in Theorem 5.1, which relies on Theorem 4.2.

Case (2).

Note that $Z$ can be regarded as the solution of the following stochastic differential equation

[TABLE]

Set $\eta(t)=\lambda(Z_{0}(t)+\zeta_{0}e^{-\kappa t})$ . Then, the covariance function of $\{\eta(t)\}$ is

[TABLE]

which is continuous at $s=t$ . By [36, Theorem 7.15], it holds that $\mu_{Z}\sim\mu_{B}$ . It then follows from (9)-(11) that there exists a standard Brownian motion $\{V(t)\}$ on $(\Omega,\{\mathcal{F}_{t}(Z)\},\mathbb{P})$ such that

[TABLE]

where the Volterra kernel function $l_{\text{OU}}(s,u)\in L^{2}([0,T]^{2})$ for any $T>0$ .

We now evaluate the resolvent kernel $l_{\text{OU}}(s,u)$ and prove that $l_{\text{OU}}(s,u)$ fulfills all the conditions in Theorem 4.2. It follows from (36) and (37) that

[TABLE]

here $F_{\ast}(t,u)$ is a Volterra kernel function satisfying $F_{\ast}(t,u)=1+\int_{u}^{t}f_{\ast}(s,u)ds$ for any $t\geq u$ , where the Volterra kernel function $f_{\ast}(s,u)=\lambda e^{-\kappa(s-u)}$ for any $s\geq u$ . Since the resolvent kernel $g_{\ast}(s,u)$ of $f_{\ast}(s,u)$ is calculated by

[TABLE]

by (9)-(11), we obtain

[TABLE]

where $G_{\ast}(t,u)$ is the Volterra kernel function satisfying $G_{\ast}(t,u)=1+\int_{u}^{t}g_{\ast}(s,u)ds$ for any $t\geq u$ . Therefore, by (50), we have

[TABLE]

where

[TABLE]

Then, since $g_{\ast}(s,u)$ is the resolvent kernel of $f_{\ast}(s,u)$ , it holds that

[TABLE]

By [37, Lemma 6.2.6], the innovation process $\{V(t)\}$ defined by

[TABLE]

is a standard Brownian motion. The one-dimensional Kalman-Bucy filter [37, Theorem 6.2.8] is applied to estimate $\zeta_{0}$ from the observation equations (51) to yield the following estimate:

[TABLE]

Substituting (52) and (54) into (53), we obtain (49) by a series of elementary calculations. Specifically, $l_{\text{OU}}(s,u)$ is calculated by

[TABLE]

It is easy to see that $l_{\text{OU}}$ satisfies all the conditions in Theorem 4.2. Then, the corresponding $\alpha_{\text{OU}},\beta_{\text{OU}}$ are given by

[TABLE]

and

[TABLE]

respectively. By Theorem 4.2, we have $\overline{I}_{\text{SK}}(\Theta;Y)=Pr_{\text{OU}}^{2}$ , where $r_{\text{OU}}$ is one of the real roots of the following cubic equation

[TABLE]

It is not difficult to see that the equation (55) has the unique positive root for all $-2\kappa\leq\lambda\leq 0$ . Then, substituting $y=\sqrt{x/P}$ into (55), we are able to prove that $\overline{I}_{\text{SK}}(\Theta;Y)$ is the unique positive root of the third-order polynomial (35), which implies

[TABLE]

This, together with Corollary 5.5 and Theorem 3.1, immediately yields

[TABLE]

as desired. ∎

Remark 5.6.

Noting, for a fixed $\kappa>0$ , $C_{fb}(P)$ actually depends on $\lambda$ , which we rewrite as $C_{fb}(P,\lambda;\kappa)$ . It follows from (56) that $C_{fb}(P,\lambda;\kappa)>C_{fb}(P,0;\kappa)=P/2$ if $-2\kappa<\lambda<0$ , where $C_{fb}(P,0;\kappa)$ is feedback capacity of an AWGN channel (2). In other words, “coloring” may increase capacity.

However, the condition that $\lambda>0$ or $\lambda<-2\kappa$ in Case (1) may invalidate the uniqueness of the real root of the cubic equation (55). As a result, it is challenging to determine $\overline{I}_{\text{SK}}(\Theta;Y)$ explicitly, despite the fact that it must be one of the real roots of the polynomial (55). Nevertheless, all real roots of this polynomial must be in $(0,1/\sqrt{2})$ . As a result, $\overline{I}_{\text{SK}}(\Theta;Y)<P/2$ , which suggests that the continuous-time SK coding scheme fails to achieve the capacity $P/2$ in Case (1).

Next, to prove the achievability of Case (1), let us turn our attention to the OU-Colored AWGN channel (7) in the generalized sense. Let

[TABLE]

where we have defined that $\bm{w}(t)=\dot{B}(t)$ , $\bm{u}(t)=\int_{-\infty}^{t}e^{-\kappa(t-u)}dB(u)$ . Then, in the most rigorous terms, the channel (7) should be interpreted as

[TABLE]

where $\mathcal{D}$ is the space of test functions over $\mathbb{R}$ , i.e., all infinitely differentiable real functions with bounded support. Now, for any $T>0$ , let $\mathcal{D}_{T}=\{\phi\in\mathcal{D}:\text{supp}(\phi)\subset[0,T]\}$ , and define

[TABLE]

where the supremum is taken over all positive integers $m,n$ and all test functions $\phi_{1}^{m},\varphi_{1}^{n}\in\mathcal{D}_{T}$ . Then, we consider the so-called $T$ -block non-feedback capacity

[TABLE]

where the supremum is taken over all $\{\bm{x}(t),t\in[0,T]\}$ independent of $Z$ and satisfying the average power constraint

[TABLE]

Furthermore, we define

[TABLE]

provided the limit exists, and define

[TABLE]

where the supremum is taken over all $\bm{x}$ independent of $Z$ and satisfying

[TABLE]

Then, we present the aforementioned coding theorem for $C_{0}(P)$ below.

Theorem 5.7 ([29, Theorem 1]).

Assume that

[TABLE]

If $R<C_{nfb,\infty}(P)$ and $P$ is continuous point of $C_{nfb,\infty}(P)$ , then the rate $R$ is achievable. Conversely, if a rate $R$ is achievable, then $R\leq C_{nfb,\infty}(P)$ .

Then, the proof of achievability of Case (1) will use the following corollary, which gives the explicit formula for $C_{0}(P)$ .

Corollary 5.8.

It holds that

[TABLE]

for all $\lambda\geq 0$ or $\lambda\leq-2\kappa,$ $\kappa>0$ .

The proof relies on the following result, whose proof is very similar to that of [12, Lemma 4] and thus omitted.

Lemma 5.9.

$\{\bm{z}(t)\}$ * is a generalized stationary Gaussian process with spectral density function*

[TABLE]

where $\lambda\in\mathbb{R},\kappa>0$ .

Proof of Corollary 5.8.

Note that it follows from Lemma 5.4 and $C_{nfb,T}(P)\leq C_{fb,T}(P)$ that the condition (57) is fulfilled. Now, we claim that

[TABLE]

which, together with Theorem 5.7, immediately implies (58). To prove (59), it suffices to show that

[TABLE]

since $C_{nfb,\infty}(P)\leq P/2$ follows from Corollary 5.5 and $C_{nfb,\infty}(P)\leq C_{fb,\infty}(P)$ . For each $k\in\mathbb{N}$ , define a function

[TABLE]

Consider a series of zero-mean stationary Gaussian inputs $\{\bm{x}_{k}(t)\}$ with spectral density functions $\{S_{\bm{x},k}(x)\}$ , respectively. Since both $S_{\bm{z}}(x)$ and $S_{\bm{x},k}(x)$ are rational, by [14, Theorem 10.3.1], we have

[TABLE]

Note that $\lim_{x\to\infty}S_{\bm{z}}(x)=1/(2\pi)$ and $S_{\bm{z}}(x)$ is a strictly decreasing (resp., increasing) function on $[0,+\infty)$ (resp., $(-\infty,0]$ ). Thus, by the monotone convergence theorem, we deduce that

[TABLE]

Next, we consider a series of zero-mean stationary Gaussian inputs $\{\bm{x}_{n,k}(t)\}$ with spectral density functions

[TABLE]

Similarly, for a fixed $n$ , the above argument yields

[TABLE]

Then, (60) is immediately derived by letting $n\to\infty$ . The proof is then complete. ∎

Now we can prove the achievability of Case (1).

Case (1).

The achievability part follows immediately from $C_{nfb}(P)\leq C_{fb}(P)$ together with Proposition 5.8. ∎

Appendix A Proof of Theorem 3.4

It is known that $C_{fb,T}(P)$ is achieved by transmitting a Gaussian message process $\Theta$ with an additive feedback term $\zeta$ given by (17). Thus, it suffices to consider any additive feedback coding scheme $(\Theta,X)$ with $X(t)=\Theta(t)-\zeta(t)$ , where $\{\Theta(t)\}$ is Gaussian and $X$ satisfies (15). Note that $\mathbb{E}[\Theta(t)|\mathcal{F}_{t}(Y^{\ast})]$ can be written as

[TABLE]

where $h(t,s)$ is an $L^{2}$ -Volterra kernel. Let

[TABLE]

Substituting (61) into (62) and by (10), the stochastic equation

[TABLE]

has the unique solution $U(t)=Y^{\ast}(t)$ , which implies $\{Y^{\ast}(t)\}$ is uniquely determined by $\{Y(t)\}$ and thus $\mathcal{F}_{t}(Y^{\ast})=\mathcal{F}_{t}(Y)$ whenever $t\leq T$ . Now, let $Y_{\zeta}(t)=\int_{0}^{t}\Theta(s)ds-\int_{0}^{t}\zeta(s)ds+Z(t)$ . It then follows from Lemma 3.3 that

[TABLE]

and

[TABLE]

where $(a)$ follows from Lemma 3.3 and $(b)$ holds true since $\zeta(t)$ is $\mathcal{F}_{t}(Y_{\zeta})$ measurable. The proof is then complete.

Appendix B Proof of Lemma 4.1

We first prove the existence and uniqueness of the solution $g(t)$ . Let $P_{3}(y;t,P)$ denote the polynomial (in $y$ ): $-Py^{3}+P/\sqrt{2}y^{2}+p(t)y+q(t)/\sqrt{2}$ . Since $p(t),q(t)$ are continuous, [38, Theorem (7.6)] gives rise to a unique nonextendible solution $g(t)$ , which is either defined for all $t\geq 0$ or blows up at some $t>0$ . In fact, the domain of $g(t)$ extends to the infinity since it cannot blow up in finite interval. Indeed, by way of contradiction, suppose that there exists $T_{0}<\infty$ such that

[TABLE]

Then, it follows from the continuity of $p(t),q(t)$ at $T_{0}$ that there exists $\epsilon>0$ such that $P_{3}(g(t);t,P)<0$ for all $t\geq T_{0}-\epsilon$ . However, by (22), it holds that $g^{\prime}(t)<0$ for $t\geq T_{0}-\epsilon$ , which contradicts (63), as desired.

Next, we shall prove the “moreover” part. To achieve this, let

[TABLE]

Since

[TABLE]

we have that $\lim_{t\to\infty}P_{3}(y;t,P)=P_{3}(y;P)$ . Next, we deal with the following three cases:

(I)

The cubic $P_{3}(y;P)$ has one real root $(r_{11})$ and two non-real complex conjugate roots $(r_{12}=\bar{r}_{13})$ ;

(II)

The cubic $P_{3}(y;P)$ has three distinct real roots ( $r_{21}<r_{22}<r_{23}$ );

(III)

The cubic $P_{3}(y;P)$ has a simple root $(r_{31})$ and a double root $(r_{32}=r_{33}$ ).

We shall prove that the solution $g(t)$ converges to some real root $r_{ij}$ as $t\to\infty$ for $i,j\in\{1,2,3\}$ case by case. For $x\in\mathbb{R},M>0$ , let $B(x,M)\triangleq(x-M,x+M)$ .

Case (I). Let $\varepsilon>0$ be a sufficiently small constant. It then follows immediately from the continuity of roots of polynomial [39, Theorem B] and (64) that there exists $T_{\varepsilon}>0$ such that for any $t\geq T_{\varepsilon}$ , $P_{3}(y,t;P)$ admits the unique real root $r_{11}(t)$ satisfying

[TABLE]

It then remains to show that there exists $T^{\ast}_{\varepsilon}\geq T_{\varepsilon}$ such that

[TABLE]

Indeed, it then immediately follows from (65) and (66) that $\lim_{t\to\infty}g(t)=r_{11}$ , as desired.

Note that by ODE (22), we have

[TABLE]

Clearly, if $g(T_{\varepsilon})\in\overline{B(r_{11},\varepsilon)}$ , then (66) holds true with $T_{\varepsilon}^{\ast}=T_{\varepsilon}$ . WLOG, we assume in the following that $g(T_{\varepsilon})\notin\overline{B(r_{11},\varepsilon)}$ and $g(T_{\varepsilon})>r_{11}+\varepsilon$ since the proof is similar if $g(T_{\varepsilon})\notin\overline{B(r_{11},\varepsilon)}$ and $g(T_{\varepsilon})<r_{11}-\varepsilon$ . We now claim that there exists $t^{\ast}\geq T_{\varepsilon}$ such that

[TABLE]

To see this, by way of contradiction, we suppose the opposite is true, that is,

[TABLE]

It then follows from (67) that for all $t\geq T_{\varepsilon}$

[TABLE]

which, together with (65), implies that

[TABLE]

Hence, both $\lim_{t\to\infty}g(t)$ and $\lim_{t\to\infty}g^{\prime}(t)$ exist, which implies $\lim_{t\to\infty}g^{\prime}(t)$ $=0$ . Then, by the ODE (22), we have $\lim_{t\to\infty}g(t)=r_{11}$ , which contradicts (68). Consequently, (66) immediately follows from (67) with $T_{\varepsilon}^{\ast}=t^{\ast}$ , as desired.

Case (II). The proof of this case is largely similar to that in Case (I), except that $g(t)$ may converge to the middle root $r_{22}$ as $t\to\infty$ . Indeed, let $\varepsilon>0$ so that $B(r_{2i},\varepsilon)\cap B(r_{2j},\varepsilon)=\varnothing$ for $i\neq j$ . Then, there exists $T_{\varepsilon}>0$ so that the polynomial $P_{3}(y;t,P)$ admits three real roots $\{r_{2j}(t),j=1,2,3\}$ satisfying $r_{2j}(t)\in B(r_{2j},\varepsilon),j=1,2,3,$ for all $t\geq T_{\varepsilon}$ . Consider seven disjoint subintervals of $\mathbb{R}$ : $(-\infty,r_{21}-\varepsilon),\overline{B(r_{21},\varepsilon)},(r_{21}+\varepsilon,r_{22}-\varepsilon),\overline{B(r_{22},\varepsilon)},(r_{22}+\varepsilon,r_{23}-\varepsilon),\overline{B(r_{23},\varepsilon)}$ and $(r_{23}+\varepsilon,+\infty)$ . On the one hand, the same argument in Case (I) yields $\lim_{t\to\infty}g(t)=r_{21}$ if $g(T_{\varepsilon})\in(-\infty,r_{21}-\varepsilon)\cup\overline{B(r_{21},\varepsilon)}\cup(r_{21}+\varepsilon,r_{22}-\varepsilon)$ or $\lim_{t\to\infty}g(t)=r_{23}$ if $g(T_{\varepsilon})\in(r_{22}+\varepsilon,r_{23}-\varepsilon)\cup\overline{B(r_{23},\varepsilon)}\cup(r_{23}+\varepsilon,+\infty)$ . On the other hand, if $g(T_{\varepsilon})\in\overline{B(r_{22},\varepsilon)}$ , then there will be only two subcases for $\{g(t),t\geq T_{\varepsilon}\}$ , i.e., either $g(t)\in\overline{B(r_{22},\varepsilon)}$ for all $t\geq T_{\varepsilon}$ or $g(t^{\prime})\notin\overline{B(r_{22},\varepsilon)}$ for some $t^{\prime}>T_{\varepsilon}$ . The latter subcase can be proved similarly as done before. For the previous subcase, we have $\lim_{t\to\infty}g(t)=r_{22}$ , as desired.

Case (III). WLOG, we assume that $r_{31}<r_{32}=r_{33}$ . Let $\varepsilon>0$ be given such that $B(r_{31},\varepsilon)\cap B(r_{3j},\varepsilon)=\varnothing$ for $j=2,3$ . As in Case (II), it suffices to consider the subcase $g(T_{\varepsilon})\in\overline{B(r_{32},\varepsilon)}$ . By (67), $g(t)$ for $t\geq T_{\varepsilon}$ has two subcases, i.e., either $g(t)\in\overline{B(r_{32},\varepsilon)}$ for all $t\geq T_{\varepsilon}$ or $g(t)\in(r_{31}+\varepsilon,r_{32}-\varepsilon)$ for some $t=T^{\ast}_{\varepsilon}\geq T_{\varepsilon}$ . The former subcase leads to $\lim_{t\to\infty}g(t)=r_{32}$ and the latter subcase of $g(t)$ converges to $r_{31}$ . The proof is then complete.

Appendix C Proofs of (a)-(f)

We shall first give the proofs of (a), (c) and (e) as follows.

Proof of (a).

The equality (a) follows from Lemma 3.3. ∎

Proof of (c).

It is easy to show that $Y^{(n)}_{k}$ is a linear combination of $Y^{\ast,(n)}_{i},i=0,1,...,k$ , and vice versa, which implies (c). ∎

Proof of (e).

From the stationarity of the ARMA(1,1) process (43), it follows that $C_{FB,n}$ is super-additive [26]:

[TABLE]

As a consequence, $C_{FB,n}(P)\leq C_{FB}(P)$ for any $n\in\mathbb{N}$ , which implies (e). ∎

We are now in a position to give the proofs of (b) and (d).

Proof of (b).

Let $\widetilde{\Theta}(t)=\int_{0}^{t}\Theta(s)ds$ for $t\in[0,T]$ . Define $\widetilde{\Theta}^{(n)}=\{\widetilde{\Theta}^{(n)}(t);0\leq t\leq T\}$ and $Y^{\ast,(n)}=\left\{Y^{\ast,(n)}(t);0\leq t\leq T\right\}$ as follows:

[TABLE]

where $\Delta_{n}(t)\triangleq\max\{t^{(n)}_{k}|t\geq t^{(n)}_{k},k\leq n,k\in\mathbb{N}\}$ and $N_{\Delta_{n}}(t)\triangleq\max\{k|t\geq t^{(n)}_{k},k\leq n,k\in\mathbb{N}\}$ . We further define an approximation process $\{\hat{Z}^{(n)}_{0}(t)\}$ of $Z_{0}$ as

[TABLE]

Let $\hat{Z}_{0,k}^{(n)}\triangleq\hat{Z}^{(n)}_{0}(t^{(n)}_{k+1})-\hat{Z}^{(n)}_{0}(t^{(n)}_{k})$ . Then, we have

[TABLE]

Hence, $\{\widetilde{Z}_{k}^{(n)}\}$ defined in (42) can be equivalently written as

[TABLE]

It follows from (36) and (38) that $\{Z(t)\}$ can similarly expressed as:

[TABLE]

Therefore, for any $t\in[0,T]$ , we have

[TABLE]

Hence, we can readily prove that $\{\widetilde{\Theta}^{(n)},Y^{\ast,(n)}\}$ converges in distribution to $\{\widetilde{\Theta},Y^{\ast}\}$ . By the lower semi-continuity of mutual information [13], we obtain

[TABLE]

This, together with $I(\Theta_{0}^{T};Y_{0}^{\ast,T})=I(\widetilde{\Theta}_{0}^{T};Y_{0}^{\ast,T})$ and $I(\widetilde{\Theta}_{0}^{(n),T};Y_{0}^{\ast,(n),T})=I(\{\Theta_{k}^{(n)}\};\{Y_{k}^{\ast,(n)}\})$ , implies (b). ∎

Proof of (d).

Recall that we have constructed an $n$ -block discrete-time ARMA(1,1) Gaussian channel with feedback

[TABLE]

The energy $E(\delta_{n})$ and average power $P(\delta_{n})$ for such a channel can be computed as

[TABLE]

Define a Volterra kernel $K^{(n)}(t,s)\in L^{2}([0,T]^{2})$ by

[TABLE]

and a random process $\{\zeta^{(n)}(t)\}$ by

[TABLE]

respectively. By the assumption (C.0), we have that $\lim_{n\to\infty}\|K^{(n)}-K\|_{2}=0$ , where $\|\cdot\|_{2}$ denotes the usual norm on $L^{2}([0,T]^{2})$ . Thus, it is clear from (40) and (70) that

[TABLE]

Furthermore, set $\Delta Y^{\ast}_{i}=Y^{\ast}(t^{(n)}_{i+1})-Y^{\ast}(t^{(n)}_{i})$ and $\Delta Z_{0,i}=Z_{0}(t^{(n)}_{i+1})-Z_{0}(t^{(n)}_{i})$ respectively. It then follows from (69) and (42) that

[TABLE]

and

[TABLE]

for all $i$ . Therefore, we have

[TABLE]

where (a) follows from the general Lebesgue dominated convergence theorem, and where in (b) we have used the result derived from the assumption (C.0) that

[TABLE]

and another result derived from (73) and (74) that

[TABLE]

Thus, we conclude that

[TABLE]

which follows from (72), (75) and

[TABLE]

Now, by Hölder’s inequality, we have

[TABLE]

which, together with (76), implies that there exists an error function $e(\delta_{n})$ such that

[TABLE]

and

[TABLE]

Then, (d) immediately follows from the definition of $n$ -block capacity. ∎

Note that $\{\widetilde{Z}^{(n)}_{k+1}/\sqrt{\delta_{n}},k=0,1,...,n-1\}$ defined by (42) satisfies

[TABLE]

where $\phi(\delta_{n})=-e^{-\kappa\delta_{n}}$ and $\theta(\delta_{n})=\lambda/\kappa-(\lambda/\kappa+1)e^{-\kappa\delta_{n}}$ .

Proof of (f).

In the following, we deal with the case $\lambda/\kappa\geq-1$ only, since the case $\lambda/\kappa<-1$ can be proved in a parallel manner.

First of all, it is clear that

[TABLE]

Next, we complete the proof by considering the following three cases:

Case 1: $-\kappa\leq\lambda<0$ . For any arbitrarily small $\epsilon>0$ there exists a sufficiently large $N$ such that for $n\geq N$ , $P\delta_{n}+e(\delta_{n})/n\leq(P+\epsilon)\delta_{n}\triangleq P_{\delta_{n}}(\epsilon)$ and $|\theta(\delta_{n})|\leq 1$ . Thus, by Theorem 5.2, we obtain $C_{FB}(P_{\delta_{n}}(\epsilon))=-\log x(\delta_{n})$ , where $x(\delta_{n})$ is the unique positive root of the following polynomial

[TABLE]

By the continuity of roots of polynomial [39, Theorem B], we infer that $\lim_{\delta_{n}\to 0^{+}}x(\delta_{n})=1$ . Moreover, by elementary calculus, it holds that $x(\delta_{n})$ is differentiable in $\delta_{n}$ over $(0,\delta_{N})$ and $\lim_{\delta_{n}\to 0^{+}}x^{\prime}(\delta_{n})$ exists. Since

[TABLE]

$\lim_{n\to\infty}C_{FB}(P_{\delta_{n}}(\epsilon))/\delta_{n}$ exists, which is denoted by $\beta_{P+\epsilon}$ . Then, it holds that

[TABLE]

for $n$ large enough. Now, substituting (78) into (77) and letting $n\to\infty$ , we establish the equation

[TABLE]

Thus, we have

[TABLE]

Letting $\epsilon\to 0$ , we conclude $\lim_{\epsilon\to 0^{+}}\beta_{P+\epsilon}=x_{0}(P;\lambda,\kappa)$ . Thus, we complete the proof of (f) in this case.

Case 2: $\lambda>0$ . By Theorem 5.2 again, the polynomial (77) in Case 1 becomes

[TABLE]

Similarly, we can also obtain that $\beta_{P+\epsilon}=(P+\epsilon)/2$ , as desired.

Case 3: $\lambda=0$ . In this case, the OU-Colored AWGN channel (34) boils down to a white Gaussian channel. Indeed, similarly as above, we can readily show that $\beta_{P+\epsilon}=(P+\epsilon)/2$ , which is our desired result. ∎

Bibliography39

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] L. Koralov and Y. G. Sinai, Theory of Probability and Random Processes . Springer Science & Business Media, 2007.
2[2] I. M. Gel’fand and N. Y. Vilenkin, Generalized Functions: Applications of Harmonic Analysis , vol. 4. Academic press, 2014.
3[3] N. Obata, White Noise Calculus and Fock Space . Springer, 2006.
4[4] S. Ihara, Information Theory for Continuous Systems , vol. 2. World Scientific, 1993.
5[5] T. Kadota, M. Zakai, and J. Ziv, “Mutual information of the white Gaussian channel with and without feedback,” IEEE Transactions on Information Theory , vol. 17, no. 4, pp. 368–371, 1971.
6[6] R. Huang and R. Johnson, “Information capacity of time-continuous channels,” IRE Transactions on Information Theory , vol. 8, no. 5, pp. 191–198, 1962.
7[7] R. Huang and R. Johnson, “Information transmission with time-continuous random processes,” IEEE Transactions on Information Theory , vol. 9, no. 2, pp. 84–94, 1963.
8[8] M. Hitsuda and S. Ihara, “Gaussian channels and the optimal coding,” Journal of Multivariate Analysis , vol. 5, no. 1, pp. 106–118, 1975.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Feedback Capacity of OU-Colored AWGN Channels

Abstract

1 Introduction

2 Notation and Terminlogies

3 Continuous-Time SK Coding

Theorem 3.1** ([29, Theorem 1]).**

Lemma 3.2** ([12, Theorem 1]).**

Lemma 3.3**.**

Theorem 3.4** ([9, Theorem 3] Reformulated).**

4 Mutual Information Rate

Lemma 4.1**.**

Theorem 4.2**.**

Proof.

Remark 4.3**.**

Example 4.4**.**

Example 4.5**.**

5 Capacity of OU-Colored AWGN Channels

Theorem 5.1**.**

5.1 Proof of the Converse Part (Upper Bound)

Theorem 5.2** ([34],[26]111Theorem 5.2 has been stated and proved in [26, Theorem 5.3]. However, a recent paper [35] pointed out that the proof of a key result [26, Corollary 4.4] is incorrect, and as a consequence, the proof of Theorem 5.3 in [26] is invalid.).**

Remark 5.3**.**

Lemma 5.4**.**

Proof.

Corollary 5.5**.**

Proof.

Proof of the Converse Part.

5.2 Proof of the Achievability Part (Lower Bound)

Case (2).

Remark 5.6**.**

Theorem 5.7** ([29, Theorem 1]).**

Corollary 5.8**.**

Lemma 5.9**.**

Proof of Corollary 5.8.

Case (1).

Appendix A Proof of Theorem 3.4

Appendix B Proof of Lemma 4.1

Appendix C Proofs of (a)-(f)

Proof of (a).

Proof of (c).

Proof of (e).

Proof of (b).

Proof of (d).

Proof of (f).

Theorem 3.1 ([29, Theorem 1]).

Lemma 3.2 ([12, Theorem 1]).

Lemma 3.3.

Theorem 3.4 ([9, Theorem 3] Reformulated).

Lemma 4.1.

Theorem 4.2.

Remark 4.3.

Example 4.4.

Example 4.5.

Theorem 5.1.

Theorem 5.2 ([34],[26]111Theorem 5.2 has been stated and proved in [26, Theorem 5.3]. However, a recent paper [35] pointed out that the proof of a key result [26, Corollary 4.4] is incorrect, and as a consequence, the proof of Theorem 5.3 in [26] is invalid.).

Remark 5.3.

Lemma 5.4.

Corollary 5.5.

Remark 5.6.

Theorem 5.7 ([29, Theorem 1]).

Corollary 5.8.

Lemma 5.9.