The Arbitrarily Varying Channel with Colored Gaussian Noise

Uzi Pereg; Yossef Steinberg

arXiv:1901.00929·cs.IT·April 20, 2020

The Arbitrarily Varying Channel with Colored Gaussian Noise

Uzi Pereg, Yossef Steinberg

PDF

TL;DR

This paper investigates the capacity of arbitrarily varying channels with colored Gaussian noise, extending classical models by incorporating frequency domain analysis and game-theoretic insights, and demonstrating the suboptimality of scalar coding.

Contribution

It introduces capacity results for AVCs with colored Gaussian noise using double water filling in the frequency domain, connecting game theory and capacity analysis.

Findings

01

Deterministic and random code capacities are characterized for various AVC models.

02

Double water filling in frequency domain is optimal for AVC with colored Gaussian noise.

03

Scalar coding is suboptimal for the arbitrarily varying Gaussian product channel.

Abstract

We address the arbitrarily varying channel (AVC) with colored Gaussian noise. The work consists of three parts. First, we study the general discrete AVC with fixed parameters, where the channel depends on two state sequences, one arbitrary and the other fixed and known. This model can be viewed as a combination of the AVC and the time-varying channel. We determine both the deterministic code capacity and the random code capacity. Super-additivity is demonstrated, showing that the deterministic code capacity can be strictly larger than the weighted sum of the parametric capacities. In the second part, we consider the arbitrarily varying Gaussian product channel (AVGPC). Hughes and Narayan characterized the random code capacity through min-max optimization leading to a "double" water filling solution. Here, we establish the deterministic code capacity and also discuss the game-theoretic…

Equations574

\displaystyle\mathsf{C}_{1}=\begin{cases}\mathsf{C}^{\,\text{

\mbox{ \small

⋆

}

}}_{1}&\text{if $\Lambda<\Omega$}\,,\\ 0&\text{if $\Lambda\geq\Omega$}\,.\end{cases}

\displaystyle\mathsf{C}_{1}=\begin{cases}\mathsf{C}^{\,\text{

\mbox{ \small

⋆

}

}}_{1}&\text{if $\Lambda<\Omega$}\,,\\ 0&\text{if $\Lambda\geq\Omega$}\,.\end{cases}

\displaystyle\mathcal{A}^{(n)}_{\delta}(p)\triangleq\Big{\{}x^{n}\in\mathcal{X}^{n}:\,\forall\,a\in\mathcal{X}\,,\;

\displaystyle\mathcal{A}^{(n)}_{\delta}(p)\triangleq\Big{\{}x^{n}\in\mathcal{X}^{n}:\,\forall\,a\in\mathcal{X}\,,\;

\displaystyle\hat{P}_{x^{n}}(a)=0\;\text{if $p(a)=0$}\Big{\}}\,.

W_{Y^{n} ∣ X^{n}, S^{n}, T^{n}} (y^{n} ∣ x^{n}, s^{n}, t^{n}) = i = 1 \prod n W_{Y ∣ X, S, T} (y_{i} ∣ x_{i}, s_{i}, t_{i}) .

W_{Y^{n} ∣ X^{n}, S^{n}, T^{n}} (y^{n} ∣ x^{n}, s^{n}, t^{n}) = i = 1 \prod n W_{Y ∣ X, S, T} (y_{i} ∣ x_{i}, s_{i}, t_{i}) .

T^{n} = θ^{n},

T^{n} = θ^{n},

S^{n} \sim i = 1 \prod n q (s_{i} ∣ θ_{i})

S^{n} \sim i = 1 \prod n q (s_{i} ∣ θ_{i})

ϕ^{n} (x^{n}) =

ϕ^{n} (x^{n}) =

l^{n} (s^{n}) =

ϕ^{n} (f (m, θ^{n})) \leq Ω, for all m \in [1 : 2^{n R}] .

ϕ^{n} (f (m, θ^{n})) \leq Ω, for all m \in [1 : 2^{n R}] .

\overline{P}_{Λ} (S ∣ θ^{\infty})

\overline{P}_{Λ} (S ∣ θ^{\infty})

P_{Λ} (S^{n} ∣ θ^{n})

P_{e}^{(n)} (C ∣ s^{n}, θ^{n}) ≜ \frac{1}{2 ^{n R}} m = 1 \sum 2^{n R} y^{n} : g (y^{n}, θ^{n}) \neq = m \sum W_{Y^{n} ∣ X^{n}, S^{n}, T^{n}} (y^{n} ∣ f (m, θ^{n}), s^{n}, θ^{n}) .

P_{e}^{(n)} (C ∣ s^{n}, θ^{n}) ≜ \frac{1}{2 ^{n R}} m = 1 \sum 2^{n R} y^{n} : g (y^{n}, θ^{n}) \neq = m \sum W_{Y^{n} ∣ X^{n}, S^{n}, T^{n}} (y^{n} ∣ f (m, θ^{n}), s^{n}, θ^{n}) .

P_{e}^{(n)} (q, θ^{n}, C) ≜ s^{n} \in S^{n} \sum q (s^{n} ∣ θ^{n}) P_{e}^{(n)} (C ∣ s^{n}, θ^{n}) .

P_{e}^{(n)} (q, θ^{n}, C) \leq ε, for all q \in P_{Λ} (S^{n} ∣ θ^{n}),

P_{e}^{(n)} (q, θ^{n}, C) \leq ε, for all q \in P_{Λ} (S^{n} ∣ θ^{n}),

γ \sum μ (γ) ϕ^{n} (f (m, θ^{n})) \leq Ω, for all m \in [1 : 2^{n R}],

γ \sum μ (γ) ϕ^{n} (f (m, θ^{n})) \leq Ω, for all m \in [1 : 2^{n R}],

P_{e}^{(n)} (q, C^{Γ}) ≜ γ \in Γ \sum μ (γ) P_{e}^{(n)} (q, θ^{n}, C_{γ}) \leq ε, for all q \in P_{Λ} (S^{n} ∣ θ^{n}) .

C_{n} (W^{Q}) = p (x ∣ t) : E ϕ (X) \leq Ω max q (s ∣ t) \in Q in f I_{q} (X; Y ∣ T),

C_{n} (W^{Q}) = p (x ∣ t) : E ϕ (X) \leq Ω max q (s ∣ t) \in Q in f I_{q} (X; Y ∣ T),

C (W^{Q}) = n \to \infty lim inf C_{n} (W^{Q}),

C (W^{Q}) = n \to \infty lim inf C_{n} (W^{Q}),

\displaystyle\mathsf{C}_{n}^{\,\text{

\mbox{ \small

⋆

}

}}(\mathcal{W})\triangleq

\displaystyle\mathsf{C}_{n}^{\,\text{

\mbox{ \small

⋆

}

}}(\mathcal{W})\triangleq

s^{n} \in S^{n} \sum q^{n} (s^{n} ∣ θ^{n}) h (s^{n}, θ^{n}) \leq α_{n},

s^{n} \in S^{n} \sum q^{n} (s^{n} ∣ θ^{n}) h (s^{n}, θ^{n}) \leq α_{n},

\frac{1}{∣Π ( θ ^{n} ) ∣} π \in Π (θ^{n}) \sum h (π s^{n}, θ^{n}) \leq β_{n}, for all s^{n} \in S^{n} such that l^{n} (s^{n}) \leq Λ,

\frac{1}{∣Π ( θ ^{n} ) ∣} π \in Π (θ^{n}) \sum h (π s^{n}, θ^{n}) \leq β_{n}, for all s^{n} \in S^{n} such that l^{n} (s^{n}) \leq Λ,

\displaystyle\mathbb{C}^{\,\text{

\mbox{ \small

⋆

}

}}(\mathcal{W})=\liminf_{n\rightarrow\infty}\mathsf{C}_{n}^{\,\text{

\mbox{ \small

⋆

}

}}(\mathcal{W})\,.

\displaystyle\mathbb{C}^{\,\text{

\mbox{ \small

⋆

}

}}(\mathcal{W})=\liminf_{n\rightarrow\infty}\mathsf{C}_{n}^{\,\text{

\mbox{ \small

⋆

}

}}(\mathcal{W})\,.

\displaystyle\mathsf{C}_{t}^{\,\text{

\mbox{ \small

⋆

}

}}(\Omega,\Lambda)\triangleq\min_{q(s)\,:\;\mathbb{E}l(S)\leq\Lambda}\max_{p(x)\,:\;\mathbb{E}\phi(X)\leq\Omega}I_{q}(X;Y|T=t)=\max_{p(x)\,:\;\mathbb{E}\phi(X)\leq\Omega}\min_{q(s)\,:\;\mathbb{E}l(S)\leq\Lambda}I_{q}(X;Y|T=t)

\displaystyle\mathsf{C}_{t}^{\,\text{

\mbox{ \small

⋆

}

}}(\Omega,\Lambda)\triangleq\min_{q(s)\,:\;\mathbb{E}l(S)\leq\Lambda}\max_{p(x)\,:\;\mathbb{E}\phi(X)\leq\Omega}I_{q}(X;Y|T=t)=\max_{p(x)\,:\;\mathbb{E}\phi(X)\leq\Omega}\min_{q(s)\,:\;\mathbb{E}l(S)\leq\Lambda}I_{q}(X;Y|T=t)

\displaystyle\mathsf{R}_{n}^{\,\text{

\mbox{ \small

⋆

}

}}(\mathcal{W})\triangleq

\displaystyle\mathsf{R}_{n}^{\,\text{

\mbox{ \small

⋆

}

}}(\mathcal{W})\triangleq

\displaystyle\mathsf{R}_{n}^{\,\text{

\mbox{ \small

⋆

}

}}(\mathcal{W})=\mathsf{C}_{n}^{\,\text{

\mbox{ \small

⋆

}

}}(\mathcal{W})\,.

\displaystyle\mathsf{R}_{n}^{\,\text{

\mbox{ \small

⋆

}

}}(\mathcal{W})=\mathsf{C}_{n}^{\,\text{

\mbox{ \small

⋆

}

}}(\mathcal{W})\,.

\displaystyle\mathbb{C}^{\,\text{

\mbox{ \small

⋆

}

}}(\mathcal{W})=\liminf_{n\rightarrow\infty}\mathsf{R}_{n}^{\,\text{

\mbox{ \small

⋆

}

}}(\mathcal{W})\,.

\displaystyle\mathbb{C}^{\,\text{

\mbox{ \small

⋆

}

}}(\mathcal{W})=\liminf_{n\rightarrow\infty}\mathsf{R}_{n}^{\,\text{

\mbox{ \small

⋆

}

}}(\mathcal{W})\,.

s \in S \sum V_{Y ∣ X, S} (y ∣ x_{1}, s) J (s ∣ x_{2}) = s \in S \sum V_{Y ∣ X, S}

s \in S \sum V_{Y ∣ X, S} (y ∣ x_{1}, s) J (s ∣ x_{2}) = s \in S \sum V_{Y ∣ X, S}

\forall x_{1}, x_{2} \in X, y \in Y .

either ∣ I (n) ∣ = 0 or ∣ I (n) ∣ = Ω (n),

either ∣ I (n) ∣ = 0 or ∣ I (n) ∣ = Ω (n),

I (n) = {i \in [1 : n] : W_{Y ∣ X, S} (\cdot ∣ \cdot, \cdot, θ_{i}) is non-symmetrizable} .

\frac{1}{n} i = 1 \sum n p (x ∣ θ_{i}) ϕ (x) \leq Ω,

\frac{1}{n} i = 1 \sum n p (x ∣ θ_{i}) ϕ (x) \leq Ω,

Λ_{n} (p) ≜ min \frac{1}{n} i = 1 \sum n x \in X \sum s \in S \sum p (x ∣ θ_{i}) J_{θ_{i}} (s ∣ x) l (s) = min t \in T \sum x \in X \sum s \in S \sum P_{T} (t) p (x ∣ t J_{t} (s ∣ x) l (s),

Λ_{n} (p) ≜ min \frac{1}{n} i = 1 \sum n x \in X \sum s \in S \sum p (x ∣ θ_{i}) J_{θ_{i}} (s ∣ x) l (s) = min t \in T \sum x \in X \sum s \in S \sum P_{T} (t) p (x ∣ t J_{t} (s ∣ x) l (s),

L_{n}^{*} ≜ p (x ∣ t) : \frac{1}{n} \sum_{i = 1}^{n} p (x ∣ θ_{i}) ϕ (x) \leq Ω max Λ_{n} (p) .

L_{n}^{*} ≜ p (x ∣ t) : \frac{1}{n} \sum_{i = 1}^{n} p (x ∣ θ_{i}) ϕ (x) \leq Ω max Λ_{n} (p) .

C_{n} (W)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

The Arbitrarily Varying Channel with Colored Gaussian Noise

Uzi Pereg* 1* and Yossef Steinberg* 2*

1 Institute for Communications Engineering, Technical University of Munich

2 Department of Electrical Engineering, Technion

Email: [email protected], [email protected]

Abstract

We address the arbitrarily varying channel (AVC) with colored Gaussian noise. The work consists of three parts. First, we study the general discrete AVC with fixed parameters, where the channel depends on two state sequences, one arbitrary and the other fixed and known. This model can be viewed as a combination of the AVC and the time-varying channel. We determine both the deterministic code capacity and the random code capacity. Super-additivity is demonstrated, showing that the deterministic code capacity can be strictly larger than the weighted sum of the parametric capacities.

In the second part, we consider the arbitrarily varying Gaussian product channel (AVGPC). Hughes and Narayan characterized the random code capacity through min-max optimization leading to a “double” water filling solution. Here, we establish the deterministic code capacity and also discuss the game-theoretic meaning and the connection between double water filling and Nash equilibrium. As in the case of the standard Gaussian AVC, the deterministic code capacity is discontinuous in the input constraint, and depends on which of the input or state constraint is higher. As opposed to Shannon’s classic water filling solution, it is observed that deterministic coding using independent scalar codes is suboptimal for the AVGPC.

Finally, we establish the capacity of the AVC with colored Gaussian noise, where double water filling is performed in the frequency domain. The analysis relies on our preceding results, on the AVC with fixed parameters and the AVGPC.

Index Terms:

Arbitrarily varying channel, water filling, colored Gaussian noise, time varying channel, Gaussian product channel, deterministic code, random code.

†† This work was supported by the Israel Science Foundation (grant No. 1285/16).

I Introduction

A channel with colored Gaussian noise was first studied by Shannon [94], introducing the water filling optimal power allocation. This channel is the spectral counterpart of the Gaussian product channel (see e.g. [27, Section 9.5]). Those results led to useful algorithms for DSL and OFDM systems, and were generalized to multiple-input multiple output (MIMO) wireless communication systems as well (see e.g. [99, 38, 12, 11, 93, 41]). Furthermore, for some networks, water filling is performed in multiple stages [26, 111, 113, 114, 71, 105]. A limit formula for the capacity of the general time-varying channel (TVC) is given in [102] (see also [29, 47, 3, 33, 10, 76, 87, 112]). Another relevant setting is that of a finite-state channel, where the state evolves as a Markov chain [110, 74, 14, 73, 46, 100, 98]. In practice, there is often uncertainty regarding channel statistics, due to a variety of causes such as fading in wireless communication [95, 92, 1, 80, 42, 25, 59, 57], memory faults in storage [68, 51, 69, 66], malicious attacks on identification systems [45, 62], and cyber-physical warfare [97, 72, 104]. The arbitrarily varying channel (AVC) is an appropriate model to describe such a situation [16, 73].

Blackwell et al. [16] determined the random code capacity of the general AVC, i.e. the capacity achieved with shared randomness between the encoder and the decoder. It was also demonstrated in [16] that the random code capacity is not necessarily achievable using deterministic codes. A well-known result by Ahlswede [5] is the dichotomy property of the AVC, i.e. the deterministic code capacity, also referred to as ‘capacity’, either equals the random code capacity or else, it is zero. Subsequently, Ericson [37] and Csiszár and Narayan [30] have established a simple single-letter condition, namely non-symmetrizability, which is both necessary and sufficient for the capacity to be positive. Schaefer et al. [91] demonstrated the super-additivity phenomenon, i.e. when the capacity of a product of orthogonal AVCs is strictly larger than the sum of the capacities of the components. Csiszár and Narayan [31, 30] also considered the AVC when input and state constraints are imposed on the user and the jammer, respectively, due to their power limitations. Not only the constrained setting provokes serious technical difficulties analytically, but also, as shown in [30], constraints have a significant effect on the behavior of the capacity. Specifically, it is shown in [30] that dichotomy in the sense of [5] no longer holds when state constraints are imposed on the jammer. That is, the deterministic code capacity of the general AVC can be lower than the random code capacity, and yet non-zero.

The Gaussian AVC is specified by the relation $\mathbf{Y}=\mathbf{X}+\mathbf{S}+\mathbf{Z}$ , where $\mathbf{X}$ and $\mathbf{Y}$ are the input and output sequences, respectively; $\mathbf{S}$ is a state sequence of unknown joint distribution $F_{\mathbf{S}}$ , not necessarily independent nor stationary; and the noise sequence $\mathbf{Z}$ is i.i.d. $\sim\mathcal{N}(0,\sigma^{2})$ . The state sequence can be thought of as if generated by an adversary, or a jammer, who randomizes the channel states arbitrarily in an attempt to disrupt communication. It is also possible for $\mathbf{S}$ to be a deterministic unknown state sequence. It is assumed that the user and the jammer have power limitations, and are subject to input and state constraints, $\frac{1}{n}\sum_{i=1}^{n}X_{i}^{2}\leq\Omega$ and $\frac{1}{n}\sum_{i=1}^{n}S_{i}^{2}\leq\Lambda$ , respectively, where $n$ is the transmission length. In [60], Hughes and Narayan showed that the random code capacity is given by $\mathsf{C}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}_{1}=\frac{1}{2}\log(1+\frac{\Omega}{\sigma^{2}+\Lambda})$ . Subsequently, Csiszár and Narayan [32] showed that the deterministic code capacity is given by

[TABLE]

It is noted in [32] that this result is not a straightforward consequence of the elegant Elimination Technique [5], used by Ahlswede to establish dichotomy for the AVC without constraints. Hosseinigoki and Kosut [57] determined the capacity in multiple side information scenarios for the Gaussian AVC with fast fading. Hughes and Narayan [61] determined the random code capacity of the arbitrarily varying Gaussian product channel (AVGPC), and showed that it is obtained as a “double” water filling solution to an optimization min-max problem, maximizing over input power allocation and minimizing over state power allocation. In the solution, the jammer performs water filling first, attempting to whiten the overall noise as much as possible, and then the user performs water filling taking into account the total interference power, contributed by both the channel noise and the jamming signal [61]. The Gaussian AVC is also considered in [4, 101, 70, 88, 90, 56, 59].

Extensive research has been conducted on other AVC models as well, of which we name a few. Recently, the arbitrarily varying wiretap channel has been extensively studied, as e.g. in [77, 17, 9, 18, 19, 78, 48, 2], including input and state constraints in [13, 64, 40]. The capacity region of the arbitrarily varying multiple access channel (MAC) with and without constraints is characterized in [85, 63, 7, 8]; capacity bounds for the arbitrarily varying broadcast channel are derived in [63, 52]; and for the arbitrarily varying relay channel in [83, 81]. Additional results on arbitrarily varying multi-user channels and constraints are derived e.g. in [108, 24, 50, 106, 84, 65]. Transmission of an arbitrarily varying Wyner-Ziv source over a Gel’fand-Pinsker channel is considered in [109, 107], and related problems were recently presented in [24, 22, 21]. Various Gaussian AVC networks are studied e.g. in [89, 49, 23, 54, 55, 82, 83, 85, 58].

In this paper, we address the AVC with colored Gaussian noise. The body of this manuscript consists of three parts, of which the first and the second can also be viewed as milestones on our path to the main result. First, we study the general discrete AVC with fixed parameters. This model is a combination of the TVC and the AVC, as the channel depends on two state sequences, one arbitrary and the other fixed. We determine both the deterministic code capacity and the random code capacity. Deterministic code super-additivity is demonstrated, showing that the capacity can be strictly larger than the weighted sum of the parametric capacities. In the second part of this paper, we establish the deterministic code capacity of the AVGPC, where there is white Gaussian noise and no parameters. We also give observations and discuss the game-theoretic interpretation of Hughes and Narayan’s random code characterization [61], and the connection between the double water filling solution and the idea of Nash equilibrium in game theory. We further examine the connection between the AVGPC and the product MAC [26, 71] (without a state), pointing out the similarities and differences between the models, results, and interpretation. As in the case of the standard Gaussian AVC, the deterministic code capacity is discontinuous in the input constraint, and depends on which of the input or state constraint is higher. As opposed to Shannon’s classic water filling solution [94], it is observed that deterministic coding using independent scalar codes is suboptimal for the AVGPC. Finally, we establish the capacity of the AVC with colored Gaussian noise, where double water filling is performed in the frequency domain.

While the results on the AVC with fixed parameters and on the AVGPC stand in their own right, they also play a key role in our proof of the main capacity theorem for the AVC with colored Gaussian noise. In the random code analysis for the AVC with fixed parameters, we modify Ahlswede’s Robustification Technique (RT) [6]. Essentially, the RT uses a reliable code for the compound channel to construct a random code for the AVC applying random permutations to the codeword symbols. A straightforward application of Ahlswede’s RT does not work here, since the user cannot apply permutations to the parameter sequence. Hence, we give a modified RT which is restricted to permutations that do not affect the parameter sequence, i.e. such that the parameter sequence is an eigenvector of all of our permutation matrices. The second part of the paper builds on identifying the symmetrizing jamming strategies and minimal symmetrizability costs for the AVGPC. At last, we use the results on the AVC with fixed parameters and the AVGPC in our proof of the capacity theorem for the AVC with colored Gaussian noise. By orthogonalization of the noise covariance, the AVC with colored Gaussian noise is transformed into an AVC with fixed parameters, which are determined by the spectral representation of the noise covariance matrix. This in turn yields double water-filling optimization in analogy to the AVGPC.

II Channels with Fixed Parameters

In this section we consider the AVC with fixed parameters. The results in this section will be used to analyze the AVC with colored Gaussian noise.

II-A Notation

We use the following notation. Calligraphic letters $\mathcal{X},\mathcal{S},\mathcal{T},\mathcal{Y},...$ are used for finite sets. Lowercase letters $x,s,t,y,\ldots$ stand for constants and values of random variables, and uppercase letters $X,S,T,Y,\ldots$ stand for random variables. The distribution of a random variable $X$ is specified by a probability mass function (pmf) $P_{X}(x)=p(x)$ over a finite set $\mathcal{X}$ . The set of all pmfs over $\mathcal{X}$ is denoted by $\mathcal{P}(\mathcal{X})$ . The set of all probability kernels $p(x|t)$ is denoted by $\mathcal{P}(\mathcal{X}|\mathcal{T})$ . We use $x^{j}=(x_{1},x_{2},\ldots,x_{j})$ to denote a sequence of letters from $\mathcal{X}$ . A random sequence $X^{n}$ and its distribution $P_{X^{n}}(x^{n})=p(x^{n})$ are defined accordingly. For a pair of integers $i$ and $j$ , $1\leq i\leq j$ , we define the discrete interval $[i:j]=\{i,i+1,\ldots,j\}$ .

The type $\hat{P}_{x^{n}}$ of a given sequence $x^{n}$ is defined as the empirical distribution $\hat{P}_{x^{n}}(a)=N(a|x^{n})/n$ for $a\in\mathcal{X}$ , where $N(a|x^{n})$ is the number of occurrences of the symbol $a$ in the sequence $x^{n}$ . A type class is denoted by $\mathcal{T}^{n}(\hat{P})=\{x^{n}\,:\;\hat{P}_{x^{n}}=\hat{P}\}$ . Similarly, define the joint type $\hat{P}_{x^{n},y^{n}}(a,b)=N(a,b|x^{n},y^{n})/n$ for $a\in\mathcal{X}$ , $b\in\mathcal{Y}$ , where $N(a,b|x^{n},y^{n})$ is the number of occurrences of the symbol pair $(a,b)$ in the sequence $(x_{i},y_{i})_{i=1}^{n}$ . Then, a conditional type is defined as $\hat{P}_{x^{n}|y^{n}}(a,b)=\hat{P}_{x^{n},y^{n}}(a,b)/\hat{P}_{y^{n}}(b)$ . Furthermore, we define the $\delta$ -typical set $\mathcal{A}^{(n)}_{\delta}(p)$ with respect to a distribution $p(x)$ by

[TABLE]

The distribution of a real random variable $Z\in\mathbb{R}$ is represented by a cumulative distribution function (cdf) $F_{Z}(z)=\Pr\left(Z\leq z\right)$ over the real line, or alternatively, the probability density function (pdf) $f_{Z}(z)$ , when it exists. The notation $\mathbf{z}=(z_{1},z_{2},\ldots,z_{n})$ is used when it is understood from the context that the length of the sequence is $n$ , and the $\ell^{2}$ -norm of $\mathbf{z}$ is denoted by $\left\lVert\mathbf{z}\right\rVert$ . The trace of a matrix $A\in\mathbb{R}^{m\times n}$ is denoted by $\mathrm{tr}(A)$ .

II-B Channel Description

A state-dependent discrete memoryless channel (DMC) with parameters $(\mathcal{X}\times\mathcal{S}\times\mathcal{T},W_{Y|X,S,T},\mathcal{Y})$ consists of finite input alphabet $\mathcal{X}$ , state alphabet $\mathcal{S}$ , parameters alphabet $\mathcal{T}$ , output alphabet $\mathcal{Y}$ , and a conditional pmf $W_{Y|X,S,T}$ over $\mathcal{Y}$ . The channel is without feedback, and it is memoryless when conditioned on the state and parameter sequences, i.e.

[TABLE]

The AVC with fixed parameters is a DMC $W_{Y|X,S,T}$ where the parameter sequence is fixed, while the state sequence has an unknown distribution, not necessarily independent nor stationary. That is, the parameter is sequence is given by

[TABLE]

where $\theta_{1},\theta_{2},\ldots$ is a given sequence of letters from $\mathcal{T}$ , known to the encoder, decoder, and jammer. Whereas, the state sequence $S^{n}\sim q(s^{n}|\theta^{n})$ with an unknown joint pmf $q(s^{n}|\theta^{n})$ over $\mathcal{S}^{n}$ . In particular, $q(s^{n}|\theta^{n})$ could give mass $1$ to some state sequence $s^{n}$ . The AVC with fixed parameters is denoted by $\mathcal{W}=\{W_{Y|X,S,T},\theta^{\infty}\}$ , where $\theta^{\infty}$ is a short notation for the sequence $(\theta_{i})_{i=1}^{\infty}$ .

The compound channel with fixed parameters is used as a tool in the analysis. Different models of compound channels are described in the literature [29]. Here, the compound channel with fixed parameters is a DMC $W_{Y|X,S,T}$ where the state has a conditional product distribution $q(s|t)$ that is not known in exact, but rather belongs to a family of conditional distributions $\mathcal{Q}$ , with $\mathcal{Q}\subseteq\mathcal{P}(\mathcal{S}|\mathcal{T})$ . That is,

[TABLE]

with an unknown conditional pmf $q(s|t)\in\mathcal{Q}$ . We note that this differs from the classical definition of the compound channel, as in [29], where the state is fixed throughout the transmission.

*Remark 1**.*

Note that the special case of a channel $W_{Y|X,S,T=t}$ , with a constant parameter $\theta_{i}=t$ for $i=1,2,\ldots$ , reduces to the standard state-dependent DMC. Thereby, the AVC $\mathcal{W}_{t}=\{W_{Y|X,S,T=t}\}$ with a constant parameter can be regarded as the traditional AVC, as introduced by Blackwell et al. [16]. On the other hand, the special case of a channel $W_{Y|X,S,T}=W_{Y|X,T}$ , which does not depend on the state $S$ , reduces to a TVC [102].

*Remark 2**.*

The AVC with colored Gaussian noise does not fit the description above. Nevertheless, the fixed parameters model is a crucial tool for our final goal, i.e. to determine the capacity of the AVC with colored Gaussian noise.

II-C Coding

We introduce some preliminary definitions.

*Definition 1** (Code).*

A $(2^{nR},n)$ code for the AVC $\mathcal{W}$ with fixed parameters consists of the following; a message set $[1:2^{nR}]$ , where $2^{nR}$ is assumed to be an integer, an encoding function $\mathrm{f}:[1:2^{nR}]\times\mathcal{T}^{n}\rightarrow\mathcal{X}^{n}$ , and a decoding function $g:\mathcal{Y}^{n}\times\mathcal{T}^{n}\rightarrow[1:2^{nR}]$ .

Given a message $m\in[1:2^{nR}]$ and and a parameter sequence $\theta^{n}$ , the encoder transmits the codeword $x^{n}=\mathrm{f}(m,\theta^{n})$ . The decoder receives the channel output $y^{n}$ , and finds an estimate of the message $\hat{m}=g(y^{n},\theta^{n})$ . We denote the code by $\mathscr{C}=\left(\mathrm{f}(\cdot,\cdot),g(\cdot,\cdot)\right)$ .

We proceed now to coding schemes when using stochastic-encoder stochastic-decoder pairs with common randomness.

*Definition 2** (Random code).*

A $(2^{nR},n)$ random code for the AVC $\mathcal{W}$ with fixed parameters consists of a collection of $(2^{nR},n)$ codes $\{\mathscr{C}_{\gamma}=(\mathrm{f}_{\gamma},g_{\gamma})\}_{\gamma\in\Gamma}$ , along with a probability distribution $\mu(\gamma)$ over the code collection $\Gamma$ . We denote such a code by $\mathscr{C}^{\,\Gamma}=(\mu,\Gamma,\{\mathscr{C}_{\gamma}\}_{\gamma\in\Gamma})$ .

II-D Input and State Constraints

Next, we consider input constraints and state constraint, imposed on the encoder and the jammer, respectively. We note that the constraints specifications are known to both the user and the jammer in this model. Let $\phi:\mathcal{X}\rightarrow[0,\infty)$ , $k=1,2$ , and $l:\mathcal{S}\rightarrow[0,\infty)$ be some given bounded functions, and define

[TABLE]

Let $\Omega>0$ and $\Lambda>0$ . Below, we specify the input constraint $\Omega$ and state constraint $\Lambda$ , corresponding to the functions $\phi^{n}(x^{n})$ and $l^{n}(s^{n})$ , respectively. It is assumed that for some $a\in\mathcal{X}$ and $b\in\mathcal{S}$ , $\phi(a)=l(b)=0$ .

As the parameter sequence $\theta^{\infty}\equiv(\theta_{i})_{i=1}^{\infty}$ is fixed and known to the encoder, the decoder and the jammer, the input and state constraints below are specified for a particular sequence. Given an input constraint $\Omega$ , the encoding function needs to satisfy

[TABLE]

That is, the input sequence satisfies $\phi^{n}(X^{n})\leq\Omega$ with probability $1$ .

Moving to the state constraint $\Lambda$ , we have different definitions for the AVC and for the compound channel. The compound channel has a constraint on average, where the state sequence satisfies $\mathbb{E}_{q}l^{n}(S^{n})\leq\Lambda$ , while the AVC has an almost-surely constraint, $l^{n}(S^{n})\leq\Lambda$ with probability (w.p.) $1$ . Explicitly, we say that a compound channel is under a state constraint $\Lambda$ if $\mathcal{Q}\subseteq\overline{\mathcal{P}}_{\Lambda}(\mathcal{S}|\theta^{\infty})$ , where

[TABLE]

This includes the case of a deterministic unknown state sequence, i.e. when $q$ gives probablity $1$ to a particular $s^{n}\in\mathcal{S}^{n}$ with $l^{n}(s^{n})\leq\Lambda$ .

II-E Capacity Under Constraints

We move to the definition of an achievable rate and the capacity of the AVC $\mathcal{W}$ with fixed parameters under input and state constraints. Codes over the AVC $\mathcal{W}$ with fixed parameters are defined as in Definition 1, with the additional constraint (8) on the codebook.

Define the conditional probability of error of a code $\mathscr{C}$ given a state sequence $s^{n}\in\mathcal{S}^{n}$ by

[TABLE]

*Definition 3** (Achievable rate and capacity under constraints).*

A code $\mathscr{C}=(\mathrm{f},g)$ is a called a $(2^{nR},n,\varepsilon)$ code for the AVC $\mathcal{W}$ with fixed parameters under input constraint $\Omega$ and state constraint $\Lambda$ , when (8) is satisfied and

[TABLE]

or, equivalently, $P_{e}^{(n)}(\mathscr{C}|s^{n},\theta^{n})\leq\varepsilon$ for all $s^{n}\in\mathcal{S}^{n}$ with $l^{n}(s^{n})\leq\Lambda$ .

We say that a rate $R\geq 0$ is achievable under constraints if for every $\varepsilon>0$ and sufficiently large $n$ , there exists a $(2^{nR},n,\varepsilon)$ code for the AVC $\mathcal{W}$ with fixed parameters under input constraint $\Omega$ and state constraint $\Lambda$ . The operational capacity is defined as the supremum of achievable rates, and it is denoted by $\mathbb{C}(\mathcal{W})$ . We use the term ‘capacity’ referring to this operational meaning, and in some places we call it the deterministic code capacity in order to emphasize that achievability is measured with respect to deterministic codes.

Analogously to the deterministic case, a $(2^{nR},n,\varepsilon)$ random code $\mathscr{C}^{\Gamma}$ satisfies the requirements

[TABLE]

The capacity region achieved by random codes is then denoted by $\mathbb{C}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\mathcal{W})$ , and it is referred to as the random code capacity.

The definitions above are naturally extended to the compound channel with fixed parameters, under input constraints $\Omega$ and state constraint $\Lambda$ , by limiting the requirements (8), (12) and (13) to conditionally memoryless state distributions $q\in\mathcal{Q}$ . The respective deterministic code capacity $\mathbb{C}(\mathcal{W}^{\mathcal{Q}})$ and random code capacity $\mathbb{C}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\mathcal{W}^{\mathcal{Q}})$ are defined accordingly.

III Main Results – Channels with Fixed Parameters

In this section, we establish the random code capacity of the AVC with fixed parameters. To this end, we first give an auxiliary result on the compound channel.

III-A The Compound Channel with Fixed Parameters

We begin with the capacity theorem for the compound channel $\mathcal{W}^{\mathcal{Q}}=\{W_{Y|X,S,T},\mathcal{Q},\theta^{\infty}\}$ . This is an auxiliary result, obtained by a simple extension of [29, Exercise 6.8]. A similar result appears in [74] as well. Given a parameter squence $\theta^{n}$ of a fixed length, define

[TABLE]

with $(T,S,X)\sim P_{T}(t)p(x|t)q(s|t)$ , where $P_{T}$ is the type of the parameter sequence $\theta^{n}$ .

*Lemma 1**.*

The capacity of the compound channel $\mathcal{W}^{\mathcal{Q}}$ with fixed parameters, under input constraint $\Omega$ and state constraint $\Lambda$ , is given by

[TABLE]

and it is identical to the random code capacity, i.e. $\mathbb{C}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\mathcal{W}^{\mathcal{Q}})=\mathbb{C}(\mathcal{W}^{\mathcal{Q}})$ .

The proof of Lemma 1 is given in Appendix A.

III-B The AVC with Fixed Parameters – Random Code Capacity

We determine the random code capacity of the AVC with fixed parameters, $\mathcal{W}=\{W_{Y|X,S,T},\theta^{\infty}\}$ , under input constraint $\Omega$ and state constraint $\Lambda$ . The random code derivation is based on our result on the compound channel with fixed parameters and a variation of Ahlswede’s Robustification Technique (RT). Define

[TABLE]

We begin with a lemma, based on Ahlswede’s RT [6] (see also [82, Lemma 9]). We modify it here to include the parameter sequence $\theta^{n}$ and the constraint on the family of conditional state distributions $q(s|t)$ .

*Lemma 2** (Modified RT).*

Let $h:\mathcal{S}^{n}\times\mathcal{T}^{n}\rightarrow[0,1]$ be a given function. If, for some fixed $\alpha_{n}\in(0,1)$ , and for all $q^{n}(s^{n}|\theta^{n})=\prod_{i=1}^{n}q(s_{i}|\theta_{i})$ , with $q\in\overline{\mathcal{P}}_{\Lambda}(\mathcal{S}|\theta^{\infty})$ ,

[TABLE]

then,

[TABLE]

where $\Pi(\theta^{n})$ is the set of all $n$ -tuple permutations $\pi:\mathcal{S}^{n}\rightarrow\mathcal{S}^{n}$ such that $\pi\theta^{n}=\theta^{n}$ , and $\beta_{n}=(n+1)^{|\mathcal{S}||\mathcal{T}|}\alpha_{n}$ .

Originally, Ahlswede’s RT is stated so that (17) holds for any $q(s)\in\mathcal{P}(\mathcal{S})$ , without state constraint (see [6]), and without conditioning on the parameter sequence $\theta^{n}$ . We give the proof of Lemma 2 in Appendix B. Next, we give our random code capacity theorem.

*Theorem 3**.*

The random code capacity of the AVC $\mathcal{W}$ with fixed parameters, under input constraint $\Omega$ and state constraint $\Lambda$ , is given by

[TABLE]

The proof of Theorem 3 is given in Appendix C. The proof is based on our extension of Ahlswede’s RT above. Essentially, we use a reliable code for the compound channel to construct a random code for the AVC by applying random permutations to the codeword symbols. However, here, we only use permutations that do not affect the parameter sequence $\theta^{n}$ . The result above plays a central role in the proof of the capacity theorem in Section V, where the AVC with colored Gaussian noise is considered.

We also give an equivalent formulation in terms of the random code capacity of the traditional AVC. As mentioned in Remark 1, the case of an AVC $\{W_{Y|X,S,T=t}\}$ with a constant parameter $\theta_{i}=t$ reduces to the traditional AVC under input and state constraints. For this channel, Csiszár and Narayan [31] showed that the random code capacity is given by

[TABLE]

where the last equality is due to the minimax theorem [96]. Then, define

[TABLE]

*Lemma 4**.*

[TABLE]

The proof of Lemma 4 is given in Appendix D. Theorem 3 and Lemma 4 yield the following consequence.

*Corollary 5**.*

The random code capacity of the AVC $\mathcal{W}$ with fixed parameters, under input constraint $\Omega$ and state constraint $\Lambda$ , is given by

[TABLE]

The corollary will also be useful in our analysis of the AVC with colored Gaussian noise.

III-C The AVC with Fixed Parameters – Deterministic Code Capacity

We move to the deterministic code capacity of the AVC with fixed parameters, $\mathcal{W}=\{W_{Y|X,S,T},\theta^{\infty}\}$ , under input constraint $\Omega$ and state constraint $\Lambda$ .

III-C1 Capacity Theorem

Before we state the capacity theorem, we give a few definitions. We begin with symmetrizability of a channel without parameters.

*Definition 4** (see [30]).*

A state-dependent DMC $V_{Y|X,S}$ is said to be symmetrizable if for some conditional distribution $J(s|x)$ ,

[TABLE]

Equivalently, the channel $\widetilde{V}(y|x_{1},x_{2})$ $=$ $\sum_{s\in\mathcal{S}}V_{Y|X,S}(y|x_{1},s)J(s|x_{2})$ is symmetric, i.e. $\widetilde{V}(y|x_{1},x_{2})=\widetilde{V}(y|x_{2},x_{1})$ , for all $x_{1},x_{2}\in\mathcal{X}$ and $y\in\mathcal{Y}$ . We say that such a $J:\mathcal{X}\rightarrow\mathcal{S}$ symmetrizes $V_{Y|X,S}$ .

Intuitively, symmetrizability identifies a poor channel, where the jammer can impinge the communication scheme by randomizing the state sequence $S^{n}$ according to $J^{n}(s^{n}|x_{2}^{n})=\prod_{i=1}^{n}J(s_{i}|x_{2,i})$ , for some codeword $x_{2}^{n}$ . Suppose that the transmitted codeword is $x_{1}^{n}$ . The codeword $x_{2}^{n}$ can be thought of as an impostor sent by the jammer. Now, since the “average channel” $\widetilde{V}$ is symmetric with respect to $x_{1}^{n}$ and $x_{2}^{n}$ , the two codewords appear to the receiver as equally likely. Indeed, by [37], if the AVC $\{V_{Y|X,S}\}$ without parameters and free of constraints is symmetrizable, then its capacity is zero.

We will assume that either the channels $W_{Y|X,S}(\cdot|\cdot,\cdot,\theta_{i})$ are all symmetrizable, or the number of non-symmetrizable channels grows linearly with $n$ . That is,

[TABLE]

The asymptotic notation $f(n)=\mathbf{\Omega}(n)$ means that there exist $n_{0}>0$ and $0<\alpha\leq 1$ such that $f(n)\geq\alpha n$ for all $n\geq n_{0}$ . An intuitive explanantion for this assumption is given in Remark 3 below. Next, we define a symmetrizability cost and threshold for the AVC with fixed parameters. For every $n$ and $p(x|t)$ with

[TABLE]

define the minimal symmetrizability cost by

[TABLE]

where the minimization is over the conditional distributions $J_{t}(s|x)$ that symmetrize $W_{Y|X,S,T}(\cdot|\cdot,\cdot,t)$ , for $t\in\mathcal{T}$ (see Definition 4). We use the convention that a minimum value over an empty set is $+\infty$ . Note that the last equality in (27) holds since $P_{T}$ is defined as the type of the parameter sequence $\theta^{n}$ , hence averaging over time is the same as averaging according to $P_{T}$ . In addition, define the symmetrizability threshold

[TABLE]

Intuitively, $\widetilde{\Lambda}_{n}(p)$ is the minimal average state cost which the jammer has to pay to symmetrize the channel at each time instance, for a given conditional input distribution $p(x|t)$ . If this minimal state cost violates the state constraint $\Lambda$ , then the jammer is prohibited from symmetrizing the channel. Indeed, we will show that if there exists an input distribution $p(x|t)$ with $\frac{1}{n}\sum_{i=1}^{n}p(x|\theta_{i})\phi(x)\leq\Omega$ and $\widetilde{\Lambda}_{n}(p)>\Lambda$ for large $n$ , then the deterministic code capacity is positive. The symmetrizability threshold $L_{n}^{*}$ is the worst symmetrizability cost from the jammer’s perspective.

Our capacity result is stated below. Let

[TABLE]

with $(T,S,X)\sim P_{T}(t)p(x|t)q(s|t)$ , where $P_{T}$ is the type of the parameter sequence $\theta^{n}$ with a fixed length $n$ .

*Theorem 6**.*

Assume that $L_{n}^{*}\neq\Lambda$ for sufficiently large $n$ and that (25) holds. The capacity of an AVC $\mathcal{W}$ with fixed parameters, under input constraint $\Omega$ and state constraint $\Lambda$ , is given by

[TABLE]

In particular, if the channels $W_{Y|X,S,T}(\cdot|\cdot,\cdot,t)$ , $t\in\mathcal{T}$ , are non-symmetrizable, then $\mathbb{C}(\mathcal{W})=\mathbb{C}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\mathcal{W})=\,$ $\liminf\limits_{n\rightarrow\infty}\mathsf{C}_{n}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\mathcal{W})$ . That is, the deterministic code capacity coincides with the random code capacity.

The proof of Theorem 6 is given in Appendix G. The theorem will also play a central role in the proof of the capacity theorem in Section V.

*Remark 3**.*

Observe that the second part of the theorem implies that for the case where there are no constraints, i.e. $\Omega=\phi_{max}$ and $\Lambda=l_{max}$ , non-symmetrizability is a sufficient condition for positive capacity. Specfically, according to the definition of $\widetilde{\Lambda}_{n}(p)$ , $L_{n}^{*}$ in (27)-(28), if some of the channels $W_{Y|X,S,T}(\cdot|\cdot,\cdot,\theta_{i})$ are non-symmetrizable, then the symmetrizability threshold is $L_{n}^{*}=\infty$ , hence the capacity is positive. Intuitively, if the number of such channels is constant, i.e. $|\mathcal{I}(n)|=c$ for all $n$ , it seems that this assignment of $L_{n}^{*}$ does not make sense, since the user cannot achieve positive rates by coding over a negligible fraction of the block. Yet, our assumption in (25) excludes this scenario. In particular, if $|\mathcal{I}(n)|$ is non-zero, then we assume that $|\mathcal{I}(n)|$ grows linealy in $n$ , in which case positive rates can be achieved by coding over the part of the block that lies within $\mathcal{I}(n)$ . Furthermore, without constraints, we may replace the linear growth assumption with a poly-logarithmic one, i.e. $|\mathcal{I}(n)|=\mathbf{\Omega}((\log n)^{a})$ , with $a>1$ . Indeed, based on Ahlswede’s elimination technique [5], the random code capacity can be achieved with a code collection of polynomial size, $|\Gamma|=n^{2}$ . Therefore, without state constraints, the random element $\gamma\in\Gamma$ can be reliably sent to the receiver over the sub-block $\mathcal{I}(n)$ , at rate $\rho_{n}=\frac{\log|\Gamma|}{(\log n)^{a}}=2(\log n)^{-(a-1)}$ , which tends to zero as $n\rightarrow\infty$ , hence the decrease in the overall rate is negligible as well. We deduce that if $|\mathcal{I}(n)|=\mathbf{\Omega}((\log n)^{a})$ , then the deterministic code capacity of the AVC with fixed parameters without constraints is the same as the random code capacity, i.e.

[TABLE]

*Remark 4**.*

Even in the case where there are no parameters, the boundary case where $L_{n}^{*}=\Lambda$ is an open problem. Although, for the traditional AVC, it is conjectured in [30] that the capacity is zero in this case. Similarly, we conjecture that the capacity of the AVC with fixed parameters is given by $\mathbb{C}(\mathcal{W})=\liminf\limits_{n\rightarrow\infty}\mathsf{C}_{n}(\mathcal{W})$ for all values of $\{L_{n}^{*}\}_{n\geq 1}$ , provided that (25) holds. There are special cases where we know that this holds, given in the corollary below. The corollary is based on the remark following Theorem 3 in [30].

*Corollary 7**.*

Let $\mathcal{W}$ be an AVC with fixed parameters such that all channels $W_{Y|X,S,T}(\cdot|\cdot,\cdot,t)$ , $t\in\mathcal{T}$ , are symmetrizable. If the minimum in (27) is attained by a [math]- $1$ law, for every $n$ and $p(x|t)$ with $\frac{1}{n}\sum_{i=1}^{n}p(x|\theta_{i})\phi(x)\leq\Omega$ , then

[TABLE]

The proof of Corollary 7 is given in Appendix H. In particular, we note that the condition of [math]- $1$ law in Corollary 7 holds when the output $Y$ is a deterministic function of $X$ , $S$ , and $T$ . As opposed to Theorem 6, the statement in Corollary 7 holds for all values of $\{L_{n}^{*}\}_{n\geq 1}$ .

III-C2 Decoding Rule

We specify the decoding rule and state the corresponding properties, which are used in the analysis. To specify the decoding rule, we define the decoding sets $\mathcal{D}(m)\subseteq\mathcal{Y}^{n}\times\mathcal{T}^{n}$ , for $m\in[1:2^{nR}]$ , such that $g(y^{n},\theta^{n})=m$ iff $(y^{n},\theta^{n})\in\mathcal{D}(m)$ .

*Definition 5** (Decoder).*

Given the codebook $\{\mathrm{f}(m,\theta^{n})\}_{m\in[1:2^{nR}]}$ , declare that $(y^{n},\theta^{n})\in\mathcal{D}(m)$ if there exists $s^{n}\in\mathcal{S}^{n}$ with $l^{n}(s^{n})\leq\Lambda$ such that the following hold.

For $(T,X,S,Y)$ that is distributed according to the joint type $\hat{P}_{\theta^{n},\mathrm{f}(m,\theta^{n}),s^{n},y^{n}}$ , we have that

[TABLE] 2. 2)

For every $\widetilde{m}\neq m$ such that for some $\widetilde{s}^{n}\in\mathcal{S}^{n}$ with $l^{n}(\widetilde{s}^{n})\leq\Lambda$ ,

[TABLE]

where $(T,\widetilde{X},\widetilde{S},Y)\sim\hat{P}_{\theta^{n},\mathrm{f}(\widetilde{m},\theta^{n}),\widetilde{s}^{n},y^{n}}$ , we have that

[TABLE]

We note that in Definition 5, the variables $T,X,\widetilde{X},S,\widetilde{S},Y$ are dummy random variables, distributed according to the joint type of $(\theta^{n},\mathrm{f}(m,\theta^{n}),$ $\mathrm{f}(\widetilde{m},\theta^{n}),$ $s^{n},\widetilde{s}^{n},y^{n})$ , where $\mathrm{f}(m,\theta^{n})$ is a “tested” codeword, $\mathrm{f}(\widetilde{m},\theta^{n})$ is a competing codeword, $s^{n}$ is a “tested” state sequence, $\widetilde{s}^{n}$ is a competing state sequence, and $y^{n}$ is the received sequence. None of the sequences are random here. We may have that the conditional type $P_{Y|X,S,T}$ differs from the actual channel $W_{Y|X,S,T}$ . Therefore, the divergences and mutual informations in Definition 5 could be positive.

For the definition above to be proper, the decoding sets need to be disjoint, as stated in the following lemma.

*Lemma 8** (Decoding Disambiguity).*

Suppose that in each codebook, all codewords have the same conditional type, i.e. $\hat{P}_{\mathrm{f}(m,\theta^{n})|\theta^{n}}=p$ for all $m\in[1:2^{nR}]$ . Assume (25) holds, that for some $\delta_{0},\delta_{1}>0$ , $P_{T}(t)\geq\delta_{0}$ , $p(x|t)\geq\delta_{1}$ , $\forall x\in\mathcal{X}$ , $t\in\mathcal{T}$ , and also

[TABLE]

Then, for sufficiently small $\eta>0$ ,

[TABLE]

The proof of Lemma 8 is given in Appendix E.

III-C3 Codebook Generation

We now extend Csiszár and Narayan’s lemma for the codebook generation [30].

*Lemma 9** (Codebooks Generation).*

For every $\varepsilon>0$ , sufficiently large $n$ , rate $R\geq\varepsilon$ and conditional type $p(x|t)$ , there exist a set of codewords $\{x^{n}(m,\theta^{n})\}_{m\in[1:2^{nR}]}$ of conditional type $p$ , such that for every $a^{n}\in\mathcal{X}^{n}$ and $s^{n}\in\mathcal{S}^{n}$ with $l^{n}(s^{n})\leq\Lambda$ , and every joint type $P_{T,X,\widetilde{X},S}$ with $P_{X|T}=P_{\widetilde{X}|T}=p$ , the following hold.

[TABLE]

and

[TABLE]

The proof of Lemma 9 is given in Appendix F.

III-D Super-Additivity

We also give an equivalent formulation with a sum over $i\in[1:n]$ . Here, as opposed to the previous section, the formula cannot be expressed in terms of the capacities of the constant-parameter AVCs $\{W_{Y|X,S,T=\theta_{i}}\}$ . Considering the AVC without constraints, Schaefer et al. [91] showed that the capacity of any product AVC that is composed of a symmetrizable channel and a non-symmetrizable channel is larger than the sum of the individual capacities (see Theorem 6 in [91]). Similarly, we give an example at the end of this section where the capacity of the AVC with fixed parameters is larger than the weighted sum of the capacities of the constant-parameter AVCs $\{W_{Y|X,S,T=\theta_{i}}\}$ . This phenomenon can be viewed as an instance of the super-additivity property in [91].

We begin with constant-parameter definitions, i.e. for a fixed $T=t$ . For every input distribution $p(x)$ with $\mathbb{E}\phi(X)\leq\Omega$ , define the constant-parameter minimal symmetrizability cost by

[TABLE]

where the minimization is over the distributions $J(s|x)$ that symmetrize $W_{Y|X,S,T}(\cdot|\cdot,\cdot,t)$ , where $t\in\mathcal{T}$ is fixed (see Definition 4). Then, we can write the minimal symmetrizability cost defined in (27) as

[TABLE]

Let

[TABLE]

We note that based on Csiszár and Narayan’s result in [30], the capacity of the constant-parameter AVC $\{W_{Y|X,S,T=t}\}$ is given by $\mathsf{C}_{t}(\Omega,\Delta,\Lambda)$ with $\Delta=\Lambda$ .

*Lemma 10**.*

[TABLE]

The proof of Lemma 10 is given in Appendix I. Theorem 6, Corollary 7, and Lemma 10 yield the following consequence.

*Corollary 11**.*

The deterministic code capacity of the AVC $\mathcal{W}$ with fixed parameters, under input constraint $\Omega$ and state constraint $\Lambda$ , is given by

[TABLE]

Furthermore, if the minimum in (41) is attained by a [math]- $1$ law, for every $p(x)$ with $\mathbb{E}\phi(X)\leq\Omega$ , and for all $t\in\mathcal{T}$ , then

[TABLE]

for all values of $\{L_{n}\}_{n\geq 1}$ .

The corollary will also be useful in our analysis of the AVC with colored Gaussian noise.

*Example 1**.*

Consider the arbitrarily varying binary symmetric channel (BSC) with fixed parameters,

[TABLE]

with $\mathcal{X}=\mathcal{S}=\mathcal{T}=\{0,1\}$ , where $Z_{t}\sim\text{Bernoulli}(\varepsilon_{t})$ , for $t=0,1$ , $\varepsilon_{0}<\varepsilon_{1}<\frac{1}{2}$ . Consider a parameter sequence with an empirical distribution $P_{T}(0)=P_{T}(1)=\frac{1}{2}$ , say $\theta_{2i}=0$ and $\theta_{2i-1}=1$ for $i=1,2,\ldots$ . Suppose that the user and the jammer are subject to input constraint $\Omega$ and state constraint $\Lambda$ , respectively, with Hamming weight cost functions, i.e. $\phi(x)=x$ and $l(s)=s$ .

For the constant-parameter AVC, we have by Definition 4 that $W_{Y|X,S,T=t}$ is symmetrized by any symmetric distribution, i.e. with $J(s|1)=1-J(s|0)$ . Denoting $\zeta=J(1|1)=1-J(1|0)$ , we have that

[TABLE]

Based on the analysis by Csiszár and Narayan [30, Example 1], the capacity of the constant-parameter AVC under input constraint $\omega$ and state constraint $\lambda$ is given by

[TABLE]

where $h(x)=-x\log x-(1-x)\log x$ is the binary entropy function and $a*b=(1-a)b+a(1-b)$ .

Suppose that

[TABLE]

For those values, we have that

[TABLE]

Thus, by Corollary 11, the capacity is given by

[TABLE]

with $\omega_{0}=\omega_{1}=\frac{5}{16}$ , $\lambda_{0}=\frac{3}{8}$ and $\lambda_{1}=\frac{1}{8}$ . Whereas, using two separate codes for $W_{Y|X,S,T=0}$ and $W_{Y|X,S,T=1}$ independently, the rate achieved is

[TABLE]

This can be viewed as an instance of the more general phenomenon of super-additivity, that holds for any product AVC which is composed of a symmetrizable AVC and a non-symmetrizable AVC [91, Theorem 6].

III-E Example: Channel with Fadings

To illustrate our results, we give another example.

*Example 2**.*

Consider an arbitrarily varying fading channel,

[TABLE]

with a Gaussian noise sequence $Z^{n}$ that is i.i.d. $\sim\mathcal{N}(0,\sigma^{2})$ , where $\theta_{1},\theta_{2},\ldots$ is a sequence of fixed fading coefficients. Recently, Hosseinigoki and Kosut [57] considered this channel with a random memoryless sequence of fading coefficients. Yet, we assume that the fading coefficients are fixed, and belong to a finite set $\mathcal{T}$ . Intuitively, the jammer would like to confuse the decoder by sending a state sequence that simulates the sequence $\theta^{n}X^{n}\equiv(\theta_{i}X_{i})_{i=1}^{n}$ . Indeed, as seen below, the deterministic code capacity is positive only if there exists an input distribution such that $\frac{1}{n}\sum_{i=1}^{n}\theta_{i}^{2}\mathbb{E}X_{i}^{2}>\Lambda$ , in which case the jammer cannot simulate $\theta^{n}X^{n}$ without violating the state constraint.

Although we previously assumed that the alphabets are finite, our results can be extended to the continuous case as well, using standard discretization techniques [15, 5] [36, Section 3.4.1]. By Theorem 3, the random code capacity is given by

[TABLE]

Then, we show that

[TABLE]

with expectation over $T\sim P_{T}$ , where $P_{T}$ is the type of the sequence $\theta^{n}$ .

As for the deterministic code capacity, we show that the minimum in (27) is attained by a [math]- $1$ law that gives probability $1$ to $s=\theta_{i}^{2}x$ , hence we can determine the capacity using Corollary 7. We show that the minimal symmetrizability cost is given by

[TABLE]

and deduce that the capacity of the AVC with fixed fading coeffients is given by

[TABLE]

with

[TABLE]

The derivation is given in Appendix J. We note that the last expression has the same form as the capacity formula established by Hosseinigoki and Kosut [57] for a random memoryless sequence of fading coefficients.

Next, we extend the result above to continuous fading coefficients, where $\mathcal{T}=[-t_{0},t_{0}]\subset\mathbb{R}$ . First, we observe that the formulas above can also be written as

[TABLE]

and

[TABLE]

This follows from the same considerations as in the proofs of Lemma 4 and Lemma 10. Now, if the fading coefficients are continuous, then one may perform the discretization procedure in [36, Section 3.4.1]. Hence, the deterministic and random code capacities in the continuous case are also given by the limit infimum of the formulas (61) and (62), respectively.

IV The Arbitrarily Varying Gaussian Product Channel

From this point on, we consider Gaussian AVCs, without parameters. In this section, we consider the Gaussian product channel. Our results on the AVC with colored Gaussian noise, in the next section, are based on the capacity theorems of the AVC with fixed parameters, in the previous section, and on the analysis in the current section.

IV-A Channel Description

The state-dependent Gaussian product channel consists of a set of $d$ parallel channels,

[TABLE]

where $j$ is the channel index, $d$ is the dimension (number of channels), and $Z^{d}$ is a Gaussian vector with zero mean and covariance matrix $K_{Z}$ . Let $\mathbf{X}_{j}=(X_{j,i})_{i=1}^{n}$ , $\mathbf{S}_{j}=(S_{j,i})_{i=1}^{n}$ and $\mathbf{Z}_{j}=(Z_{j,i})_{i=1}^{n}$ denote the input, state and noise sequences associated with the $j$ th channel, respectively, where $i\in[1:n]$ is the time index, and let $\mathbf{X}^{d}=(\mathbf{X}_{j})_{j=1}^{d}$ , $\mathbf{S}^{d}=(\mathbf{S}_{j})_{j=1}^{d}$ and $\mathbf{Z}^{d}=(\mathbf{Z}_{j})_{j=1}^{d}$ . The corresponding output of the product channel is the vector sequence $\mathbf{Y}^{d}=\mathbf{X}^{d}+\mathbf{S}^{d}+\mathbf{Z}^{d}$ .

The Gaussian arbitrarily varying product channel (AVGPC) is a state-dependent Gaussian product channel with $d$ state sequences $(\mathbf{S}_{1},\ldots,\mathbf{S}_{d})$ of unknown distribution, not necessarily independent nor stationary. That is, $(\mathbf{S}_{1},\ldots,\mathbf{S}_{d})\sim F_{\mathbf{S}_{1},\ldots,\mathbf{S}_{d}}$ , where $F_{\mathbf{S}_{1},\ldots,\mathbf{S}_{d}}$ is an unknown joint cumulative distribution function (cdf) over $\mathbb{R}^{nd}$ . In particular, $F_{\mathbf{S}_{1},\ldots,\mathbf{S}_{d}}$ could give probability mass $1$ to a particular sequence of state vectors $(\mathbf{s}_{1},\ldots,\mathbf{s}_{d})\in\mathbb{R}^{nd}$ . The channel is subject to input constraint $\Omega>0$ and state constraint $\Lambda>0$ ,

[TABLE]

IV-B Coding

We introduce preliminary definitions for the AVGPC.

*Definition 6** (Code).*

A $(2^{nR},n)$ code for the AVGPC consists of the following; a message set $[1:2^{nR}]$ , where it is assumed throughout that $2^{nR}$ is an integer, a sequence of $d$ encoding functions $\mathbf{f}_{j}:[1:2^{nR}]\rightarrow\mathbb{R}^{n}$ , for $j\in[1:d]$ , such that

[TABLE]

and a decoding function $g:\mathbb{R}^{nd}\rightarrow[1:2^{nR}]$ . Given a message $m\in[1:2^{nR}]$ , the encoder transmits $\mathbf{x}_{j}=\mathbf{f}_{j}(m)$ , for $j\in[1:d]$ . The codeword is then given by $\mathbf{x}^{d}=\mathbf{f}^{d}(m)\triangleq\left(\mathbf{f}_{1}(m),\mathbf{f}_{2}(m),\ldots,\mathbf{f}_{d}(m)\right)$ . The decoder receives the channel outputs $\mathbf{y}^{d}=(\mathbf{y}_{1},\ldots,\mathbf{y}_{d})$ , and finds an estimate of the message $\hat{m}=g(\mathbf{y}^{d})$ . We denote the code by $\mathscr{C}=\left(\mathbf{f}^{d},g\right)$ .

Define the conditional probability of error of a code $\mathscr{C}$ given the sequence $\mathbf{s}^{d}=(\mathbf{s}_{1},\ldots,\mathbf{s}_{d})$ by

[TABLE]

where $f_{\mathbf{Y}^{d}|m,\mathbf{s}^{d}}(\mathbf{y}^{d})=\prod_{i=1}^{n}f_{Z^{d}}(y^{d}_{i}-\mathrm{f}^{d}_{i}(m)-s^{d}_{i})$ , with

[TABLE]

A code $\mathscr{C}=(\mathbf{f}^{d},g)$ is called a $(2^{nR},n,\varepsilon)$ code for the AVGPC if

[TABLE]

We say that a rate $R$ is achievable if for every $\varepsilon>0$ and sufficiently large $n$ , there exists a $(2^{nR},n,\varepsilon)$ code for the AVGPC. The operational capacity is defined as the supremum of all achievable rates, and it is denoted by $\mathbb{C}(K_{Z})$ . We use the term ‘capacity’ referring to this operational meaning, and in some places we call it the deterministic code capacity to emphasize that achievability is measured with respect to deterministic codes.

We proceed now to coding schemes when using stochastic-encoder stochastic-decoder pairs with common randomness.

*Definition 7** (Random code).*

A $(2^{nR},n)$ random code for the AVGPC consists of a collection of $(2^{nR},n)$ codes $\{\mathscr{C}_{\gamma}=(\mathbf{f}_{\gamma}^{d},g_{\gamma})\}_{\gamma\in\Gamma}$ , along with a pmf $\mu(\gamma)$ over the code collection $\Gamma$ . We denote such a code by $\mathscr{C}^{\,\Gamma}=(\mu,\Gamma,\{\mathscr{C}_{\gamma}\}_{\gamma\in\Gamma})$ . Analogously to the deterministic case, a $(2^{nR},n,\varepsilon)$ random code for the AVGPC satisfies

[TABLE]

The capacity achieved by random codes is denoted by $\mathbb{C}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(K_{Z})$ , and it is referred to as the random code capacity.

IV-C Related Work

Consider the AVGPC with parallel Gaussian channels, where the covariance matrix of the additive noise is

[TABLE]

i.e. $Z_{1},\ldots,Z_{d}$ are independent and $Z_{j}\sim\mathcal{N}(0,\sigma_{j}^{2})$ . Denote the random code capacity of the AVGPC with parallel channels by $\mathbb{C}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\Sigma)$ . Hughes and Narayan [61] have shown that the solution for the random code capacity is given by “double” water filling, where the jammer performs water filling first, attempting to whiten the overall noise as much as possible, and then the user performs water filling taking into account the total noise power, which is contributed by both the channel and the jammer. The formal definitions are given below. Let

[TABLE]

with $[t]_{+}=\max\{0,t\}$ , where $\beta\geq 0$ is chosen to satisfy

[TABLE]

Next, let

[TABLE]

where $\alpha\geq 0$ is chosen to satisfy

[TABLE]

We can now define Hughes and Narayan’s capacity formula [61],

[TABLE]

*Theorem 12** (see [61]).*

The random code capacity of the AVGPC is given by

[TABLE]

IV-D Observations on The Water Filling Game

We give further observations on the results by Hughes and Narayan [61], which will be useful in the sequel.

IV-D1 Game Theoretic Interpretation

By [61, Theorem 3], the random code capacity is the solution of the following optimization problem,

[TABLE]

where the minimization is over the simplex $\mathcal{F}_{\text{state}}=\{(N_{1},\ldots,N_{d})\,:\;\sum_{j=1}^{d}N_{j}\leq\Lambda\}$ , and the maximization is over the simplex $\mathcal{F}_{\text{input}}=\{(P_{1},\ldots,P_{d})\,:\;\sum_{j=1}^{d}P_{j}\leq\Omega\}$ .

The optimization problem is thus interpreted as a two-player zero-sum simultaneous game, played by the user and the jammer, where $\mathcal{F}_{\text{input}}$ and $\mathcal{F}_{\text{state}}$ are the respective action sets. The payoff function $v:\mathcal{F}_{\text{input}}\times\mathcal{F}_{\text{state}}\rightarrow\mathbb{R}$ is defined such that, given a profile $(P_{1},\ldots,P_{d},N_{1},\ldots,N_{d})$ ,

[TABLE]

We have defined a game with pure strategies, i.e. the players’ actions are deterministic. In the communication model, the optimal coding and jamming scheme are random in general, yet the capacity can be achieved with deterministic power allocations, as in the game.

The optimal power allocation has a water filling analogy (see e.g. [27, Section 9.4]), where the jammer pours water of volume $\Lambda$ to a vessel, and then the encoder pours more water of volume $\Omega$ . The shape of the bottom of the vessel is determined by the noise variances $\sigma_{1}^{2},\ldots,$$\sigma_{d}^{2}$ . The jammer brings the water level to $\beta$ , and then the encoder brings the water level to $\alpha$ . Water filling for the AVGPC is illustrated in Figure 1, for $\Omega=13$ , $\Lambda=8$ , $d=10$ , $(\sigma_{j}^{2})_{j=1}^{10}=(5,8,3,1.5,2.5,1.8,3.2,9,4.5,5.5)$ . The light shade “fluid” is the jammer’s water filling and the dark shade “fluid” is the transmitter’s. The resulting “water levels” are $\beta=4$ and $\alpha=6$ . Then, substituting into (72) and (74) yields the power allocations $(N_{j}^{*})_{j=1}^{10}=(0,0,1,2.5,1.5,2.2,0.8,0,0,0)$ for the jammer and $(P_{j}^{*})_{j=1}^{10}=(1,0,2,2,2,2,2,1.5,0.5)$ for the transmitter.

One can easily prove the following properties of the random code capacity characterization.

*Lemma 13**.*

The quantities defined by (72)-(76) satisfy

[TABLE]

For completeness, we give the proof of Lemma 13 is given in Appendix K. Based on the water filling analogy of the power allocation above, part 1 of Lemma 13 is natural, since $\beta$ is interpreted as the water level after the jammer pours his share, and $\alpha$ is interpreted as the water level after the user pours additional water after that (see Figure 1). Part 3 and part 4 are not surprising either since, as can be seen in Figure 1, the variance of the combined interference $(Z_{j}+S_{j})$ is $\max(\beta,\sigma_{j}^{2})$ and the variance of the channel output $Y_{j}$ is $\max(\alpha,\sigma_{j}^{2})$ .

Observe that an equivalent statement of part 2 is the following. If the user discards a channel, i.e. assigns $P_{j}^{*}=0$ to the $j$ th channel, then the jammer does not invest power in this channel either, i.e. $N_{j}^{*}=0$ . This claim is also intuitive, and from a game theoretic perspective, it is an aspect of the jammer’s rationality, as explained below. As mentioned above the optimization problem is interpreted as a two-player zero-sum simultaneous game between the user and the jammer. The value of such a game is attained by a pair of strategies which forms a Nash equilibrium [103] (see also [79][75, Theorem 3.1.4]). That is, if the user and the jammer were to agree to use the power allocation strategies $(P_{j}^{*})_{j=1}^{d}$ and $(N_{j}^{*})_{j=1}^{d}$ , then neither player could profit by deviating from his original strategy, provided that the other player respects the agreement. Now, suppose that for some $j\in[1:d]$ , $P_{j}^{*}=0$ and $N_{j}^{*}>0$ . Then, the jammer is wasting energy, and can surely profit from diverging this energy to some other channel $j^{\prime}$ with $P_{j^{\prime}}^{*}>0$ . Thus, such strategy profile is irrational and cannot be a Nash equilibrium.

For a general AVC, a coding scheme which assumes that the jammer is using his optimal strategy would typically fail. The code needs to be robust standing against any state sequence that satisfies the state constraint. For example, consider a scalar Gaussian AVC [60], specified by $\mathbf{Y}=\mathbf{X}+\mathbf{S}+\mathbf{Z}$ , under input constraint $\left\lVert\mathbf{X}\right\rVert^{2}\leq n\Omega$ and state constraint $\left\lVert\mathbf{S}\right\rVert^{2}\leq n\Lambda$ , where the noise sequence $\mathbf{Z}$ is i.i.d. $\sim\mathcal{N}(0,\sigma^{2})$ . Suppose that the receiver is using joint typicality decoding for a Gaussian channel $\mathbf{Y}=\mathbf{X}+\mathbf{V}$ , where $\mathbf{V}$ is i.i.d. $\sim\mathcal{N}(0,\Lambda+\sigma^{2})$ (see [27, Section 9.1]), corresponding to the optimal jamming strategy. Then, the jammer can fail the decoder by selecting a state sequence such that $\left\lVert\mathbf{S}\right\rVert^{2}=\frac{n\Lambda}{2}$ , for instance. As a result, there is a high probability that the square norm of the output sequence is below $n(\Lambda+\sigma^{2}-\delta)$ , for small $\delta>0$ , in which case the decoder cannot establish joint typicality and declares an error. The same principle holds in our problem. The user cannot assume that the jammer is using his optimal power allocation, and a reliable code must be robust standing against any power allocation of the jammer.

IV-D2 Multiple Access Channel Analogy

Water filling in two (or more) stages appears in other settings in the literature, e.g. [26, 71, 111, 113]. Consider a Gaussian product multiple access channel (MAC), where $Y_{j}=X_{1,j}+X_{2,j}+Z_{j}$ , $j\in[1:d]$ , under the input constraints $\left\lVert\mathbf{X}_{1}^{d}\right\rVert^{2}\leq n\Omega$ and $\left\lVert\mathbf{X}_{2}^{d}\right\rVert^{2}\leq n\Lambda$ . This can be viewed as a different variation of the AVGPC where a second transmitter replaces the jammer. By [26], a corner point of the capacity region can be achieved by applying water filling to the total power in the first step, and then to the power of User 2 in the second step. Specifically, by [26, Section III.B.], the optimal power allocations $(P_{j}^{*})_{j=1}^{d}$ and $(N_{j}^{*})_{j=1}^{d}$ , for Encoder 1 and Encoder 2, respectively, which achieve a corner point of the capacity region, satisfy

[TABLE]

such that $\sum_{j=1}^{d}N_{j}^{*}=\Lambda$ . Following part 3 of Lemma 13, it can be seen that the strategy above is equivalent to (72)-(75). The total power allocation in (83) seems natural in order to maximize the sum rate. Though, our presentation in (72)-(75) is intuitive for the Gaussian product MAC as well. Indeed, using successive cancellation decoding, the receiver estimates the transmission of User 1 while treating the transmission of User 2 as noise, and then subtracts the estimated sequence from the received sequence to decode the transmission of User 2. Hence, decoding for User 1 is analogous to the decoder in our problem. Nevertheless, in the next section, we show that the deterministic code capacity in our adversarial problem has a different behavior.

Another water filling game is described by Lai and El Gamal in [71], who considered the flat fading MAC $Y=h_{1}X_{1}+h_{2}X_{2}+Z$ with selfish users, where the fading coefficients are continuous random variables, distributed according to $(h_{1},h_{2})\sim\mu$ . Suppose that the users are subject to average input constraints, $\mathbb{E}_{\mu}\left\lVert\mathbf{X}_{1}\right\rVert^{2}\leq n\Omega$ and $\mathbb{E}_{\mu}\left\lVert\mathbf{X}_{2}\right\rVert^{2}\leq n\Lambda$ . As shown in [71], a maximum sum-rate point on the capacity region boundary is achieved if the users perform water filling treating each other’s transmission as noise. It is further shown that opportunistic communication is optimal, where User 1 only transmits if his water level times fading coefficient is at least as high as that of User 2, and vice versa. That is, the power allocations of the users are given by

[TABLE]

where $\beta_{1}$ and $\beta_{2}$ are chosen such that $\mathbb{E}P_{h_{1},h_{2}}^{*}=\Omega$ and $\mathbb{E}N_{h_{1},h_{2}}^{*}=\Lambda$ . This threshold operation resembles the result in the next section, on the deterministic code capacity of the AVGPC, except that the phase transition of the AVGPC depends only on the “water volumes” $\Omega$ and $\Lambda$ (see Subsection IV-F).

IV-E Results

We give our result on the AVGPC with parallel Gaussian channels, where the covariance matrix of the additive noise is $\Sigma=\mathrm{diag}\{\sigma_{1}^{2},\ldots,\sigma_{d}^{2}\}$ , i.e. $Z_{1},\ldots,Z_{d}$ are independent and $Z_{j}\sim\mathcal{N}(0,\sigma_{j}^{2})$ . The deterministic code capacity of the AVGPC with parallel channels is denoted by $\mathbb{C}(\Sigma)$ .

We establish the capacity of the AVGPC. Based on Csiszár and Narayan’s result in [30], the deterministic code capacity of an AVC under input and state constraints is given in terms of channel symmetrizability and the minimal state cost for the jammer to symmetrize the channel (see also [73] [82, Definition 5 and Theorem 5]). By [30, Definition 2], a AVGPC is symmetrized by a conditional pdf $\varphi(s^{d}|x^{d})$ if

[TABLE]

where $f_{Z^{d}}(z^{d})=\prod_{j=1}^{d}\frac{1}{\sqrt{2\pi\sigma_{j}^{2}}}e^{-z_{j}^{2}/2\sigma_{j}^{2}}$ . In particular, observe that (86) holds for $\varphi(s^{d}|x^{d})=\delta(s^{d}-x^{d})$ , where $\delta(\cdot)$ is the Dirac delta function. In other words, the channel is symmetrized by a distribution $\varphi(s^{d}|x^{d})$ which gives probability $1$ to $S^{d}=x^{d}$ . For the AVGPC, the minimal state cost for the jammer to symmetrize the channel, for an input distribution $f_{X^{d}}$ , is given by

[TABLE]

where the minimization is over all conditional pdfs $\varphi(s^{d}|x^{d})$ that symmetrize the channel, that is, satisfy (86). The following lemma states that the minimal state cost for symmetrizability is the same as the input power. The lemma will be used in the achievability proof of the capacity theorem.

*Lemma 14**.*

For a zero mean Gaussian vector $X^{d}\sim\mathcal{N}(\mathbf{0},K_{X})$ ,

[TABLE]

The proof of Lemma 14 is given in Appendix L. The proof builds on our observation that (86) holds if and only if $\varphi(s^{d}|x^{d})=\varphi(s^{d}-x^{d}|0)$ . This in turn leads to the conclusion that the minimum in (87) is attained by $\varphi_{x^{d}}(s^{d})=\delta(s^{d}-x^{d})$ . Moving to the capacity theorem, define

[TABLE]

*Theorem 15**.*

The deterministic code capacity of the AVGPC is given by

[TABLE]

The proof of Theorem 15 is given in Appendix M. Considering the scalar case, Csiszár and Narayan showed the direct part by providing a coding scheme for the Gaussian AVC [32]. While the receiver in their coding scheme uses simple minimum-distance decoding, the analysis is fairly complicated. Here, on the other hand, we treat the AVGPC using a much simpler approach. To prove direct part, we consider the optimization problem based on the capacity formula of the general AVC under input and state constraints, which is given in terms of symmetrizing state distributions. We use Lemma 14 to show that if $\Omega>\Lambda$ , then the transmitter’s water filling strategy in (74) guarantees that $\widetilde{\Lambda}(F_{x^{d}})>\Lambda$ . Intuitively, this means that the jammer cannot symmetrize the channel without violating the state constraint. In this scenario, the random code capacity can be achieved with deterministic codes as well.

IV-F Discussion

We give a couple of remarks on our result in Theorem 15. As in the case of the Gaussian scalar AVC [32], the capacity is disconinuous in the input constraint, and has a phase transition behavior, depending on whether $\Omega>\Lambda$ or $\Omega\leq\Lambda$ . We give an intuitive explanation below. For the classic Gaussian AVC, reliable communication requires the power of the transmitted signal to be higher than the power of the jamming signal, otherwise the jammer can confuse the receiver by making the state sequence $\mathbf{S}$ “look like” the input sequence $\mathbf{X}$ [32]. At a first glance at our problem, one might have expected that the input power $P_{j}$ of the $j$ th channel also needs to be higher than the jamming power $N_{j}$ , in order for the output $\mathbf{Y}_{j}$ to be useful. This is not the case. Since the decoder has the vector of outputs $(\mathbf{Y}_{1},\ldots,\mathbf{Y}_{d})$ , even if $\mathbf{S}_{j}$ looks like $\mathbf{X}_{j}$ , the receiver could still gain information from $\mathbf{Y}_{j}$ as the other outputs may “break the symmetry”.

Based on Shannon’s classic water filling result [94], the capacity of the Gaussian product channel, $Y_{j}=X_{j}+V_{j}$ , $j\in[1:d]$ , can be achieved by combining $d$ independent encoder-decoder pairs, where the $j$ th pair is associated with a capacity achieving code for the scalar Gaussian channel under input constraint $P_{j}^{*}$ . However, based on Csiszár and Narayan’s result on the Gaussian single AVC [32], the capacity of the $j$ th AVC, $Y_{j}=X_{j}+S_{j}+Z_{j}$ , is zero under input constraint $P_{j}^{*}$ and state constraint $N_{j}^{*}$ for $P_{j}^{*}\leq N_{j}^{*}$ . This means that, in contrast to the Shannon’s Gaussian product channel [94], using $d$ independent encoder-decoder pairs over the AVGPC is suboptimal in general. This can be viewed as a constrained version of the super-additivity phenomenon in [91].

V Main Results – AVC with Colored Gaussian Noise

We consider an AVC with colored Gaussian noise, i.e.

[TABLE]

where $\mathbf{Z}$ is a zero mean stationary Gaussian process, with power spectral density $\Psi_{Z}(\omega)$ . Assume that the power spectral density is bounded and integrable. We denote the random code capacity and the deterministic code capacity of this channel by $\mathbb{C}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\Psi_{Z})$ and $\mathbb{C}(\Psi_{Z})$ , respectively.

We show that the optimal power allocations of the user and the jammer are given by “double” water filling in the frequency domain. Define

[TABLE]

where $\beta\geq 0$ is chosen to satisfy

[TABLE]

Next, define

[TABLE]

where $\alpha\geq 0$ is chosen to satisfy

[TABLE]

Now, let

[TABLE]

*Theorem 16**.*

The random code capacity of the AVC with colored Gaussian noise is given by

[TABLE]

and the deterministic code capacity is given by

[TABLE]

The proof of Theorem 16 is given in Appendix N, combining our previous results on the AVC with fixed parameters and the AVGPC. Despite the common belief that the characterization for a channel with colored Gaussian noise easily follows from the results for the product channel setting, the analysis is more involved. While standard orthogonalization transforms the channel into an equivalent one with statistically independent noise instances, the noise in the transformed channel is not necessarily white. As the noise variance may change over time, we observe that the transformed channel is in fact an AVC with fixed parameters which represent the sequence of noise variances. Using Corollary 5 and Corollary 11, we obtain deterministic and random capacity formulas that are analogous to those of the AVGPC, and use Toeplitz matrix properties to express the formulas as integrals in the frequency domain.

The optimal power allocation has a water filling analogy in the frequency domain (see e.g. [27, Section 9.5]), where the jammer pours water of volume $\Lambda$ on top of the power spectral density $\Psi_{Z}(\omega)$ , and then the encoder pours more water of volume $\Omega$ . The jammer brings the water level to $\beta$ , and then the encoder brings the water level to $\alpha$ . The process is illustrated in Figure 2.

Appendix A Proof of Theorem 1

Consider the compound channel $\mathcal{W}^{\mathcal{Q}}$ with fixed parameters under input constraint $\Omega$ and state constraint $\Lambda$ .

A-A Achievability Proof

To show achievability, we construct a code based on conditional typicality decoding with respect to a channel state type, which is “close” to one of the state distributions in $\mathcal{Q}$ .

Denote the type of the parameter sequence by $P_{T}=\hat{P}_{\theta^{n}}$ . Define a set $\hat{\mathcal{Q}}_{n}$ of conditional state types,

[TABLE]

with $(P_{T}\times q)(t,s)=P_{T}(t)q(s|t)$ , and

[TABLE]

where $\delta>0$ is arbitrarily small. In words, $\hat{\mathcal{Q}}_{n}$ is the set of conditional types $q^{\prime}(s|t)$ , given a parameter sequence $\theta^{n}$ , such that the joint type is $\delta_{1}$ -close to $P_{T}(t)q(s|t)$ , for some conditional state distribution $q(s|t)$ in $\mathcal{Q}$ . We note that the sets $\mathcal{Q}$ and $\hat{\mathcal{Q}}_{n}$ could be disjoint, since $\mathcal{Q}$ is not limited to conditional empirical distributions. Nevertheless, for a fixed $\delta>0$ and sufficiently large $n$ , every $q\in\mathcal{Q}$ can be approximated by some $q^{\prime}\in\hat{\mathcal{Q}}_{n}$ . Indeed, for sufficiently large $n$ , there exists a joint type $P^{\prime}_{T}(t)q^{\prime}(s|t)$ such that $|P^{\prime}_{T}(t)q^{\prime}(s|t)-P_{T}(t)q(s|t)|\leq\delta_{1}/|\mathcal{S}|$ , hence $|P^{\prime}_{T}(t)-P_{T}(t)|\leq\delta_{1}$ and $|P_{T}(t)q^{\prime}(s|t)-P_{T}(t)q(s|t)|\leq\delta_{1}q^{\prime}(s|t)\leq\delta_{1}$ . Now, a code is constructed as follows.

Codebook Generation: Fix $P_{X|T}$ such that $\mathbb{E}\phi(X)\leq\Omega-\varepsilon$ , where

[TABLE]

Generate $2^{nR}$ independent sequences at random, $x^{n}(m,\theta^{n})\sim\prod_{i=1}^{n}P_{X|T}(x_{i}|\theta_{i})$ , for $m\in[1:2^{nR}]$ .

Encoding: To send a message $m$ , if $\phi^{n}(x^{n}(m,\theta^{n}))\leq\Omega$ , transmit $x^{n}(m,\theta^{n})$ . Otherwise, transmit an idle sequence $x^{n}=(a,a,\ldots,a)$ with $\phi(a)=0$ .

Decoding: Find a unique $\hat{m}\in[1:2^{nR}]$ for which there exists $q\in\hat{\mathcal{Q}}_{n}$ such that $(\theta^{n},x^{n}(\hat{m},\theta^{n}),y^{n})\in\mathcal{A}^{(n)}_{\delta}(P_{T}P^{q}_{X,Y|T})$ , where

[TABLE]

If there is none, or more than one such $\hat{m}$ , declare an error. We note that using the set of types $\hat{\mathcal{Q}}_{n}$ instead of the original set of state distributions $\mathcal{Q}$ alleviates the analysis, since $\mathcal{Q}$ is not necessarily finite nor countable.

Analysis of Probability of Error: Assume without loss of generality that the user sent $M=1$ . By the union of events bound, we have that $\Pr\left(\hat{M}\neq 1\right)\leq\Pr\left(\mathcal{E}_{1}\right)+\Pr\left(\mathcal{E}_{2}\mid\mathcal{E}_{1}^{c}\right)+\Pr\left(\mathcal{E}_{3}\mid\mathcal{E}_{1}^{c}\right)$ , where

[TABLE]

The first term tends to zero exponentially by the law of large numbers and Chernoff’s bound (see e.g. [67, Theorem 1.2]). Now, suppose that the event $\mathcal{E}_{1}^{c}$ occurs. Then, for sufficiently small $\delta$ , we have that $\phi^{n}(X^{n}(1,\theta^{n}))\leq\Omega$ , since $\mathbb{E}\phi(X)\leq\Omega-\varepsilon$ . Hence, $X^{n}(1,\theta^{n})$ is the channel input.

Next, we claim that the second error event implies that $(\theta^{n},X^{n}(1,\theta^{n}),Y^{n})\notin\mathcal{A}^{(n)}_{\nicefrac{{\delta}}{{2}}}(P_{T}P_{X|T}P^{q}_{Y|X,T})$ , where $q(s|t)$ is the actual state distribution chosen by the jammer. Assume to the contrary that $\mathcal{E}_{2}$ holds, but $(\theta^{n},X^{n}(1,\theta^{n}),Y^{n})\in\mathcal{A}^{(n)}_{\nicefrac{{\delta}}{{2}}}(P_{T}P_{X|T}P^{q}_{Y|X,T})$ . For sufficiently large $n$ , there exists a conditional type $q^{\prime}\in\hat{\mathcal{Q}}_{n}$ that approximates $q$ in the sense that $|P_{T}(t)q^{\prime}(s|t)-P_{T}(t)q(s|t)|\leq\delta_{1}$ for all $s\in\mathcal{S}$ and $t\in\mathcal{T}$ , hence

[TABLE]

for all $x\in\mathcal{X}$ , $t\in\mathcal{T}$ , $y\in\mathcal{Y}$ (see (100)-(102)). To show $\delta$ -typicality with respect to $q^{\prime}(s|t)$ , we observe that

[TABLE]

where the first inequality is due to the triangle inequality, and the second inequality follows from (104) and the assumption that $(\theta^{n},X^{n}(1,\theta^{n}),Y^{n})\in\mathcal{A}^{(n)}_{\nicefrac{{\delta}}{{2}}}(P_{T}P_{X|T}P^{q}_{Y|X,T})$ . It follows that $(\theta^{n},X^{n}(1,\theta^{n}),Y^{n})\in\mathcal{A}^{(n)}_{\delta}(P_{T}P_{X|T}P^{q^{\prime}}_{Y|X,T})$ , and $\mathcal{E}_{2}$ does not hold. Thus,

[TABLE]

This tends to zero exponentially as $n\rightarrow\infty$ by the law of large numbers and Chernoff’s bound (see e.g. [67, Theorem 1.2]).

Moving to the third error event, as the number of type classes in $\mathcal{S}^{n}$ is bounded by $(n+1)^{|\mathcal{S}|}$ , we have that

[TABLE]

For every $m\neq 1$ , $X^{n}(m,\theta^{n})$ is independent of $Y^{n}$ , hence

[TABLE]

Let $(\theta^{n},x^{n},y^{n})\in\mathcal{A}^{(n)}_{\delta}(P_{T}P_{X|T}P^{q^{\prime}}_{Y|X,T})$ . Then, $\,(\theta^{n},y^{n})\in\mathcal{A}^{(n)}_{\delta_{2}}(P_{T}P_{Y|T}^{q^{\prime}})$ with $\delta_{2}\triangleq|\mathcal{X}|\cdot\delta$ . By Lemmas 2.6-2.7 in [29],

[TABLE]

where $\varepsilon_{1}(\delta)\rightarrow 0$ as $\delta\rightarrow 0$ . Therefore, by (107) $-$ (109),

[TABLE]

with $\varepsilon_{2}(\delta)\rightarrow 0$ as $\delta\rightarrow 0$ , where the last inequality is due to [29, Lemma 2.13]. The RHS of (110) tends to zero exponentially as $n\rightarrow\infty$ , provided that $R<I_{q^{\prime}}(X;Y|T)-\varepsilon_{2}(\delta)$ . The probability of error, averaged over the class of codebooks, exponentially decays to zero as $n\rightarrow\infty$ . Therefore, there must exist a $(2^{nR},n,e^{-an})$ deterministic code, for a sufficiently large $n$ . This completes the proof of the direct part.

A-B Converse Proof

Since the deterministic code capacity is always bounded by the random code capacity, we consider a sequence of $(2^{nR},n,\alpha_{n})$ random codes, where $\alpha_{n}\rightarrow 0$ as $n\rightarrow\infty$ . Then, let $X^{n}=f_{\gamma}^{n}(M,\theta^{n})$ be the channel input sequence, and $Y^{n}$ be the corresponding output sequence, where $\gamma\in\Gamma$ is the random element shared between the encoders and the decoder. For every $q\in\mathcal{Q}$ , we have by Fano’s inequality that $H_{q}(M|Y^{n},T^{n}=\theta^{n},\gamma)\leq n\varepsilon_{n}$ , hence

[TABLE]

where $\varepsilon_{n}\rightarrow 0$ as $n\rightarrow\infty$ . The third equality holds since $X^{n}$ is a deterministic function of $(M,\gamma,\theta^{n})$ , and the last equality since $(M,\gamma)\leavevmode\hbox to9.01pt{\vbox to4.71pt{\pgfpicture\makeatletter\hbox{\hskip 0.2pt\lower-0.2pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}{}}{{}}{} {}{}{}\pgfsys@moveto{0.0pt}{2.58331pt}\pgfsys@lineto{8.61108pt}{2.58331pt}\pgfsys@stroke\pgfsys@invoke{ } {}{{}}{}{{{}}{}{}{}{}{}{}{}{}}{}\pgfsys@moveto{4.30554pt}{2.58331pt}\pgfsys@moveto{6.02777pt}{2.58331pt}\pgfsys@curveto{6.02777pt}{3.53448pt}{5.25671pt}{4.30554pt}{4.30554pt}{4.30554pt}\pgfsys@curveto{3.35437pt}{4.30554pt}{2.58331pt}{3.53448pt}{2.58331pt}{2.58331pt}\pgfsys@curveto{2.58331pt}{1.63214pt}{3.35437pt}{0.86108pt}{4.30554pt}{0.86108pt}\pgfsys@curveto{5.25671pt}{0.86108pt}{6.02777pt}{1.63214pt}{6.02777pt}{2.58331pt}\pgfsys@closepath\pgfsys@moveto{4.30554pt}{2.58331pt}\pgfsys@stroke\pgfsys@invoke{ } {{}{}}{{}}{} {{}{}}{}\pgfsys@beginscope\pgfsys@invoke{ }\color[rgb]{1,1,1}\definecolor[named]{pgfstrokecolor}{rgb}{1,1,1}\pgfsys@color@gray@stroke{1}\pgfsys@invoke{ }\pgfsys@color@gray@fill{1}\pgfsys@invoke{ }\definecolor[named]{pgffillcolor}{rgb}{1,1,1}{}\pgfsys@moveto{1.72223pt}{0.0pt}\pgfsys@lineto{4.30554pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}(X^{n},T^{n})\leavevmode\hbox to9.01pt{\vbox to4.71pt{\pgfpicture\makeatletter\hbox{\hskip 0.2pt\lower-0.2pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}{}}{{}}{} {}{}{}\pgfsys@moveto{0.0pt}{2.58331pt}\pgfsys@lineto{8.61108pt}{2.58331pt}\pgfsys@stroke\pgfsys@invoke{ } {}{{}}{}{{{}}{}{}{}{}{}{}{}{}}{}\pgfsys@moveto{4.30554pt}{2.58331pt}\pgfsys@moveto{6.02777pt}{2.58331pt}\pgfsys@curveto{6.02777pt}{3.53448pt}{5.25671pt}{4.30554pt}{4.30554pt}{4.30554pt}\pgfsys@curveto{3.35437pt}{4.30554pt}{2.58331pt}{3.53448pt}{2.58331pt}{2.58331pt}\pgfsys@curveto{2.58331pt}{1.63214pt}{3.35437pt}{0.86108pt}{4.30554pt}{0.86108pt}\pgfsys@curveto{5.25671pt}{0.86108pt}{6.02777pt}{1.63214pt}{6.02777pt}{2.58331pt}\pgfsys@closepath\pgfsys@moveto{4.30554pt}{2.58331pt}\pgfsys@stroke\pgfsys@invoke{ } {{}{}}{{}}{} {{}{}}{}\pgfsys@beginscope\pgfsys@invoke{ }\color[rgb]{1,1,1}\definecolor[named]{pgfstrokecolor}{rgb}{1,1,1}\pgfsys@color@gray@stroke{1}\pgfsys@invoke{ }\pgfsys@color@gray@fill{1}\pgfsys@invoke{ }\definecolor[named]{pgffillcolor}{rgb}{1,1,1}{}\pgfsys@moveto{1.72223pt}{0.0pt}\pgfsys@lineto{4.30554pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}Y^{n}$ form a Markov chain. It follows that

[TABLE]

for all $q\in\mathcal{Q}$ , with $X\equiv X_{K}$ , $Y\equiv Y_{K}$ , $T\equiv T_{K}=\theta_{K}$ , where the random variable $K$ is uniformly distributed over $[1:n]$ , and $\varepsilon_{n}\rightarrow 0$ as $n\rightarrow\infty$ . Observe that the random variable $T$ is distributed according to

[TABLE]

where $N(t|\theta^{n})$ is the number of occurrences of the symbol $t\in\mathcal{T}$ in the sequence $\theta^{n}$ . Since $K\leavevmode\hbox to9.01pt{\vbox to4.71pt{\pgfpicture\makeatletter\hbox{\hskip 0.2pt\lower-0.2pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}{}}{{}}{} {}{}{}\pgfsys@moveto{0.0pt}{2.58331pt}\pgfsys@lineto{8.61108pt}{2.58331pt}\pgfsys@stroke\pgfsys@invoke{ } {}{{}}{}{{{}}{}{}{}{}{}{}{}{}}{}\pgfsys@moveto{4.30554pt}{2.58331pt}\pgfsys@moveto{6.02777pt}{2.58331pt}\pgfsys@curveto{6.02777pt}{3.53448pt}{5.25671pt}{4.30554pt}{4.30554pt}{4.30554pt}\pgfsys@curveto{3.35437pt}{4.30554pt}{2.58331pt}{3.53448pt}{2.58331pt}{2.58331pt}\pgfsys@curveto{2.58331pt}{1.63214pt}{3.35437pt}{0.86108pt}{4.30554pt}{0.86108pt}\pgfsys@curveto{5.25671pt}{0.86108pt}{6.02777pt}{1.63214pt}{6.02777pt}{2.58331pt}\pgfsys@closepath\pgfsys@moveto{4.30554pt}{2.58331pt}\pgfsys@stroke\pgfsys@invoke{ } {{}{}}{{}}{} {{}{}}{}\pgfsys@beginscope\pgfsys@invoke{ }\color[rgb]{1,1,1}\definecolor[named]{pgfstrokecolor}{rgb}{1,1,1}\pgfsys@color@gray@stroke{1}\pgfsys@invoke{ }\pgfsys@color@gray@fill{1}\pgfsys@invoke{ }\definecolor[named]{pgffillcolor}{rgb}{1,1,1}{}\pgfsys@moveto{1.72223pt}{0.0pt}\pgfsys@lineto{4.30554pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}(T,X)\leavevmode\hbox to9.01pt{\vbox to4.71pt{\pgfpicture\makeatletter\hbox{\hskip 0.2pt\lower-0.2pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{{}}{{}{}}{{}}{} {}{}{}\pgfsys@moveto{0.0pt}{2.58331pt}\pgfsys@lineto{8.61108pt}{2.58331pt}\pgfsys@stroke\pgfsys@invoke{ } {}{{}}{}{{{}}{}{}{}{}{}{}{}{}}{}\pgfsys@moveto{4.30554pt}{2.58331pt}\pgfsys@moveto{6.02777pt}{2.58331pt}\pgfsys@curveto{6.02777pt}{3.53448pt}{5.25671pt}{4.30554pt}{4.30554pt}{4.30554pt}\pgfsys@curveto{3.35437pt}{4.30554pt}{2.58331pt}{3.53448pt}{2.58331pt}{2.58331pt}\pgfsys@curveto{2.58331pt}{1.63214pt}{3.35437pt}{0.86108pt}{4.30554pt}{0.86108pt}\pgfsys@curveto{5.25671pt}{0.86108pt}{6.02777pt}{1.63214pt}{6.02777pt}{2.58331pt}\pgfsys@closepath\pgfsys@moveto{4.30554pt}{2.58331pt}\pgfsys@stroke\pgfsys@invoke{ } {{}{}}{{}}{} {{}{}}{}\pgfsys@beginscope\pgfsys@invoke{ }\color[rgb]{1,1,1}\definecolor[named]{pgfstrokecolor}{rgb}{1,1,1}\pgfsys@color@gray@stroke{1}\pgfsys@invoke{ }\pgfsys@color@gray@fill{1}\pgfsys@invoke{ }\definecolor[named]{pgffillcolor}{rgb}{1,1,1}{}\pgfsys@moveto{1.72223pt}{0.0pt}\pgfsys@lineto{4.30554pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}Y$ form a Markov chain, we have that

[TABLE]

∎

Appendix B Proof of Lemma 2

We state the proof of our modified version of Ahlswede’s RT [6]. The proof follows the lines of [6, Subsection IV-B], which we modify here to include a constraint on the family of state distributions $q(s)$ and the parameter sequence $\theta^{n}$ . Let $\widetilde{s}^{\;n}\in\mathcal{S}^{n}$ such that $l^{n}(\widetilde{s}^{\;n})\leq\Lambda$ . Denote the conditional type of $\widetilde{s}^{\;n}\in\mathcal{S}^{n}$ given $\theta^{n}$ by $\widehat{q}(s|t)$ . Observe that $\widehat{q}\in\overline{\mathcal{P}}_{\Lambda}(\mathcal{S}|\theta^{\infty})$ (see (9)), since $\frac{1}{n}\sum_{i=1}^{n}\sum_{s\in\mathcal{S}}q(s|\theta_{i})l(s)=l^{n}(\widetilde{s}^{\;n})$ .

Given a permutation $\pi\in\Pi(\theta^{n})$ ,

[TABLE]

where the first equality holds since $\pi$ is a bijection, the second equality holds since $\pi\theta^{n}=\theta^{n}$ for every $\pi\in\Pi(\theta^{n})$ , and the last equality holds due to the product form of the conditional distribution $q^{n}(s^{n}|t^{n})=\prod_{i=1}^{n}q(s_{i}|t_{i})$ . Hence, taking $q=\widehat{q}$ ,

[TABLE]

and by (17),

[TABLE]

Thus,

[TABLE]

As the expression in the square brackets is identical for all sequences $s^{n}$ of conditional type $\widehat{q}$ , we have that

[TABLE]

The second sum is the probability of the conditional type class of $\widehat{q}$ , hence

[TABLE]

by [27, Theorem 11.1.4]. The proof follows from (119) and (120). ∎

Appendix C Proof of Theorem 3

Consider the AVC $\mathcal{W}$ with fixed parameters under input constraint $\Omega$ and state constraint $\Lambda$ .

C-A Achievability Proof

To prove the random code capacity theorem for the AVC with fixed parameters, we use our result on the compound channel along with our modified Robustification Technique (RT), i.e. Lemma 2.

Let $R<\mathsf{C}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}$ . At first, we consider the compound channel under input constraint $\Omega$ , with $\mathcal{Q}=\overline{\mathcal{P}}_{\Lambda}(\mathcal{S}|\theta^{\infty})$ . According to Lemma 1, for some $\delta>0$ and sufficiently large $n$ , there exists a $(2^{nR},n)$ code $\mathscr{C}=(\mathrm{f}(m,\theta^{n}),$ $g(y^{n},\theta^{n}))$ for the compound channel $\mathcal{W}^{\overline{\mathcal{P}}_{\Lambda}(\mathcal{S}|\theta^{\infty})}$ with fixed parameters such that

[TABLE]

and

[TABLE]

for all product state distributions $q(s^{n}|\theta^{n})=\prod_{i=1}^{n}q(s_{i}|\theta_{i})$ , with $q\in\overline{\mathcal{P}}_{\Lambda}(\mathcal{S}|\theta^{\infty})$ .

Therefore, by Lemma 2, taking $h_{0}(s^{n},\theta^{n})=P_{e}^{(n)}(\mathscr{C}|s^{n},\theta^{n})$ and $\alpha_{n}=e^{-2\delta n}$ , we have that for a sufficiently large $n$ ,

[TABLE]

for all $s^{n}\in\mathcal{S}^{n}$ with $l^{n}(s^{n})\leq\Lambda$ , where the sum is over the set of all $n$ -tuple permutations such that $\pi\theta^{n}=\theta^{n}$ .

On the other hand, for every $\pi\in\Pi(\theta^{n})$ ,

[TABLE]

where $(a)$ is obtained by plugging $\pi s^{n}$ in (11a); in $(b)$ we substitue $\pi y^{n}$ instead of $y^{n}$ ; and $(c)$ holds because the channel is memoryless. Since $\pi\theta^{n}=\theta^{n}$ for every $\pi\in\Pi(\theta^{n})$ , it follows that

[TABLE]

Then, consider the $(2^{nR},n)$ random code $\mathscr{C}^{\Pi(\theta^{n})}$ , specified by

[TABLE]

with a uniform distribution $\mu(\pi)=\frac{1}{|\Pi(\theta^{n})|}$ for $\pi\in\Pi(\theta^{n})$ . As the inputs cost is additive (see (6)), the permutation does not affect the costs of the codewords, hence the random code satisfies the input constraint $\Omega$ . From (125), we see that $P_{e}^{(n)}(\mathscr{C}^{\Pi(\theta^{n})}|s^{n},\theta^{n})=\sum_{\pi\in\Pi(\theta^{n})}\mu(\pi)\cdot P_{e}^{(n)}(\mathscr{C}|\pi s^{n},\theta^{n})$ , for all $s^{n}\in\mathcal{S}^{n}$ with $l^{n}(s^{n})\leq\Lambda$ . Therefore, together with (123), we have that the probability of error of the random code $\mathscr{C}^{\Pi(\theta^{n})}$ is bounded by $P_{e}^{(n)}(q,\theta^{n},\mathscr{C}^{\Pi(\theta^{n})})\leq e^{-\delta n}$ , for every $q(s^{n}|\theta^{n})\in\mathcal{P}_{\Lambda}(\mathcal{S}^{n}|\theta^{n})$ . It follows that $\mathscr{C}^{\Pi(\theta^{n})}$ is a $(2^{nR},n,e^{-\delta n})$ random code for the AVC $\mathcal{W}$ with fixed parameters under input constraint $\Omega$ and state constraint $\Lambda$ . ∎

C-B Converse Proof

Assume to the contrary that there exists an achievable rate pair

[TABLE]

using random codes over the AVC $\mathcal{W}$ under input constraint $\Omega$ and state constraint $\Lambda$ , where $\delta>0$ is arbitrarily small. That is, for every $\varepsilon>0$ and sufficiently large $n$ , there exists a $(2^{nR},n)$ random code $\mathscr{C}^{\Gamma}=(\mu,\Gamma,\{\mathscr{C}_{\gamma}\}_{\gamma\in\Gamma})$ for the AVC $\mathcal{W}$ , such that $\sum_{\gamma\in\Gamma}\mu(\gamma)\phi^{n}(\mathrm{f}_{\gamma}(m,\theta^{n}))\leq\Omega$ , and

[TABLE]

for all $m\in[1:2^{nR}]$ and $q(s^{n}|\theta^{n})\in\mathcal{P}_{\Lambda}(\mathcal{S}^{n}|\theta^{n})$ . In particular, for distributions $q(\cdot|\theta^{n})$ that give mass $1$ to some sequence $s^{n}\in\mathcal{S}^{n}$ with $l^{n}(s^{n})\leq\Lambda$ , we have that $P_{e}^{(n)}(\mathscr{C}^{\Gamma}|s^{n},\theta^{n})\leq\varepsilon$ .

Consider using the random code $\mathscr{C}^{\Gamma}$ over the compound channel $\mathcal{W}^{\overline{\mathcal{P}}_{\Lambda-\delta}(\mathcal{S})}$ with fixed parameters under input constraint $\Omega$ . Let $\overline{q}(s|t)\in\overline{\mathcal{P}}_{\Lambda-\delta}(\mathcal{S})$ be a given state distribution. Then, define a sequence of conditionally independent random variables $\overline{S}_{1},\ldots,\overline{S}_{n}\sim\overline{q}(s|t)$ . Letting $\overline{q}^{n}(s^{n}|\theta^{n})\triangleq\prod_{i=1}^{n}\overline{q}(s_{i}|\theta_{i})$ , the probability of error is bounded by

[TABLE]

The first sum is bounded by (128), and the second term vanishes by the law of large numbers, since $\overline{q}\in\overline{\mathcal{P}}_{\Lambda-\delta}(\mathcal{S}|\theta^{\infty})$ . It follows that the random code $\mathscr{C}^{\Gamma}$ achieves a rate $R$ as in (127) over the compound channel $\mathcal{W}^{\overline{\mathcal{P}}_{\Lambda-\delta}(\mathcal{S})}$ with fixed parameters under input constraint $\Omega$ , for an arbitrarily small $\delta>0$ , in contradiction to Lemma 1. We deduce that the assumption is false, and $\mathbb{C}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\mathcal{W})\leq\mathsf{C}(\mathcal{W}^{\mathcal{Q}})\big{|}_{\mathcal{Q}=\overline{\mathcal{P}}_{\Lambda}(\mathcal{S}|\theta^{\infty})}=\mathsf{C}_{n}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\mathcal{W})$ . ∎

Appendix D Proof of Lemma 4

To prove that $\mathsf{R}_{n}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\mathcal{W})=\mathsf{C}_{n}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\mathcal{W})$ , we begin with the property in the lemma below.

*Lemma 17**.*

Let $\omega_{i}^{*}$ , $\lambda_{i}^{*}$ , $i\in[1:n]$ , be the parameters that achieve the saddle point in (21), i.e.

[TABLE]

Then, for every $i,j\in[1:n]$ such that $\theta_{i}=\theta_{j}$ , we have that $\omega_{i}^{*}=\omega_{j}^{*}$ and $\lambda_{i}^{*}=\lambda_{j}^{*}$ .

Proof of Lemma 17.

For every $i\in[1:n]$ , let $p_{i},q_{i}$ denote input and state distributions such that $\mathbb{E}\phi(X_{i})\leq\omega_{i}^{*}$ , $\mathbb{E}l(S_{i})\leq\lambda_{i}^{*}$ for $X_{i}\sim p_{i}$ , $S_{i}\sim q_{i}$ . Now, suppose that $\theta_{i}=\theta_{j}=t$ , and define

[TABLE]

Then, $\mathbb{E}\phi(X^{\prime})=\frac{1}{2}[\mathbb{E}\phi(X_{i})+\mathbb{E}\phi(X_{j})]$ and $\mathbb{E}l(S^{\prime})=\frac{1}{2}[\mathbb{E}l(S_{i})+\mathbb{E}l(S_{j})]$ for $X^{\prime}\sim p^{\prime}$ , $S^{\prime}\sim q^{\prime}$ . Furthermore, since the mutual information is concave- $\cap$ in the input distribution and convex- $\cup$ in the state distribution, we have that

[TABLE]

Therefore, the saddle point distributions must satisfy $p_{i}=p_{j}=p^{\prime}$ and $q_{i}=q_{j}=q^{\prime}$ , hence $\omega_{i}^{*}=\omega_{j}^{*}$ and $\lambda_{i}^{*}=\lambda_{j}^{*}$ . ∎

Next, it can be inferred from Lemma 17 that

[TABLE]

where $P_{T}$ is the type of the parameter sequence $\theta^{n}$ . The second equality follows from the definition of $\mathsf{C}_{t}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\omega_{t},\lambda_{t})$ in (20), using the minimax theorem [96] to switch between the order of the minimum and maximum. In the third line, we eliminate the slack variables $\lambda_{i}$ and $\omega_{i}$ replacing $\mathbb{E}_{q}l(S_{i})$ and $\mathbb{E}\phi(X_{i})$ , respectively. The last equality holds by the definition of $\mathsf{C}_{n}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\mathcal{W})$ in (16). ∎

Appendix E Proof of Lemma 8

Consider the AVC $\mathcal{W}$ with fixed parameters under input constraint $\Omega$ and state constraint $\Lambda$ . Let $\theta^{n}$ be sequence of fixed parameters for a given blocklength. Recall that $T$ is a random variable that is distributed as the type of $\theta^{n}$ . We extend the proof in [30]. First, we give an auxiliary lemma, which we also used in [85].

*Lemma 18** *(See [30]

[85, Lemma 11] ).

For every pair of conditional state distributions $Q(s|x,t)$ and $Q^{\prime}(s|x,t)$ such that

[TABLE]

there exists $\xi>0$ such that

[TABLE]

Proof of Lemma 18.

Assume to the contrary that the LHS in (135) is zero, and define

[TABLE]

Using the symmetry between $Q$ and $Q^{\prime}$ , we have that

[TABLE]

Since we have assumed that $P_{T}(t)>\delta_{0}$ for all $t\in\mathcal{T}$ , it follows that

[TABLE]

for all $t\in\mathcal{T}$ , $x,\tilde{x}\in\mathcal{X}$ and $y\in\mathcal{Y}$ . In other words, $Q_{A}(\cdot|\cdot,t)$ symmetrizes the channel $W_{Y|X,S,T}(\cdot|\cdot,\cdot,t)$ for all $t\in\mathcal{T}$ . Therefore, by the definition of $\widetilde{\Lambda}_{n}(p)$ in (27), we have that

[TABLE]

in contradiction to (134). The equality above holds because $T$ is distributed as the type of the parameter sequence $\theta^{n}$ , hence averaging over time is the same as averaging according to $P_{T}$ . It follows that the LHS of (135) must be positive. This completes the proof of the auxiliary Lemma. ∎

We move to the main part of the proof. To show that (37) holds for sufficiently small $\eta$ , assume to the contrary that there exists $y^{n}$ such that $(y^{n},\theta^{n})$ is in $\mathcal{D}(m)\cap\mathcal{D}(\widetilde{m})\neq\emptyset$ . By the assumption in the lemma, the codewords $\{\mathrm{f}(m,\theta^{n})\}_{m\in[1:2^{nR}]}$ have the same conditional type. In particular, $P_{\widetilde{X}|T}=P_{X|T}=p$ .

By Condition 1) of the decoding rule,

[TABLE]

and by Condition 2) of the decoding rule,

[TABLE]

where $T,X,\widetilde{X},S,Y$ are distributed according to the joint type of $\theta^{n}$ , $f^{n}(m,\theta^{n})$ , $f^{n}(\widetilde{m},\theta^{n})$ , $s^{n}$ , and $y^{n}$ . Adding (140) and (141) yields

[TABLE]

That is, $D(P_{T,X,\widetilde{X},S,Y}||P_{T}\times p\times p\times P_{S|\widetilde{X},T}\times W_{Y|X,S,T})\leq 2\eta$ . Therefore, by the log-sum inequality (see e.g. [27, Theorem 2.7.1]),

[TABLE]

where $V_{Y|X,\widetilde{X},T}(y|x,\tilde{x},t)=\sum_{s\in\mathcal{S}}W_{Y|X,S,T}(y|x,s,t)P_{S|\widetilde{X},T}(s|\tilde{x},t)$ . Then, by Pinsker’s inequality (see e.g. [29, Problem 3.18]),

[TABLE]

where $c>0$ is a constant. By the same arguments, (34) implies that

[TABLE]

where $V_{Y|X,\widetilde{X},T}^{\prime}(y|x,\tilde{x},t)=\sum_{s\in\mathcal{S}}W_{Y|X,S,T}(y|\tilde{x},s,t)P_{\widetilde{S}|X,T}(s|x,t)$ . Now, observe that inserting the sum over $t\in\mathcal{T}$ into the absolute value maintains the inequality, by the triangle inequality. Furthermore, since $p(x|t)>\delta_{1}$ , for $x\in\mathcal{X}$ , $t\in\mathcal{T}$ , we have that

[TABLE]

Equivalently, the above can be expressed as

[TABLE]

Now, we show that the state distributions $Q=P_{S|\widetilde{X},T}$ and $Q^{\prime}=P_{\widetilde{S}|X,T}$ satisfy the conditions of Lemma 18. Indeed,

[TABLE]

where the last inequality is due to (36). Thus, there exists $\xi>0$ such that (135) holds with $Q=P_{S|\widetilde{X},T}$ and $Q^{\prime}=P_{\widetilde{S}|X,T}$ , which contradicts (147), if $\eta$ is sufficiently small such that $\frac{2c\sqrt{2\eta}}{\delta^{2}}<\xi$ . ∎

Appendix F Proof of Lemma 9

Let $Z^{n}(m,\theta^{n})$ , $m\in[1:2^{nR}]$ , be statistically independent sequences, uniformly distributed over the conditional type class $\mathcal{T}^{n}(p)$ . Fix $a^{n}\in\mathcal{X}^{n}$ and $s^{n}\in\mathcal{S}^{n}$ , and consider a joint type $P_{T,X,\widetilde{X},S}$ , such that $P_{X|T}=P_{\widetilde{X}|T}=p$ . We intend to show that $\{Z^{n}(m,\theta^{n})\}$ satisfy each of the desired properties with double exponential high probability $(1-e^{-2^{\mathsf{E}n}})$ , $\mathsf{E}>0$ , implying that there exists a deterministic codebook that satisfies (38)-(40) simultaneously. We begin with the following large deviations result by Csisár and Narayan [30].

*Lemma 19** (see [30, Lemma A1]).*

Let $\alpha,\beta\in[0,1]$ , and consider a sequence of random vectors $U^{n}(m)$ , and functions $\varphi_{m}:\mathcal{X}^{nm}\rightarrow[0,1]$ , for $m\in[1:\mathsf{M}]$ . If

[TABLE]

then

[TABLE]

To show that (38) holds, consider the indicator

[TABLE]

By standard type class considerations (see e.g. [67, Theorem 1.3]), we have that

[TABLE]

where the last inequality holds since $I(\widetilde{X};T,X,S)\geq I(\widetilde{X};X,S|T)$ .

Next, we use Lemma 19, and plug

[TABLE]

For sufficiently large $n$ , we have that $\mathsf{M}(\beta-\alpha\log e)\geq 2^{n\varepsilon/2}$ . Hence, by Lemma 19,

[TABLE]

By the symmetry between $m$ and $\widetilde{m}$ in the derivation above, the double exponential decay of the probability in (154) implies that there exists a codebook that satisfies (38).

Similarly, to show (39), we replace the indicator of the type $P_{X,\widetilde{X},S|T}$ in (151) by an indicator of the type $P_{\widetilde{X},S|T}$ , and rewrite (152) with $I(\widetilde{X};S|T)$ , to obtain

[TABLE]

where $\varepsilon_{1}>0$ is arbitrarily small. If $I(\widetilde{X};S|T)>\varepsilon$ and $R\geq\varepsilon$ , then choosing $\varepsilon_{1}=\frac{\varepsilon}{2}$ , we have that

[TABLE]

hence,

[TABLE]

It remains to show that (40) holds. Assume that

[TABLE]

Let $\mathcal{J}_{m}$ denote the set of indices $\widetilde{m}<m$ such that $(\theta^{n},Z^{n}(\widetilde{m},\theta^{n}),s^{n})\in\mathcal{T}^{n}(P_{T,\widetilde{X},S})$ , provided that their number does not exceed $2^{n\left(\left[R-I(\widetilde{X};S|T)\right]_{+}+\frac{\varepsilon}{8}\right)}$ ; else, let $\mathcal{J}_{m}=\emptyset$ . Also, let

[TABLE]

Then, choosing $\varepsilon_{1}=\frac{\varepsilon}{8}$ in (155) yields

[TABLE]

Therefore, instead of bounding the set of messages, it is sufficient to consider the sum $\sum\psi_{m}(Z^{n}(1,\theta^{n}),\ldots,Z^{n}(m,\theta^{n}))$ . Furthermore, by standard type class considerations (see e.g. [67, Theorem 1.3]), we have that

[TABLE]

where the last inequality is due to (158). Thus, by Lemma 19,

[TABLE]

as we have assumed that $R\geq\varepsilon$ . Equations (160) and (162) imply that the property in (40) holds with double exponential probability $1-e^{-2^{\mathsf{E}_{1}n}}$ , where $\mathsf{E}_{1}>0$ . ∎

Appendix G Proof of Theorem 6

G-A Achievability Proof

Suppose that $L_{n}^{*}>\Lambda$ for sufficiently large $n$ . Let $\varepsilon>0$ be chosen later, and let $P_{X|T}$ be a conditional type over $\mathcal{X}$ , for which $P_{X|T}(x|t)>0$ $\forall x\in\mathcal{X}$ , $t\in\mathcal{T}$ , and $\mathbb{E}\phi(X)\leq\Omega$ , with

[TABLE]

As explained below, we may assume without loss of generality that for some $\delta_{0}>0$ that does not depend on $n$ , we have that $P_{T}(t)>\delta_{0}$ for all $t\in\mathcal{T}$ . Indeed, following our assumption in (25), the asymptotic capacity formula $\liminf\mathsf{C}_{n}(\mathcal{W})$ does not change when we remove parameter values $t\in\mathcal{T}$ such that $P_{T}(t)\rightarrow 0$ . Hence, coding can be limited to the rest of the block with negligible rate decrease, thus removing those parameters from consideration. Then, choose $\eta>0$ to be sufficiently small such that Lemma 8 guarantees that the decoder in Definition 5 is well defined. Now, Lemma 9 assures that there is a codebook $\{x^{n}(m,\theta^{n})\}_{m\in[1:2^{nR}]}$ of conditional type $p$ that satisfies (38)-(40). Consider the following coding scheme.

Encoding: To send $m\in[1:2^{nR}]$ , transmit $x^{n}(m,\theta^{n})$ .

Decoding: Find a unique message $\hat{m}$ such that $(y^{n},\theta^{n})$ belongs to $\mathcal{D}(\hat{m})$ , as in Definition 5. If there is none, declare an error. Lemma 8 guarantees that there cannot be two messages for which this holds.

Analysis of Probability of Error: Fix $s^{n}\in\mathcal{S}^{n}$ with $l^{n}(s^{n})\leq\Lambda$ , let $q=P_{S|T}$ denote the conditional type of $s^{n}$ given $\theta^{n}$ , and let $M$ denote the transmitted message. Consider the error events

[TABLE]

and

[TABLE]

where $(T,X,\widetilde{X},S)$ are dummy random variables, which are distributed as the joint type of $(\theta^{n},x^{n}(M,\theta^{n}),x^{n}(\widetilde{m},\theta^{n}),$ $s^{n})$ . By the union of events bound,

[TABLE]

where the conditioning on $S^{n}=s^{n}$ and $T^{n}=\theta^{n}$ is omitted for convenience of notation. Based on Lemma 9, the probabilities of the events $\mathcal{F}_{1}$ and $\mathcal{F}_{2}$ tend to zero as $n\rightarrow\infty$ , by (39) and (40), respectively.

Now, suppose that Condition 1) of the decoding rule is violated. Observe that the event $\mathcal{E}_{1}\cap\mathcal{F}_{1}^{c}$ implies that

[TABLE]

Then, by standard large deviations considerations (see e.g. [27, pp. 362–364]),

[TABLE]

which tends to zero as $n\rightarrow\infty$ , for sufficiently small $\varepsilon>0$ , with $\varepsilon<\frac{1}{2}\eta$ .

Moving to Condition 2) of the decoding rule, let $\mathcal{D}_{2}$ denote the set of joint types $P_{T,X,\widetilde{X},S}$ such that

[TABLE]

Then, by standard type class considerations (see e.g. [67, Theorem 1.3]),

[TABLE]

for every given $m\in[1:2^{nR}]$ . Hence, by (38),

[TABLE]

To further bound $\Pr\left(\mathcal{E}_{2}\cap\mathcal{F}_{2}^{c}\right)$ , consider the following cases. Suppose that $R\leq I_{q}(\widetilde{X};S|T)$ . Then, given $\mathcal{F}_{2}^{c}$ , we have that

[TABLE]

By (173), it then follows that

[TABLE]

Returning to (175), we note that since the number of types is polynomial in $n$ , the cardinality of the set of types $\mathcal{D}_{2}$ can be bounded by $2^{n\varepsilon}$ , for sufficiently large $n$ . Hence, by (175) and (177), we have that $\Pr\left(\mathcal{E}_{2}\cap\mathcal{F}_{2}^{c}\right)\leq 2^{-n(\eta-4\varepsilon)}$ , which tends to zero as $n\rightarrow\infty$ , for $\varepsilon<\frac{1}{4}\eta$ .

Otherwise, if $R>I_{q}(\widetilde{X};S|T)$ , then given $\mathcal{F}_{2}^{c}$ ,

[TABLE]

Thus,

[TABLE]

Hence, by (175) we have that

[TABLE]

For $P_{T,X,\widetilde{X},S}\in\mathcal{D}_{2}$ , we have by (172) that $P_{T,\widetilde{X},\widetilde{S},Y}$ is arbitrarily close to some $P_{T,X,\widetilde{S},\widetilde{Y}}$ , where

[TABLE]

if $\eta>0$ is sufficiently small. In which case,

[TABLE]

where $\delta>0$ is arbitrarily small. Therefore, provided that

[TABLE]

we have that $\Pr\left(\mathcal{E}_{2}\cap\mathcal{F}_{2}^{c}\right)\leq 2^{-n(I_{q}(\widetilde{X};Y|T)-R-4\varepsilon)}$ tends to zero as $n\rightarrow\infty$ . ∎

G-B Converse Proof

We will use the following lemma, based on the observations of Ericson [37].

*Lemma 20**.*

Consider the AVC with fixed parameters free of state constraints, and let $\mathscr{C}=(f,g)$ be a $(2^{nR},n)$ deterministic code. Suppose that the channels $W_{Y|X,S,T}(\cdot|\cdot,\cdot,\theta_{i})$ are symmetrizable for all $i\in[1:n]$ , and let $J_{t}(s|x)$ , $t\in\mathcal{T}$ , be a set of conditional state distributions that satisfy (4). If $R>0$ , then

[TABLE]

where $J_{\theta^{n}}(s^{n}|x^{n})=\prod_{i=1}^{n}J_{\theta_{i}}(s_{i}|x_{i})$ .

For completeness, we give the proof below.

Proof of Lemma 20.

Denote the codebook size by $\mathsf{M}=2^{nR}$ , and the codewords by $x^{n}(m,\theta^{n})=f^{n}(m,\theta^{n})$ .

Under the conditions of the lemma,

[TABLE]

where have defined $W^{n}\equiv W_{Y^{n}|X^{n},S^{n},T^{n}}$ for short notation. By switching between the summation indices $m$ and $\widetilde{m}$ , we obtain

[TABLE]

Now, as the channel is memoryless,

[TABLE]

where the second equality is due to (4). Therefore,

[TABLE]

Assuming the sum rate is positive, we have that $\mathsf{M}\geq 2$ , hence $P_{e}^{(n)}(\widetilde{q},\theta^{n},\mathscr{C})\geq\frac{1}{4}$ . ∎

Now, we are in position to prove the converse part of Theorem 6. Consider a sequence of $(2^{nR},n,\alpha_{n})$ deterministic codes $\mathscr{C}_{n}$ over the AVC with fixed parameters under input constraint $\Omega$ and state constraint $\Lambda$ , where $\alpha_{n}\rightarrow 0$ as $n\rightarrow\infty$ . In particular, the conditional probability of error given a state sequence $s^{n}$ is bounded by

[TABLE]

Let $X^{n}=\mathrm{f}(M,\theta^{n})$ be the channel input sequence, and let $Y^{n}$ be the corresponding output.

Consider using the same code over the compound channel with fixed parameters, i.e. where the jammer selects a state sequence at random according to a product distribution, $\overline{S}^{n}\sim\prod_{i=1}^{n}q(\overline{s}_{i}|\theta_{i})$ , under the average state constraint $\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}_{q}l(S_{i})\leq\Lambda-\delta$ . Here, there is no state constraint with probability $1$ , as the jammer may select a sequence $\overline{S}^{n}$ with $l^{n}(\overline{S}^{n})>\Lambda$ . Yet, the probability of error is bounded by

[TABLE]

The first sum is bounded by (190), and the second term vanishes by the law of large numbers, since $\overline{q}\in\overline{\mathcal{P}}_{\Lambda-\delta}(\mathcal{S}|\theta^{\infty})$ . It follows that the code sequence of the constrained AVC achieves the same rate $R$ over the compound channel $W_{Y|X,\overline{S},T}$ . As in Appendix A, Fano’s inequality implies that for every jamming strategy $\overline{q}^{n}(s^{n}|\theta^{n})$ ,

[TABLE]

with $X\triangleq X_{K}$ , $T\equiv\theta_{K}$ , $Y\triangleq Y_{K}$ , where $K$ is uniformly distributed over $[1:n]$ . Hence, $T$ is distributed according to the type of the parameter sequence $\theta^{n}$ (see (113)).

Returning to the original AVC, suppose that $L_{n}^{*}>\Lambda$ . It remains to show that $R>0$ implies that $\widetilde{\Lambda}_{n}(P_{X|T})\geq\Lambda$ . If the channels $W_{Y|X,S,T}(\cdot|\cdot,\cdot,\theta_{i})$ is non-symmetrizable for some $i\in[1:n]$ , then $\widetilde{\Lambda}_{n}(P_{X|T})=+\infty$ , and there is nothing to show. Hence, consider the case where $W_{Y|X,S,T}(\cdot|\cdot,\cdot,\theta_{i})$ are symmetrizable for all $i\in[1:n]$ . Assume to the contrary that $R>0$ and $\widetilde{\Lambda}_{n}(P_{X|T})<\Lambda$ . Hence, there exist conditional state distributions $J_{\theta_{i}}(s|x)$ that symmetrize $W_{Y|X,S,T}(\cdot|\cdot,\cdot,\theta_{i})$ , such that

[TABLE]

Now, consider the following jamming strategy. First, the jammer selects a codeword $\widetilde{X}^{n}$ from the codebook uniformly at random. Then, the jammer selects a sequence $\widetilde{S}^{n}$ at random, according to the conditional distribution

[TABLE]

At last, if $l^{n}(\widetilde{S}^{n})\leq\Lambda$ , the jammer chooses the state sequence to be $S^{n}=\widetilde{S}^{n}$ . Otherwise, the jammer chooses $S^{n}$ to be some sequence of zero cost. Such jamming strategy satisfies the state constraint $\Lambda$ with probability $1$ .

To contradict our assumption that $\widetilde{\Lambda}(P_{X|T})<\Lambda$ , we first show that $\mathbb{E}l^{n}(\widetilde{S}^{n})=\widetilde{\Lambda}(P_{X|T})$ . Observe that for every $x^{n}\in\mathcal{X}^{n}$ ,

[TABLE]

Since $\widetilde{X}^{n}$ is distributed as $X^{n}$ , we obtain

[TABLE]

Thus, by Chebyshev’s inequality we have that for sufficiently large $n$ ,

[TABLE]

where $\delta_{0}>0$ is arbitrarily small. Now, on the one hand, the probability of error is bounded by

[TABLE]

where $\widetilde{q}(s^{n}|\theta^{n})$ is as defined in (185). On the other hand, the sequence $\widetilde{S}^{n}$ can be thought of as the state sequence of an AVC without a state constraint, hence, by Lemma 20,

[TABLE]

Thus, by (198)-(199), the probability of error is bounded by $P_{e}^{(n)}(q,\theta^{n},\mathscr{C}_{n})\geq\frac{1}{4}-\delta_{0}$ . As this cannot be the case for a code with vanishing probability of error, we deduce that the assumption is false, i.e. $R>0$ implies that $\widetilde{\Lambda}_{n}(P_{X|T})\geq\Lambda$ .

If $L_{n}^{*}<\Lambda$ , then $\widetilde{\Lambda}_{n}(P_{X|T})<\Lambda$ for all $P_{X|T}$ with $\mathbb{E}\phi(X)\leq\Omega$ , and a positive rate cannot be achieved. This completes the converse proof. ∎

Appendix H Proof of Corollary 7

Assume that the AVC $\mathcal{W}$ with fixed parameters satisfies the conditions of Corollary 7. Looking into the converse proof above, the following addition suffices. We show that for every code $\mathscr{C}_{n}$ as in the converse proof above, $\widetilde{\Lambda}_{n}(P_{X|T})=\Lambda$ implies that $R=0$ . Since there is only a polynomial number of types, we may consider $P_{X|T}(x|t)$ to be the conditional type of $f^{n}(m,\theta^{n})$ given $\theta^{n}$ , for all $m\in[1:2^{nR}]$ (see [29, Problem 6.19]).

Suppose that $\widetilde{\Lambda}_{n}(P_{X|T})=\Lambda$ , assume to the contrary that $R>0$ , and let $J_{i}(s|x)$ be distributions that achieve the minimum in (27), i.e.

[TABLE]

Based on the condition of the corollary, we may assume that $J_{i}(s|x)$ is a [math]- $1$ law, i.e.

[TABLE]

for some deterministic function $G_{i}:\mathcal{X}\rightarrow\mathcal{S}$ .

Recall that we have defined $X=X_{K}$ , $Y=Y_{K}$ in the converse proof, where $K$ is a uniformly distributed variable over $[1:n]$ . Thus, by (200),

[TABLE]

Now, consider the following jamming strategy. First, the jammer selects a codeword $\widetilde{X}^{n}$ from the codebook uniformly at random. Then, given $\widetilde{X}^{n}=x^{n}$ , the jammer chooses the state sequence $S^{n}=\left(G_{i}(x_{i})\right)_{i=1}^{n}$ . Observe that

[TABLE]

where the last equality is due to (202). Thus, the state sequence satisfies the state constraint. Now, observe that the jamming strategy $S^{n}=\left(G(\widetilde{X}_{i})\right)_{i=1}^{n}$ is equivalent to $S^{n}\sim\widetilde{q}(s^{n}|\theta^{n})$ as in (185). Thus, by Lemma 20, we have that $P_{e}^{(n)}(\widetilde{q},\mathscr{C}_{n})\geq\frac{1}{4}$ , hence a positive rate cannot be achieved. ∎

Appendix I Proof of Lemma 10

Suppose that $L_{n}^{*}>\Lambda$ . The proof is similar to that of Lemma 4. We begin with the property in the lemma below.

*Lemma 21**.*

Let $\omega_{i}^{*}$ , $\lambda_{i}^{*}$ , $\widetilde{\lambda}_{i}^{*}$ , $i\in[1:n]$ , be the parameters that achieve the saddle point in (43), i.e.

[TABLE]

Then, for every $i,j\in[1:n]$ such that $\theta_{i}=\theta_{j}$ , we have that $\omega_{i}^{*}=\omega_{j}^{*}$ , $\widetilde{\lambda}_{i}^{*}=\widetilde{\lambda}_{j}^{*}$ , and $\lambda_{i}^{*}=\lambda_{j}^{*}$ .

Proof of Lemma 21.

For every $i\in[1:n]$ , let $p_{i},q_{i}$ denote input and state distributions such that $\mathbb{E}\phi(X_{i})\leq\omega_{i}^{*}$ , $\widetilde{\Lambda}_{\theta_{i}}(p_{i})\geq\widetilde{\lambda}_{i}^{*}$ , $\mathbb{E}l(S_{i})\leq\lambda_{i}^{*}$ for $X_{i}\sim p_{i}$ , $S_{i}\sim q_{i}$ . Now, suppose that $\theta_{i}=\theta_{j}=t$ , and define

[TABLE]

Then, $\mathbb{E}\phi(X^{\prime})=\frac{1}{2}[\mathbb{E}\phi(X_{i})+\mathbb{E}\phi(X_{j})]$ , $\Lambda_{t}(p^{\prime})=\frac{1}{2}[\Lambda_{t}(p_{i})+\Lambda_{t}(p_{j})]$ , and $\mathbb{E}l(S^{\prime})=\frac{1}{2}[\mathbb{E}l(S_{i})+\mathbb{E}l(S_{j})]$ for $X^{\prime}\sim p^{\prime}$ , $S^{\prime}\sim q^{\prime}$ . Furthermore, since the mutual information is concave- $\cap$ in the input distribution and convex- $\cup$ in the state distribution, we have that

[TABLE]

Therefore, the saddle point distributions must satisfy $p_{i}=p_{j}=p^{\prime}$ and $q_{i}=q_{j}=q^{\prime}$ , hence $\omega_{i}^{*}=\omega_{j}^{*}$ , $\widetilde{\lambda}_{i}^{*}=\widetilde{\lambda}_{j}^{*}$ , and $\lambda_{i}^{*}=\lambda_{j}^{*}$ . ∎

Next, it can be inferred from Lemma 21 that

[TABLE]

where $P_{T}$ is the type of the parameter sequence $\theta^{n}$ . The second equality follows from the definition of $\mathsf{C}_{t}(\omega_{t},\lambda_{t},\widetilde{\lambda}_{d})$ in (44), using the minimax theorem [96] to switch between the order of the minimum and maximum. In the third line, we eliminate the slack variables $\lambda_{i}$ , $\omega_{i}$ , and $\widetilde{\lambda}_{i}$ , replacing $\mathbb{E}_{q}l(S_{i})$ , $\mathbb{E}\phi(X_{i})$ , and $\widetilde{\Lambda}(p,\theta_{i})$ , respectively. The last equality holds by the definition of $\mathsf{C}_{n}(\mathcal{W})$ in (29). ∎

Appendix J Analysis of Example 2

Consider the fading AVC in Example 2. To show the direct part with random codes, set the conditional input distribution $X\sim\mathcal{N}(0,\omega(t))$ given $T=t$ in (21). Then, for every $t\in\mathcal{T}$ ,

[TABLE]

where we have denoted $\lambda^{\prime}(t)\triangleq\mathbb{E}(S^{2}|T=t)$ . The last inequality holds since Gaussian noise is known to be the worst additive noise under variance constraint [34, Lemma II.2]. The direct part follows. As for the converse part, consider a jamming scheme where the state is drawn according to the conditional distribution $S\sim\mathcal{N}(0,\lambda(t))$ given $T=t$ . Then, the proof follows from Shannon’s classic result on the Gaussian channel $Y=tX+V$ with $V\sim\mathcal{N}(0,\lambda(t)+\sigma^{2})$ .

We move to the deterministic code capacity. By Definition 4, the constant-parameter channel $W_{Y|X,S,T=t}$ is symmetrized by a conditional pdf $\varphi(s|x)$ if

[TABLE]

where $f_{Z}(z)=\frac{1}{\sqrt{2\pi\sigma^{2}}}e^{-z^{2}/2\sigma^{2}}$ . Equivalently, the constant-parameter channel is symmetrized by $\varphi_{x}(s)\equiv\varphi(s|x)$ if

[TABLE]

for all $x,y\in\mathbb{R}$ . By substituting $z=y-tx-s$ in the LHS, and $\bar{z}=y-s$ in the RHS, we have

[TABLE]

For every $x\in\mathbb{R}$ , define the random variable $\overline{S}(x)\sim\varphi_{x}$ . We note that the RHS is the convolution of the pdfs of the random variables $Z$ and $\overline{S}(x)$ , while the LHS is the convolution of the pdfs of the random variables $Z$ and $\overline{S}(0)+x$ . This is not surprising since the channel output $Y$ is a sum of independent random variables, and thus the pdf of $Y$ is a convolution of pdfs. It follows that $\varphi_{0}(y-tx)=\varphi_{x}(y)$ , and by plugging $s$ instead of $y$ , we have that $\varphi_{x}$ symmetrizes the constant-parameter channel $W_{Y|X,S,T=t}$ if and only if

[TABLE]

Then, the corresponding state cost satisfies

[TABLE]

where the second equality follows by the integral substitution of $a=s-tx$ . Observe that the bracketed integral can be expressed as

[TABLE]

Thus, by (213),

[TABLE]

Note that the last inequality holds for any $\varphi_{x}$ which symmetrizes the channel, and in particular for $\hat{\varphi}_{x}(s)=\delta(s-tx)$ , where $\delta(\cdot)$ is the Dirac delta function. In addition, since $\hat{\varphi}_{0}$ gives probability $1$ to $S=0$ , we have that (215) holds with equality for $\hat{\varphi}_{x}$ , and thus,

[TABLE]

with $\omega(t)\equiv\mathbb{E}[X^{2}|T=t]$ . Hence,

[TABLE]

Having shown that the minimum in (27) is attained by a [math]- $1$ law, we have by Corollary 7 that the capacity of the fading AVC is $\mathbb{C}(\mathcal{W})=\liminf\mathsf{C}_{n}(\mathcal{W})$ , with

[TABLE]

To show the direct part, we only need to consider the case where $\max\limits_{\omega(t)\,:\;\mathbb{E}\omega(T)\leq\Omega}\mathbb{E}(T^{2}\omega(T))>\Lambda$ . Then, set the conditional input distribution $X\sim\mathcal{N}(0,\omega(t))$ given $T=t$ in (218). As in the direct part with random codes,

[TABLE]

with $\lambda^{\prime}(t)\triangleq\mathbb{E}(S^{2}|T=t)$ , since Gaussian noise is the worst additive noise under variance constraint [34, Lemma II.2]. The direct part follows. As for the converse part, for the conditional distribution $S\sim\mathcal{N}(0,\lambda(t))$ given $T=t$ , we have that

[TABLE]

with $\omega^{\prime}(t)\triangleq\mathbb{E}(X^{2}|T=t)$ , since the Gaussian distribution maximizes the differential entropy. The proof follows. ∎

Appendix K Proof of Lemma 13

Part 1

Since $\sum_{j^{\prime}=1}^{d}P_{j^{\prime}}^{*}=\Omega>0$ , there must be some $j\in[1:d]$ such that $P_{j}^{*}=\alpha-(N_{j}^{*}+\sigma_{j}^{2})>0$ , thus $\alpha>N_{j}^{*}+\sigma_{j}^{2}$ . If $N_{j}^{*}=0$ , then it follows that $\beta\leq\sigma_{j}^{2}$ , hence

[TABLE]

Otherwise, $N_{j}^{*}=\beta-\sigma_{j}^{2}>0$ , thus by the assumption $P_{j}^{*}>0$ , we have that

[TABLE]

Part 2

Assume to the contrary that $N_{j}^{*}=\beta-\sigma_{j}^{2}>0$ and $P_{j}^{*}=0$ . The assumption $P_{j}^{*}=0$ implies that $\alpha\leq N_{j}^{*}+\sigma_{j}^{2}=\beta$ , in contradiction to part 1 of the Lemma. Hence, the assumption is false, and $N_{j}^{*}>0$ implies that $P_{j}^{*}>0$ .

Part 3 and Part 4

By the definition of $N_{j}^{*}$ in (72), we have that $N_{j}^{*}+\sigma_{j}^{2}=\max(\beta,\sigma_{j}^{2})$ for all $j\in[1:d]$ . Thus,

[TABLE]

where the last equality is due to part 1. Part 4 immediately follows. ∎

Appendix L Proof of Lemma 14

Let $X^{d}$ be a zero mean random vector with the covariance matrix $K_{X}$ . Observe that by (86), the AVGPC is symmetrized by a conditional pdf $\varphi_{x^{d}}(s^{d})=\varphi(s^{d}|x^{d})$ if

[TABLE]

for all $x^{d},y^{d}\in\mathbb{R}^{d}$ . By substituting $z^{d}=y^{d}-x^{d}-s^{d}$ in the LHS, and $\bar{z}^{d}=y^{d}-s^{d}$ in the RHS, this is equivalent to

[TABLE]

For every $x^{d}\in\mathbb{R}^{d}$ , define the random vector $\overline{S}^{d}(x^{d})\sim\varphi_{x^{d}}$ . We note that the RHS is the convolution of the pdfs of the random vectors $Z^{d}$ and $\overline{S}^{d}(x^{d})$ , while the LHS is the convolution of the pdfs of the random vectors $Z^{d}$ and $\overline{S}^{d}(0)+x^{d}$ . This is not surprising since the channel output $Y^{d}$ is a sum of independent random vectors, and thus the pdf of $Y^{d}$ is a convolution of pdfs. It follows that $\varphi_{0}(y^{d}-x^{d})=\varphi_{x^{d}}(y^{d})$ , and by plugging $s^{d}$ instead of $y^{d}$ , we have that $\varphi_{x^{d}}$ symmetrizes the AVGPC if and only if

[TABLE]

Then, the corresponding state cost satisfies

[TABLE]

where the second equality follows by the integral substitution of $a^{d}=s^{d}-x^{d}$ . Observe that the bracketed integral can be expressed as

[TABLE]

Thus, by (227),

[TABLE]

Note that the last inequality holds for any $\varphi_{x^{d}}$ which symmetrizes the channel. Now, observe that (226) holds for $\hat{\varphi}_{x^{d}}(s^{d})=\delta(s^{d}-x^{d})$ , where $\delta(\cdot)$ is the Dirac delta function, hence $\hat{\varphi}_{x^{d}}$ symmetrizes the channel. In addition, since $\hat{\varphi}_{0}$ gives probability $1$ to $S^{d}=0$ , we have that (229) holds with equality for $\hat{\varphi}_{x^{d}}$ , and thus, $\widetilde{\Lambda}(F_{X^{d}})=\mathrm{tr}(K_{X})$ . ∎

Appendix M Proof of Theorem 15

Consider the AVGPC under input constraint $\Omega$ and state constraint $\Lambda$ .

Achievability Proof

Assume that $\Omega>\Lambda$ . We show that $\mathbb{C}(\Sigma)\geq\mathsf{C}(\Sigma)=\mathsf{C}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\Sigma)$ . By [28, Theorem 3], if there exists an input distribution $F_{X^{d}}$ such that $\widetilde{\Lambda}(F_{X^{d}})>\Lambda$ , then the capacity is given by

[TABLE]

where $P_{j}=\mathbb{E}X_{j}^{2}$ and $N_{j}=\mathbb{E}S_{j}^{2}$ .

Consider the input distribution $F_{X^{d}}$ of a Gaussian vector $X^{d}\sim\mathcal{N}(\mathbf{0},K_{X})$ , where the covariance matrix is given by $K_{X}=\mathrm{diag}(P_{1}^{*},\ldots,P_{d}^{*})$ . By Lemma 14, we have that

[TABLE]

Having assumed that $\Omega>\Lambda$ , it follows that $\widetilde{\Lambda}(F_{X^{d}})>\Lambda$ , hence (230) applies. Then, setting $X^{d}\sim\mathcal{N}(\mathbf{0},K_{X})$ yields

[TABLE]

where the second inequality holds as $X_{1},\ldots,X_{d}$ are independent and since conditioning reduces entropy, and the last inequality holds since Gaussian noise is known to be the worst additive noise under variance constraint [34, Lemma II.2].

From this point, we use the considerations given in [61]. To prove the direct part, it remains to show that the assignment of $N_{j}=N_{j}^{*}$ , for $j\in[1:d]$ , is optimal in the RHS of (234), where $N_{j}^{*}$ are as defined in (72)-(73). An assignment of $N_{1},\ldots,N_{d}$ is optimal if and only if it satisfies the KKT optimality conditions [20, Section 5.5.3],

[TABLE]

for $j\in[1:d]$ , where $\theta>0$ is a Lagrange multiplier.

We claim that the conditions are met by

[TABLE]

Condition (235) is met by the definition of $N_{j}^{*}$ , $j\in[1:d]$ , in (72)-(73). Let $j\in[1:d]$ be a given channel index. We consider the following cases. Suppose that $N_{j}^{*}=0$ . Then, Condition (237) is clearly satisfied. Now, if $P_{j}^{*}=0$ , then Condition (236) is satisfied since $\alpha>\beta$ by part 1 of Lemma 13. Otherwise, $0<P_{j}^{*}=\alpha-(N_{j}^{*}+\sigma_{j}^{2})=\alpha-\sigma_{j}^{2}$ , and then

[TABLE]

where the last inequality holds since $N_{j}^{*}=0$ only if $\beta\leq\sigma_{j}^{2}$ . Thus, Condition (236) is satisfied.

Next, suppose that $N_{j}^{*}>0$ , hence $N_{j}^{*}+\sigma_{j}^{2}=\beta$ . By part 2 of Lemma 13, this implies that $P_{j}^{*}>0$ , i.e. $P_{j}^{*}=\alpha-(N_{j}^{*}+\sigma_{j}^{2})=\alpha-\beta$ . Thus,

[TABLE]

and thus Condition (236) is satisfied with equality, and Condition (237) is satisfied as well.

As the KKT conditions are satisfied under (238), we deduce that the assignment of $N_{j}=N_{j}^{*}$ , $j\in[1:d]$ , minimizes the RHS of (234). Together with (234), this implies that $\mathbb{C}(\Sigma)\geq\mathsf{C}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\Sigma)$ for $\Omega>\Lambda$ .

Converse Proof

We use a similar technique as in [32] (see also [37, 16]). In general, the deterministic code capacity is bounded by the random code capacity, hence $\mathbb{C}(\Sigma)\leq\mathbb{C}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\Sigma)=\mathsf{C}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\Sigma)$ , by Theorem 12. It remains to show that if $\Omega\leq\Lambda$ , then the capacity is zero. Suppose that $\Omega\leq\Lambda$ , and assume to the contrary that there exists an achievable rate $R>0$ . Then, there exists a sequence of $(2^{nR},n,\varepsilon_{n})$ codes $\mathscr{C}_{n}=(\mathbf{f}^{d},g)$ for the AVGPC such that $\varepsilon_{n}\rightarrow 0$ as $n\rightarrow\infty$ , where the size of the message set is at least $2$ , i.e. $\mathsf{M}\triangleq 2^{nR}\geq 2$ .

Consider a jammer who chooses the state sequence from the codebook uniformly at random, i.e. $\mathbf{S}^{d}=\mathbf{f}^{d}(M^{\prime})$ , where $M^{\prime}$ is uniformly distributed over $[1:\mathsf{M}]$ . This choice meets the state constraint, since the square norm of the state sequence is $\left\lVert\mathbf{S}^{d}\right\rVert^{2}\leq\Omega\leq\Lambda$ . The average probability of error is then bounded by

[TABLE]

where $f_{\mathbf{Z}^{d}}(\mathbf{z}^{d})=\prod_{j=1}^{d}\frac{1}{(2\pi\sigma_{j}^{2})^{n/2}}e^{-\left\lVert\mathbf{z}_{j}\right\rVert^{2}/2\sigma_{j}^{2}}$ , and

[TABLE]

By interchanging the summation variables $m$ and $m^{\prime}$ , we now have that

[TABLE]

Next, observe that for $m\neq m^{\prime}$ , $\mathcal{D}_{e}(m,m^{\prime})\cup\mathcal{D}_{e}(m,m^{\prime})=\mathbb{R}^{nd}$ , and thus the probability of error is lower bounded by

[TABLE]

where the last inequality holds since $\mathsf{M}\geq 2$ . Hence, the assumption is false and a positive rate cannot be achieved when $\Omega\leq\Lambda$ . This completes the proof of the converse part. ∎

Appendix N Proof of Theorem 16

Consider the AVC with colored Gaussian noise. First, we show that the problem can be transformed into that of an AVC with fixed parameters. Then, we derive a limit expression for the random code capacity, and prove the capacity characterization in Theorem 16 using the Toeplitz matrix properties in the auxiliary lemma below. To derive the deterministic code capacity, we use similar symmetrizability and optimization arguments as in our proofs for the Gaussian product channel.

*Lemma 22**.*

[35, Section 2.3] (see also [43, 53] [39, Section 8.5])

Let $\Psi_{Z}(\omega)$ be the power spectral density of a zero mean stationary process $\{Z_{i}\}_{i=1}^{\infty}$ . Assume that $\Psi_{Z}:[-\pi,\pi]\rightarrow[0,\nu]$ is bounded and integrable, for some $\nu>0$ , and denote the auto-correlation function by

[TABLE]

with $j=\sqrt{-1}$ . For a sequence $\mathbf{Z}$ of length $n$ , let $\sigma_{1}^{2},\ldots,\sigma_{n}^{2}$ denote the eigenvalues of the $n\times n$ covariance matrix $K_{Z}$ , where $K_{Z}(i,j)=r_{Z}(|i-j|)$ for $i,j\in[1:n]$ . Then, for every real, monotone non-increasing, and bounded function $G:[0,\nu]\rightarrow[0,\eta]$ ,

[TABLE]

if the integral exists.

N-A Transformation to AVC with Fixed Parameters

Let $K_{Z}$ denote the $n\times n$ covariance matrix of the noise sequence $\mathbf{Z}$ . Consider the eigen decomposition of the covariance matrix $K_{Z}$ , and denote the eigenvector and eigenvaule matrices by $Q$ and $\Sigma$ , respectively, i.e.

[TABLE]

We claim that the capacity of the AVC with colored Gaussian noise is the same as the capacity of the following AVC,

[TABLE]

where $\mathbf{X}^{\prime}=Q^{T}\mathbf{X}$ , $\mathbf{Z}^{\prime}=Q^{T}\mathbf{Z}$ , and $\mathbf{S}^{\prime}=Q^{T}\mathbf{S}$ . Since $Q$ is a unitary matrix, i.e. $Q^{-1}=Q^{T}$ , the input and state constraints remain the same, as $\left\lVert\mathbf{X}^{\prime}\right\rVert^{2}=(\mathbf{X}^{\prime})^{T}\mathbf{X}^{\prime}=\mathbf{X}^{T}QQ^{T}\mathbf{X}=\mathbf{X}^{T}\mathbf{X}=\left\lVert\mathbf{X}\right\rVert^{2}\leq n\Omega$ , and similarly, $\left\lVert\mathbf{S}^{\prime}\right\rVert^{2}=\left\lVert\mathbf{S}\right\rVert^{2}\leq n\Lambda$ . Furthermore, the noise covariance matrix is now

[TABLE]

This transformation can be thought of as a linear system, which is not time invariant. Hence, the noise of the transformed channel is a Gaussian process, but it is non-stationary. Thereby, the input-output relation above specifies a time varying channel, $\{F_{Y_{1},\ldots,Y_{n}|X_{1},\ldots,X_{n},S_{1},\ldots,S_{n}}\}_{n=1}^{\infty}$ . From operational perspective, if there exists a $(2^{nR},n,\varepsilon)$ code $\mathscr{C}=(\mathbf{f},g)$ for the original AVC with colored Gaussian noise, then the code $\mathscr{C}^{\prime}=(\mathbf{f}^{\prime},g^{\prime})$ , given by $\mathbf{f}^{\prime}(m)=Q^{T}\mathbf{f}(m)$ and $g^{\prime}(\mathbf{y}^{\prime})=g(Q\mathbf{y}^{\prime})$ , is a $(2^{nR},n,\varepsilon)$ code for the transformed AVC in (248). Similarly, if there exists a $(2^{nR},n,\varepsilon)$ code $\mathscr{C}^{\prime}=(\mathbf{f}^{\prime},g^{\prime})$ for the transformed AVC, then the code $\mathscr{C}=(\mathbf{f},g)$ , given by $\mathbf{f}(m)=Q\mathbf{f}^{\prime}(m)$ and $g(\mathbf{y})=g^{\prime}(Q^{T}\mathbf{y})$ , is a $(2^{nR},n,\varepsilon)$ code for the original AVC. Thus, the original AVC and the transformed AVC have the same operational capacity.

Therefore, we can assume without loss of generality that the noise sequence has independent components $Z_{i}\sim\mathcal{N}(0,\sigma_{i}^{2})$ , $i\in[1:n]$ . Assume, at first, that $\sigma_{i}^{2}\in\mathcal{T}$ for $i\in[1:n]$ , with some set $\mathcal{T}$ of finite size, which does not grow with $n$ , and that $\sigma_{i}^{2}>\delta$ , where $\delta>0$ is arbitrarily small. Hence, observe that the channel in (248) is equivalent to a channel $W_{Y^{\prime\prime}|X^{\prime\prime},S^{\prime\prime},T^{\prime\prime}}$ with fixed parameters, specified by

[TABLE]

with the parameter sequence $\sigma_{1},\sigma_{2},\ldots$ . It is left to determine the random code capacity and deterministic code capacity of the Gaussian AVC with fixed parameters in (250). Although we previously assumed in Sections II and III that the input, state, and output alphabets are finite, our results can be extended to the continuous case as well, using standard discretization techniques [15, 5] [36, Section 3.4.1].

Now, consider the double water filling allocation,

[TABLE]

for $i\in[1:n]$ , where $\beta^{\prime}>0$ and $\alpha^{\prime}>0$ are chosen to satisfy $\frac{1}{n}\sum_{i=1}^{n}\left[\beta^{\prime}-\sigma_{i}^{2}\right]_{+}=\Lambda$ and $\frac{1}{n}\sum_{i=1}^{n}\left[\alpha^{\prime}-(b_{i}^{*}+\sigma_{i}^{2})\right]_{+}=\Omega$ , respectively. Define

[TABLE]

N-B Random Code Capacity

Now that we have shown that the problem reduces to that of an AVC with fixed parameters, we have by Corollary 5 that the random code capacity is given by

[TABLE]

where $\mathsf{C}_{\sigma}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(P,N)$ is the random code capacity of the traditional AVC under input constraint $P$ and state constraint $N$ . Hughes and Narayan [60] showed that the random code capacity of such a channel, where the noise sequence is i.i.d. $\sim\mathcal{N}(0,\sigma^{2})$ , is given by

[TABLE]

Hence, for the AVC with colored Gaussian noise,

[TABLE]

Next, observe that this is the same min-max optimization as for the AVGPC in (78), due to [61], with $d\leftarrow n$ , $\Omega\leftarrow(n\Omega)$ , $\Lambda\leftarrow(n\Lambda)$ . Therefore, by Theorem 12 [61] and (256),

[TABLE]

Given a bounded power spectral density $\Psi_{Z}:[-\pi,\pi]\rightarrow[0,\nu]$ , define a function $G:[0,\nu]\rightarrow[0,\eta]$ by

[TABLE]

and observe that

[TABLE]

As $G(x)$ is non-increasing and bounded by $\eta=\frac{1}{2}\log[1+\Omega/\delta]$ , we have by Lemma 22 that

[TABLE]

Observing that the function defined in (258) is also continuous, while $\Psi_{Z}(\omega)$ is bounded and integrable, it follows that the integral exists [86, Theorem 6.11]. Plugging (258) into the RHS of (260), we obtain

[TABLE]

where $\beta$ and $\alpha$ satisfy (93) and (95), respectively. Since the covariance matrix of the stationary noise process is Toeplitz (see e.g. [43]), the density of eigenvalues on the real line tends to the power spectral density [44]. Given that the power spectral density is bounded and integrable, we have that the sequence of eigenvalues $\sigma_{1}^{2},\sigma_{2}^{2},\ldots$ is summable [43, Theorem 4.2], and thus, bounded as well. Hence, we can remove the assumption that the set of noise variances has finite cardinality, by quantization of the variances. The random code characterization now follows from (257) and (261).

N-C Deterministic Code Capacity

Moving to the deterministic code capacity, observe that for a constant-parameter Gaussian AVC, where the noise sequence is i.i.d. $\sim\mathcal{N}(0,\sigma^{2})$ , we have that $\widetilde{\Lambda}(F_{X},\sigma)=\mathbb{E}X^{2}$ , by Lemma 14, taking $d=1$ . Therefore, for the Gaussian AVC with a parameter sequence $\sigma_{1}^{2},\ldots,\sigma_{n}^{2}$ ,

[TABLE]

where the first equality holds by the definition of $L_{n}^{*}$ in (28) and by (42). It can further be seen from the proof of Lemma 14 in Appendix L that the Gaussian channel $Y=X+S+Z_{\sigma}$ is symmetrized by a distribution $\varphi(s|x)$ that gives probability $1$ to $S=x$ , and that the minimum in the formula of $\widetilde{\Lambda}(F_{X},\sigma)$ in (41) is attained with this distribution.

Therefore, by Corollary 11, the capacity of the AVC with colored Gaussian noise is given by the limit inferior of

[TABLE]

Consider the direct part. Suppose that $\Omega>\Lambda$ , hence $L_{n}^{*}>\Lambda$ (see (262)), and set $P_{i}=\widetilde{\lambda}_{i}=a_{i}^{*}$ for $i\in[1:n]$ . This choice of parameters satisfies the optimization constraints in (263), as $\sum_{i=1}^{n}P_{i}=\Omega$ , and also $\sum_{i=1}^{n}\widetilde{\lambda}_{i}=\Omega>\Lambda$ . Therefore,

[TABLE]

where the the last inequality holds since Gaussian noise is known to be the worst additive noise under variance constraint [34, Lemma II.2]. Next, observe that this is the same minimization as in (234), in the proof of the direct part for the AVGPC, with $d\leftarrow n$ , $\Omega\leftarrow(n\Omega)$ , $\Lambda\leftarrow(n\Lambda)$ (see proof of Theorem 15 in Appendix M). Therefore, the minimum is attained with $N_{i}=b_{i}^{*}$ , and the RHS of (257) is achievable with deterministic codes as well, provided that $\Omega>\Lambda$ .

The converse part is straightforward. Since the deterministic code capacity is always bounded by the random code capacity, we have that $\mathbb{C}(\Psi_{Z})\leq\mathbb{C}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\Psi_{Z})=\mathsf{C}^{\,\text{ $$\mbox{ \small$ \star $}$$ }}(\Psi_{Z})$ . If $\Omega\leq\Lambda$ , then $L_{n}^{*}\leq\Lambda$ by (262), hence $\mathbb{C}(K_{Z})=\liminf\mathsf{R}_{n}(\mathcal{W})=0$ by the second part of Corollary 11. ∎

Bibliography114

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Abdul Salam et al. [2017] A. Abdul Salam, R. Sheriff, S. Al-Araji, K. Mezher, and Q. Nasir. Novel approach for modeling wireless fading channels using a finite state markov chain. ETRI J. , 39(5):718–728, October 2017.
2Ahlswede et al. [2019] A. Ahlswede, I. Althöfer, C. Deppe, and U. Tamm. Probabilistic methods and distributed information . Springer, 2019.
3Ahlswede [1968] R. Ahlswede. The weak capacity of averaged channels. J. Prob. Theory and Related Areas , 11(1):61–73, 1968.
4Ahlswede [1971] R. Ahlswede. The capacity of a channel with arbitrarily varying additive gaussian channel probability functions. In Trans. 6th Prague Conf. Inform. Theory, Statist. Decision Func., Random Processes , Prague, Czech Republic, Sep 1971.
5Ahlswede [1978] R. Ahlswede. Elimination of correlation in random codes for arbitrarily varying channels. Z. Wahrscheinlichkeitstheorie Verw. Gebiete , 44(2):159–175, Jun 1978.
6Ahlswede [1986] R. Ahlswede. Arbitrarily varying channels with states sequence known to the sender. IEEE Trans. Inform. Theory , 32(5):621–629, Sep 1986.
7Ahlswede and Cai [1996] R. Ahlswede and N. Cai. Arbitrarily varying multiple-access channels . Universität Bielefeld., 1996.
8Ahlswede and Cai [1999] R. Ahlswede and N. Cai. Arbitrarily varying multiple-access channels. i. ericson’s symmetrizability is adequate, gubner’s conjecture is true. IEEE Trans. Inform. Theory , 45(2):742–749, Mar 1999. ISSN 0018-9448.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

The Arbitrarily Varying Channel with Colored Gaussian Noise

Abstract

Index Terms:

I Introduction

II Channels with Fixed Parameters

II-A Notation

II-B Channel Description

Remark 1*.*

Remark 2*.*

II-C Coding

Definition 1* (Code).*

Definition 2* (Random code).*

II-D Input and State Constraints

II-E Capacity Under Constraints

Definition 3* (Achievable rate and capacity under constraints).*

III Main Results – Channels with Fixed Parameters

III-A The Compound Channel with Fixed Parameters

Lemma 1*.*

III-B The AVC with Fixed Parameters – Random Code Capacity

Lemma 2* (Modified RT).*

Theorem 3*.*

Lemma 4*.*

Corollary 5*.*

III-C The AVC with Fixed Parameters – Deterministic Code Capacity

III-C1 Capacity Theorem

Definition 4* (see [30]).*

Theorem 6*.*

Remark 3*.*

Remark 4*.*

Corollary 7*.*

III-C2 Decoding Rule

Definition 5* (Decoder).*

Lemma 8* (Decoding Disambiguity).*

III-C3 Codebook Generation

Lemma 9* (Codebooks Generation).*

III-D Super-Additivity

Lemma 10*.*

Corollary 11*.*

Example 1*.*

III-E Example: Channel with Fadings

Example 2*.*

IV The Arbitrarily Varying Gaussian Product Channel

IV-A Channel Description

IV-B Coding

Definition 6* (Code).*

Definition 7* (Random code).*

IV-C Related Work

Theorem 12* (see [61]).*

IV-D Observations on The Water Filling Game

IV-D1 Game Theoretic Interpretation

Lemma 13*.*

IV-D2 Multiple Access Channel Analogy

IV-E Results

Lemma 14*.*

Theorem 15*.*

IV-F Discussion

V Main Results – AVC with Colored Gaussian Noise

Theorem 16*.*

Appendix A Proof of Theorem 1

A-A Achievability Proof

A-B Converse Proof

Appendix B Proof of Lemma 2

Appendix C Proof of Theorem 3

C-A Achievability Proof

C-B Converse Proof

Appendix D Proof of Lemma 4

Lemma 17*.*

Proof of Lemma 17.

Appendix E Proof of Lemma 8

Lemma 18* *(See [30]

Proof of Lemma 18.

Appendix F Proof of Lemma 9

Lemma 19* (see [30, Lemma A1]).*

Appendix G Proof of Theorem 6

*Remark 1**.*

*Remark 2**.*

*Definition 1** (Code).*

*Definition 2** (Random code).*

*Definition 3** (Achievable rate and capacity under constraints).*

*Lemma 1**.*

*Lemma 2** (Modified RT).*

*Theorem 3**.*

*Lemma 4**.*

*Corollary 5**.*

*Definition 4** (see [30]).*

*Theorem 6**.*

*Remark 3**.*

*Remark 4**.*

*Corollary 7**.*

*Definition 5** (Decoder).*

*Lemma 8** (Decoding Disambiguity).*

*Lemma 9** (Codebooks Generation).*

*Lemma 10**.*

*Corollary 11**.*

*Example 1**.*

*Example 2**.*

*Definition 6** (Code).*

*Definition 7** (Random code).*

*Theorem 12** (see [61]).*

*Lemma 13**.*

*Lemma 14**.*

*Theorem 15**.*

*Theorem 16**.*

*Lemma 17**.*

*Lemma 18** *(See [30]

*Lemma 19** (see [30, Lemma A1]).*

*Lemma 20**.*

*Lemma 21**.*

*Lemma 22**.*