Coding for Arbitrarily Varying Remote Sources

Amitalok J. Budkuley; Bikash Kumar Dey; Vinod M. Prabhakaran

arXiv:1704.07693·cs.IT·April 26, 2017

Coding for Arbitrarily Varying Remote Sources

Amitalok J. Budkuley, Bikash Kumar Dey, Vinod M. Prabhakaran

PDF

TL;DR

This paper investigates the limits of lossy source coding over an arbitrarily varying channel controlled by an adversary, providing bounds and exact characterizations of the rate distortion function in certain cases.

Contribution

It introduces bounds and exact solutions for the adversarial rate distortion problem in remote source coding with side information.

Findings

01

Derived upper and lower bounds on the adversarial rate distortion function.

02

Identified special cases where bounds coincide, giving exact rate distortion characterizations.

Abstract

We study a lossy source coding problem for a memoryless remote source. The source data is broadcast over an arbitrarily varying channel (AVC) controlled by an adversary. One output of the AVC is received as input at the encoder, and another output is received as side information at the decoder. The adversary is assumed to know the source data non-causally, and can employ randomized jamming strategies arbitrarily correlated to the source data. The decoder reconstructs the source data from the encoded message and the side information. We prove upper and lower bounds on the adversarial rate distortion function for the source under randomized coding. Furthermore, we present some interesting special cases of our general setup where the above bounds coincide, and thus, provide their complete rate distortion function characterization.

Equations101

P (Y = y, Z = z ∣ X = x, J = j) = i = 1 \prod n W_{Y, Z ∣ X, J} (y_{i}, z_{i} ∣ x_{i}, j_{i}) .

P (Y = y, Z = z ∣ X = x, J = j) = i = 1 \prod n W_{Y, Z ∣ X, J} (y_{i}, z_{i} ∣ x_{i}, j_{i}) .

D^{(n)} = Q_{J ∣ X} max E [d (X, Φ (Ψ (Y), Z))],

D^{(n)} = Q_{J ∣ X} max E [d (X, Φ (Ψ (Y), Z))],

D_{0}

D_{0}

and D_{1}

R_{U}^{*} (D) := ⎩ ⎨ ⎧ P_{U ∣ Y}, \tilde{x} (\cdot, \cdot) min Q_{J ∣ X} \in Q max I (U; Y ∣ Z), 0, \mbox i f D \in [D_{0}, D_{1}] \mbox i f D > D_{1},

R_{U}^{*} (D) := ⎩ ⎨ ⎧ P_{U ∣ Y}, \tilde{x} (\cdot, \cdot) min Q_{J ∣ X} \in Q max I (U; Y ∣ Z), 0, \mbox i f D \in [D_{0}, D_{1}] \mbox i f D > D_{1},

R_{L}^{*} (D) := ⎩ ⎨ ⎧ Q_{J ∣ X} \in Q max P_{U ∣ Y}, \tilde{x} (\cdot, \cdot) min I (U; Y ∣ Z), 0, \mbox i f D \in [D_{0}, D_{1}] \mbox i f D > D_{1},

R_{L}^{*} (D) := ⎩ ⎨ ⎧ Q_{J ∣ X} \in Q max P_{U ∣ Y}, \tilde{x} (\cdot, \cdot) min I (U; Y ∣ Z), 0, \mbox i f D \in [D_{0}, D_{1}] \mbox i f D > D_{1},

R_{L}^{*} (D) \leq R (D) \leq R_{U}^{*} (D) .

R_{L}^{*} (D) \leq R (D) \leq R_{U}^{*} (D) .

R^{(P_{U ∣ Y}, \tilde{x})} :=

R^{(P_{U ∣ Y}, \tilde{x})} :=

=

R^{(P_{U ∣ Y}, \tilde{x})}

R^{(P_{U ∣ Y}, \tilde{x})}

= Q_{J ∣ X} \in Q max (I_{P_{Y}} (U; Y) - I_{Q_{J ∣ X}} (U; Z))

\displaystyle\geq\max_{Q_{J|X}\in\mathscr{Q}}\max_{\begin{array}[]{c}P^{\prime}_{Y}\in\mathcal{P}(\mathcal{Y})\\ P^{\prime}_{Y}\stackrel{{\scriptstyle f(\epsilon)}}{{\approx}}P_{Y}\end{array}}(I_{P^{\prime}_{Y}}(U;Y)-I_{Q_{J|X}}(U;Z))-\frac{\epsilon}{4},

R^{(P_{U ∣ Y}, \tilde{x})}

R^{(P_{U ∣ Y}, \tilde{x})}

\displaystyle\geq\max_{P^{\prime}_{Y}\in\mathcal{P}(\mathcal{Y})}\max_{\begin{array}[]{c}Q_{J|X}\in\mathscr{Q}\\ P_{Y}\stackrel{{\scriptstyle f(\epsilon)}}{{\approx}}P^{\prime}_{Y}\end{array}}(I_{P^{\prime}_{Y}}(U;Y)-I_{Q_{J|X}}(U;Z))-\frac{\epsilon}{4}

\displaystyle\geq\max_{P^{\prime}_{Y}\in\mathcal{P}(\mathcal{Y})}\left[I_{P^{\prime}_{Y}}(U;Y)-\min_{\begin{array}[]{c}Q_{J|X}\in\mathscr{Q}\\ P_{Y}\stackrel{{\scriptstyle f(\epsilon)}}{{\approx}}P^{\prime}_{Y}\end{array}}I_{Q_{J|X}}(U;Z)\right]-\frac{\epsilon}{4}.

R_{U} (T_{Y})

R_{U} (T_{Y})

\tilde{R} (T_{Y})

R

R

\leq R^{(P_{U ∣ Y}, \tilde{x})} + ϵ . (using \eqref eq:rds, \eqref eq:ru and \eqref eq:rtilde)

Q^{(n)} (T_{y})

Q^{(n)} (T_{y})

\frac{1}{n} i = 1 \sum n E [d (X_{i}, X_{i})] \leq D^{(n)}

\frac{1}{n} i = 1 \sum n E [d (X_{i}, X_{i})] \leq D^{(n)}

Q_{J ∣ X} (j ∣ x) := i = 1 \prod n Q_{J ∣ X} (j_{i} ∣ x_{i}) .

Q_{J ∣ X} (j ∣ x) := i = 1 \prod n Q_{J ∣ X} (j_{i} ∣ x_{i}) .

F (D, Q_{J ∣ X})

F (D, Q_{J ∣ X})

R \geq F (D^{(n)}, Q_{J ∣ X}) .

R \geq F (D^{(n)}, Q_{J ∣ X}) .

R (D) \geq F (D, Q_{J ∣ X}) .

R (D) \geq F (D, Q_{J ∣ X}) .

∥ T_{U_{m, l}, Y} - P_{U ∣ Y} T_{Y} ∥_{\infty} \leq δ_{2} (δ) .

∥ T_{U_{m, l}, Y} - P_{U ∣ Y} T_{Y} ∥_{\infty} \leq δ_{2} (δ) .

\displaystyle\mathcal{L}_{\gamma(\delta)}(m,\mathbf{z})=\Big{\{}\mathbf{u}\in\mathcal{B}^{(T_{\mathbf{y}})}_{m}

\displaystyle\mathcal{L}_{\gamma(\delta)}(m,\mathbf{z})=\Big{\{}\mathbf{u}\in\mathcal{B}^{(T_{\mathbf{y}})}_{m}

E_{e n c}

E_{e n c}

E_{d e c_{1}}

E_{d e c_{2}}

P (E) \leq P (E_{e n c}) + P (E_{d ec, 1} ∣ E_{e n c}^{c}) + P (E_{d ec, 2} ∣ E_{e n c}^{c}) .

P (E) \leq P (E_{e n c}) + P (E_{d ec, 1} ∣ E_{e n c}^{c}) + P (E_{d ec, 2} ∣ E_{e n c}^{c}) .

P (U_{M, l^{'}} \in L_{γ (δ)} (M, Z), for some l^{'} \neq = L) \leq 2^{- n f_{2} (δ, ϵ)} .

P (U_{M, l^{'}} \in L_{γ (δ)} (M, Z), for some l^{'} \neq = L) \leq 2^{- n f_{2} (δ, ϵ)} .

E [d (X, \tilde{X})]

E [d (X, \tilde{X})]

E [d (X, \tilde{X})]

E [d (X, \tilde{X})]

P ((s, T) \in T_{3 δ_{0}}^{n} (P_{S} W_{T ∣ X})) \geq 1 - ∣ S ∣∣ T ∣ e^{- 2 n δ_{0}^{3}} .

P ((s, T) \in T_{3 δ_{0}}^{n} (P_{S} W_{T ∣ X})) \geq 1 - ∣ S ∣∣ T ∣ e^{- 2 n δ_{0}^{3}} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Coding for Arbitrarily Varying Remote Sources

Amitalok J. Budkuley2 , Bikash Kumar Dey and Vinod M. Prabhakaran 2This work was done when Amitalok J. Budkuley was with the Dept. of Electrical Engineering at IIT Bombay, Mumbai-India.

Emails: [email protected], [email protected], [email protected]

Abstract

We study a lossy source coding problem for a memoryless remote source. The source data is broadcast over an arbitrarily varying channel (AVC) controlled by an adversary. One output of the AVC is received as input at the encoder, and another output is received as side information at the decoder. The adversary is assumed to know the source data non-causally, and can employ randomized jamming strategies arbitrarily correlated to the source data. The decoder reconstructs the source data from the encoded message and the side information. We prove upper and lower bounds on the adversarial rate distortion function for the source under randomized coding. Furthermore, we present some interesting special cases of our general setup where the above bounds coincide, and thus, provide their complete rate distortion function characterization.

I Introduction

Consider the communication scenario depicted in Fig. 1.

A memoryless source outputs an independent and identically distributed (i.i.d.) data vector $\mathbf{X}$ , which is broadcast over a memoryless channel $W_{Y,Z|X,J}$ . Apart from the source, this channel has an input $\mathbf{J}$ from an adversary, and it has two outputs $\mathbf{Y}$ and $\mathbf{Z}$ . The output $\mathbf{Y}$ is fed to the source encoder which encodes it into a message $M$ . The decoder receives the other output $\mathbf{Z}$ and the message $M$ , and wants to reconstruct $\mathbf{X}$ under an average distortion criterion. The adversary knows $\mathbf{X}$ non-causally, and is allowed to employ randomized vector jamming strategies arbitrarily correlated with it, thereby inducing an arbitrarily varying channel (AVC) [1]. As is common in AVC-related channel coding problems, we first undertake a study of this setup under randomized coding in this paper, where we assume that the encoder-decoder share an unbounded amount of randomness $\Theta$ , unknown to the adversary [1]. We prove a maximin lower bound and a minimax upper bound for the rate distortion function for this arbitrarily varying remote source under randomized coding.

In standard source coding scenarios involving noisy observations (e.g. noisy source coding [2] or source coding with side information [3]), the noise statistics are known a priori. In our setup, however, the jamming signal of the malicious adversary renders these statistics completely arbitrary and unknown, thereby making its analysis considerably more challenging. Furthermore, as depicted in Fig. 1, the jamming noise controls the observations $\mathbf{Y}$ as well as $\mathbf{Z}$ . Thus, the adversary in our problem can jointly degrade the compression as well as the decoding/estimation phases of communication.

Lossy source coding has been studied extensively since the seminal work by Shannon [4], and the field has subsequently been advanced in many directions (cf. [5, 6]). Apart from noisy source coding [2] and source coding with side information [3], some of the other prominent directions related to this work include source coding under several distortion measures [7] and universal source coding [8]. Particularly relevant are the compound and universal coding problem formulations which have appeared for classical coding, noisy/indirect coding, coding under several distortion measures, and coding with side information (cf. [9, 7, 10], and some of the references therein). Our problem also has a direct connection to universal noisy source coding problems which present a wider set of challenges (cf. [7]) compared to their noise-free counterparts. Another closely related model is that of an arbitrarily varying source (AVS) introduced in [11]. This model is further studied under variable rate codes in [12]. Inspired by an adversary capable of switching among different sources, Berger [13] introduced a different AVS. In his problem, a multiplexer with inputs from several memoryless sources with a common alphabet and a single output, feeds data to the encoder. The multiplexer is controlled by a strictly causal switching adversary. An extension of results under adversaries with causal as well as non-causal knowledge of the data has subsequently appeared in [14].

The rest of the paper is organized as follows. In Section II, we first introduce the notation and problem setup. We state our main result in Section III. The proof of our main result is presented in Section IV. Finally, we discuss some implications of our work, and make concluding remarks in Section V.

II Notation and Problem Setup

II-A Notation and Preliminaries

We denote random variables by upper case letters (e.g. $X$ ), the values they take by lower case letters (e.g. $x$ ) and their alphabets by calligraphic letters (e.g. $\mathcal{X}$ ). We use boldface notation to denote random vectors (e.g. $\mathbf{X}$ ) and their values (e.g. $\mathbf{x}$ ). Here, the vectors are of length $n$ (e.g. $\mathbf{X}=(X_{1},X_{2},\dots,X_{n})$ ), where $n$ is the block length of operation. Let us also denote $\mathbf{X}^{i}=(X_{1},X_{2},\dots,X_{i})$ and $\mathbf{x}^{i}=(x_{1},x_{2}\dots,x_{i})$ as well as $\mathbf{X}_{i}^{k}=(X_{i},X_{i+1},\dots,X_{k})$ and $\mathbf{x}_{i}^{k}=(x_{i},x_{i+1},\dots,x_{k})$ . We use the $l_{\infty}$ (denoted by $\|.\|_{\infty}$ ) norm for vectors. For a set $\mathcal{X}$ , let $\mathcal{P}(\mathcal{X})$ be the set of all probability distributions on $\mathcal{X}$ . Similarly, let $\mathcal{P}(\mathcal{X}|\mathcal{Y})$ be the set of all conditional distributions of a random variable with alphabet $\mathcal{X}$ conditioned on another random variable with alphabet $\mathcal{Y}$ . For two random variables $X$ and $Y$ , we denote the marginal distribution of $X$ obtained from the joint distribution $P_{X,Y}$ by $[P_{X,Y}]_{X}$ . Distributions corresponding to strategies adopted by the adversary are denoted by $Q$ instead of $P$ for clarity. The set of all conditional distributions $\mathcal{P}(\mathcal{J}|\mathcal{X})$ is specifically denoted by $\mathscr{Q}$ . In cases where the subscripts are clear from the context, we sometimes omit them to keep the notation simple.

Deterministic functions will be denoted in lowercase (e.g. $f$ ). We denote a type of $X$ by $T_{X}$ . Given sequences $\mathbf{x}$ , $\mathbf{y}$ , we denote by $T_{\mathbf{x}}$ the type of $\mathbf{x}$ , by $T_{\mathbf{x},\mathbf{y}}$ the joint type of $(\mathbf{x},\mathbf{y})$ and by $T_{\mathbf{x}|\mathbf{y}}$ the conditional type of $\mathbf{x}$ given $\mathbf{y}$ . For $\epsilon\in(0,1)$ , the set of $\epsilon$ -typical set of $\mathbf{x}$ sequences for a distribution $P_{X}$ is $\mathcal{T}^{n}_{\epsilon}(P_{X})=\{\mathbf{x}:\|T_{\mathbf{x}}-P_{X}\|_{\infty}\leq\epsilon\}.$ In addition, for a joint distribution $P_{X,Y}$ and $\mathbf{x}\in\mathcal{X}^{n}$ , the conditionally typical set of $\mathbf{y}$ sequences, conditioned on $\mathbf{x}$ , is defined as $\mathcal{T}^{n}_{\epsilon}(P_{X,Y}|\mathbf{x})=\{\mathbf{y}:\|T_{\mathbf{x},\mathbf{y}}-P_{X,Y}\|_{\infty}\leq\epsilon\}.$

II-B The Problem Setup

Refer the communication setup depicted in Fig. 1. Let $\mathcal{X}$ , $\mathcal{Y}$ , $\mathcal{Z}$ , $\mathcal{J}$ and $\mathcal{\widetilde{X}}$ denote finite sets. Consider an i.i.d. source with distribution $P_{X}$ and alphabet $\mathcal{X}$ . We assume without loss of generality that $P_{X}(x)>0$ , $\forall x\in\mathcal{X}$ . A length- $n$ block of data $\mathbf{X}$ is sent over a noisy AVC. This channel has two inputs $X\in\mathcal{X}$ and $J\in\mathcal{J}$ and two outputs $Y\in\mathcal{Y}$ and $Z\in\mathcal{Z}$ , and its behaviour is given by the memoryless distribution $W_{Y,Z|X,J}$ . In Fig. 1, the two inputs $\mathbf{X}$ and $\mathbf{J}$ are from the source and the jamming adversary respectively. The output $\mathbf{Y}$ is available at the encoder and $\mathbf{Z}$ is available at the decoder. We assume that the adversary knows $\mathbf{X}$ non-causally. Given inputs $\mathbf{x}$ and $\mathbf{j}$ , we observe $\mathbf{y}$ and $\mathbf{z}$ with probability given by

[TABLE]

The encoder compresses $\mathbf{Y}$ and transmits a message $M$ losslessly to the decoder. Using $M$ and the available side information $\mathbf{Z}$ , the decoder outputs an estimate $\mathbf{\widetilde{X}}$ . The quality of the estimate is measured in terms of the average per-letter distortion $d(\mathbf{X},\mathbf{\widetilde{X}})=\frac{1}{n}\sum_{i=1}^{n}d(X_{i},\widetilde{X}_{i}),$ where $d:\mathcal{X}\times\mathcal{\widetilde{X}}\rightarrow\mathbb{R}^{+}$ denotes a single-letter distortion measure with $d_{\max}=\max_{(x,\tilde{x})\in\mathcal{X}\times\mathcal{\widetilde{X}}}d(x,\tilde{x})<\infty$ .

An $(n,R)$ deterministic code of block length $n$ and rate $R$ is a pair $(\psi,\phi)$ of mappings, consisting of the encoder map $\psi:\mathcal{Y}^{n}\rightarrow\{1,2,\dots,2^{nR}\},$ and the decoder map $\phi:\{1,2,\dots,2^{nR}\}\times\mathcal{Z}^{n}\rightarrow\widetilde{\mathcal{X}}^{n}.$ The encoder sends the message $M=\psi(\mathbf{Y})$ to the decoder over an error free channel. An $(n,R)$ randomized code of block length $n$ and rate $R$ is a random variable which takes values in the set of $(n,R)$ deterministic codes. We denote by $(\Psi,\Phi)$ the encoder and decoder for this $(n,R)$ randomized code. This forms the shared randomness $\Theta$ . The message sent is $M=\Psi(\mathbf{Y})$ . For this $(n,R)$ randomized code, the average distortion $D^{(n)}$ is given by

[TABLE]

where the expectation is over the shared randomness $\Theta=(\Psi,\Phi)$ , the source, the channel and the adversary’s jamming action. Given a target distortion $D$ , a rate $R$ is achievable if for any $\epsilon>0$ there exists an $n_{0}(\epsilon)$ such that for every $n\geq n_{0}(\epsilon)$ there exists an $(n,R)$ randomized code with the resulting average distortion $D^{(n)}\leq D+\epsilon$ . We define the rate distortion function $R(D)$ as the infimum of all achievable rates. Our aim is to determine the rate distortion function $R(D)$ .

III The Main Result

Recall that $\mathscr{Q}=\mathcal{P}(\mathcal{J}|\mathcal{X})$ denotes the set of all conditional distributions of $J$ given $X$ . For any distribution $Q_{J|X}\in\mathscr{Q}$ , the system model gives the single-letter joint distribution $P_{X}Q_{J|X}W_{Y,Z|X,J}$ . Let

[TABLE]

Here $D_{0}$ is the minimax average distortion when both $\mathbf{Y}$ and $\mathbf{Z}$ are available at the decoder, while $D_{1}$ is the minimax distortion when the decoder has access to only the side information $\mathbf{Z}$ (Please see discussion in Sec. IV-A). Let $U$ be an auxiliary random variable with a finite alphabet $\mathcal{U}$ and conditional distribution $P_{U|Y}$ , such that $(X,J,Z)\leftrightarrow Y\leftrightarrow U$ forms a Markov chain. The joint distribution of $(X,J,Y,Z,U)$ is then given by $P_{X}Q_{J|X}W_{Y,Z|X,J}P_{U|Y}$ . We now define the following:

[TABLE]

where the minimization is over $P_{U|Y}\in\mathcal{P}(\mathcal{U}|\mathcal{Y})$ and $\tilde{x}:\mathcal{U}\times\mathcal{Z}\rightarrow\widetilde{\mathcal{X}}$ such that $\mathbb{E}[d(X,\tilde{x}(U,Z))]\leq D,~{}\forall Q_{J|X}\in\mathscr{Q}$ . Clearly, we may restrict the cardinality of $U$ to $|\mathcal{U}|\leq|\widetilde{\mathcal{X}}|^{|\mathcal{Z}|}$ which is the number of possible functions from $\mathcal{Z}$ to $\widetilde{\mathcal{X}}$ .

[TABLE]

where the minimization is over $P_{U|Y}\in\mathcal{P}(\mathcal{U}|\mathcal{Y})$ and $\tilde{x}:\mathcal{U}\times\mathcal{Z}\rightarrow\widetilde{\mathcal{X}}$ such that $\mathbb{E}[d(X,\tilde{x}(U,Z))]\leq D$ for the specified $Q_{J|X}$ . Here, we may restrict the cardinality of $U$ to $|\mathcal{U}|\leq|\mathcal{Y}|+1$ ; this cardinality bound follows in a manner similar to [3]. We next state our main result.

Theorem 1.

The adversarial rate distortion function $R(D)$ for the arbitrarily varying remote source problem in Fig. 1 under randomized coding satisfies

[TABLE]

Our setup can be considered to be an “arbitrarily varying remote” version of the Wyner-Ziv setup [3], where both the input to the encoder as well as the side-information are corrupted by the adversary. The setup gives two interesting special cases by limiting the adversary’s control to either one of these (i.e., $Y$ or $Z$ ). If the adversary controls only $Y$ , i.e., $W_{Y,Z|X,J}=W_{Y|X,J}P_{Z|X}$ , then the order of maximum and minimum can be interchanged. This is a consequence of the convexity-concavity properties of $I(U;Y)-I(U;Z)$ . Specifically, $I(U;Y)-I(U;Z)$ is concave in $Q_{J|X}$ and convex111In order to have a convex domain, we need to rewrite the minimization as a minimization only over $P_{U|Y}$ where the alphabet of $U$ is the set of Shannon strategies at the decoder; see [15] for details. in $P_{U|Y}$ . We can now use the Minimax theorem [16] to conclude that the minimax and the maximin are equal. Similarly, if the adversary controls only $Z$ , that is, when only the side-information is arbitrarily varying ( $W_{Y,Z|X,J}=P_{Y|X}W_{Z|X,J}$ ), then again one can show that $I(U;Y)-I(U;Z)$ is convex and concave in $P_{U|Y}$ and $Q_{J|X}$ respectively. Hence, the maximum and minimum can be interchanged. In both these special cases, the upper bound and the lower bound in Theorem 1 match, and they give a characterization of the optimum rate.

IV Proof of Theorem 1

IV-A Achievability

We present an outline of the achievability proof. The detailed proof can be found in Appendix A. Observe that if $D>D_{1}$ , then we can estimate $\mathbf{X}$ using an estimator $\tilde{x}(z)$ based solely on the side information $\mathbf{Z}$ . Thus, for $D>D_{1}$ , $R(D)=0$ .

Let us now assume that $D_{1}\geq D\geq D_{0}$ . We fix an arbitrary $P_{U|Y}$ and $\tilde{x}(u,z)$ , and prove the achievability of the rate

[TABLE]

where the equality follows from the Markov chain $U\leftrightarrow Y\leftrightarrow Z$ . We rewrite this rate as 222Here we indicate $I(U;Y)$ as a function of only $P_{Y}$ as $P_{U|Y}$ is fixed in our discussion of achievability. For the same reason, we indicate $I(U;Z)$ only as a function of $Q_{J|X}$ , as $P_{X},P_{U|Y}$ , and $W_{Y,Z|X,J}$ are fixed in our discussion.

[TABLE]

where we have $P_{Y}:=\left[P_{X}Q_{J|X}W_{Y,Z|X,J}\right]_{Y}$ . Note that $P_{Y}$ is a function of $Q_{J|X}$ . Here, we write $P_{Y}^{\prime}\stackrel{{\scriptstyle f(\epsilon)}}{{\approx}}P_{Y}$ to mean that $||P_{Y}^{\prime}-P_{Y}||_{\infty}\leq f(\epsilon)$ . We have used a function $f(\cdot)$ such that $f(\epsilon)>0$ for $\epsilon>0$ , and $|I_{P_{Y}^{\prime}}(U;Y)-I_{P_{Y}}(U;Y)|\leq\epsilon/4$ if $P_{Y}^{\prime}\stackrel{{\scriptstyle f(\epsilon)}}{{\approx}}P_{Y}$ . The existence of such a function follows from the uniform continuity of $I(U;Y)$ as a function of $P_{Y}$ for fixed $P_{U|Y}$ . Now interchanging the maximizations, we get

[TABLE]

Now for every type $T_{Y}\in\mathcal{P}(\mathcal{Y})$ , we define

[TABLE]

Code Construction:

•

We will now describe the generation of a random code. We assume that both the encoder and decoder share the ensemble of all possible such codes, and they jointly select a code at random from this ensemble using their shared randomness $\Theta$ . This is equivalent to generating the code randomly and then sharing it between the encoder and the decoder.

•

For each type $T_{Y}\in\mathcal{P}(\mathcal{Y})$ , we generate $2^{nR_{U}(T_{Y})}$ vectors i.i.d. $\sim P_{U}$ , where $P_{U}:=\left[T_{Y}P_{U|Y}\right]_{U}$ , to form the codebook $\mathcal{C}(T_{Y})$ . The codebook $\mathcal{C}(T_{Y})$ is randomly partitioned into $2^{n(R_{U}(T_{Y})-\tilde{R}(T_{Y}))}$ bins.

•

The randomly generated code containing the list of binned codebooks for each $T_{Y}$ is shared between the encoder and the decoder.

Encoder operations:

•

The encoder, upon observing a vector $\mathbf{y}$ , computes its type $T_{\mathbf{y}}$ . It checks if there is at least one codeword in $\mathcal{C}(T_{\mathbf{y}})$ which is jointly typical with $\mathbf{y}$ with respect to (w.r.t.) the joint distribution $T_{\mathbf{y}}P_{U|Y}$ . The encoder then sends $T_{\mathbf{y}}$ and the bin index of such a codeword in $\mathcal{C}(T_{\mathbf{y}})$ , selecting one uniformly at random if there is more than one possibility.

•

Since there are at most a polynomial number of types, for large enough $n$ , the rate required to convey $T_{\mathbf{y}}$ is at most $\epsilon/4$ . So, the rate of the full message is bounded as

[TABLE]

Decoder operations:

•

The decoder knows $T_{\mathbf{y}}$ and the bin index sent by the encoder; it also knows $\mathbf{Z}=\mathbf{z}$ as the side information. The decoder identifies the set of conditional types

[TABLE]

such that the resulting $Y$ -marginal distribution is close to $T_{\mathbf{y}}$

•

The decoder then checks within the bin if there is a codeword $\mathbf{u}$ such that $(\mathbf{u},\mathbf{z})$ is jointly typical w.r.t. the distribution $\left[P_{X}T_{J|X}W_{Y,Z|X,J}P_{U|Y}\right]_{U,Z}$ for some type $T_{J|X}\in\mathscr{Q}^{(n)}(T_{\mathbf{y}})$ . If there is a unique such codeword $\mathbf{u}$ , then it chooses that codeword, otherwise it chooses an arbitrary codeword $\mathbf{u}$ from the bin. Using this codeword $\mathbf{u}$ and $\mathbf{z}$ , it then outputs $\tilde{\mathbf{x}}$ , where $\tilde{x}_{i}=\tilde{x}(u_{i},z_{i})$ , $i=1,2,\dots,n$ .

Average distortion analysis:

•

We first analyse the error probability in decoding the right codeword $\mathbf{u}$ . A decoding error can occur due to three possibilities:

Encoder does not find any codeword $\mathbf{u}\in\mathcal{C}(T_{\mathbf{y}})$ that is jointly typical with $\mathbf{y}$ w.r.t. $T_{\mathbf{y}}P_{U|Y}$ . The probability that there is no such codeword in $\mathcal{C}(T_{\mathbf{y}})$ is exponentially small (by covering lemma) since $R_{U}(T_{\mathbf{y}})=I_{T_{\mathbf{y}}P_{U|Y}}(U;Y)+\epsilon/4$ . 2. 2.

Let us assume that the encoder succeeded in finding a suitable codeword $\mathbf{u}$ . For this correct codeword $\mathbf{u}$ and the actual conditional type $T_{\mathbf{j}|\mathbf{x}}$ instantiated by the adversary, we will argue that $\mathbf{u}$ will satisfy the decoding condition with high probability (w.h.p.) 333All our w.h.p. statements hold under “except for an exponentially small probability.”. First, w.h.p. $\mathbf{y}$ is typical w.r.t. $[P_{X}T_{\mathbf{j}|\mathbf{x}}W_{Y,Z|X,J}]_{Y}$ , i.e., $T_{\mathbf{y}}$ is “close” to $[P_{X}T_{\mathbf{j}|\mathbf{x}}W_{Y,Z|X,J}]_{Y}$ . In that case, $T_{\mathbf{j}|\mathbf{x}}\in\mathscr{Q}^{(n)}(T_{\mathbf{y}})$ is one of the conditional types considered by the decoder for the code associated with $T_{\mathbf{y}}$ . Secondly, w.h.p. $(\mathbf{y},\mathbf{u})$ is jointly typical w.r.t. $T_{\mathbf{y}}P_{U|Y}$ and so it is also jointly typical w.r.t. the distribution $[P_{X}T_{\mathbf{j}|\mathbf{x}}W_{Y,Z|X,J}]_{Y}P_{U|Y}$ (though with a bigger slack). Now, using a version of the refined Markov lemma [17, Lemma 5], it follows that w.h.p., $(\mathbf{x},\mathbf{j},\mathbf{y},\mathbf{u},\mathbf{z})$ is jointly typical w.r.t. $P_{X}T_{\mathbf{j}|\mathbf{x}}W_{Y,Z|X,J}P_{U|Y}$ . In particular, $(\mathbf{u},\mathbf{z})$ is jointly typical w.r.t. $[P_{X}T_{\mathbf{j}|\mathbf{x}}W_{Y,Z|X,J}P_{U|Y}]_{U,Z}$ . 3. 3.

Now, let us consider all the wrong codewords in the bin. For any type $Q_{J|X}\in\mathscr{Q}^{(n)}(T_{\mathbf{y}})$ , the probability that at least one of the wrong codewords will be jointly typical with $\mathbf{z}$ w.r.t. $[P_{X}Q_{J|X}W_{Y,Z|X,J}P_{U|Y}]_{U,Z}$ is exponentially small due to the choice of $\tilde{R}(T_{\mathbf{y}})$ (by packing lemma). By taking union bound over all (at most polynomial number of) types in $\mathscr{Q}^{(n)}(T_{\mathbf{y}})$ , the probability that any of them will be jointly typical with $\mathbf{z}$ w.r.t. $[P_{X}Q_{J|X}W_{Y,Z|X,J}P_{U|Y}]_{U,Z}$ for any such $Q_{J|X}$ is exponentially small.

•

We now note that if $(\mathbf{x},\mathbf{j},\mathbf{y},\mathbf{u},\mathbf{z})$ is jointly typical w.r.t. $P_{X}T_{\mathbf{j}|\mathbf{x}}W_{Y,Z|X,J}P_{U|Y}$ , then $(\mathbf{x},\mathbf{j},\mathbf{y},\mathbf{u},\mathbf{z},\tilde{\mathbf{x}})$ is jointly typical w.r.t. $P_{X}T_{J|X}W_{Y,Z|X,J}P_{U|Y}\mathbf{1}_{\{\widetilde{X}=\tilde{x}(U,Z)\}}$ , and thus, $(\mathbf{x},\tilde{\mathbf{x}})$ is jointly typical. Finally, the average distortion $\mathbb{E}[d(\mathbf{X},\mathbf{\widetilde{X}})]$ is bounded using the typical average lemma.

Remark 2.

We have taken the code and binning rates (see (11) and (14)) such that their difference is more than the $\max$ term in (10). A crucial feature of our achievability scheme is the choice of $\tilde{R}(T_{Y})$ in (14), which motivated the expression of $R^{(P_{U|Y},\tilde{x})}$ as in (10). We now explain the insight behind this choice of $\tilde{R}(T_{Y})$ . It is worth noting that instead of taking the rate as the minimum value of $I_{Q_{J|X}}(U;Z)-\epsilon/4$ over all $Q_{J|X}$ such that $[P_{X}Q_{J|X}W_{Y,Z|X,J}]_{Y}=T_{Y}$ , we have taken it to be the minimum over all $Q_{J|X}$ such that $[P_{X}Q_{J|X}W_{Y,Z|X,J}]_{Y}$ is “close” to $T_{Y}$ . Firstly, a part of our proof relies on bounding the probability of error by union bounding over the (polynomial number of) conditional types (in $\mathscr{Q}^{(n)}(T_{\mathbf{y}})$ ) that the decoder considers. For the union bound to work, the decoder cannot consider every conditional distribution $Q_{J|X}$ , which gives the right $P_{Y}$ , as the number of such distributions can be infinite. Secondly, specially since the decoder only searches over the conditional types and not every conditional distribution $Q_{J|X}$ , it may not find any conditional type that gives exactly $T_{\mathbf{y}}$ as the marginal on $Y$ . Thirdly, our proof argument relies on the fact that the instantiated conditional type $Q_{\mathbf{j}|\mathbf{x}}$ is considered by the decoder, i.e., it is in $\mathscr{Q}^{(n)}(T_{\mathbf{y}})$ . However $[P_{X}Q_{\mathbf{j}|\mathbf{x}}W_{Y,Z|X,J}]_{Y}$ is only guaranteed (w.h.p.) to be close to $T_{\mathbf{y}}$ , and this is the reason for defining $\tilde{R}(T_{Y})$ and $\mathscr{Q}^{(n)}(T_{\mathbf{y}})$ with a slack in the resulting marginal on $Y$ .

IV-B The proof of the lower bound

We will prove now that any achievable rate is lower bounded by the maximin lower bound in (4). We consider $D_{1}\geq D\geq D_{0}$ . Consider an $(n,R)$ randomized code which achieves an average distortion of $D^{(n)}$ , i.e., the code is such that

[TABLE]

under any jamming distribution $Q_{\mathbf{J}|\mathbf{X}}$ . In particular, it satisfies the distortion constraint under the i.i.d. jamming distribution $Q_{J|X}$ with

[TABLE]

Under this jamming distribution, $(X_{i},Y_{i},Z_{i})$ , $\forall i$ , form an i.i.d. sequence with joint distribution given by $P_{X}P_{Y,Z|X}$ , where $P_{Y,Z|X}(y,z|x)=\sum_{j}W_{Y,Z|X,J}(y,z|x,j)Q_{J|X}(j|x)$ , $\forall(x,y,z)$ . Let us define

[TABLE]

where the minimization is over $P_{U|Y}\in\mathcal{P}(\mathcal{U}|\mathcal{Y})$ , $\widetilde{x}:\mathcal{U}\times\mathcal{Z}\rightarrow\widetilde{\mathcal{X}}$ such that $\mathbb{E}[d(X,\tilde{x}(U,Z))]\leq D$ under the given $Q_{J|X}$ . It then follows using a similar argument as in the converse for the Wyner-Ziv problem [3] that (see Appendix C for details)

[TABLE]

Hence, by the continuity of $F(D,Q_{J|X})$ in $D$ (Lemma 18 in Appendix C),

[TABLE]

Since this is true for any $Q_{J|X}\in\mathscr{Q}$ , we have the lower bound.

V Conclusion

In this paper, we studied a setup of lossy source coding for an arbitrarily varying remote source with side-information. As a natural first step, we gave upper and lower bounds for the rate-distortion function for the randomized coding setup. The proof of achievability employed novel techniques. We also presented interesting special cases of our setup, and completely characterized their rate distortion function. The deterministic coding version is open and is under current investigation.

Acknowledgment

This work was supported in part by Bharti Centre for Communication, IIT Bombay and in part by Information Technology Research Academy (ITRA), Government of India under grant ITRA/15(64)/Mobile/USEAADWN/01. In addition, Amitalok J. Budkuley, Bikash K. Dey and Vinod M. Prabhakaran were supported in part by RGC’s GRF grants 14208315 and 14313116, the Department of Science & Technology, Government of India under a grant SB/S3/EECE/057/2013, and the Ramanujan Fellowship respectively.

Appendix A Proof of Achievability

In this detailed proof of achievability, we begin with the description of our randomized coding scheme.

Code Construction:

•

As discussed in the outline, the random code $\mathcal{C}$ is a list of individual codes $\mathcal{C}(T_{Y})$ for every type $T_{Y}\in\mathscr{T}^{(n)}(\mathcal{Y})$ . This list of codes is shared as the common randomness $\Theta$ between the encoder-decoder.

•

For a fixed type $T_{Y}\in\mathscr{T}^{(n)}(\mathcal{Y})$ , our code $\mathcal{C}(T_{Y})$ is a binned codebook comprising $2^{nR_{U}(T_{Y})}=2^{n(R(T_{Y})+\tilde{R}(T_{Y}))}$ vectors $\mathbf{U}_{j,k}$ , where $j=1,2,\dots,2^{nR(T_{Y})}$ and $k=1,2,\dots,2^{n\tilde{R}(T_{Y})}$ . Here $R_{U}(T_{Y})$ and $\tilde{R}(T_{Y})$ are as given in (11) and (14) respectively, and $R(T_{Y})=R_{U}(T_{Y})-\tilde{R}(T_{Y})$ . Every codeword $\mathbf{U}_{j,k}$ is chosen i.i.d. $\sim P_{U}$ , where $P_{U}:=[P_{U|Y}T_{Y}]_{U}$ . There are $2^{nR(T_{Y})}$ bins indexed by $j$ , with each bin containing $2^{n\tilde{R}(T_{Y})}$ codewords indexed by $k$ . Let $\mathcal{B}^{(T_{Y})}_{m}$ denote the bin with index $m$ . Thus, our code $\mathcal{C}$ is the list containing $\mathcal{C}(T_{Y});T_{Y}\in\mathscr{T}^{(n)}(\mathcal{Y})$ .

Encoding:

•

Given input $\mathbf{Y}$ , the encoder determines its type $T_{\mathbf{Y}}$ to identify $\mathcal{C}(T_{\mathbf{Y}})$ . In $\mathcal{C}(T_{\mathbf{Y}})$ , it finds a codeword $\mathbf{U}_{m,l}$ , where $m\in\{1,2,\dots,2^{nR^{(T_{\mathbf{Y}})}}\}$ and $l\in\{1,2,\dots,2^{n\tilde{R^{(T_{\mathbf{Y}})}}}\}$ , such that

[TABLE]

Here $\delta_{2}(\delta)>0$ is a fixed constant (the choice of $\delta_{2}(\delta)$ is indicated in Lemma 4)444Here $\delta>0$ is a function of $\epsilon$ , such that $\delta\rightarrow 0$ as $\epsilon\rightarrow 0$ and it is such that (A) holds.. This implies that $\mathbf{U}_{m,l}$ and $\mathbf{Y}$ are jointly typical according to the distribution $P_{U|Y}T_{\mathbf{Y}}$ . If no such $\mathbf{U}_{m,l}$ is found, then the encoder chooses $\mathbf{U}_{1,1}$ . If more than one $\mathbf{U}_{m,l}$ satisfying (18) exist, then the encoder chooses one uniformly at random from amongst them. Let $\mathbf{U}=\mathbf{U}_{M,L}$ denote the chosen codeword.

•

The encoder transmits $T_{\mathbf{y}}$ and the bin index $M$ losslessly to the decoder.

Decoding:

•

Let the bin index $m$ and side information $\mathbf{z}$ be received at the decoder. In addition, the decoder knows the type $T_{\mathbf{y}}$ of the encoder’s input $\mathbf{y}$ , and so the code $\mathcal{C}(T_{\mathbf{y}})$ used by the encoder.

•

For some fixed parameter $\gamma(\delta)>0$ (the choice of $\gamma(\delta)$ is indicated in Lemma 5), the decoder determines the set of codewords

[TABLE]

Here $\mathscr{Q}(T_{\mathbf{y}}):=\{T_{J|X}\in\mathscr{T}^{n}(\mathcal{J}|\mathcal{X}):[P_{X}T_{J|X}W_{Y,Z|X,J}]_{Y}\stackrel{{\scriptstyle f(\epsilon)}}{{\approx}}T_{\mathbf{y}}\}$ .

•

If $\mathcal{L}_{\gamma(\delta)}(m,\mathbf{z})$ contains exactly one codeword, then the decoder chooses it. Otherwise it chooses $\mathbf{u}_{m,1}$ . Let the chosen codeword be $\mathbf{u}_{m,\tilde{l}}$ .

•

The decoder outputs $\tilde{\mathbf{x}}$ , where $\tilde{x}_{i}=\tilde{x}(u_{i}(m,\tilde{l}),z_{i})$ .

Average distortion analysis:

We first analyse the error in decoding the codeword $\mathbf{U}=\mathbf{U}_{M,L}$ chosen by the encoder. The decoder makes an error if one or more of the following events occur.

[TABLE]

Then, using the union bound we can express the probability of decoding error by

[TABLE]

We will show that for every $\epsilon>0$ there exists small enough $\delta>0$ such that $\mathbb{P}(E)\rightarrow 0$ as $n\rightarrow\infty$ . We first make the following obvious claim.

Claim 3.

Let $\mathbf{U}$ be generated i.i.d. $\sim P_{U}$ . Then, with probability at least $(1-|\mathcal{U}|e^{-2n\delta^{2}})$ , $\mathbf{U}\in\mathcal{T}^{n}_{\delta}(P_{U})$ .

Let us define this “good” event as $A_{U}:=\{\mathbf{U}\in\mathcal{T}^{n}_{\delta}(P_{U})\}$ . We now state the following lemma which guarantees that the first term in (20) is vanishingly small.

Lemma 4.

Under the event $A_{U}$ , there exist $\delta_{2}(\delta),f_{1}(\delta,\epsilon)>0$ , where $\delta_{2}(\delta),f_{1}(\delta,\epsilon)\rightarrow 0$ as $\delta,\epsilon\rightarrow 0$ , such that the encoder finds a codeword $\mathbf{U}$ with probability at least $1-2^{-nf_{1}(\delta,\epsilon)}$ such that $(\mathbf{Y},\mathbf{U})\in\mathcal{T}^{n}_{\delta_{2}}(P_{U|Y}T_{\mathbf{Y}})$ .

The proof of this lemma follows from the covering lemma [18, Lemma 3.3]. Note that this lemma specifies the $\delta_{2}(\delta)$ parameter which appears in the definition of the encoder in (18). This lemma implies $\mathbb{P}(E_{enc})\rightarrow 0$ as $n\rightarrow 0$ . Our next lemma addresses the remaining two terms in the RHS of (20).

Lemma 5.

Let the codeword chosen be $\mathbf{U}$ (where $\mathbf{U}\in\mathcal{T}^{n}_{\delta}(P_{U})$ ) and let the output on the channel $W_{Y,Z|X,J}$ be $(\mathbf{Y},\mathbf{Z})$ . Then,

(a)

there exists $\gamma(\delta)>0$ , where $\gamma(\delta)\rightarrow 0$ as $\delta\rightarrow 0$ , such that except for an exponentially small probability, $\mathbf{U}\in\mathcal{L}_{\gamma(\delta)}(M,\mathbf{Z})$ . 2. (b)

there exists $f_{2}(\delta,\epsilon)>0$ , where $f_{2}(\delta,\epsilon)\rightarrow 0$ as $\delta,\epsilon\rightarrow 0$ , such that

[TABLE]

The proof of this lemma can be found in Appendix B. This lemma specifies the parameter $\gamma(\delta)$ which appears in the the decoder operation in (19). Lemma 5 implies that $\mathbb{P}(E_{dec,1}|E^{c}_{enc}),\mathbb{P}(E_{dec,2}|E^{c}_{enc})\rightarrow 0$ as $n\rightarrow 0$ . Hence, we can conclude that $\mathbb{P}(E)\rightarrow 0$ as $n\rightarrow\infty$ .

We now get a bound on the average distortion. Toward this, we first make the following claim.

Claim 6.

There exists $r(\delta),f_{3}(\delta,\epsilon)>0$ , where $r(\delta),f_{3}(\delta,\epsilon)\rightarrow$ as $\delta,\epsilon\rightarrow 0$ , such that $\mathbb{P}\left((\mathbf{X},\mathbf{\widetilde{X}})\in\mathcal{T}^{n}_{r(\delta)}(P_{X,\widetilde{X}})\right)\geq 1-2^{-nf_{3}(\delta,\epsilon)}$ .

Proof:

By Claim 15 in App. B, with high probability, $(\mathbf{X},\mathbf{J},\mathbf{Y},\mathbf{Z},\mathbf{U})$ is $\delta_{4}$ -typical according to the joint distribution $P_{X}T_{\mathbf{J}|\mathbf{X}}W_{Y,Z|X,J}P_{U|Y}$ . As $\mathbf{\widetilde{X}}$ is a deterministic function of $(\mathbf{U},\mathbf{Z})$ , it follows by the conditional typicality lemma (see Lemma 9 in Appendix B) that with probability at least $(1-|\mathcal{X}||\mathcal{J}||\mathcal{Y}||\mathcal{Z}||\mathcal{U}||\widetilde{\mathcal{X}}|2^{-n\delta_{4}^{3}})$ , the tuple $(\mathbf{X},\mathbf{J},\mathbf{Y},\mathbf{Z},\mathbf{U},\mathbf{\widetilde{X}})$ is $3\delta_{4}$ -typical, and hence $(\mathbf{X},\mathbf{\widetilde{X}})$ is $r(\delta)$ -typical, where $r(\delta):=3|\mathcal{X}||\widetilde{\mathcal{X}}|\delta_{4}(\delta)$ . This completes the proof. ∎

We now show that the average distortion for the code $\mathcal{C}$ can be made arbitrarily close to $D$ . Let $\bar{E}:=\{(\mathbf{X},\mathbf{\widetilde{X}})\not\in\mathcal{T}^{n}_{r(\delta)}(P_{X,\widetilde{X}})\}$ . From Claim 6, we know that $\mathbb{P}(\bar{E})\rightarrow 0$ as $n\rightarrow\infty$ . Then,

[TABLE]

Recall that $d_{\max}<\infty$ . In addition, from the typical average lemma we know that $\mathbb{E}[d(\mathbf{X},\mathbf{\tilde{X}})|\bar{E}^{c}]\leq D+h(\delta)$ , where $h(\delta)>0$ and $h(\delta)\rightarrow 0$ as $\delta\rightarrow 0$ . Thus,

[TABLE]

As $\mathbb{P}(\bar{E})\rightarrow 0$ as $n\rightarrow\infty$ , we choose a large enough $n$ and a small enough $\delta>0$ to get $(a)$ . This implies that the average distortion can be made arbitrarily close to $D$ . We have, thus, shown that for any $\epsilon>0$ , the rate $R\leq\max_{Q_{J|X}}(I(U;Y)-I(U;Z))+\epsilon$ is achievable. This completes the proof of achievability.

Appendix B Proof of Lemma 5

Let us define $\delta_{0}=\delta/2$ . Consider the “good” encoder event $E^{c}_{enc}=\{(\mathbf{Y},\mathbf{U})\in\mathcal{T}^{n}_{\delta_{2}}(P_{U|Y}T_{\mathbf{Y}})\}$ . We now state and prove some necessary claims.

Claim 7.

Let $\mathbf{X}$ be generated i.i.d. $\sim P_{X}$ . Then, with probability at least $(1-|\mathcal{X}|e^{-2n\delta_{0}^{2}})$ , $\mathbf{X}\in\mathcal{T}^{n}_{\delta_{0}}(P_{X})$ .

Let us define this “good” event as $A_{X}:=\{\mathbf{X}\in\mathcal{T}^{n}_{\delta_{0}}(P_{X})\}$ .

Claim 8.

Let $(\mathbf{x},\mathbf{j})$ be a pair of vectors where $\mathbf{x}\in\mathcal{T}_{\delta_{0}}^{n}(P_{X})$ . Then, $(\mathbf{x},\mathbf{j})\in\mathcal{T}_{\delta_{0}}^{n}(P_{X}T_{\mathbf{j}|\mathbf{x}})$ .

Let us denote the event that $(\mathbf{X},\mathbf{J})$ is jointly typical w.r.t. $P_{X}T_{\mathbf{J}|\mathbf{X}}$ as $A_{X,J}$ . By the above claim, we have $A_{X}\subseteq A_{X,J}$ .

Lemma 9 (Conditional typicality lemma).

Let $\mathbf{s}\in\mathcal{T}^{n}_{\delta_{0}}(P_{S})$ and $\mathbf{T}$ be generated from $\mathbf{s}$ using the memoryless distribution $W_{T|S}$ . Then,

[TABLE]

Proof:

We need to show that

[TABLE]

is exponentially small for all $s,t$ . We consider two cases.

Case I: $T_{\mathbf{s}}(s)\leq\delta_{0}$ . As $\mathbf{s}\in\mathcal{T}^{n}_{\delta_{0}}(P_{S})$ , this implies that $P_{S}(s)\leq T_{\mathbf{s}}(s)+\delta_{0}\leq 2\delta_{0}.$ Then, $\forall(s,t)$ ,

[TABLE]

Thus, for such $s$ , $\mathbb{P}\left(\left|T_{\mathbf{s},\mathbf{T}}(s,t)-P_{S}(s)W_{T|S}(t|s)\right|>2\delta_{0}\right)=0$ .

Case II: $T_{\mathbf{s}}(s)>\delta_{0}$ . Using Chernoff-Hoeffding’s theorem [19, Theorem 1] for each $t\in\mathcal{T}$ , we have

[TABLE]

Now, it can be easily checked that $|W_{T|S}(t|s)-T_{\mathbf{T}|\mathbf{s}}(t|s)|\leq\delta_{0}$ and $|P(s)-T_{\mathbf{s}}(s)|\leq\delta_{0}$ together imply

[TABLE]

Hence, (22) follows by taking union bound over all $s\in\mathcal{S}$ . ∎

Claim 10.

With probability at least $(1-|\mathcal{X}||\mathcal{J}||\mathcal{Y}||\mathcal{Z}|e^{-2\delta_{0}^{3}n})$ , $(\mathbf{X},\mathbf{J},\mathbf{Y},\mathbf{Z})$ are jointly $3\delta_{0}$ -typical according to the distribution $P_{S}T_{\mathbf{J}|\mathbf{X}}W_{Y,Z|X,J}$ .

The proof of this result follows from Lemma 9. We now consider this “good” event $A_{X,J,Y,Z}$ , where $A_{X,J,Y,Z}:=\{(\mathbf{X},\mathbf{J},\mathbf{Y},\mathbf{Z})\in\mathcal{T}_{3\delta_{0}}^{n}(P_{X}T_{\mathbf{J}|\mathbf{X}}W_{Y,Z|X,J})\}$ .

Claim 11.

Under the event $A_{X,J,Y,Z}$ , $\mathbf{Y}$ is $\delta_{1}$ -typical w.r.t. $P_{Y}=[P_{X}T_{\mathbf{J}|\mathbf{X}}W_{Y,Z|X,J}]_{Y}$ , where $\delta_{1}(\delta)=3|\mathcal{X}||\mathcal{J}||\mathcal{Z}|\delta_{0}(\delta)\rightarrow 0$ as $\delta\rightarrow 0$ . That is, $\|T_{\mathbf{Y}}-P_{Y}\|_{\infty}\leq 3|\mathcal{X}||\mathcal{J}||\mathcal{Z}|\delta_{0}$ .

The proof is straightforward, and hence, omitted. The above claim implies that, except for an exponentially small probability, the decoder considers the conditional type $T_{\mathbf{J}|\mathbf{X}}$ for decoding.

Claim 12.

Under $E^{c}_{enc}$ and $A_{X,J,Y,Z}$ , $(\mathbf{Y},\mathbf{U})$ are jointly $\delta_{3}$ -typical according to the distribution $P_{Y}P_{U|Y}$ , where $P_{Y}=[P_{X}T_{\mathbf{J}|\mathbf{X}}W_{Y,Z|X,J}]_{Y}$ and $\delta_{3}(\delta):=3|\mathcal{X}||\mathcal{Y}||\mathcal{Z}|\delta_{0}(\delta)+\delta_{2}(\delta)\rightarrow 0$ as $\delta\rightarrow 0$ .

Proof:

Note that

[TABLE]

where $\delta_{3}=3|\mathcal{X}||\mathcal{Y}||\mathcal{Z}|\delta_{0}+\delta_{2}$ . ∎

Claim 13.

There exists $g(\delta)>0$ , where $g(\delta)\rightarrow 0$ as $\delta\rightarrow 0$ , such that $\forall\mathbf{u}\in\mathcal{T}^{n}_{\delta_{3}}(P_{U|Y}P_{Y}|\mathbf{y})$ ,

[TABLE]

where $H(U|Y)$ is computed with the distribution $P_{U|Y}P_{Y}$ .

Proof:

We have two cases.

Case 1: When $\mathbf{u}\in\mathcal{T}^{n}_{\delta_{2}}(P_{U|Y}T_{\mathbf{y}}|\mathbf{y})\bigcap\mathcal{T}^{n}_{\delta_{3}}(P_{U|Y}P_{Y}|\mathbf{y})$ . Then we note that

[TABLE]

where $g_{1}(\delta_{3})\rightarrow 0$ as $\delta_{3}\rightarrow 0$ . Since $||P_{Y}-T_{\mathbf{y}}||_{1}\leq|\mathcal{Y}|\cdot||P_{Y}-T_{\mathbf{y}}||_{\infty}\leq 3|\mathcal{X}||\mathcal{J}||\mathcal{Z}||\mathcal{Y}|\delta_{0}$ , and $||P_{Y}P_{U|Y}-T_{\mathbf{y}}P_{U|Y}||_{1}\leq|\mathcal{U}||\mathcal{Y}|\delta_{3}$ , using [11, Lemma 2.7], we get

[TABLE]

Together, the above two equations imply

[TABLE]

By defining $g_{2}(\delta_{3}):=g_{1}(\delta_{3})+2|\mathcal{U}||\mathcal{Y}|\delta_{3}\cdot\log\left(\frac{1}{\delta_{3}}\right)$ , we get

[TABLE]

Case II: When $\mathbf{u}\not\in\mathcal{T}^{n}_{\delta_{2}}(P_{U|Y}T_{\mathbf{y}}|\mathbf{y})$ . For such a $\mathbf{u}$ , the encoder outputs it only if $\mathbf{U}_{1,1}=\mathbf{u}$ and there is no codeword which is jointly typical with $\mathbf{y}$ w.r.t. $P_{U|Y}T_{\mathbf{y}}$ . Thus,

[TABLE]

where $g_{4}(\delta_{2})=g_{3}(\delta_{2})+|\mathcal{U}|^{2}|\mathcal{Y}|\delta_{3}\cdot\log\left(\frac{1}{|\mathcal{U}||\mathcal{Y}|\delta_{3}}\right)$ .

Combining the two cases, and taking $g(\delta)=\max(g_{2}(\delta_{3}),g_{4}(\delta_{3}))$ , the lemma follows. ∎

Lemma 14 (Refined Markov Lemma [17] 555In the refined

Markov lemma presented in [17], condition (b) also has a lower bound on $P_{\mathbf{Z}}(\mathbf{z})$ . However, the lower bound is not used in the proof given in [17], and hence, can be removed. Here, we state this lemma without any lower bound. We also note that condition (a) and the upper bound on the probability of a typical sequence imply that probability of too many typical sequences can not be too small; and so some essence of the lower bound in condition (b) is already implied by these. Thus, it is not surprising that the lower bound is not needed for the lemma to hold. ).

Suppose $X\rightarrow Y\rightarrow Z$ is a Markov chain, i.e., $P_{X,Y,Z}=P_{Y}P_{X|Y}P_{Z|Y}$ . Let $(\mathbf{x},\mathbf{y})\in\mathcal{T}^{n}_{\delta_{0}}\left(P_{X,Y}\right)$ and $\mathbf{Z}\sim P_{\mathbf{Z}}$ be such that

(a)

$\mathbb{P}\left((\mathbf{y},\mathbf{Z})\not\in\mathcal{T}^{n}_{\delta_{0}}\left(P_{Y,Z}\right)\right)\leq\epsilon$ , where $\epsilon>0$ , 2. (b)

for every $\mathbf{z}\in\mathcal{T}^{n}_{\delta_{0}}\left(P_{Y,Z}|\mathbf{y}\right)$ ,

[TABLE]

for some $g:\mathbb{R}^{+}\rightarrow\mathbb{R}^{+}$ , where $g(\delta_{0})\rightarrow 0$ as $\delta_{0}\rightarrow 0$ .

Then, there exists $\delta:\mathbb{R}^{+}\rightarrow\mathbb{R}^{+}$ , where $\delta(\delta_{0})\rightarrow 0$ as $\delta_{0}\rightarrow 0$ , such that

[TABLE]

*Here $K>0$ and $K$ does not depend on $n$ , $P_{X,Y}$ , $P_{\mathbf{Z}}$ or $(\mathbf{x},\mathbf{y})$ but does depend on $\delta_{0}$ , $g$ and $P_{Z|Y}$ . Further, the $\delta$ function does not depend on $(\mathbf{x},\mathbf{y})$ , $P_{X,Y}$ or $P_{\mathbf{Z}}$ . *

We now use the above lemma to prove the following claim.

Claim 15.

There exists $\delta_{4}(\delta)>0$ , where $\delta_{4}(\delta)\rightarrow 0$ as $\delta\rightarrow 0$ , such that except for a small probability, $(\mathbf{X},\mathbf{J},\mathbf{Z},\mathbf{Y},\mathbf{U})$ is jointly $\delta_{4}$ -typical w.r.t. $P_{X}T_{\mathbf{J}|\mathbf{X}}W_{Y,Z|X,J}P_{U|Y}$ .

Proof:

Let us assume that $A_{X,J,Y,Z}$ is true. Now we use the refined Markov lemma (Lemma 14) on the Markov chain $(X,J,Z)\rightarrow Y\rightarrow U$ . Then, by Claims 12 and 13, $\mathbf{U}$ is chosen such that both conditions (a) and (b) in Lemma 14 are satisfied. Thus, the claim follows. ∎

We define this “good” event as $A_{X,J,Y,Z,U}:=\{(\mathbf{X},\mathbf{J},\mathbf{Z},\mathbf{Y},\mathbf{U})\in\mathcal{T}^{n}_{\delta_{4}}(P_{X}T_{\mathbf{J}|\mathbf{X}}W_{YZ|XJ}P_{U|Y}\}$ .

Claim 16.

There exists $\gamma(\delta)>0$ , where $\gamma(\delta)\rightarrow 0$ as $\delta\rightarrow 0$ , such that except for an exponentially small probability, $\mathbf{U}\in\mathcal{L}_{\gamma(\delta)}(M,\mathbf{Z})$ .

Proof:

Consider the event $A_{X,J,Y,Z,U}$ . Under this event, $(\mathbf{U},\mathbf{Z})$ are $\gamma(\delta)$ -typical w.r.t. $P_{U,Z}=[P_{X}T_{\mathbf{J}|\mathbf{X}}W_{Y,Z|X,J}P_{U|Y}]_{U,Z}$ , where $\gamma(\delta)=|\mathcal{X}||\mathcal{J}||\mathcal{Y}|\delta_{4}$ . Thus, the claim follows from Claim 15. ∎

This completes the proof of the first part of the lemma. The proof of the second part directly follows from the following claim.

Claim 17.

There exists $f_{2}(\delta,\epsilon)>0$ , where $f_{2}(\delta,\epsilon)\rightarrow 0$ as $\delta,\epsilon\rightarrow 0$ , such that

[TABLE]

Proof:

Note that the codewords $\{\mathbf{U}_{M,L^{\prime}}\}_{L^{\prime}\neq L}$ are independently generated, and hence, $\{\mathbf{U}_{M,L^{\prime}}\}_{L^{\prime}\neq L}$ and $\mathbf{Z}$ are independent. Consider a fixed conditional type $T_{J|X}\in\mathscr{Q}(T_{\mathbf{Y}})$ , and let the resulting distribution $P_{U,Z}=[P_{X}T_{J|X}W_{Y,Z|X,J}P_{U|Y}]_{U,Z}$ . Then,

[TABLE]

for some $\tilde{f_{2}(\delta,\epsilon)}\rightarrow 0$ as $\delta,\epsilon\rightarrow 0$ . This follows from the packing lemma [18, Lemma 3.1]. By taking the union bound over all conditional types $T_{J|X}\in\mathscr{Q}(T_{\mathbf{Y}})$ (the number of such types is at most polynomial in $n$ ), we get

[TABLE]

∎

This completes the proof of the lemma.

Appendix C

Here we prove (17). We first state the following useful lemma.

Lemma 18.

For a fixed $Q_{J|X}\in\mathscr{Q}$ , $F(D,Q_{J|X})$ is a non-decreasing, convex and continuous function of $D$ .

Proof:

The proof of Lemma 18 can be given using similar arguments as in the proof of the same statement about the rate-distortion function for the Wyner-Ziv problem [3]. We provide the proof below for completeness.

To prove that $F(D,Q_{J|X})$ is a non-increasing function of $D$ , note that the minimization in the definition of $F(D,Q_{J|X})$ is over the set $\mathcal{S}=\{(P_{U|Y},\tilde{x}(.,.)):\mathbb{E}[d(X,\widetilde{X})]\leq D$ }. So, for $D^{\prime}_{2}>D^{\prime}_{1}$ , the corresponding domains of minimization satisfy $\mathcal{S}_{2}\supseteq\mathcal{S}_{1}$ . Thus $F(D^{\prime}_{2},Q_{J|X})\leq F(D^{\prime}_{1},Q_{J|X})$ .

To prove the convexity of $F(D,Q_{J|X})$ as a function of $D$ , note that the minimization over $P_{U|Y}$ and $\tilde{x}(.,.)$ can be rewritten as a minimization over only $P_{U|Y}$ in a manner similar to [15]. Then, the alphabet of the auxiliary random variable $U$ is the set of ‘Shannon strategies’. To establish the convexity, we will first show that for given $Q_{J|X}\in\mathscr{Q}$ and for fixed $P_{X}$ and $W_{Y,Z|X,J}$ , $I(U;Y|Z)$ is convex in $P_{U|Y}$ . Toward this, consider the joint distribution $P_{X,Y,Z,U}=[P_{X}Q_{J|X}W_{Y,Z|X,J}P_{U|Y}]_{X,Y,Z,U}$ . For fixed $P_{Z}$ , we know that $I(U;Y|Z)$ is a convex function of $P_{U,Y|Z}$ [20]. Now $P_{U,Y|Z}=P_{Y|Z}P_{U|Y,Z}=P_{Y|Z}P_{U|Y}$ , where the last equality follows from the Markov chain $(X,J,Z)\rightarrow Y\rightarrow U$ . As $P_{X}$ , $Q_{J|X}$ and $W_{Y,Z|X,J}$ are fixed, it follows that $P_{Y,Z}$ is fixed, and hence, $I(U;Y|Z)$ is a convex function of $P_{U|Y}$ .

Now consider two distortion values $D^{\prime}_{1},D^{\prime}_{2}$ , such that $P^{(1)}_{U|Y}$ and $P^{(2)}_{U|Y}$ achieve the values $F(D^{\prime}_{1},Q_{J|X})$ and $F(D^{\prime}_{2},Q_{J|X})$ respectively. Let us define the convex combinations $D_{\lambda}=\lambda D^{\prime}_{1}+(1-\lambda)D^{\prime}_{2}$ and $P^{(\lambda)}_{U|Y}=\lambda P^{(1)}_{U|Y}+(1-\lambda)P^{(2)}_{U|Y}$ . Note that the other factors of the joint distribution $P_{X,J,Y,Z,U}=P_{X}Q_{J|X}W_{Y,Z|X,J}P_{U|Y}$ are fixed here. So the average distortion is linear in $P_{U|Y}$ , and thus, $P^{(\lambda)}_{U|Y}$ is a feasible distribution for $D_{\lambda}$ . Thus

[TABLE]

Here $(a)$ follows from the convexity of $I(U;Y|Z)$ w.r.t. $P_{U|Y}$ . This proves the convexity of $F(D,Q_{J|X})$ w.r.t. $D$ .

Finally, the continuity of $F(D,Q_{J|X})$ follows from its convexity [21]. ∎

In the following, for the given code and the i.i.d. jamming distribution $Q_{J|X}$ , we denote $D_{i}:=E[d(X_{i},\widetilde{X}_{i})]$ . Now we have

[TABLE]

Under the memoryless jamming strategy of the adversary given in (15), $Y_{i}$ and $(\mathbf{Z}^{i-1},\mathbf{Z}_{i+1}^{n},\mathbf{Y}^{i-1},\Theta)$ are independent, conditioned on $Z_{i}$ . This gives us $(a)$ . By defining $U_{i}=(M,\mathbf{Y}^{i-1},\mathbf{Z}^{i-1},\mathbf{Z}_{i+1}^{n},\Theta)$ , we have $(b)$ . For $(c)$ , note first that from (15), it follows that $U_{i}\leftrightarrow Y_{i}\leftrightarrow Z_{i},~{}\forall i$ . The minimization in (c) can hence be taken over pairs $(P_{U_{i}|Y_{i}},P_{\widetilde{X}_{i}|U_{i},Z_{i}})$ such that $E[d(X_{i},\widetilde{X}_{i})]\leq D_{i}$ under $Q_{J|X}$ . We get $(d)$ from the fact that the distortion constraint in the minimization being linear in $P_{\tilde{X}_{i}|U_{i},Z_{i}}$ allows us to replace it with a function $\tilde{x}(.,.)$ of $u,z$ as in the definition of $F(D,Q_{J|X})$ in (16). Finally, $(e)$ and $(f)$ follow respectively from the convexity and non-decreasing nature of $F(D,Q_{J|X})$ (from Lemma 18) where we note that $(1/n)\sum_{i}D_{i}\leq D^{(n)}$ . This establishes (17).

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. Lapidoth and P. Narayan, “Reliable communication under channel uncertainty,” IEEE Trans. Inform. Theory , vol. 44, pp. 2148–2177, 1998.
2[2] R. Dobrushin and B. Tsybakov, “Information transmission with additional noise,” IRE Trans. Inform. Theory , vol. 8, pp. 293–304, September 1962.
3[3] A. D. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,” IEEE Trans. Inform. Theory , vol. 22, pp. 1–10, January 1976.
4[4] C. E. Shannon, “Coding theorems for a discrete source with a fidelity criterion,” IRE Nat. Conv. Rec. , vol. 7, pp. 142–163, 1959.
5[5] J. C. Kieffer, “A survey of the theory of source coding with a fidelity criterion,” IEEE Trans. Inform. Theory , vol. 39, pp. 1473–1490, September 1993.
6[6] T. Berger and J. D. Gibson, “Lossy source coding,” IEEE Trans. Inform. Theory , vol. 44, pp. 2693–2723, October 1998.
7[7] A. Dembo and T. Weissman, “The minimax distortion redundancy in noisy source coding,” IEEE Trans. Inform. Theory , vol. 49, pp. 3020–3030, November 2003.
8[8] D. L. Neuhoff, R. M. Gray, and L. D. Davisson, “Fixed rate universal block source coding with a fidelity criterion,” IEEE Trans. Inform. Theory , vol. 21, pp. 511–523, September 1975.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Coding for Arbitrarily Varying Remote Sources

Abstract

I Introduction

II Notation and Problem Setup

II-A Notation and Preliminaries

II-B The Problem Setup

III The Main Result

Theorem 1**.**

IV Proof of Theorem 1

IV-A Achievability

Remark 2**.**

IV-B The proof of the lower bound

V Conclusion

Acknowledgment

Appendix A Proof of Achievability

Claim 3**.**

Lemma 4**.**

Lemma 5**.**

Claim 6**.**

Proof:

Appendix B Proof of Lemma 5

Claim 7**.**

Claim 8**.**

Lemma 9** (Conditional typicality lemma).**

Proof:

Claim 10**.**

Claim 11**.**

Claim 12**.**

Proof:

Claim 13**.**

Proof:

Lemma 14** **(Refined Markov Lemma [17] 555In the refined

Claim 15**.**

Proof:

Claim 16**.**

Proof:

Claim 17**.**

Proof:

Appendix C

Lemma 18**.**

Proof:

Theorem 1.

Remark 2.

Claim 3.

Lemma 4.

Lemma 5.

Claim 6.

Claim 7.

Claim 8.

Lemma 9 (Conditional typicality lemma).

Claim 10.

Claim 11.

Claim 12.

Claim 13.

Lemma 14 (Refined Markov Lemma [17] 555In the refined

Claim 15.

Claim 16.

Claim 17.

Lemma 18.