Second Order Analysis for Joint Source-Channel Coding with Markovian   Source

Ryo Yaguchi; Masahito Hayashi

arXiv:1701.03290·cs.IT·September 10, 2024

Second Order Analysis for Joint Source-Channel Coding with Markovian Source

Ryo Yaguchi, Masahito Hayashi

PDF

Open Access

TL;DR

This paper derives second order rates for joint source-channel coding with Markovian sources and compares it to separation schemes, introducing new distribution families to facilitate analysis.

Contribution

It extends second order analysis to general Markov sources and channels, and introduces new distribution families for this purpose.

Findings

01

Derived second order rates for Markov sources and DMC channels.

02

Compared joint source-channel coding with separation scheme in second order regime.

03

Introduced switched Gaussian convolution and *-product distributions.

Abstract

We derive the second order rates of joint source-channel coding, whose source obeys an irreducible and ergodic Markov process when the channel is a discrete memoryless, while a previous study solved it only in a special case. We also compare the joint source-channel scheme with the separation scheme in the second order regime while a previous study made a notable comparison only with numerical calculation. To make these two notable progress, we introduce two kinds of new distribution families, switched Gaussian convolution distribution and *-product distribution, which are defined by modifying the Gaussian distribution.

Tables2

Table 1. TABLE I: Cumulative distribution functions

$Φ_{v}$	Gaussian distribution with variance $v$
$Ψ [v_{1}, v_{2}, v_{3}]$	Switched Gaussian convolution distribution (8)
$\tilde{Φ} [v_{1}, v_{2}]$	$*$ -product distribution (10)

Table 2. TABLE II: Relation between general case and conditional additive case

of message	$P_{M}$	$P_{M}$
	general case	conditional additive
message	$M$	$M$
input	$X$	$X$
output variable	$Y$	$(X, Z)$
channel	$W_{Y \| X}$	$W_{X Z \| X}$
encoder	$𝖾$	$𝖾$
decoder	$𝖽$	$𝖽$
distribution	$P_{M}$	$P_{M}$
decoding error	$P_{js} (ϕ \| P_{M}, W_{Y \| X})$	$P_{js} (ϕ \| P_{M}, W_{X Z \| X})$
probability	$P_{js} (ϕ \| P_{M}, W_{Y \| X})$	$P_{js} (ϕ \| P_{M}, W_{X Z \| X})$

Equations462

x \sum W (x, z ∣ x^{'}, z^{'}) = W (z ∣ z^{'})

x \sum W (x, z ∣ x^{'}, z^{'}) = W (z ∣ z^{'})

W (x, z ∣ x^{'}, z^{'})^{1 + θ} W_{Z} (z ∣ z^{'})^{- θ}

W (x, z ∣ x^{'}, z^{'})^{1 + θ} W_{Z} (z ∣ z^{'})^{- θ}

H_{1 + θ}^{W, ↓} (X ∣ Z) := - \frac{1}{θ} lo g λ_{θ},

H_{1 + θ}^{W, ↓} (X ∣ Z) := - \frac{1}{θ} lo g λ_{θ},

H^{W} (X ∣ Z) := θ \to 0 lim H_{1 + θ}^{W} (X ∣ Z) .

H^{W} (X ∣ Z) := θ \to 0 lim H_{1 + θ}^{W} (X ∣ Z) .

V^{W} (X ∣ Z) := θ \to 0 lim \frac{2 [ H ^{W} ( X ∣ Z ) - H _{1 + θ}^{W} ( X ∣ Z )]}{θ} .

V^{W} (X ∣ Z) := θ \to 0 lim \frac{2 [ H ^{W} ( X ∣ Z ) - H _{1 + θ}^{W} ( X ∣ Z )]}{θ} .

φ_{v_{1} + v_{2}} (x) = \int_{- \infty}^{\infty} φ_{v_{1}} (y) φ_{v_{2}} (x - y) d y .

φ_{v_{1} + v_{2}} (x) = \int_{- \infty}^{\infty} φ_{v_{1}} (y) φ_{v_{2}} (x - y) d y .

ψ [v_{1}, v_{2}, v_{3}] (x)

ψ [v_{1}, v_{2}, v_{3}] (x)

:=

Ψ [v_{1}, v_{2}, v_{3}] (R)

Ψ [v_{1}, v_{2}, v_{3}] (R)

=

=

\tilde{Φ} [v_{1}, v_{2}] (R) := a \in R min Φ_{v_{1}} (a) * Φ_{v_{2}} (R - a),

\tilde{Φ} [v_{1}, v_{2}] (R) := a \in R min Φ_{v_{1}} (a) * Φ_{v_{2}} (R - a),

a * b = a + b - ab .

a * b = a + b - ab .

\tilde{Φ} [v_{1}, v_{2}]^{- 1} (ε) = ε = ε_{s} * ε_{c} max (Φ_{v_{1}}^{- 1} (ε_{s}) + Φ_{v_{2}}^{- 1} (ε_{c})) .

\tilde{Φ} [v_{1}, v_{2}]^{- 1} (ε) = ε = ε_{s} * ε_{c} max (Φ_{v_{1}}^{- 1} (ε_{s}) + Φ_{v_{2}}^{- 1} (ε_{c})) .

Φ_{v_{1} + v_{2}} (R) \leq \tilde{Φ} [v_{1}, v_{2}] (R) \leq 2 Φ_{2 (v_{1} + v_{2})} (R) - Φ_{2 (v_{1} + v_{2})} (R)^{2} .

Φ_{v_{1} + v_{2}} (R) \leq \tilde{Φ} [v_{1}, v_{2}] (R) \leq 2 Φ_{2 (v_{1} + v_{2})} (R) - Φ_{2 (v_{1} + v_{2})} (R)^{2} .

P_{js} [ϕ ∣ P_{M}, W_{Y ∣ X}] := m \in M \sum P_{M} (m) W_{Y ∣ X} ({b : d (b) \neq = m} ∣ e (m)) .

P_{js} [ϕ ∣ P_{M}, W_{Y ∣ X}] := m \in M \sum P_{M} (m) W_{Y ∣ X} ({b : d (b) \neq = m} ∣ e (m)) .

P_{js} (P_{M}, W_{Y ∣ X}) := ϕ in f P_{js} [ϕ ∣ P_{M}, W_{Y ∣ X}] .

P_{js} (P_{M}, W_{Y ∣ X}) := ϕ in f P_{js} [ϕ ∣ P_{M}, W_{Y ∣ X}] .

P_{js} [ϕ ∣ P_{M}, W_{Y ∣ X}] \leq (P_{M} \times P_{X} \times W_{Y ∣ X}) {(P_{M} \times P_{X} \times W_{Y ∣ X}) (M, X, Y) \leq c (P_{X} \times \overset{ˉ}{W}_{Y}) (X, Y)} + \frac{1}{c},

P_{js} [ϕ ∣ P_{M}, W_{Y ∣ X}] \leq (P_{M} \times P_{X} \times W_{Y ∣ X}) {(P_{M} \times P_{X} \times W_{Y ∣ X}) (M, X, Y) \leq c (P_{X} \times \overset{ˉ}{W}_{Y}) (X, Y)} + \frac{1}{c},

P_{js} (P_{M}, W_{Y ∣ X}) \leq (P_{M} \times P_{X} \times W_{Y ∣ X}) {(P_{M} \times P_{X} \times W_{Y ∣ X}) (M, X, Y) \leq c (P_{X} \times \overset{ˉ}{W}_{Y}) (X, Y)} + \frac{1}{c} .

P_{js} (P_{M}, W_{Y ∣ X}) \leq (P_{M} \times P_{X} \times W_{Y ∣ X}) {(P_{M} \times P_{X} \times W_{Y ∣ X}) (M, X, Y) \leq c (P_{X} \times \overset{ˉ}{W}_{Y}) (X, Y)} + \frac{1}{c} .

W_{X Z ∣ X} (x, z ∣ x^{'}) = P_{X Z} (x - x^{'}, z) .

W_{X Z ∣ X} (x, z ∣ x^{'}) = P_{X Z} (x - x^{'}, z) .

P_{js} (P_{M}, W_{X Z ∣ X}) \leq P_{M} \times P_{X Z} {P_{M} (M) P_{X ∣ Z} (X ∣ Z) \leq c \frac{1}{∣ X ∣}} + \frac{1}{c} .

P_{js} (P_{M}, W_{X Z ∣ X}) \leq P_{M} \times P_{X Z} {P_{M} (M) P_{X ∣ Z} (X ∣ Z) \leq c \frac{1}{∣ X ∣}} + \frac{1}{c} .

(P_{M} \times P_{X^{'}} \times W_{X Z ∣ X^{'}}) {(P_{M} \times P_{X^{'}} \times W_{X Z ∣ X}) (M, X^{'}, X Z) \leq c P_{X^{'}} \times \overset{ˉ}{W}_{X Z} (X^{'}, X, Z)}

(P_{M} \times P_{X^{'}} \times W_{X Z ∣ X^{'}}) {(P_{M} \times P_{X^{'}} \times W_{X Z ∣ X}) (M, X^{'}, X Z) \leq c P_{X^{'}} \times \overset{ˉ}{W}_{X Z} (X^{'}, X, Z)}

=

=

=

P_{js} (ϕ ∣ P_{M}, W_{Y ∣ X}) \leq P_{M} \times P_{X Z} {P_{M} (M) P_{X ∣ Z} (X ∣ Z) \leq c \frac{1}{∣ X ∣}} + \frac{1}{c} .

P_{js} (ϕ ∣ P_{M}, W_{Y ∣ X}) \leq P_{M} \times P_{X Z} {P_{M} (M) P_{X ∣ Z} (X ∣ Z) \leq c \frac{1}{∣ X ∣}} + \frac{1}{c} .

P_{js} (P_{M}, W_{Y ∣ X}) \geq m \sum P_{M} (m) W_{Y ∣ X = e (m)} {P_{M} (m) W_{Y ∣ X = e (m)} (Y) \leq c Q_{Y} (Y)} - c .

P_{js} (P_{M}, W_{Y ∣ X}) \geq m \sum P_{M} (m) W_{Y ∣ X = e (m)} {P_{M} (m) W_{Y ∣ X = e (m)} (Y) \leq c Q_{Y} (Y)} - c .

P_{js} (P_{M}, W_{X, Z ∣ X}) \geq P_{M} \times P_{X Z} {P_{M} (M) P_{X ∣ Z} (X ∣ Z) \leq c \frac{1}{∣ X ∣}} - c

P_{js} (P_{M}, W_{X, Z ∣ X}) \geq P_{M} \times P_{X Z} {P_{M} (M) P_{X ∣ Z} (X ∣ Z) \leq c \frac{1}{∣ X ∣}} - c

Q_{Y} (y) = Q_{X Z} (x, z) = \frac{1}{∣ X ∣} P_{Z} (z)

Q_{Y} (y) = Q_{X Z} (x, z) = \frac{1}{∣ X ∣} P_{Z} (z)

m \sum P_{M} (m) W_{Y ∣ X = e (m)} {P_{M} (m) W_{Y ∣ X = e (m)} (Y) \leq c Q_{Y} (Y)}

m \sum P_{M} (m) W_{Y ∣ X = e (m)} {P_{M} (m) W_{Y ∣ X = e (m)} (Y) \leq c Q_{Y} (Y)}

=

=

=

=

P_{j} [ϕ ∣ k, n ∣ W_{s}, W_{Y^{n} ∣ X^{n}}] := m^{k} \in M^{k} \sum P_{M^{k}} (m^{k}) W_{Y^{n} ∣ X^{n}} ({y^{n} : d (y^{n}) \neq = m^{k}} ∣ e (m^{k})) .

P_{j} [ϕ ∣ k, n ∣ W_{s}, W_{Y^{n} ∣ X^{n}}] := m^{k} \in M^{k} \sum P_{M^{k}} (m^{k}) W_{Y^{n} ∣ X^{n}} ({y^{n} : d (y^{n}) \neq = m^{k}} ∣ e (m^{k})) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWireless Communication Security Techniques · Algorithms and Data Compression · Cellular Automata and Applications

Full text

Second Order Analysis for Joint Source-Channel Coding with Markovian Source

Ryo Yaguchi and Masahito Hayashi The material in this paper will be presented in part at the 2017 IEEE International Symposium on Information Theory (ISIT 2017), Aachen (Germany), 25-30 June 2017.Ryo Yaguchi was with the Graduate School of Mathematics, Nagoya University, Furocho, Chikusaku, Nagoya, 464-860, JapanMasahito Hayashi is with the Graduate School of Mathematics, Nagoya University, Furocho, Chikusaku, Nagoya, 464-860, Japan, and Centre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, Singapore 117542. (e-mail: [email protected])

Abstract

We derive the second order rates of joint source-channel coding, whose source obeys an irreducible and ergodic Markov process when the channel is a discrete memoryless, while a previous study solved it only in a special case. We also compare the joint source-channel scheme with the separation scheme in the second order regime while a previous study made a notable comparison only with numerical calculation. To make these two notable progress, we introduce two kinds of new distribution families, switched Gaussian convolution distribution and $*$ -product distribution, which are defined by modifying the Gaussian distribution.

Index Terms:

Markov chain, second order, joint source-channel coding, separation scheme

I Introduction

Nowadays, second order analysis attracts much attention in information theory [1, 2, 3, 4, 6]. In this type of analysis, we focus on the second leading term with the order $\sqrt{n}$ in the coding length in addition to the first leading term with the order $n$ when the block length is $n$ . To discuss the finiteness of the blocklength, we need to be careful for the second leading term as well as the first leading term. The coefficient of the order $\sqrt{n}$ is given as the inverse of the cumulative distribution function of the Gaussian distribution depending on the decoding error probability $\varepsilon$ in many existing studies for the second order except for the papers [13, 14]. This is because the second order analysis is deeply rooted in the central limit theorem. In channel coding, the second order coefficient is given by the Gaussian distribution, whose variance is given as the variance of the information density. Here, the information density is given as the logarithm of the likelihood ratio between the joint distribution of the input and output random variable and their product distribution when the expectation of the logarithm of the likelihood ratio achieve the channel capacity. However, the variance of the information density is not unique, in general because multiple input distributions attain the channel capacity in general. So, in such a general case, the variance of the Gaussian determining the second order coefficient is chosen depending on the sign of the decoding error probability $\varepsilon$ . Recently, the two papers [5, 15] extended the second order analysis to the Makovian case, in which, the Markovian version of the central limit theorem is employed instead of the conventional central limit theorem. In particular, the paper [5] discussed source coding for Markovian source and channel coding for additive channel whose additive noise is Markovian. Also, Kontoyiannis and Verdú, [23] discussed the variable-length source coding in a similar setting.

Usually, the channel coding is discussed with the message subject to the uniform distribution. However, in the real communication, the message is not necessarily subject to the uniform distribution. To resolve this problem, we often consider the channel coding with the message subject to the non-uniform distribution. Such a problem is called source-channel joint coding and has been actively studied by several researchers [12, 10, 11, 6, 9, 8]. As a simple case, we often assume that the message is subject to the independent and identical distribution. In this case, the capacity is given as the ratio of the conventional channel capacity to the entropy of the message. Several studies [12, 10, 11] derived the exponential decreasing rate of the decoding error probability in this setting. Recently, while Wang-Ingber-Kochman [6] and Kostina-Verdú [9, 24] discussed the second-order coefficient in this problem, two major open problems has been remained in this topic as follows. Wang-Ingber-Kochman [6] derived the second order coefficient only when the variance of the information density is unique. When the variance is not unique, Kostina-Verdú [24] extended it to the lossy case. Kostina-Verdú [9] extended the lower bound of the second-order coefficient by the same method as [6]. However, the impossibility to improve the bound has been an open problem in the general case. Also, in the above special case, Wang-Ingber-Kochman [6] compared their second order coefficient of the joint scheme with that with the separation scheme. Based on their numerical calculation, they conjectured an inequality for the loss of the separation scheme [7], whose analytical proof has been remained as another open problem.

In this paper, we tackle both open problems. Firstly, we derive the second-order coefficient in this problem. The obtained coefficient is strictly larger than that by Kostina-Verdú [9] when the variance of the information density is not unique. To characterize the second-order coefficient, we introduce a new probability distribution as a generalization of the Gaussian distribution. That is, the second-order coefficient is given as the inverse of the cumulative distribution function of the new probability distribution. Further, we derive this result even when the distribution of the message is Markovian. Secondly, we discuss the second order coefficient with the separation scheme in the above general setting. Also, we analytically determine the range of the ratio between the error probabilities with the joint and separation schemes when the variance of the information density is unique. In this way, we resolve both open problems.

The remaining part of this paper is organized as follows. In Section II, we prepare several information quantities for Markovian process. Section III introduces two new distribution families. In Section IV, we discuss the joint source-channel coding in the single shot setting. Then, Section V shows our results for Markovian conditional additive channel. discusses the second order rate. Section VI discusses the case with discrete memoryless channel. In Section VII, we compare the joint source-channel scheme with the separation scheme.

II Notations and Information quantities

II-A Single shot

In this paper, we denote the random variable by a capital letter, e.g., $X$ . By ${\cal X}$ , we denote the set that the random variable $X$ takes values in. Then, we denote the distribution of the random variable $X$ by $P_{X}$ . When we have two distributions $P_{X}$ and $P_{Y}$ , we define their product distribution $P_{X}\times P_{Y}$ as $(P_{X}\times P_{Y})(x,y):=P_{X}(x)P_{Y}(y)$ .

When we have two different sets ${\cal X}$ and ${\cal Y}$ , we denote a transition matrix from ${\cal X}$ to ${\cal Y}$ by $W_{Y|X}$ . Then, we define the distribution $P_{X}\times W_{Y|X}$ as $(P_{X}\times W_{Y|X})(x,y)=P_{X}(x)W_{Y|X}(y|x)$ . When ${\cal X}$ is the same set as ${\cal Y}$ , we do not describe the subscript $Y|X$ . In this case, we define the transition matrix $W^{n}$ on ${\cal X}$ as $W^{n}(x_{n}|x_{0}):=\sum_{x_{n-1},\ldots x_{1}}W(x_{n}|x_{n-1})W(x_{n-1}|x_{n-2})\cdots W(x_{1}|x_{0})$ . A transition matrix $W$ on ${\cal X}$ is called irreducible when for each $x,x^{\prime}\in{\cal X}$ , there exists a natural number $n$ such that $W^{n}(x|x^{\prime})>0$ . An irreducible matrix $W$ is called ergodic when there are no input $x^{\prime}$ and no integer $n^{\prime}$ such that $W^{n}(x^{\prime}|x^{\prime})=0$ unless $n$ is divisible by $n^{\prime}$ .

II-B Markovian process

Since this paper addresses the Markovian processes, we prepare several information measures given in [5] for an ergodic and irreducible transition matrix $W=\{W(x,z|x^{\prime},z^{\prime})\}_{(x,z),(x^{\prime},z^{\prime})\in({\cal X\times Z})^{2}}$ on $({\cal X}\times{\cal Z})$ . For this purpose, we employ the following assumption on transition matrices, which were introduced by the paper [5].

Definition 1 (non-hidden).

When an ergodic and irreducible transition matrix $W$ satisfies the condition

[TABLE]

for every $x^{\prime}\in{\cal X}$ and $z,z^{\prime}\in{\cal Z}$ , it is canned non-hidden (with respect to ${\cal Z}$ ).

For example, when the cardinality of ${\cal Z}$ is $1$ , the above non-hidden condition holds. For a non-hidden transition matrix $W$ on ${\cal X}\times{\cal Z}$ with respect to ${\cal Z}$ , we define the marginal $W_{Z}$ by $W_{Z}(z|z^{\prime}):=\sum_{x}W(x,z|x^{\prime},z^{\prime})$ . In the following, we assume the non-hidden condition. By $\lambda_{\theta}$ , we denote the Perron-Frobenius eigenvalue of

[TABLE]

for a real number $\theta$ . Then, we define the conditional Rényi entropy for the transition matrix [5] as

[TABLE]

which is often called the lower type of conditional Rényi entropy and is denoted by $H_{1+\theta}^{W,\downarrow}(X|Z)$ in [5].

Taking the limit $\theta\to 0$ , we define the entropy for the transition matrix $W$ as

[TABLE]

To discuss the difference of $H_{1+\theta}^{W}(X|Z)$ from $H^{W}(X|Z)$ , we introduce the varentropy for the transition matrix $\Gamma$ as

[TABLE]

So, we have the approximation as $H_{1+\theta}^{W}(Z|X)=H^{W}(Z|X)-\frac{1}{2}V^{W}(Z|X)\theta+O(\theta^{2})$ as $\theta\to 0$ . In these definitions, when the output distribution of $W$ does not depend on the input element, the quantities $H_{1+\theta}^{W}(X|Z)$ , $H^{W}(X|Z)$ , and $V^{W}(X|Z)$ are the same as the conventional definitions. Then, we have the following proposition.

Proposition 2 (Central limit theorem for Markovian Process ([22]etc.)).

When $X^{n}=(X_{1},\ldots,X_{n})$ and $Z^{n}=(Z_{1},\ldots,Z_{n})$ are subject to the Markovian process generated by a non-hidden transition matrix $W$ , the random variable $\frac{1}{\sqrt{n}}(-\log P_{X^{n}|Z^{n}}(X^{n}|Z^{n})-nH^{W}(X|Z))$ asymptotically obeys the Gaussian distribution with variance $V^{W}(X|Z)$ 111There are so many literatures for central limit theorem for Markovian Process. The paper [19, Corollary 7.2.] gives its very elementary proof. It also summarizes existing approaches for this statement..

III New Probability Distribution Families

III-A Switched Gaussian convolution distribution

To describe the second order rate in the joint source-channel coding, we introduce a new type of distribution family, so called switched Gaussian convolution distributions. It is known that the convolution of two Gaussian distributions is also a Gaussian distribution as follows. When $\varphi_{v}$ is the probability density function of the Gaussian distribution with average [math] and variance $v$ , we have

[TABLE]

Now, we consider the case when the variance of the second probability density function is switched at $y=x$ . So, we define the function $\psi[v_{1},v_{2},v_{3}](x)$ as

[TABLE]

where $v_{+}:=\max\{v_{2},v_{3}\}$ and $v_{-}:=\min\{v_{2},v_{3}\}$ . Taking the integral with respect to $x$ , we define the function ${\Psi}[v_{1},v_{2},v_{3}](R):=\int_{-\infty}^{R}\psi[v_{1},v_{2},v_{3}](x)dx$ , which satisfies

[TABLE]

where $\Phi_{v}(R):=\int_{-\infty}^{R}\varphi_{v}(x)dx$ . We simplify $\Phi_{v}$ to $\Phi$ when $v=1$ .

Since the value $\min\{\Phi_{v_{2}}(R-y),\Phi_{v_{3}}(R-y)\}$ goes to math as $R$ goes to $-\infty$ ( $\infty$ ), respectively, the RHS of (8) goes to math as $R$ goes to $-\infty$ ( $\infty$ ), respectively, Also, the value $\min\{\Phi_{v_{2}}(R-y),\Phi_{v_{3}}(R-y)\}$ is monotonically increasing with respect to $R$ , the RHS of (8) also is monotonically increasing with respect to $R$ . These facts show that ${\Psi}[v_{1},v_{2},v_{3}](R)$ is the cumulative distribution function of a probability distribution. In the following, we call this distribution the switched Gaussian convolution distribution with $v_{1},v_{2}$ , and $v_{3}$ .

To see the behavior of the distribution function of the switched Gaussian convolution distribution, we set $v_{1}=v_{2}=1$ , and change the third parameter $v_{3}$ . Then, we obtain the graph given in Fig. 1. From the definition, we find that the maximum $\max_{v_{3}}{\Psi}[1,1,v_{3}](x)$ is realized when $v_{3}=1$ . Fig. 1 shows how much ${\Psi}[1,1,v_{3}](x)$ decreases unless $v_{3}=1$ .

III-B $$ -product distribution*

Now, given two parameter $v_{1},v_{2}>0$ , we define another probability distribution. For this purpose, we define the function $\tilde{\Phi}[v_{1},v_{2}]$ as

[TABLE]

where the product $*$ is defined as

[TABLE]

So, the inverse function $\tilde{\Phi}[v_{1},v_{2}]^{-1}$ is given as

[TABLE]

Since the function $\tilde{\Phi}[v_{1},v_{2}]$ satisfies the condition of the cumulative distribution function, it can be regarded as the cumulative distribution function of another probability distribution. We call it $*$ -product distribution because it is defined based on the $*$ product.

The cumulative distribution function $\tilde{\Phi}[v_{1},v_{2}]$ has the following property.

Lemma 1.

For any $v_{1},v_{2}>0$ , we have

[TABLE]

The equality in the first inequality is attained if and only if $\frac{v_{1}}{v_{2}}$ is [math] or $\infty$ . When $R\leq 0$ , the equality of the second inequality is attained if and only if $v_{1}=v_{2}$ .

Lemma 1 is shown in Appendix A. The functions in Lemma 1 are numerically compared in Fig. 2. When $v_{1}=v_{2}$ , we also numerically checked that the equality of the second inequality holds even for $R>0$ . Overall, the cumulative distribution functions of this paper are summarized in Table I.

Remark 3.

The paper [6, Section V] considered the function $\tilde{\Phi}[1,v_{2}]^{-1}(\varepsilon)/\sqrt{1+v_{2}}$ , and gave the same statement as the second inequality in (11) with the condition $\tilde{\Phi}[v_{1},v_{2}](R)<\frac{3}{4}$ in a difference form as a conjecture based on numerical calculations. This conjecture had been an open problem.

IV Single Shot Setting

IV-A Problem formulation

We first present the problem formulation by the single shot setting. Assume that the message $M$ takes values in ${\cal M}$ and is subject to the distribution $P_{M}$ . For a channel $W_{Y|X}(y|x)$ with input alphabet ${\cal X}$ and output alphabet ${\cal Y}$ , a channel code $\phi=(\mathsf{e},\mathsf{d})$ consists of one encoder $\mathsf{e}:{\cal M}\to{\cal X}$ and one decoder $\mathsf{d}:{\cal Y}\to{\cal M}$ . The average decoding error probability is defined by

[TABLE]

For notational convenience, we introduce the smallest attainable decoding error probability under the above condition:

[TABLE]

IV-B Direct part

IV-B1 General case

We introduce several lemmas for the case when ${\cal M}$ is the set of messages to be sent, $P_{M}$ is the distribution of the messages, and $W_{Y|X}$ is the channel from ${\cal X}$ to ${\cal Y}$ . We have the following single-shot lemma for the direct part.

Proposition 4.

[16, Lemma 3.8.1]** For any constant $c>0$ and for any $P_{X}\in{\cal P(X)}$ , there exists a code $\phi=(\mathsf{e},\mathsf{d})$ such that

[TABLE]

where $\bar{W}_{Y}(y):=\sum_{x}P_{X}(x)W_{Y|X}(y|x)$ and $P_{X}\times W_{Y|X}(y,x):=P_{X}(x)W_{Y|X}(y|x)$ .

From above Proposition, we obviously have the following corollary.

Corollary 1.

[TABLE]

IV-B2 Conditional additive case

Now, we proceed to the case when the channel is conditional additive. Assume that ${\cal X}$ is a module and ${\cal Y}$ is given as ${\cal X}\times{\cal Z}$ . Here, $Z$ is called the internal state. Then, the channel $W$ is called conditional additive [5] when there exists a joint distribution $P_{XZ}$ such that

[TABLE]

We summarize the relation between general case and conditional additive case as Table II.

Then we simplify (15) of Corollary 1 to the following lemma.

Lemma 2.

A conditional additive channel $W_{XZ|X}$ satisfies the inequality

[TABLE]

Proof.

By setting that $P_{X}$ is the uniform distribution and choosing the random variables $X=X^{\prime}$ and $Y=XZ$ to the right hand side of (15), we have

[TABLE]

where $P_{Z}(z):=\sum_{x}P_{XZ}(x,z)$ . Hence, (15) can be simplified to

[TABLE]

∎

IV-C Converse part

IV-C1 General case

Firstly, combining the idea of meta converse [20][21, Lemma 4][4] and the general converse lemma for the joint source and channel coding [16, Lemma 3.8.2], we obtain the following lemma for the single shot setting. The following lemma is the same as [16, Lemma 3.8.2] when $Q_{Y}$ is $\bar{W}_{Y}$ .

Lemma 3.

For any constant $c>0$ , any code $\phi=(\mathsf{e},\mathsf{d})$ and any distribution $Q_{Y}$ on ${\cal Y}$ , we have

[TABLE]

Remark 5.

The paper [24, Theorem 1] gives a similar statement with slightly different terminology. To readers’ convenience, we give its proof in Appendix D.

IV-C2 Conditional additive case

Now, we proceed to the conditional additive case given in (16), in which, ${\cal Y}$ is given as ${\cal X}\times{\cal Z}$ . Applying (19) to the conditional additive case, we obtain the following lemma.

Lemma 4.

The inequality

[TABLE]

holds for any $c>0$ .

Proof.

We choose $Q_{Y}$ as

[TABLE]

to (19). Then, the first term of the right hand side of (20) is

[TABLE]

So, we obtain (20). ∎

V $n$ -fold Markovian conditional additive channel

V-A Formulation for general case

Firstly, we give general notations for channel coding when the message obeys Markovian process. The formulation presented in this subsection will be applied even to the next section. We assume that the set of messages is ${\cal M}^{k}$ . Then, we assume that the message $M^{k}=(M_{1},\ldots,M_{k})\in{\cal M}^{k}$ is subject to the Markov process with the transition matrix $\{W_{s}(m|m^{\prime})\}_{m,m^{\prime}\in{\cal M}}$ . We denote the distribution for $M^{k}$ by $P_{M^{k}}$ .

Now, we consider very general sequence of channels with the input alphabet ${\cal X}^{n}$ and the output alphabet ${\cal Y}^{n}$ . In this case, the transition matrix as $\{W_{Y^{n}|X^{n}}(y^{n}|x^{n})\}_{x^{n}\in{\cal X}^{n},y^{n}\in{\cal Y}^{n}}$ . Then, a channel code $\phi=(\mathsf{e},\mathsf{d})$ consists of one encoder $\mathsf{e}:{\cal M}^{k}\to{\cal X}^{n}$ and one decoder $\mathsf{d}:{\cal Y}^{n}\to{\cal M}^{k}$ . Then, the average decoding error probability is defined by

[TABLE]

For notational convenience, we introduce the error probability under the above condition:

[TABLE]

When there is no possibility for confusion, we simplify it to $\mathrm{P}_{\mathrm{j}}(k,n)$ . Instead of evaluating the error probability $\mathrm{P}_{\mathrm{j}}(n,k)$ for given $n,k$ , we are also interested in evaluating

[TABLE]

for given $0\leq\varepsilon\leq 1$ .

V-B Formulation for Markovian conditional additive channel

In this section, we address an $n$ -fold Markovian conditional additive channel [5]. That is, we consider the case when the joint distribution for the additive noise obeys the Markov process. To formulate our channel, we prepare notations. Consider the joint Markovian process on ${\cal X}\times{\cal Z}$ . That is, the random variables $X^{n}=(X_{1},\ldots,X_{n})\in{\cal X}^{n}$ and $Z^{n}=(Z_{1},\ldots,Z_{n})\in{\cal Z}^{n}$ are assumed to be subject to the joint Markovian process defined by the transition matrix $\{W_{c}(x,z|x^{\prime},z^{\prime})\}_{x,x^{\prime}\in{\cal X},z,z^{\prime}\in{\cal Z}}$ . We denote the joint distribution for $X^{n}$ and $Z^{n}$ by $P_{X^{n},Z^{n}}$ . Now, we assume that ${\cal X}$ is a module, and consider the channel with the input alphabet ${\cal X}^{n}$ and the output alphabet $({\cal X}\times{\cal Z})^{n}$ . The transition matrix for the channel $W_{X^{n},Z^{n}|{X^{n}}^{\prime}}$ is given as

[TABLE]

for $z^{n}\in{\cal Z}^{n}$ and $x^{n},{x^{n}}^{\prime}\in{\cal X}^{n}$ . Also, we denote $\log|{\cal X}|$ by $R$ . In this case, we denote the average error probability $\mathrm{P}_{\mathrm{j}}[\phi|k,n|W_{s},W_{X^{n},Z^{n}|X^{n}}]$ and the minimum average error probability $\mathrm{P}_{\mathrm{j}}(k,n|W_{s},W_{X^{n},Z^{n}|X^{n}})$ by $\mathrm{P_{jca}}[\phi|k,n|W_{s},W_{c}]$ and $\mathrm{P_{jca}}(k,n|W_{s},W_{c})$ , respectively. Then, we denote the maximum size $\mathrm{K}(n,\varepsilon|W_{s},W_{Y^{n}|X^{n}})$ by $\mathrm{K_{ca}}(n,\varepsilon|W_{s},W_{c})$ . When we have no possibility for confusion, we simplify them to by $\mathrm{P_{jca}}[\phi|k,n]$ , $\mathrm{P_{jca}}(k,n)$ , and $\mathrm{K_{ca}}(n,\varepsilon)$ , respectively.

In the following discussion, we assume the non-hidden condition for the joint Markovian process described by the transition matrix $\{W_{c}(x,z|x^{\prime},z^{\prime})\}_{x,x^{\prime}\in{\cal X},z,z^{\prime}\in{\cal Z}}$ . Under the non-hidden condition, the paper [5] shows the single-letterized channel capacity to be $C:=\log|{\cal X}|-H^{W_{c}}(X|Z)$ . Among author’s knowledge, the class of channels satisfying the non-hidden condition is the largest class of channels whose channel capacity is known. When ${\cal Z}$ is singleton and the channel is the noiseless channel given by identity transition matrix $I$ , our problem becomes the source coding with Markovian source. In this case, the memory size is equal to the cardinality $|{\cal X}|^{k}$ , and we simplify the smallest attainable decoding error probability $\mathrm{P_{jca}}(k,n|W_{s},I_{X|X})$ to $\mathrm{P}_{\mathrm{s}}(k,n|W_{s})$ .

V-C Second order analysis

Theorem 1.

For any $0<\varepsilon<1$ , it holds that

[TABLE]

In other words,

[TABLE]

Theorem 1 yields the following corollary.

Corollary 2.

For $0<\varepsilon<1$ , we have

[TABLE]

Proof.

It is sufficient to show

[TABLE]

when $k$ is chosen as

[TABLE]

By choosing $c=e^{n^{1/4}}$ , (17) implies that

[TABLE]

Applying Proposition 2 to the random variables $-\log{P_{M^{k}}(M^{k})}$ and $-\log{P_{X^{n}|Z^{n}}(X^{n}|Z^{n})}$ , we find that

the random variable $\frac{1}{\sqrt{n}}(-\log{P_{M^{k}}(M^{k})}-\log{P_{X^{n}|Z^{n}}(X^{n}|Z^{n})}-kH^{W_{s}}(M)-nH^{W_{c}}(X|Z))$ converges to the Gaussian random variable with variance $\frac{C}{H^{W_{s}}(M)}V^{W_{s}}(M)+V^{W_{c}}(X|Z)$ . Since $\frac{n^{1/4}}{\sqrt{n}}\to 0$ and $\frac{1}{\sqrt{n}}(n\log{|{\cal X}|})=\frac{1}{\sqrt{n}}(kH^{W_{s}}(M)+nH^{W_{c}}(X|Z)-\sqrt{n}R)$ , we see that the RHS of (30) goes to $\Phi_{\frac{C}{H^{W_{s}}(M)}V^{W_{s}}(M)+V^{W_{c}}(X|Z)}(R)$ , which implies that

[TABLE]

By choosing $c=e^{-n^{1/4}}$ , (20) implies that

[TABLE]

Since $e^{-n^{1/4}}\to 0$ , the above application of Proposition 2 implies

[TABLE]

The combination of (31) and (33) implies (28). ∎

Similar to the above two cases, we can recover the result of data compression with the second order regime.

VI $n$ -fold Discrete Memoryless Channel (DMC) case

VI-A Formulation and notations

In this section, we address the $n$ -fold discrete memoryless channel with the input system ${\cal X}^{n}$ and the output system ${\cal Y}^{n}$ Hence, we adopt the same assumptions given in Section V for the message source. The difference from Section V is the form of channel. Given a transition matrix $\{W_{Y|X}(y|x)\}_{x\in{\cal X},y\in{\cal Y}}$ , the transition matrix for the channel $W_{Y^{n}|{X^{n}}}$ is given as

[TABLE]

where $x^{n}=(x_{1},\ldots,x_{n})\in{\cal X}^{n}$ and $y^{n}=(y_{1},\ldots,y_{n})\in{\cal Y}^{n}$ .

In this case, we denote the average error probability $\mathrm{P}_{\mathrm{j}}[\phi|k,n|W_{s},W_{X^{n},Z^{n}|X^{n}}]$ and the minimum average error probability $\mathrm{P}_{\mathrm{j}}(k,n|W_{s},W_{X^{n},Z^{n}|X^{n}})$ by $\mathrm{P_{jdm}}[\phi|k,n|W_{s},W_{Y|X}]$ and $\mathrm{P_{jdm}}(k,n|W_{s},W_{Y|X})$ , respectively. Then, we denote the maximum size $\mathrm{K}(n,\varepsilon|W_{s},W_{Y^{n}|X^{n}})$ by $\mathrm{K_{jdm}}(n,\varepsilon|W_{s},W_{Y|X})$ . When we have no possibility for confusion, we simplify them to $\mathrm{P_{jdm}}[\phi|k,n]$ , $\mathrm{P_{jdm}}(k,n)$ , and $\mathrm{K_{jdm}}(n,\varepsilon)$ , respectively.

For the latter discussion, we prepare the mutual information as

[TABLE]

where $D(P\|Q):=\sum_{y\in{\cal Y}}P(y)\log\frac{P(y)}{Q(y)}$ . Then, we define its variance version as

[TABLE]

and we also define the channel capacity $C:=\max_{P_{X}\in{\cal P(X)}}I(P_{X},W_{Y|X})=\min_{Q}\max_{x\in{\cal X}}D(W_{Y|X=x}\|Q)$ . Also, we define the maximum and minimum variances

[TABLE]

and the distribution achieving above maximum and minimum as

[TABLE]

VI-B Second order analysis and comparison

Using the switched Gaussian convolution distribution $\Psi\left[\frac{C}{H^{W_{s}}(M)}V^{W_{s}}(M),V^{*}_{+}(W_{Y|X}),V^{*}_{-}(W_{Y|X})\right]$ , we derive the second order coding rate in the following Theorem.

Theorem 2.

For any $\varepsilon\in(0,1)$ , we have

[TABLE]

where

[TABLE]

In other words, we have

[TABLE]

The direct and converse parts will be shown in Subsections VI-C and VI-D. The paper [6] discussed the same problem when the message is subject to the independent and identical distribution and the relation $V^{*}_{+}(W_{Y|X})=V^{*}_{-}(W_{Y|X})$ holds. When the condition $V^{*}_{+}(W_{Y|X})=V^{*}_{-}(W_{Y|X})$ holds, $\Psi\Big{[}\frac{C}{H^{W_{s}}(M)}V^{W_{s}}(M),V^{*}_{+}(W_{Y|X}),V^{*}_{-}(W_{Y|X})\Big{]}^{-1}(\varepsilon)$ becomes $\sqrt{\frac{C}{H^{W_{s}}(M)}V^{W_{s}}(M)+V^{*}_{+}(W_{Y|X})}{\Phi^{-1}(\varepsilon)}$ .

When the message is subject to the independent and identical distribution, as a simple generalization of the direct part of [6], Kostina-Verdú [9] showed the inequality

[TABLE]

where $\varepsilon_{KV}(R)$ is defined as

[TABLE]

Hence, we call the bound $\varepsilon_{KV}(R)$ Kostina-Verdú bound even for a general Markovian source with a transition matrix $W_{s}$ . As a comparison between our tight bound $\varepsilon(R)$ and Kostina-Verdú bound $\varepsilon_{KV}(R)$ , we obtain the following lemma.

Lemma 5.

The ratio $\frac{\varepsilon_{KV}(R)}{\varepsilon(R)}$ is evaluated as

[TABLE]

The equality of the first inequality is attained if and only if $V^{*}_{+}(W_{Y|X})=V^{*}_{-}(W_{Y|X})$ or $V(W_{s})=0$ . The equality of the second inequality is attained if and only if $V^{*}_{+}(W_{Y|X})$ and $V^{*}_{-}(W_{Y|X})$ go to $+\infty$ and [math], respectively.

This lemma shows that a gap between $V^{*}_{+}(W_{Y|X})$ and $V^{*}_{-}(W_{Y|X})$ produces a non-negligible effect for joint source-channel coding when the source is non-uniform. Fig. 3 gives a numerical calculation of the ratio $\frac{\varepsilon_{KV}(R)}{\varepsilon(R)}$ .

Proof.

The property (8) implies the first inequality. The equality condition for the first inequality follows from the form of the switched Gaussian convolution distribution given in (8).

To show the second inequality, we introduce the notation with variance $v$ as:

[TABLE]

For any $R$ , we find that $\alpha[v](R)$ is monotonically increasing function of $v$ , and $\beta[v](R)$ is monotonically decreasing function of $v$ . Additionally, we define

[TABLE]

For $R<0$ , we have

[TABLE]

where $(a)$ follows from $\beta[V^{*}_{+}(W_{Y|X})](R)\geq\beta_{min}(R)$ , $\beta[V^{*}_{-}(W_{Y|X})](R)\leq\beta_{max}(R)$ , and $\alpha[V^{*}_{-}(W_{Y|X})](R)\geq\alpha_{min}(R)$ .

For $R\geq 0$ , we have

[TABLE]

where $(b)$ follows from $\beta[V^{*}_{+}(W_{Y|X})](R)\geq\beta_{min}(R)$ , $\alpha[V^{*}_{-}(W_{Y|X})](R)\leq\alpha_{max}(R)$ , and $\alpha[V^{*}_{-}(W_{Y|X})](R)\geq\alpha_{min}(R)$ . The quality condition of the second inequality follows from the equality conditions of $(a)$ and $(b)$ . ∎

VI-C Direct part

To show the direct part of Theorem 2, we invent a novel random coding method because the existing random coding method cannot attain the bound $\varepsilon(R)$ . To attain the bound $\varepsilon(R)$ , we need to choose the distribution on ${\cal X}^{n}$ deciding the random coding depending on the message to be sent. Hence, we divide the set of messages into two sets, and we decide our code depending on the set the message belongs to. To realize this type code, we employ a code composed of two parts. The first part informs which set the message belongs to. The second part sends which element of the chosen set to be transmitted. Using Proposition 4, we show that this code attains the bound $\varepsilon(R)$ .

Step(0): First, we prepare several notations, some of which are used throughout this proof including the converse part. We simplify $W_{Y|X}(y|x)$ as $W_{x}(y)$ and $W_{Y^{n}|X^{n}}(y^{n}|x^{n})$ as $W_{x^{n}}^{n}(y^{n})$ . So, $W_{X^{n}}(Y^{n})$ is a random variable on ${\cal X}^{n}\times{\cal Y}^{n}$ . We choose the integer $k$ as

[TABLE]

Then, we define the following random variables.

[TABLE]

Step (i): In this step, we describe our code used in this proof. This code consists of two parts as follows. In the first part, the sender tells the receiver whether $S(m^{k})\leq R$ or $S(m^{k})>R$ . In the second part, they communicate each other by using the code depending on the result of the first part.

Now, we give the first part, in which, the message size is $2$ . So, we use only $n^{1/4}$ transmission of the channel for the first part. That is, the first is the code $\phi_{n}^{0}=(\mathsf{e}_{n}^{0},\mathsf{d}_{n}^{0})$ to tell whether $S(m^{k})\geq R$ or not. Assume that ${\cal X}$ contains elements [math] and $1$ . To give the first part, we define the encoder $\mathsf{e}_{n}^{0}:\{0,1\}\to{\cal X}^{n}$ as

[TABLE]

The decoder $\mathsf{d}_{n}^{0}:{\cal Y}^{n^{1/4}}\to\{0,1\}$ is defined as

[TABLE]

Then, we denote the error probability of the code $\phi_{n}^{0}$ by $\delta_{n}$ , which is represented as

[TABLE]

Note that $\delta_{n}\to 0$ because $n^{1/4}\to\infty$ .

As the second part, we define the code to send the massage $m^{k}$ based on the information transmitted in the first part. We use $N$ transmissions of the channel in the second part, where $N=n-n^{1/4}$ . Then, (58) implies that

[TABLE]

Using Proposition 4, we define the code $\phi_{N}^{+}:=(\mathsf{e}_{N}^{+},\mathsf{d}_{N}^{-})$ so that

[TABLE]

where $P_{M^{k}|S(M^{k})\leq R}$ is the conditional probability distribution of $P_{M^{k}}$ under the condition of $S(M^{k})\leq R$ . On the other hands, from Proposition 4, we define a code $\phi_{N}^{-}=(\mathsf{e}_{N}^{-},\mathsf{d}_{N}^{-})$ so that

[TABLE]

where $P_{M^{k}|S(M^{k})>R}$ is the conditional probability distribution of $P_{M^{k}}$ under the condition of $S(M^{k})>R$ . In both cases, $c$ is chosen to be $e^{N^{1/4}}$ .

Using the above preparation, we define the code $\phi_{n}:=(\mathsf{e}_{n},\mathsf{d}_{n})$ for whole protocol as follows. Then, for the encoder, we define $\mathsf{e}_{n}:{\cal M}^{k}\to{\cal X}^{\lceil n^{\frac{1}{4}}\rceil}\times{\cal X}^{N}$ as

[TABLE]

Also we define the decoder $\mathsf{d}:{\cal X}^{\lceil N^{\frac{1}{4}}\rceil}\times{\cal X}^{N}\to{\cal M}^{k}$ as

[TABLE]

Step (ii): In this step, we will prove that

[TABLE]

On the code $\phi_{n}$ , an error happens if an error occurs on the code $\phi_{n}^{0}$ , or an error doesn’t occur on the code $\phi_{n}^{0}$ and an error occurs on the code $\phi_{N}^{\pm}$ . Since $\delta_{n}\to 0$ , the error probability of the code $\phi_{n}$ , i.e., $\mathrm{P}_{\mathrm{js}}[\phi|P_{M^{k}},W_{Y^{n}|X^{n}}]$ , is evaluated as

[TABLE]

When $S(m^{k})\leq R$ , $P_{M^{k}|S(M^{k})\leq R}(m^{k})=\frac{P_{M^{k}}(m^{k})}{P_{M^{k}}\{S(M^{k})\leq R\}}$ . So, applying the central limit theorem for Markovian process (Proposition 2) to random variable $-\log P_{M^{k}}(M^{k})$ , we have

[TABLE]

which implies $\log{P_{M^{k}|S(M^{k})\leq R}(M^{k})}=\log P_{M^{k}}(M^{k})+o(\sqrt{N}).$ Since $kH^{W_{s}}(M)=NC+\sqrt{N}R+o(\sqrt{N})$ and $\frac{1}{\sqrt{N}}\log c\to 0$ , due to (62), we can rewrite (63) as:

[TABLE]

On the other hands, when $S(m^{k})>R$ , we have $P_{M^{k}|S(M^{k})>R}(m^{k})=\frac{P_{M^{k}}(m^{k})}{P_{M^{k}}\{S(M^{k})\leq R\}}$ . So, applying the central limit theorem for Markovian process to random variable $-\log P_{M^{k}}(M^{k})$ , we obtain

[TABLE]

which implies $\log{P_{M^{k}|S(M^{k})>R}(m^{k})}=\log P_{M^{k}}(m^{k})+o(N).$ So, we can rewrite (64) as:

[TABLE]

Combining (72), (73) and (74), we obtain (71).

Step (iii): In this step, we will prove that

[TABLE]

which implies

[TABLE]

for the integer $k$ given in (58).

Applying the central limit theorem for Markovian process (Proposition 2), we find the following facts. Under the distribution $P_{M^{k}}$ , the random variable $S(M^{k})$ asymptotically obeys the Gaussian distribution with mean [math] and variance $\frac{C}{H^{W_{s}}(M)}V^{W_{s}}(M)$ . Under the distribution $P_{M^{k}|S(M^{k})\leq R}\times(P_{X}^{+})^{\times N}\times W_{Y^{N}|X^{N}}$ , the random variable $C(X^{N},Y^{N})$ asymptotically obeys the Gaussian distribution with mean [math] and variance $V^{*}_{+}(W_{Y|X})$ . Under the distribution $P_{M^{k}|S(M^{k})\leq R}\times(P_{X}^{-})^{\times N}\times W_{Y^{N}|X^{N}}$ , the random variable $C(X^{N},Y^{N})$ asymptotically obeys the Gaussian distribution with mean [math] and variance $V^{*}_{-}(W_{Y|X})$ . Hence, taking the limit $N\to\infty$ , we obtain

[TABLE]

which implies (75).

VI-D Converse part

To show the converse part, we apply (19) of Lemma 3 to the case with the distribution $Q^{n}_{U}$ given in Step (i), which can be regarded as an extension of the idea of the paper [2] to the joint scheme. Then, we apply the central limit theorem for Markovian process (Proposition 2) to the two random variables related to the dispersions of channel and source. Since we treat two Gaussian random variables, the asymptotic error probability is lower bounded by the convolution of two Gaussian distributions. However, since the variance of the dispersions of channel is not unique, in general, we need to take the minimum for the Gaussian distribution function. Hence, the asymptotic error probability is lower bounded by the switched Gaussian convolution distribution.

Step (i): In this step, to show the converse part, we prepare several notations. We choose the message block length $k$ so that

[TABLE]

We denote that $x^{n}:=\mathrm{e}(m^{k})$ . We focus on the set $T_{n}$ of empirical distributions with $n$ channel inputs. Its cardinality $|T_{n}|$ is evaluated as $|T_{n}|\leq(n+1)^{|{\cal X}|}$ . And in this proof, we use the distribution

[TABLE]

where

[TABLE]

We also define the sets

[TABLE]

where ${\rm ep}(\mathsf{e}(m^{k}))$ of (82) is empirical distribution function of $\mathsf{e}(m^{k})\in{\cal X}^{n}$ .

Step (ii): We set the real number $c$ to be $e^{-n^{\frac{1}{4}}}$ . Since $\log c=nC+\sqrt{n}R-kH^{W_{s}}(M)$ , by substituting $Q_{Y}=Q^{n}_{U}$ , (19) of Lemma 3 implies that

[TABLE]

For arbitrary $L>0$ , the first term of right hand side is evaluated as

[TABLE]

Step (iii): For the second term of (84), we will show the following fact: Given an arbitrary small real number $\delta>0$ , there exists a sufficiently large $n_{1}$ such that

[TABLE]

for $n\geq n_{1}$ and $m^{k}\in\pi_{n,J,i}\cap\Omega_{n}^{c}$ .

When $m^{k}\in\Omega_{n}^{c}$ ,

[TABLE]

where $\mathrm{E}_{P}$ and $\mathrm{V}_{P}$ denote the expectation and the variance under the distribution $P$ . Thus, when $m^{k}\in\pi_{n,J,i}\cap\Omega_{n}^{c}$ , by using Chebyshev inequality, we obtain

[TABLE]

For sufficiently large $n$ , we have

[TABLE]

Since the value

[TABLE]

asymptotically goes to $1$ , we obtain (85).

Step (iv): For the second term of (84), we will show the following fact:

Given an arbitrary small real number $\delta>0$ , there exists a sufficiently large $n_{2}$ such that

[TABLE]

for $n\geq n_{2}$ and $m^{k}\in\Omega_{n}$ .

Now, to evaluate the variance of some random variable later, we define the quantity

[TABLE]

When $m^{k}\in\Omega_{n}$ , the inequality

[TABLE]

holds. Since the random variable

[TABLE]

has the variance $nV^{\prime}_{{\rm ep}(\mathsf{e}(m^{k})),W}$ , applying the central limit theorem, we have

[TABLE]

for sufficiently large $n$ . Because $\Phi(\cdot)$ is a monotonicity increasing function and the inequalities

[TABLE]

holds, the condition $R-\frac{i+1}{J}\geq 0$ implies

[TABLE]

and the other condition $R-\frac{i+1}{J}<0$ implies

[TABLE]

Hence, we obtain (90).

Step (v) : We will show the following fact: Given an arbitrary small real number $\delta>0$ , there exists a sufficiently large $n_{3}$ such that

[TABLE]

where $i_{0}:=\max\{i\in\mathbb{Z}|\frac{i+1}{J}\leq R\}$ , for $n\geq n_{3}$ and $m^{k}\in\Omega_{n}$ .

Combining (85) and (90), for sufficiently large $n$ , we obtain

[TABLE]

Step (vi): We will show the following fact: Given an arbitrary small real number $\delta^{\prime}>0$ , there exist sufficiently large numbers $n_{4},L$ , and $J$ such that

[TABLE]

for $n\geq n_{4}$ .

From the central limit theorem for Markov sequence (Proposition 2), random variable $S(M^{k})$ asymptotically obeys Gaussian distribution with mean [math] and variance $\frac{C}{H^{W_{s}}(M)}V^{W_{s}}(M)$ i.e.,

[TABLE]

With the limit $n\to\infty$ , we have

[TABLE]

So, taking the limit $n\to\infty$ , we have

[TABLE]

When $J\to\infty$ , we can compute (102) as:

[TABLE]

Furthermore, when $L\to\infty$ ,

[TABLE]

So, we obtain (99).

Step (vii): Since $\delta,\delta^{\prime}>0$ are arbitrary, the combination of Steps (iv) and (v) yields

[TABLE]

VII The Comparison between Joint and Separation Scheme

VII-A Formulation for separation coding

In this section, we compare the performance of the joint scheme with the performance of the separation scheme. To discuss the separation scheme, we formulate a separation encoder and a separation decoder. Firstly, we fix the input and output coding-lengths to be $k$ and $n$ . Then, we need to consider the encoded set $\{1,\cdots,A\}$ of source coding, which is also the message set of the channel coding. Since the channel encoder does not know the source distribution, it is natural to consider the average case with respect to the permutation on the set $\{1,\cdots,A\}$ . To handle such a permutation, we focus on the following triplet;

•

A source encoder $\mathsf{e}_{s,k,A}:{\cal M}^{k}\to\{1,\cdots,A\}$ .

•

A source-channel mapping $f_{U}:\{1,\cdots,A\}\to\{1,\cdots,A\}$ .

•

A channel encoder $\mathsf{e}_{c,A,n}:\{1,\cdots,A\}\to{\cal X}^{n}$ .

Then, our separation encoder is given as $\mathsf{e}_{c,A,n}\circ f_{U}\circ\mathsf{e}_{s,k,A}$ . The source-channel mapping $f_{U}$ is a random variable subject to the uniform distribution on the set of permutations on the set $\{1,\cdots,A\}$ . To discuss the separation decoder, we consider

•

A source decoder $\mathsf{d}_{s,A,k}:\{1,\cdots,A\}\to{\cal M}^{k}$ .

•

The inverse of the source-channel mapping $f_{U}^{-1}:\{1,\cdots,A\}\to\{1,\cdots,A\}$

•

A channel decoder $\mathsf{d}_{c,n,A}:{\cal X}^{n}\to\{1,\cdots,A\}$ .

So, our separation decoder is given as $\mathsf{d}_{s,A,k}\circ f_{U}^{-1}\circ\mathsf{d}_{s,A,k}$ . That is, our separation code is composed of $(\mathsf{e}^{*}_{n},\mathsf{d}^{*}_{n}):=(\mathsf{e}_{c,A,n}\circ f_{U}\circ\mathsf{e}_{s,k,A},\mathsf{d}_{s,A,k}\circ f_{U}^{-1}\circ\mathsf{d}_{s,A,k})$ .

Here, the source code $(\mathsf{e}_{s,k,A},\mathsf{d}_{s,A,k})$ has the source coding rate

[TABLE]

and the channel code $(\mathsf{e}_{c,A,n},\mathsf{d}_{c,n,A})$ has the channel coding rate

[TABLE]

Then, the decoding error probability of the code $(\mathsf{e}^{*}_{n},\mathsf{d}^{*}_{n})$ is given as the probability that the error occurs in the source coding or the channel coding. Hence, the decoding error probability $\mathrm{P}_{\rm sep}(\mathsf{e}^{*}_{n},\mathsf{d}^{*}_{n})$ is defined as

[TABLE]

Since the source-channel mapping $f_{U}$ takes the value in the permutation on the set $\{1,\cdots,A\}$ subject to the uniform distribution, it is natural to take the average with respect to the choice of $f_{U}$ . Hence, the value $\mathrm{P}_{\rm sep}[(\mathsf{e}_{s,k,A},\mathsf{d}_{s,A,k}),(\mathsf{e}_{c,A,n},\mathsf{d}_{c,n,A})]$ is defined as the average of $\mathrm{P}_{\rm sep}(\mathsf{e}^{*}_{n},\mathsf{d}^{*}_{n})$ with respect to this choice;

[TABLE]

Let $\mathrm{P}_{s}(\mathsf{e}_{s,k,A},\mathsf{d}_{s,A,k})$ be the decoding error probability of the source code $(\mathsf{e}_{s,k,A},\mathsf{d}_{s,A,k})$ , and let $\mathrm{P}_{c}(\mathsf{e}_{c,A,n},\mathsf{d}_{c,n,A})$ be the decoding error probability of the channel code $(\mathsf{e}_{c,A,n},\mathsf{d}_{c,n,A})$ with the message subject to the uniform distribution. Then, we have the following lemma.

Lemma 6.

The average $\mathrm{P}_{\rm sep}[(\mathsf{e}_{s,k,A},\mathsf{d}_{s,A,k}),(\mathsf{e}_{c,A,n},\mathsf{d}_{c,n,A})]$ is calculated as

[TABLE]

Proof.

From (107), we have

[TABLE]

The second term of (109) can be calculated as follows.

[TABLE]

Combining (109) and (110), we have

[TABLE]

∎

Under the fixed input and output coding-lengths $k$ and $n$ , we minimize the above value $\mathrm{P}_{\rm sep}[(\mathsf{e}_{s,k,A},\mathsf{d}_{s,A,k}),(\mathsf{e}_{c,A,n},\mathsf{d}_{c,n,A})]$ as

[TABLE]

Here, since

[TABLE]

we have

[TABLE]

Note that for any two real numbers $\alpha$ and $\beta$ ,

[TABLE]

Considering the minimum with given value $A$ , we have

[TABLE]

Hereafter, we note the coding rate of the separation scheme $r_{n}$ as $r_{n}:=\frac{k}{n}$ . Additionally, we define

[TABLE]

Remark 6.

Many existing papers [7, 11, 8] discussed the separation scheme, and they focused on the value $\mathrm{P}_{s}(\mathsf{e}_{s,k,A},\mathsf{d}_{s,A,k})*\mathrm{P}_{c}(\mathsf{e}_{c,A,n},\mathsf{d}_{c,n,A})$ . However, they did not give a rigorous derivation of this value. The contribution of this subsection is derivation of this value from the formulation given here, which is rigorously shown as Lemma 6.

VII-B Second order analysis

VII-B1 Conditional additive channel case

In this section, we evaluate the second order rate of the separation scheme. Using the $*$ -product distribution $\tilde{\Phi}\left[\frac{C}{H^{W_{s}}(M)}V^{W_{s}}(M),V^{W_{c}}(X|Z)\right]$ , we have the following theorem for a conditional additive channel given by the transition matrix $W_{c}$ .

Theorem 3.

The optimal transmission length $\mathrm{K_{sep}}(n,\varepsilon)$ is asymptotically expanded as

[TABLE]

In other words,

[TABLE]

where

[TABLE]

Remark 7.

This theorem is an extension of the existing result [6, Section V] to the case with Markovian source and a conditional additive channel.

Proof.

We assume that $\lim_{n\to\infty}\mathrm{P}_{\rm sep}(k,n)=\varepsilon$ and the intermediate set size of the separation code is $A$ . If $\mathrm{P}_{\mathrm{s}}(A;P_{M^{k}})\to\varepsilon_{s}$ and $\mathrm{P}_{\mathrm{c}}(A;W_{Y^{n}|X^{n}})\to\varepsilon_{c}$ then $\varepsilon=\varepsilon_{s}*\varepsilon_{c}$ .

The channel and source coding theorems for the Markovian case with the second order [5, Theorems 10 and 21] guarantee the following relations

[TABLE]

Hence, we have

[TABLE]

Since $\frac{k}{n}=\frac{C}{H^{W_{s}}(M)}+o(n)$ ,

[TABLE]

Optimizing the chose of $A$ , we have

[TABLE]

Hence, we have

[TABLE]

∎

VII-B2 Discrete memoryless channel case

Using the $*$ -product distribution, we evaluate the second order rate of separation coding in the discrete memoryless channel case.

Theorem 4.

For the discrete memoryless channel give by a transition matrix W, we have

[TABLE]

where

[TABLE]

Remark 8.

The paper [6, section V] showed the same statement with the assumption $V^{*}_{+}(W_{Y|X})=V^{*}_{-}(W_{Y|X})$ and the source is independent and identical distribution. Our contribution is removing the first assumption and generalizing it to Markovian source.

Proof.

We find that

[TABLE]

We assume that $\lim_{n\to\infty}\mathrm{P}_{\rm sep}(k,n)=\varepsilon$ and intermediate set size of separation code is $A$ . If $\mathrm{P}_{\mathrm{s}}(A;P_{M^{k}})\to\varepsilon_{s}$ and $\mathrm{P}_{\mathrm{c}}(A;W_{Y^{n}|X^{n}})\to\varepsilon_{c}$ then $\varepsilon=\varepsilon_{s}*\varepsilon_{c}$ . The channel coding theorem with the second order [1, 3, 4] (Theorem 2 with uniform message of size $A$ ) guarantees that

[TABLE]

Combining (121) and (130), we obtain

[TABLE]

because $\frac{k}{n}=\frac{C}{H^{W_{s}}(M)}+o(n)$ . So, we have

[TABLE]

∎

VII-C Comparison

Here, we compare the optimal error probability $\varepsilon(R)$ and the error probability $\varepsilon_{\rm sep}(R)$ of the separation scheme. Since this comparison is based on the capacity $C$ , the source entropy rate $H^{W_{s}}(M)$ , the source variance $V^{W_{s}}(M)$ , and the channel variance, the analysis of the conditional additive channel case can be done as the same was as the analysis of the discrete memoryless channel case. So, we discuss only the discrete memoryless channel case.

First, we compare the separation bound with the Kostina-Verdú bound $\varepsilon_{KV}(R)$ defined in (46), which is still not the tight bound in the joint source-channel scheme. The property (11) implies the inequality

[TABLE]

Here, the equality is attained if and only if $V^{W_{s}}(M)=0$ , $H^{W_{s}}(M)=0$ , or $C=0$ . When $H^{W_{s}}(M)=0$ , there is no information to be transmitted. When $C=0$ , we cannot make any information transmission. These two cases do not occur in a realistic case. When $V^{W_{s}}(M)=0$ , the distribution of the message source is uniform, which is not discussed in the joint source-channel coding. So, we conclude that the separation scheme always has a larger decoding error probability than the joint source-channel scheme.

As the opposite evaluation, we have the following lemma.

Lemma 7.

We have

[TABLE]

where

[TABLE]

Under the conditions $V^{*}_{+}(W_{Y|X})=V^{*}_{-}(W_{Y|X})$ and $R\leq 0$ , the equality holds if and only if $\frac{C}{H^{W_{s}}(M)}V^{W_{s}}(M)=V^{*}_{-}(W_{Y|X})$ .

Proof.

When $R\leq 0$ , we have

[TABLE]

So, the inequality (11) of Lemma 1 implies (133). We can show this inequality in the case of $R<0$ .

When $V^{*}_{+}(W_{Y|X})=V^{*}_{-}(W_{Y|X})$ and $R\leq 0$ , Lemma 1 guarantees that the equality holds if and only if $\frac{C}{H^{W_{s}}(M)}V^{W_{s}}(M)=V^{*}_{-}(W_{Y|X})$ . ∎

When the variance of the information density is unique, i..e, $V^{*}_{+}(W_{Y|X})=V^{*}_{-}(W_{Y|X})$ , Lemma 7 analytically determines the range of the ratio between the error probabilities with the joint and separation schemes. For the general case, combining Lemmas 5 and 7, we obtain the following lemma.

Lemma 8.

We have

[TABLE]

where $R^{*}:=\frac{R}{\sqrt{\frac{C}{H^{W_{s}}(M)}V^{W_{s}}(M)}}$ .

Remark 9.

The paper [7, Section V] discussed a similar comparison as Lemma 7 when the source is subject to an independent and identical distribution and $V^{*}_{-}(W_{Y|X})=V^{*}_{+}(W_{Y|X})$ . Although the paper [7, Section V] conjectured a similar statement as Lemma 1 via numerical calculation, they did not show it. Hence, they could not analytically determine the range of the ratio between the error probabilities with the joint and separation schemes even when $V^{*}_{+}(W_{Y|X})=V^{*}_{-}(W_{Y|X})$ .

VIII Discussion

We have discussed the source-channel joint coding with the second order regime. We have two open problems in this area. One is the complete derivation of the second order coding rate in the general discrete memoryless case. In this case, when the maximum and minimum variances has the same value, the second order coding rate was derived by the paper [6, 7]. However, the general case had been remained as an open problem while a lower bound was obtained by Kostina and Verdú [9]. Our optimal rate is strictly better than the lower bound by [9]. To achieve such a better rate, we have invented a new random coding method, in which, the distribution of the input alphabet is chosen according to the generation probability of the message. Since the generation probability depends on the message in the joint coding regime, this improvement is very effective. This coding method can be expected to another problem. The second contribution is the derivation of the range of the ratio between the second order error probabilities of the joint and separation schemes. The paper [7] derived an upper bound only by numerical calculation. We have showed this conjecture analytically. Further, we have given a rigorous formulation for the separation coding in Subsection VII-A while the error probability given in the RHS of (107) was used in many previous studies without rigorous derivation.

To obtain both main contributions, we have newly introduced two distribution families in Section III. One is switched Gaussian convolution distributions and the other is $*$ -product distribution. Both distributions are defined as modifying the Gaussian distribution. We have derived the notable relations among the cumulative distribution functions of these distributions and the Gaussian distribution. The second contribution has been obtained from this kind of relations. Since these new distributions have operational meaning in this way, we can expect that they will be applied to topics in information theory and related areas.

Acknowledgments

MH is very grateful to Professor Vincent Y. F. Tan and Professor Shun Watanabe for helpful discussions and comments. The works reported here were supported in part by JSPS Grants-in-Aid for Scientific Research (B) No. 16KT0017 and (A) No.17H01280, the Okawa Research Grant and Kayamori Foundation of Informational Science Advancement.

Appendix A Proof of Lemma 1

Step (i): In this step, we prove the first inequality of (11). Assume that $0<v_{1},v_{2}<\infty$ . Let $X$ and $Y$ be Gaussian random variables with mean [math] and variance $v_{1}$ and $v_{2}$ , respectively. They are assumed to be independent of each other. For a given real number $a$ , we have

[TABLE]

On the other hands, since $X+Y$ is a Gaussian random variable with variance $v_{1}+v_{2}$ , we have

[TABLE]

Because $\{X+Y\leq R\}\subsetneq\{X\leq a~{}{\rm or}~{}Y\leq R-a\}$ , we have

[TABLE]

which implies that

[TABLE]

Taking the maximum with respect to $a$ , we have

[TABLE]

Further, when $v_{1}$ or $v_{2}$ is zero, or $v_{1}$ or $v_{2}$ is infinity, the equality holds in (146).

Step (ii): In this step, we show the second inequality in (11), and its equality condition. For the proof, we define the new function $\tilde{\varepsilon}(\varepsilon,y)$ for $\varepsilon\in(0,1)$ and $y\leq 0$ as:

[TABLE]

Using this function, we can rewrite function $\tilde{\Phi}[v_{1},v_{2}]^{-1}(\varepsilon)$ as:

[TABLE]

where $y=\frac{v_{2}}{v_{1}}$ . Hence, the second inequality in (11) and its equality condition follow from the following two lemmas.

Lemma 9.

For any $y\geq 0$ , it holds that

[TABLE]

Hence, we obtain $R=\tilde{\Phi}[v_{1},v_{2}]^{-1}(\varepsilon)\geq\sqrt{2(v_{1}+v_{2})}\Phi^{-1}(1-\sqrt{1-\varepsilon})$ . So, we have $\Phi_{2(v_{1}+v_{2})}(R)\geq 1-\sqrt{1-\varepsilon}$ , which implies that

[TABLE]

Due to the equality condition in Lemma 9, the equality holds in (150) only when $v_{1}=v_{2}$ . Conversely, we have the following lemma.

Lemma 10.

When $R\leq 0$ , the equality

[TABLE]

holds.

Hence, we can see that the equality holds in (150) if and only if $v_{1}=v_{2}$ . Lemma 9 is shown in Appendix B, and Lemma 10 is shown in Appendix C.

Appendix B Proof of Lemma 9

It is sufficient to show the following two statements. (1) For any $y>0$ , we have

[TABLE]

(2) The maximum in (152) is realized only when $\varepsilon_{s}=\varepsilon_{c}$ . Under this condition, the infimum $\inf_{y\geq 0}\frac{\Phi^{-1}(\varepsilon_{s})+\sqrt{y}\Phi^{-1}(\varepsilon_{c})}{\sqrt{1+y}}$ is realized only when $y=1$ .

The statement (1) implies (150), and the statement (1) implies the necessarily condition for the equality in (150).

Step (i): In this step, we will show the following relation for $\varepsilon\leq\frac{3}{4}$ .

[TABLE]

Hence, it is sufficient to show that

[TABLE]

We rewrite the LHS of (160) as

[TABLE]

where $\cdot$ is inner product of vector. The inside of the RHS of (165) is calculated as

[TABLE]

Since $\varepsilon\leq\frac{3}{4}$ , either $\Phi(\varepsilon_{s})$ or $\Phi(\varepsilon_{c})$ is negative. Hereafter, we will consider the maximum value of (174) under the condition $\varepsilon=\varepsilon_{s}*\varepsilon_{c}$ .

When $\varepsilon\leq\frac{1}{2}$ , we have $\varepsilon_{s},\varepsilon_{c}\leq\frac{1}{2}$ , which implies (160). So, we consider the case when $\frac{1}{2}<\varepsilon\leq\frac{3}{4}$ , which has the above three cases. First, we consider the case when $\Phi^{-1}(\varepsilon_{s})\leq 0$ and $\Phi^{-1}(\varepsilon_{c})\geq 0$ . Then, we have

[TABLE]

That is, the maximum value is attained when $\Phi^{-1}(\varepsilon_{s})=\Phi^{-1}(2\varepsilon-1)$ and $\Phi^{-1}(\varepsilon_{c})=0$ .

We obtain the same equation in the case when $\Phi^{-1}(\varepsilon_{s})\geq 0$ and $\Phi^{-1}(\varepsilon_{c})\leq 0$ . Hence, we find that that the maximum of the RHS of (174) equals $\max_{\varepsilon=\varepsilon_{s}*\varepsilon_{c},\varepsilon_{s}\leq\frac{1}{2},\varepsilon_{c}\leq\frac{1}{2}}-\sqrt{\Phi^{-1}(\varepsilon_{s})^{2}+\Phi^{-1}(\varepsilon_{c})^{2}}$ , which implies (154).

Step (ii): In this step, when $\varepsilon\leq\frac{3}{4}$ , we will show the following equation. Also we will show that the following maximum is realized if and only if $\varepsilon_{s}=\varepsilon_{c}$ . Since the discussion of Step (i) shows that Under this condition, the infimum $\inf_{y\geq 0}\frac{\Phi^{-1}(\varepsilon_{s})+\sqrt{y}\Phi^{-1}(\varepsilon_{c})}{\sqrt{1+y}}$ is realized only when $y=1$ . These discussions show the desired statements (1) and (2) with $\varepsilon\leq\frac{3}{4}$ .

[TABLE]

For notation, we define the function for $\varepsilon_{s}$ and $\varepsilon_{c}$ as:

[TABLE]

Then, we can find that

[TABLE]

and $\max_{\varepsilon=\varepsilon_{s}*\varepsilon_{c},\varepsilon_{s}\leq\frac{1}{2},\varepsilon_{c}\leq\frac{1}{2}}A(\varepsilon_{s},\varepsilon_{c})$ is monotonically decreasing function of $\varepsilon$ . Hence, the relation (176) is equivalent to

[TABLE]

Choosing $a:=-\sqrt{2}\Phi^{-1}(1-\sqrt{1-\varepsilon})>0$ , we write

[TABLE]

for certain $\pi\leq\theta\leq\frac{3}{2}\pi$ . Now, to regard $\varepsilon_{s}*\varepsilon_{c}$ as a function of $\theta$ , we define

[TABLE]

and hereafter we will find $\theta$ which minimize $f(\theta)$ . Calculating the derivative, we have

[TABLE]

Now, we define

[TABLE]

Because $\Phi^{\prime}(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^{2}}{2}}$ and $\Phi(x)$ is a monotonically increasing function for $x<0$ , we find that $g_{a}(x)$ is a monotonically increasing function for $x<0$ . Since

[TABLE]

the derivative test chart of $f(\theta)$ is given as follows.

[TABLE]

Hence, when $\theta=\frac{5}{4}\pi$ i.e., $\varepsilon_{s}=\varepsilon_{c}$ , $f(\theta)$ is minimized. Therefore, when $(\varepsilon_{s},\varepsilon_{c})$ satisfies $\varepsilon=\varepsilon_{s}*\varepsilon_{c}$ and $\varepsilon_{s}=\varepsilon_{c}$ , the minimum (179) is attained. So, we have $\varepsilon_{s}=\varepsilon_{c}=1-\sqrt{1-\varepsilon}$ , which means (179).

Step (iii): In this step, when $\varepsilon>\frac{3}{4}$ , we will show the following equation. Also we will show that the following maximum is realized if and only if $\varepsilon_{s}=\varepsilon_{c}$ . Since the discussion of Step (i) shows that Under this condition, the infimum $\inf_{y\geq 0}\frac{\Phi^{-1}(\varepsilon_{s})+\sqrt{y}\Phi^{-1}(\varepsilon_{c})}{\sqrt{1+y}}$ is realized only when $y=1$ . These discussions show the desired statements (1) and (2) with $\varepsilon>\frac{3}{4}$ .

[TABLE]

Since $\varepsilon>\frac{3}{4}$ , we have four cases. (1) $\Phi^{-1}(\varepsilon_{s})\leq 0$ and $\Phi^{-1}(\varepsilon_{c})\leq 0$ . (2) $\Phi^{-1}(\varepsilon_{s})>0$ and $\Phi^{-1}(\varepsilon_{c})\leq 0$ . (3) $\Phi^{-1}(\varepsilon_{s})\leq 0$ and $\Phi^{-1}(\varepsilon_{c})>0$ . (4) $\Phi^{-1}(\varepsilon_{s})>0$ and $\Phi^{-1}(\varepsilon_{c})>0$ . The infinum $\inf_{y\geq 0}\frac{\Phi^{-1}(\varepsilon_{s})+\sqrt{y}\Phi^{-1}(\varepsilon_{c})}{\sqrt{1+y}}$ is negative except for the case (4). So, the maximum with respect to $\varepsilon_{s}$ and $\varepsilon_{c}$ under the condition $\varepsilon\geq\varepsilon_{s}*\varepsilon_{c}$ is realized in the case (4). In the case (4), we have

[TABLE]

The maximum of the RHS of (187) with the condition $\varepsilon\geq\varepsilon_{s}*\varepsilon_{c}$ is realized when $\varepsilon=\varepsilon_{s}*\varepsilon_{c}$ and $\varepsilon_{s}=\varepsilon_{c}$ . Solving the equation $\varepsilon=\varepsilon_{s}*\varepsilon_{s}$ , we have

[TABLE]

So, the combination of (187) and (188) yields (186).

Appendix C Proof of Lemma 10

It is sufficient to show the case with $v_{1}=1$ . We set the function

[TABLE]

That is, it is sufficient to show that the minimum $\min_{s}f(s)$ is realized when $s=\frac{R}{2}$ because $\Phi(\frac{R}{2})*\Phi(\frac{R}{2})$ equals the RHS of (151).

Calculating the derivative, we have

[TABLE]

The function $x\mapsto\Phi^{\prime}(x)\Phi(x-R)$ is a monotonically increasing function for $x<0$ . So, we find that $\frac{df(s)}{ds}\leq 0$ for $s\in[R,\frac{R}{2}]$ and $\frac{df(s)}{ds}\geq 0$ for $s\in[\frac{R}{2},0]$ . Further, when $s<R$ , $|s|>|R-s|$ , which implies that $\Phi^{\prime}(s)<\Phi^{\prime}(R-s)$ . In this case, we have $s-R<-s$ , which implies $\Phi(s-R)<\Phi(-s)$ . So, we obtain $\Phi^{\prime}(s)\Phi(s-R)-\Phi^{\prime}(R-s)\Phi(-s)<0$ , i.e., $\frac{df(s)}{ds}<0$ . Similarly, when $s>0$ , we can show the inequality $\frac{df(s)}{ds}>0$ . Therefore, the minimum $\min_{s}f(s)$ is realized when $s=\frac{R}{2}$ .

Appendix D Proof of Lemma 3

First, we set

[TABLE]

For each $(m,x)\in({\cal M,X})$ , we define

[TABLE]

Also, for decoder $\varphi$ and each $m\in{\cal M}$ , we define

[TABLE]

In addition, we define $P_{X|M}$ so that

[TABLE]

Using this notation, we define

[TABLE]

Then,

[TABLE]

The last equality follows since the error probability can be written as

[TABLE]

We notice here that

[TABLE]

for $y\in{\cal B}(m,x)$ . By substituting this into (196), the first term of (196) is

[TABLE]

which implies (19).

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] V. Strassen, “Asymptotische Abschätzugen in Shannon’s Informationstheorie,” In Transactions of the Third Prague Conference on Information Theory etc, Czechoslovak Academy of Sciences, Prague, pp. 689-723, 1962.
2[2] M. Hayashi, “Second-Order Asymptotics in Fixed-Length Source Coding and Intrinsic Randomness,” IEEE Trans. Inf. Theory , vol. 54, no. 10, 4619 – 4637 (2008).
3[3] M. Hayashi, “Information Spectrum Approach to Second-Order Coding Rate in Channel Coding,” IEEE Trans. Inf. Theory , vol. 55, no. 11, 4947–4966 (2009).
4[4] Y. Polyanskiy, H.V. Poor, and S. Verdú, “Channel coding rate in the finite blocklength regime,” IEEE Trans. Inf. Theory , vol. 56, no. 5, 2307 – 2359 (2010).
5[5] M. Hayashi and S. Watanabe, “Finite-Length Analyses for Source and Channel Coding on Markov Chains,” ar Xiv:1309.7528 (2013).
6[6] D. Wang, A. Ingber, and Y. Kochman, “The Dispersion of Joint Source-Channel Coding,” Proc. 49th Annual Allerton Conf. , Allerton House, Monticello, IL, USA, 2011, pp. 180 - 187.
7[7] D. Wang, A. Ingber, and Y. Kochman, “The Dispersion of Joint Source-Channel Coding,” ar Xiv: 1109.6310 (2011).
8[8] V. Y. F. Tan, S. Watanabe, and M. Hayashi “Moderate Deviations for Joint Source-Channel Coding of Systems With Markovian Memory”, Proceedings of 2014 IEEE International Symposium on Information Theory , June 29 - July 4 2014, Honolulu, HI, USA, pp. 1687 - 1691.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Second Order Analysis for Joint Source-Channel Coding with Markovian Source

Abstract

Index Terms:

I Introduction

II Notations and Information quantities

II-A Single shot

II-B Markovian process

Definition 1** (non-hidden).**

Proposition 2** (Central limit theorem for Markovian Process ([22]etc.)).**

III New Probability Distribution Families

III-A Switched Gaussian convolution distribution

III-B ∗*∗-product distribution

Lemma 1**.**

Remark 3**.**

IV Single Shot Setting

IV-A Problem formulation

IV-B Direct part

IV-B1 General case

Proposition 4**.**

Corollary 1**.**

IV-B2 Conditional additive case

Lemma 2**.**

Proof.

IV-C Converse part

IV-C1 General case

Lemma 3**.**

Remark 5**.**

IV-C2 Conditional additive case

Lemma 4**.**

Proof.

V nnn-fold Markovian conditional additive channel

V-A Formulation for general case

V-B Formulation for Markovian conditional additive channel

V-C Second order analysis

Theorem 1**.**

Corollary 2**.**

Proof.

VI nnn-fold Discrete Memoryless Channel (DMC) case

VI-A Formulation and notations

VI-B Second order analysis and comparison

Theorem 2**.**

Lemma 5**.**

Proof.

VI-C Direct part

VI-D Converse part

VII The Comparison between Joint and Separation Scheme

VII-A Formulation for separation coding

Lemma 6**.**

Proof.

Remark 6**.**

VII-B Second order analysis

VII-B1 Conditional additive channel case

Theorem 3**.**

Remark 7**.**

Proof.

VII-B2 Discrete memoryless channel case

Theorem 4**.**

Remark 8**.**

Proof.

VII-C Comparison

Lemma 7**.**

Proof.

Lemma 8**.**

Remark 9**.**

VIII Discussion

Acknowledgments

Appendix A Proof of Lemma 1

Lemma 9**.**

Lemma 10**.**

Appendix B Proof of Lemma 9

Appendix C Proof of Lemma 10

Appendix D Proof of Lemma 3

Definition 1 (non-hidden).

Proposition 2 (Central limit theorem for Markovian Process ([22]etc.)).

III-B $$ -product distribution*

Lemma 1.

Remark 3.

Proposition 4.

Corollary 1.

Lemma 2.

Lemma 3.

Remark 5.

Lemma 4.

V $n$ -fold Markovian conditional additive channel

Theorem 1.

Corollary 2.

VI $n$ -fold Discrete Memoryless Channel (DMC) case

Theorem 2.

Lemma 5.

Lemma 6.

Remark 6.

Theorem 3.

Remark 7.

Theorem 4.

Remark 8.

Lemma 7.

Lemma 8.

Remark 9.

Lemma 9.

Lemma 10.