Second Order Analysis for Joint Source-Channel Coding with Markovian Source
Ryo Yaguchi, Masahito Hayashi

TL;DR
This paper derives second order rates for joint source-channel coding with Markovian sources and compares it to separation schemes, introducing new distribution families to facilitate analysis.
Contribution
It extends second order analysis to general Markov sources and channels, and introduces new distribution families for this purpose.
Findings
Derived second order rates for Markov sources and DMC channels.
Compared joint source-channel coding with separation scheme in second order regime.
Introduced switched Gaussian convolution and *-product distributions.
Abstract
We derive the second order rates of joint source-channel coding, whose source obeys an irreducible and ergodic Markov process when the channel is a discrete memoryless, while a previous study solved it only in a special case. We also compare the joint source-channel scheme with the separation scheme in the second order regime while a previous study made a notable comparison only with numerical calculation. To make these two notable progress, we introduce two kinds of new distribution families, switched Gaussian convolution distribution and *-product distribution, which are defined by modifying the Gaussian distribution.
| general case | conditional additive | |
| message | ||
| input | ||
| output variable | ||
| channel | ||
| encoder | ||
| decoder | ||
| distribution | ||
| of message | ||
| decoding error | ||
| probability |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWireless Communication Security Techniques · Algorithms and Data Compression · Cellular Automata and Applications
Second Order Analysis for Joint Source-Channel Coding with Markovian Source
Ryo Yaguchi and Masahito Hayashi The material in this paper will be presented in part at the 2017 IEEE International Symposium on Information Theory (ISIT 2017), Aachen (Germany), 25-30 June 2017.Ryo Yaguchi was with the Graduate School of Mathematics, Nagoya University, Furocho, Chikusaku, Nagoya, 464-860, JapanMasahito Hayashi is with the Graduate School of Mathematics, Nagoya University, Furocho, Chikusaku, Nagoya, 464-860, Japan, and Centre for Quantum Technologies, National University of Singapore, 3 Science Drive 2, Singapore 117542. (e-mail: [email protected])
Abstract
We derive the second order rates of joint source-channel coding, whose source obeys an irreducible and ergodic Markov process when the channel is a discrete memoryless, while a previous study solved it only in a special case. We also compare the joint source-channel scheme with the separation scheme in the second order regime while a previous study made a notable comparison only with numerical calculation. To make these two notable progress, we introduce two kinds of new distribution families, switched Gaussian convolution distribution and -product distribution, which are defined by modifying the Gaussian distribution.
Index Terms:
Markov chain, second order, joint source-channel coding, separation scheme
I Introduction
Nowadays, second order analysis attracts much attention in information theory [1, 2, 3, 4, 6]. In this type of analysis, we focus on the second leading term with the order in the coding length in addition to the first leading term with the order when the block length is . To discuss the finiteness of the blocklength, we need to be careful for the second leading term as well as the first leading term. The coefficient of the order is given as the inverse of the cumulative distribution function of the Gaussian distribution depending on the decoding error probability in many existing studies for the second order except for the papers [13, 14]. This is because the second order analysis is deeply rooted in the central limit theorem. In channel coding, the second order coefficient is given by the Gaussian distribution, whose variance is given as the variance of the information density. Here, the information density is given as the logarithm of the likelihood ratio between the joint distribution of the input and output random variable and their product distribution when the expectation of the logarithm of the likelihood ratio achieve the channel capacity. However, the variance of the information density is not unique, in general because multiple input distributions attain the channel capacity in general. So, in such a general case, the variance of the Gaussian determining the second order coefficient is chosen depending on the sign of the decoding error probability . Recently, the two papers [5, 15] extended the second order analysis to the Makovian case, in which, the Markovian version of the central limit theorem is employed instead of the conventional central limit theorem. In particular, the paper [5] discussed source coding for Markovian source and channel coding for additive channel whose additive noise is Markovian. Also, Kontoyiannis and Verdú, [23] discussed the variable-length source coding in a similar setting.
Usually, the channel coding is discussed with the message subject to the uniform distribution. However, in the real communication, the message is not necessarily subject to the uniform distribution. To resolve this problem, we often consider the channel coding with the message subject to the non-uniform distribution. Such a problem is called source-channel joint coding and has been actively studied by several researchers [12, 10, 11, 6, 9, 8]. As a simple case, we often assume that the message is subject to the independent and identical distribution. In this case, the capacity is given as the ratio of the conventional channel capacity to the entropy of the message. Several studies [12, 10, 11] derived the exponential decreasing rate of the decoding error probability in this setting. Recently, while Wang-Ingber-Kochman [6] and Kostina-Verdú [9, 24] discussed the second-order coefficient in this problem, two major open problems has been remained in this topic as follows. Wang-Ingber-Kochman [6] derived the second order coefficient only when the variance of the information density is unique. When the variance is not unique, Kostina-Verdú [24] extended it to the lossy case. Kostina-Verdú [9] extended the lower bound of the second-order coefficient by the same method as [6]. However, the impossibility to improve the bound has been an open problem in the general case. Also, in the above special case, Wang-Ingber-Kochman [6] compared their second order coefficient of the joint scheme with that with the separation scheme. Based on their numerical calculation, they conjectured an inequality for the loss of the separation scheme [7], whose analytical proof has been remained as another open problem.
In this paper, we tackle both open problems. Firstly, we derive the second-order coefficient in this problem. The obtained coefficient is strictly larger than that by Kostina-Verdú [9] when the variance of the information density is not unique. To characterize the second-order coefficient, we introduce a new probability distribution as a generalization of the Gaussian distribution. That is, the second-order coefficient is given as the inverse of the cumulative distribution function of the new probability distribution. Further, we derive this result even when the distribution of the message is Markovian. Secondly, we discuss the second order coefficient with the separation scheme in the above general setting. Also, we analytically determine the range of the ratio between the error probabilities with the joint and separation schemes when the variance of the information density is unique. In this way, we resolve both open problems.
The remaining part of this paper is organized as follows. In Section II, we prepare several information quantities for Markovian process. Section III introduces two new distribution families. In Section IV, we discuss the joint source-channel coding in the single shot setting. Then, Section V shows our results for Markovian conditional additive channel. discusses the second order rate. Section VI discusses the case with discrete memoryless channel. In Section VII, we compare the joint source-channel scheme with the separation scheme.
II Notations and Information quantities
II-A Single shot
In this paper, we denote the random variable by a capital letter, e.g., . By , we denote the set that the random variable takes values in. Then, we denote the distribution of the random variable by . When we have two distributions and , we define their product distribution as .
When we have two different sets and , we denote a transition matrix from to by . Then, we define the distribution as . When is the same set as , we do not describe the subscript . In this case, we define the transition matrix on as . A transition matrix on is called irreducible when for each , there exists a natural number such that . An irreducible matrix is called ergodic when there are no input and no integer such that unless is divisible by .
II-B Markovian process
Since this paper addresses the Markovian processes, we prepare several information measures given in [5] for an ergodic and irreducible transition matrix on . For this purpose, we employ the following assumption on transition matrices, which were introduced by the paper [5].
Definition 1** (non-hidden).**
When an ergodic and irreducible transition matrix satisfies the condition
[TABLE]
for every and , it is canned non-hidden (with respect to ).
For example, when the cardinality of is , the above non-hidden condition holds. For a non-hidden transition matrix on with respect to , we define the marginal by . In the following, we assume the non-hidden condition. By , we denote the Perron-Frobenius eigenvalue of
[TABLE]
for a real number . Then, we define the conditional Rényi entropy for the transition matrix [5] as
[TABLE]
which is often called the lower type of conditional Rényi entropy and is denoted by in [5].
Taking the limit , we define the entropy for the transition matrix as
[TABLE]
To discuss the difference of from , we introduce the varentropy for the transition matrix as
[TABLE]
So, we have the approximation as as . In these definitions, when the output distribution of does not depend on the input element, the quantities , , and are the same as the conventional definitions. Then, we have the following proposition.
Proposition 2** (Central limit theorem for Markovian Process ([22]etc.)).**
When and are subject to the Markovian process generated by a non-hidden transition matrix , the random variable asymptotically obeys the Gaussian distribution with variance 111There are so many literatures for central limit theorem for Markovian Process. The paper [19, Corollary 7.2.] gives its very elementary proof. It also summarizes existing approaches for this statement..
III New Probability Distribution Families
III-A Switched Gaussian convolution distribution
To describe the second order rate in the joint source-channel coding, we introduce a new type of distribution family, so called switched Gaussian convolution distributions. It is known that the convolution of two Gaussian distributions is also a Gaussian distribution as follows. When is the probability density function of the Gaussian distribution with average [math] and variance , we have
[TABLE]
Now, we consider the case when the variance of the second probability density function is switched at . So, we define the function as
[TABLE]
where and . Taking the integral with respect to , we define the function , which satisfies
[TABLE]
where . We simplify to when .
Since the value goes to math as goes to (), respectively, the RHS of (8) goes to math as goes to (), respectively, Also, the value is monotonically increasing with respect to , the RHS of (8) also is monotonically increasing with respect to . These facts show that is the cumulative distribution function of a probability distribution. In the following, we call this distribution the switched Gaussian convolution distribution with , and .
To see the behavior of the distribution function of the switched Gaussian convolution distribution, we set , and change the third parameter . Then, we obtain the graph given in Fig. 1. From the definition, we find that the maximum is realized when . Fig. 1 shows how much decreases unless .
III-B -product distribution
Now, given two parameter , we define another probability distribution. For this purpose, we define the function as
[TABLE]
where the product is defined as
[TABLE]
So, the inverse function is given as
[TABLE]
Since the function satisfies the condition of the cumulative distribution function, it can be regarded as the cumulative distribution function of another probability distribution. We call it -product distribution because it is defined based on the product.
The cumulative distribution function has the following property.
Lemma 1**.**
For any , we have
[TABLE]
The equality in the first inequality is attained if and only if is [math] or . When , the equality of the second inequality is attained if and only if .
Lemma 1 is shown in Appendix A. The functions in Lemma 1 are numerically compared in Fig. 2. When , we also numerically checked that the equality of the second inequality holds even for . Overall, the cumulative distribution functions of this paper are summarized in Table I.
Remark 3**.**
The paper [6, Section V] considered the function , and gave the same statement as the second inequality in (11) with the condition in a difference form as a conjecture based on numerical calculations. This conjecture had been an open problem.
IV Single Shot Setting
IV-A Problem formulation
We first present the problem formulation by the single shot setting. Assume that the message takes values in and is subject to the distribution . For a channel with input alphabet and output alphabet , a channel code consists of one encoder and one decoder . The average decoding error probability is defined by
[TABLE]
For notational convenience, we introduce the smallest attainable decoding error probability under the above condition:
[TABLE]
IV-B Direct part
IV-B1 General case
We introduce several lemmas for the case when is the set of messages to be sent, is the distribution of the messages, and is the channel from to . We have the following single-shot lemma for the direct part.
Proposition 4**.**
[16, Lemma 3.8.1]** For any constant and for any , there exists a code such that
[TABLE]
where and .
From above Proposition, we obviously have the following corollary.
Corollary 1**.**
[TABLE]
IV-B2 Conditional additive case
Now, we proceed to the case when the channel is conditional additive. Assume that is a module and is given as . Here, is called the internal state. Then, the channel is called conditional additive [5] when there exists a joint distribution such that
[TABLE]
We summarize the relation between general case and conditional additive case as Table II.
Then we simplify (15) of Corollary 1 to the following lemma.
Lemma 2**.**
A conditional additive channel satisfies the inequality
[TABLE]
Proof.
By setting that is the uniform distribution and choosing the random variables and to the right hand side of (15), we have
[TABLE]
where . Hence, (15) can be simplified to
[TABLE]
∎
IV-C Converse part
IV-C1 General case
Firstly, combining the idea of meta converse [20][21, Lemma 4][4] and the general converse lemma for the joint source and channel coding [16, Lemma 3.8.2], we obtain the following lemma for the single shot setting. The following lemma is the same as [16, Lemma 3.8.2] when is .
Lemma 3**.**
For any constant , any code and any distribution on , we have
[TABLE]
Remark 5**.**
The paper [24, Theorem 1] gives a similar statement with slightly different terminology. To readers’ convenience, we give its proof in Appendix D.
IV-C2 Conditional additive case
Now, we proceed to the conditional additive case given in (16), in which, is given as . Applying (19) to the conditional additive case, we obtain the following lemma.
Lemma 4**.**
The inequality
[TABLE]
holds for any .
Proof.
We choose as
[TABLE]
to (19). Then, the first term of the right hand side of (20) is
[TABLE]
So, we obtain (20). ∎
V -fold Markovian conditional additive channel
V-A Formulation for general case
Firstly, we give general notations for channel coding when the message obeys Markovian process. The formulation presented in this subsection will be applied even to the next section. We assume that the set of messages is . Then, we assume that the message is subject to the Markov process with the transition matrix . We denote the distribution for by .
Now, we consider very general sequence of channels with the input alphabet and the output alphabet . In this case, the transition matrix as . Then, a channel code consists of one encoder and one decoder . Then, the average decoding error probability is defined by
[TABLE]
For notational convenience, we introduce the error probability under the above condition:
[TABLE]
When there is no possibility for confusion, we simplify it to . Instead of evaluating the error probability for given , we are also interested in evaluating
[TABLE]
for given .
V-B Formulation for Markovian conditional additive channel
In this section, we address an -fold Markovian conditional additive channel [5]. That is, we consider the case when the joint distribution for the additive noise obeys the Markov process. To formulate our channel, we prepare notations. Consider the joint Markovian process on . That is, the random variables and are assumed to be subject to the joint Markovian process defined by the transition matrix . We denote the joint distribution for and by . Now, we assume that is a module, and consider the channel with the input alphabet and the output alphabet . The transition matrix for the channel is given as
[TABLE]
for and . Also, we denote by . In this case, we denote the average error probability and the minimum average error probability by and , respectively. Then, we denote the maximum size by . When we have no possibility for confusion, we simplify them to by , , and , respectively.
In the following discussion, we assume the non-hidden condition for the joint Markovian process described by the transition matrix . Under the non-hidden condition, the paper [5] shows the single-letterized channel capacity to be . Among author’s knowledge, the class of channels satisfying the non-hidden condition is the largest class of channels whose channel capacity is known. When is singleton and the channel is the noiseless channel given by identity transition matrix , our problem becomes the source coding with Markovian source. In this case, the memory size is equal to the cardinality , and we simplify the smallest attainable decoding error probability to .
V-C Second order analysis
Theorem 1**.**
For any , it holds that
[TABLE]
In other words,
[TABLE]
Theorem 1 yields the following corollary.
Corollary 2**.**
For , we have
[TABLE]
Proof.
It is sufficient to show
[TABLE]
when is chosen as
[TABLE]
By choosing , (17) implies that
[TABLE]
Applying Proposition 2 to the random variables and , we find that
the random variable converges to the Gaussian random variable with variance . Since and , we see that the RHS of (30) goes to , which implies that
[TABLE]
By choosing , (20) implies that
[TABLE]
Since , the above application of Proposition 2 implies
[TABLE]
The combination of (31) and (33) implies (28). ∎
Similar to the above two cases, we can recover the result of data compression with the second order regime.
VI -fold Discrete Memoryless Channel (DMC) case
VI-A Formulation and notations
In this section, we address the -fold discrete memoryless channel with the input system and the output system Hence, we adopt the same assumptions given in Section V for the message source. The difference from Section V is the form of channel. Given a transition matrix , the transition matrix for the channel is given as
[TABLE]
where and .
In this case, we denote the average error probability and the minimum average error probability by and , respectively. Then, we denote the maximum size by . When we have no possibility for confusion, we simplify them to , , and , respectively.
For the latter discussion, we prepare the mutual information as
[TABLE]
where . Then, we define its variance version as
[TABLE]
and we also define the channel capacity . Also, we define the maximum and minimum variances
[TABLE]
and the distribution achieving above maximum and minimum as
[TABLE]
VI-B Second order analysis and comparison
Using the switched Gaussian convolution distribution , we derive the second order coding rate in the following Theorem.
Theorem 2**.**
For any , we have
[TABLE]
where
[TABLE]
In other words, we have
[TABLE]
The direct and converse parts will be shown in Subsections VI-C and VI-D. The paper [6] discussed the same problem when the message is subject to the independent and identical distribution and the relation holds. When the condition holds, \Psi\Big{[}\frac{C}{H^{W_{s}}(M)}V^{W_{s}}(M),V^{*}_{+}(W_{Y|X}),V^{*}_{-}(W_{Y|X})\Big{]}^{-1}(\varepsilon) becomes .
When the message is subject to the independent and identical distribution, as a simple generalization of the direct part of [6], Kostina-Verdú [9] showed the inequality
[TABLE]
where is defined as
[TABLE]
Hence, we call the bound Kostina-Verdú bound even for a general Markovian source with a transition matrix . As a comparison between our tight bound and Kostina-Verdú bound , we obtain the following lemma.
Lemma 5**.**
The ratio is evaluated as
[TABLE]
The equality of the first inequality is attained if and only if or . The equality of the second inequality is attained if and only if and go to and [math], respectively.
This lemma shows that a gap between and produces a non-negligible effect for joint source-channel coding when the source is non-uniform. Fig. 3 gives a numerical calculation of the ratio .
Proof.
The property (8) implies the first inequality. The equality condition for the first inequality follows from the form of the switched Gaussian convolution distribution given in (8).
To show the second inequality, we introduce the notation with variance as:
[TABLE]
For any , we find that is monotonically increasing function of , and is monotonically decreasing function of . Additionally, we define
[TABLE]
For , we have
[TABLE]
where follows from , , and .
For , we have
[TABLE]
where follows from , , and . The quality condition of the second inequality follows from the equality conditions of and . ∎
VI-C Direct part
To show the direct part of Theorem 2, we invent a novel random coding method because the existing random coding method cannot attain the bound . To attain the bound , we need to choose the distribution on deciding the random coding depending on the message to be sent. Hence, we divide the set of messages into two sets, and we decide our code depending on the set the message belongs to. To realize this type code, we employ a code composed of two parts. The first part informs which set the message belongs to. The second part sends which element of the chosen set to be transmitted. Using Proposition 4, we show that this code attains the bound .
Step(0): First, we prepare several notations, some of which are used throughout this proof including the converse part. We simplify as and as . So, is a random variable on . We choose the integer as
[TABLE]
Then, we define the following random variables.
[TABLE]
Step (i): In this step, we describe our code used in this proof. This code consists of two parts as follows. In the first part, the sender tells the receiver whether or . In the second part, they communicate each other by using the code depending on the result of the first part.
Now, we give the first part, in which, the message size is . So, we use only transmission of the channel for the first part. That is, the first is the code to tell whether or not. Assume that contains elements [math] and . To give the first part, we define the encoder as
[TABLE]
The decoder is defined as
[TABLE]
Then, we denote the error probability of the code by , which is represented as
[TABLE]
Note that because .
As the second part, we define the code to send the massage based on the information transmitted in the first part. We use transmissions of the channel in the second part, where . Then, (58) implies that
[TABLE]
Using Proposition 4, we define the code so that
[TABLE]
where is the conditional probability distribution of under the condition of . On the other hands, from Proposition 4, we define a code so that
[TABLE]
where is the conditional probability distribution of under the condition of . In both cases, is chosen to be .
Using the above preparation, we define the code for whole protocol as follows. Then, for the encoder, we define as
[TABLE]
Also we define the decoder as
[TABLE]
Step (ii): In this step, we will prove that
[TABLE]
On the code , an error happens if an error occurs on the code , or an error doesn’t occur on the code and an error occurs on the code . Since , the error probability of the code , i.e., , is evaluated as
[TABLE]
When , . So, applying the central limit theorem for Markovian process (Proposition 2) to random variable , we have
[TABLE]
which implies Since and , due to (62), we can rewrite (63) as:
[TABLE]
On the other hands, when , we have . So, applying the central limit theorem for Markovian process to random variable , we obtain
[TABLE]
which implies So, we can rewrite (64) as:
[TABLE]
Combining (72), (73) and (74), we obtain (71).
Step (iii): In this step, we will prove that
[TABLE]
which implies
[TABLE]
for the integer given in (58).
Applying the central limit theorem for Markovian process (Proposition 2), we find the following facts. Under the distribution , the random variable asymptotically obeys the Gaussian distribution with mean [math] and variance . Under the distribution , the random variable asymptotically obeys the Gaussian distribution with mean [math] and variance . Under the distribution , the random variable asymptotically obeys the Gaussian distribution with mean [math] and variance . Hence, taking the limit , we obtain
[TABLE]
which implies (75).
VI-D Converse part
To show the converse part, we apply (19) of Lemma 3 to the case with the distribution given in Step (i), which can be regarded as an extension of the idea of the paper [2] to the joint scheme. Then, we apply the central limit theorem for Markovian process (Proposition 2) to the two random variables related to the dispersions of channel and source. Since we treat two Gaussian random variables, the asymptotic error probability is lower bounded by the convolution of two Gaussian distributions. However, since the variance of the dispersions of channel is not unique, in general, we need to take the minimum for the Gaussian distribution function. Hence, the asymptotic error probability is lower bounded by the switched Gaussian convolution distribution.
Step (i): In this step, to show the converse part, we prepare several notations. We choose the message block length so that
[TABLE]
We denote that . We focus on the set of empirical distributions with channel inputs. Its cardinality is evaluated as . And in this proof, we use the distribution
[TABLE]
where
[TABLE]
We also define the sets
[TABLE]
where of (82) is empirical distribution function of .
Step (ii): We set the real number to be . Since , by substituting , (19) of Lemma 3 implies that
[TABLE]
For arbitrary , the first term of right hand side is evaluated as
[TABLE]
Step (iii): For the second term of (84), we will show the following fact: Given an arbitrary small real number , there exists a sufficiently large such that
[TABLE]
for and .
When ,
[TABLE]
[TABLE]
where and denote the expectation and the variance under the distribution . Thus, when , by using Chebyshev inequality, we obtain
[TABLE]
For sufficiently large , we have
[TABLE]
Since the value
[TABLE]
asymptotically goes to , we obtain (85).
Step (iv): For the second term of (84), we will show the following fact:
Given an arbitrary small real number , there exists a sufficiently large such that
[TABLE]
for and .
Now, to evaluate the variance of some random variable later, we define the quantity
[TABLE]
When , the inequality
[TABLE]
holds. Since the random variable
[TABLE]
has the variance , applying the central limit theorem, we have
[TABLE]
for sufficiently large . Because is a monotonicity increasing function and the inequalities
[TABLE]
holds, the condition implies
[TABLE]
and the other condition implies
[TABLE]
Hence, we obtain (90).
Step (v) : We will show the following fact: Given an arbitrary small real number , there exists a sufficiently large such that
[TABLE]
where , for and .
Combining (85) and (90), for sufficiently large , we obtain
[TABLE]
Step (vi): We will show the following fact: Given an arbitrary small real number , there exist sufficiently large numbers , and such that
[TABLE]
for .
From the central limit theorem for Markov sequence (Proposition 2), random variable asymptotically obeys Gaussian distribution with mean [math] and variance i.e.,
[TABLE]
With the limit , we have
[TABLE]
So, taking the limit , we have
[TABLE]
When , we can compute (102) as:
[TABLE]
Furthermore, when ,
[TABLE]
So, we obtain (99).
Step (vii): Since are arbitrary, the combination of Steps (iv) and (v) yields
[TABLE]
VII The Comparison between Joint and Separation Scheme
VII-A Formulation for separation coding
In this section, we compare the performance of the joint scheme with the performance of the separation scheme. To discuss the separation scheme, we formulate a separation encoder and a separation decoder. Firstly, we fix the input and output coding-lengths to be and . Then, we need to consider the encoded set of source coding, which is also the message set of the channel coding. Since the channel encoder does not know the source distribution, it is natural to consider the average case with respect to the permutation on the set . To handle such a permutation, we focus on the following triplet;
- •
A source encoder .
- •
A source-channel mapping .
- •
A channel encoder .
Then, our separation encoder is given as . The source-channel mapping is a random variable subject to the uniform distribution on the set of permutations on the set . To discuss the separation decoder, we consider
- •
A source decoder .
- •
The inverse of the source-channel mapping
- •
A channel decoder .
So, our separation decoder is given as . That is, our separation code is composed of .
Here, the source code has the source coding rate
[TABLE]
and the channel code has the channel coding rate
[TABLE]
Then, the decoding error probability of the code is given as the probability that the error occurs in the source coding or the channel coding. Hence, the decoding error probability is defined as
[TABLE]
Since the source-channel mapping takes the value in the permutation on the set subject to the uniform distribution, it is natural to take the average with respect to the choice of . Hence, the value is defined as the average of with respect to this choice;
[TABLE]
Let be the decoding error probability of the source code , and let be the decoding error probability of the channel code with the message subject to the uniform distribution. Then, we have the following lemma.
Lemma 6**.**
The average is calculated as
[TABLE]
Proof.
From (107), we have
[TABLE]
The second term of (109) can be calculated as follows.
[TABLE]
Combining (109) and (110), we have
[TABLE]
∎
Under the fixed input and output coding-lengths and , we minimize the above value as
[TABLE]
Here, since
[TABLE]
we have
[TABLE]
Note that for any two real numbers and ,
[TABLE]
Considering the minimum with given value , we have
[TABLE]
Hereafter, we note the coding rate of the separation scheme as . Additionally, we define
[TABLE]
Remark 6**.**
Many existing papers [7, 11, 8] discussed the separation scheme, and they focused on the value . However, they did not give a rigorous derivation of this value. The contribution of this subsection is derivation of this value from the formulation given here, which is rigorously shown as Lemma 6.
VII-B Second order analysis
VII-B1 Conditional additive channel case
In this section, we evaluate the second order rate of the separation scheme. Using the -product distribution , we have the following theorem for a conditional additive channel given by the transition matrix .
Theorem 3**.**
The optimal transmission length is asymptotically expanded as
[TABLE]
In other words,
[TABLE]
where
[TABLE]
Remark 7**.**
This theorem is an extension of the existing result [6, Section V] to the case with Markovian source and a conditional additive channel.
Proof.
We assume that and the intermediate set size of the separation code is . If and then .
The channel and source coding theorems for the Markovian case with the second order [5, Theorems 10 and 21] guarantee the following relations
[TABLE]
Hence, we have
[TABLE]
Since ,
[TABLE]
Optimizing the chose of , we have
[TABLE]
Hence, we have
[TABLE]
∎
VII-B2 Discrete memoryless channel case
Using the -product distribution, we evaluate the second order rate of separation coding in the discrete memoryless channel case.
Theorem 4**.**
For the discrete memoryless channel give by a transition matrix W, we have
[TABLE]
where
[TABLE]
Remark 8**.**
The paper [6, section V] showed the same statement with the assumption and the source is independent and identical distribution. Our contribution is removing the first assumption and generalizing it to Markovian source.
Proof.
We find that
[TABLE]
We assume that and intermediate set size of separation code is . If and then . The channel coding theorem with the second order [1, 3, 4] (Theorem 2 with uniform message of size ) guarantees that
[TABLE]
Combining (121) and (130), we obtain
[TABLE]
because . So, we have
[TABLE]
∎
VII-C Comparison
Here, we compare the optimal error probability and the error probability of the separation scheme. Since this comparison is based on the capacity , the source entropy rate , the source variance , and the channel variance, the analysis of the conditional additive channel case can be done as the same was as the analysis of the discrete memoryless channel case. So, we discuss only the discrete memoryless channel case.
First, we compare the separation bound with the Kostina-Verdú bound defined in (46), which is still not the tight bound in the joint source-channel scheme. The property (11) implies the inequality
[TABLE]
Here, the equality is attained if and only if , , or . When , there is no information to be transmitted. When , we cannot make any information transmission. These two cases do not occur in a realistic case. When , the distribution of the message source is uniform, which is not discussed in the joint source-channel coding. So, we conclude that the separation scheme always has a larger decoding error probability than the joint source-channel scheme.
As the opposite evaluation, we have the following lemma.
Lemma 7**.**
We have
[TABLE]
where
[TABLE]
Under the conditions and , the equality holds if and only if .
Proof.
When , we have
[TABLE]
So, the inequality (11) of Lemma 1 implies (133). We can show this inequality in the case of .
When and , Lemma 1 guarantees that the equality holds if and only if . ∎
When the variance of the information density is unique, i..e, , Lemma 7 analytically determines the range of the ratio between the error probabilities with the joint and separation schemes. For the general case, combining Lemmas 5 and 7, we obtain the following lemma.
Lemma 8**.**
We have
[TABLE]
where .
Remark 9**.**
The paper [7, Section V] discussed a similar comparison as Lemma 7 when the source is subject to an independent and identical distribution and . Although the paper [7, Section V] conjectured a similar statement as Lemma 1 via numerical calculation, they did not show it. Hence, they could not analytically determine the range of the ratio between the error probabilities with the joint and separation schemes even when .
VIII Discussion
We have discussed the source-channel joint coding with the second order regime. We have two open problems in this area. One is the complete derivation of the second order coding rate in the general discrete memoryless case. In this case, when the maximum and minimum variances has the same value, the second order coding rate was derived by the paper [6, 7]. However, the general case had been remained as an open problem while a lower bound was obtained by Kostina and Verdú [9]. Our optimal rate is strictly better than the lower bound by [9]. To achieve such a better rate, we have invented a new random coding method, in which, the distribution of the input alphabet is chosen according to the generation probability of the message. Since the generation probability depends on the message in the joint coding regime, this improvement is very effective. This coding method can be expected to another problem. The second contribution is the derivation of the range of the ratio between the second order error probabilities of the joint and separation schemes. The paper [7] derived an upper bound only by numerical calculation. We have showed this conjecture analytically. Further, we have given a rigorous formulation for the separation coding in Subsection VII-A while the error probability given in the RHS of (107) was used in many previous studies without rigorous derivation.
To obtain both main contributions, we have newly introduced two distribution families in Section III. One is switched Gaussian convolution distributions and the other is -product distribution. Both distributions are defined as modifying the Gaussian distribution. We have derived the notable relations among the cumulative distribution functions of these distributions and the Gaussian distribution. The second contribution has been obtained from this kind of relations. Since these new distributions have operational meaning in this way, we can expect that they will be applied to topics in information theory and related areas.
Acknowledgments
MH is very grateful to Professor Vincent Y. F. Tan and Professor Shun Watanabe for helpful discussions and comments. The works reported here were supported in part by JSPS Grants-in-Aid for Scientific Research (B) No. 16KT0017 and (A) No.17H01280, the Okawa Research Grant and Kayamori Foundation of Informational Science Advancement.
Appendix A Proof of Lemma 1
Step (i): In this step, we prove the first inequality of (11). Assume that . Let and be Gaussian random variables with mean [math] and variance and , respectively. They are assumed to be independent of each other. For a given real number , we have
[TABLE]
On the other hands, since is a Gaussian random variable with variance , we have
[TABLE]
Because , we have
[TABLE]
which implies that
[TABLE]
Taking the maximum with respect to , we have
[TABLE]
Further, when or is zero, or or is infinity, the equality holds in (146).
Step (ii): In this step, we show the second inequality in (11), and its equality condition. For the proof, we define the new function for and as:
[TABLE]
Using this function, we can rewrite function as:
[TABLE]
where . Hence, the second inequality in (11) and its equality condition follow from the following two lemmas.
Lemma 9**.**
For any , it holds that
[TABLE]
Hence, we obtain . So, we have , which implies that
[TABLE]
Due to the equality condition in Lemma 9, the equality holds in (150) only when . Conversely, we have the following lemma.
Lemma 10**.**
When , the equality
[TABLE]
holds.
Hence, we can see that the equality holds in (150) if and only if . Lemma 9 is shown in Appendix B, and Lemma 10 is shown in Appendix C.
Appendix B Proof of Lemma 9
It is sufficient to show the following two statements. (1) For any , we have
[TABLE]
(2) The maximum in (152) is realized only when . Under this condition, the infimum is realized only when .
The statement (1) implies (150), and the statement (1) implies the necessarily condition for the equality in (150).
Step (i): In this step, we will show the following relation for .
[TABLE]
Hence, it is sufficient to show that
[TABLE]
We rewrite the LHS of (160) as
[TABLE]
where is inner product of vector. The inside of the RHS of (165) is calculated as
[TABLE]
Since , either or is negative. Hereafter, we will consider the maximum value of (174) under the condition .
When , we have , which implies (160). So, we consider the case when , which has the above three cases. First, we consider the case when and . Then, we have
[TABLE]
That is, the maximum value is attained when and .
We obtain the same equation in the case when and . Hence, we find that that the maximum of the RHS of (174) equals , which implies (154).
Step (ii): In this step, when , we will show the following equation. Also we will show that the following maximum is realized if and only if . Since the discussion of Step (i) shows that Under this condition, the infimum is realized only when . These discussions show the desired statements (1) and (2) with .
[TABLE]
For notation, we define the function for and as:
[TABLE]
Then, we can find that
[TABLE]
and is monotonically decreasing function of . Hence, the relation (176) is equivalent to
[TABLE]
Choosing , we write
[TABLE]
for certain . Now, to regard as a function of , we define
[TABLE]
and hereafter we will find which minimize . Calculating the derivative, we have
[TABLE]
Now, we define
[TABLE]
Because and is a monotonically increasing function for , we find that is a monotonically increasing function for . Since
[TABLE]
the derivative test chart of is given as follows.
[TABLE]
Hence, when i.e., , is minimized. Therefore, when satisfies and , the minimum (179) is attained. So, we have , which means (179).
Step (iii): In this step, when , we will show the following equation. Also we will show that the following maximum is realized if and only if . Since the discussion of Step (i) shows that Under this condition, the infimum is realized only when . These discussions show the desired statements (1) and (2) with .
[TABLE]
Since , we have four cases. (1) and . (2) and . (3) and . (4) and . The infinum is negative except for the case (4). So, the maximum with respect to and under the condition is realized in the case (4). In the case (4), we have
[TABLE]
The maximum of the RHS of (187) with the condition is realized when and . Solving the equation , we have
[TABLE]
So, the combination of (187) and (188) yields (186).
Appendix C Proof of Lemma 10
It is sufficient to show the case with . We set the function
[TABLE]
That is, it is sufficient to show that the minimum is realized when because equals the RHS of (151).
Calculating the derivative, we have
[TABLE]
The function is a monotonically increasing function for . So, we find that for and for . Further, when , , which implies that . In this case, we have , which implies . So, we obtain , i.e., . Similarly, when , we can show the inequality . Therefore, the minimum is realized when .
Appendix D Proof of Lemma 3
First, we set
[TABLE]
For each , we define
[TABLE]
Also, for decoder and each , we define
[TABLE]
In addition, we define so that
[TABLE]
Using this notation, we define
[TABLE]
Then,
[TABLE]
The last equality follows since the error probability can be written as
[TABLE]
We notice here that
[TABLE]
for . By substituting this into (196), the first term of (196) is
[TABLE]
which implies (19).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] V. Strassen, “Asymptotische Abschätzugen in Shannon’s Informationstheorie,” In Transactions of the Third Prague Conference on Information Theory etc, Czechoslovak Academy of Sciences, Prague, pp. 689-723, 1962.
- 2[2] M. Hayashi, “Second-Order Asymptotics in Fixed-Length Source Coding and Intrinsic Randomness,” IEEE Trans. Inf. Theory , vol. 54, no. 10, 4619 – 4637 (2008).
- 3[3] M. Hayashi, “Information Spectrum Approach to Second-Order Coding Rate in Channel Coding,” IEEE Trans. Inf. Theory , vol. 55, no. 11, 4947–4966 (2009).
- 4[4] Y. Polyanskiy, H.V. Poor, and S. Verdú, “Channel coding rate in the finite blocklength regime,” IEEE Trans. Inf. Theory , vol. 56, no. 5, 2307 – 2359 (2010).
- 5[5] M. Hayashi and S. Watanabe, “Finite-Length Analyses for Source and Channel Coding on Markov Chains,” ar Xiv:1309.7528 (2013).
- 6[6] D. Wang, A. Ingber, and Y. Kochman, “The Dispersion of Joint Source-Channel Coding,” Proc. 49th Annual Allerton Conf. , Allerton House, Monticello, IL, USA, 2011, pp. 180 - 187.
- 7[7] D. Wang, A. Ingber, and Y. Kochman, “The Dispersion of Joint Source-Channel Coding,” ar Xiv: 1109.6310 (2011).
- 8[8] V. Y. F. Tan, S. Watanabe, and M. Hayashi “Moderate Deviations for Joint Source-Channel Coding of Systems With Markovian Memory”, Proceedings of 2014 IEEE International Symposium on Information Theory , June 29 - July 4 2014, Honolulu, HI, USA, pp. 1687 - 1691.
