Variable-to-Fixed Length Homophonic Coding Suitable for Asymmetric   Channel Coding

Junya Honda; Hirosuke Yamamoto

arXiv:1706.06775·cs.IT·June 30, 2017

Variable-to-Fixed Length Homophonic Coding Suitable for Asymmetric Channel Coding

Junya Honda, Hirosuke Yamamoto

PDF

Open Access

TL;DR

This paper introduces a new variable-to-fixed length homophonic coding scheme that adapts to asymmetric channels by allowing a one-symbol decoding delay, eliminating the need for prior probability gap knowledge.

Contribution

It proposes a novel VF homophonic code that does not require pre-known probability gaps, enhancing practicality for long block codes in asymmetric channels.

Findings

01

The new code achieves asymptotic optimality.

02

Theoretical analysis confirms the code's performance.

03

Experimental results validate the code's effectiveness.

Abstract

In communication through asymmetric channels the capacity-achieving input distribution is not uniform in general. Homophonic coding is a framework to invertibly convert a (usually uniform) message into a sequence with some target distribution, and is a promising candidate to generate codewords with the nonuniform target distribution for asymmetric channels. In particular, a Variable-to-Fixed length (VF) homophonic code can be used as a suitable component for channel codes to avoid decoding error propagation. However, the existing VF homophonic code requires the knowledge of the maximum relative gap of probabilities between two adjacent sequences beforehand, which is an unrealistic assumption for long block codes. In this paper we propose a new VF homophonic code without such a requirement by allowing one-symbol decoding delay. We evaluate this code theoretically and experimentally to…

Equations57

x, x^{'} \in X, x^{n - 1} \in X^{n - 1} max P_{X_{n} ∣ X^{n - 1}} (x ∣ x^{n - 1}) / P_{X_{n} ∣ X^{n - 1}} (x^{'} ∣ x^{n - 1})

x, x^{'} \in X, x^{n - 1} \in X^{n - 1} max P_{X_{n} ∣ X^{n - 1}} (x ∣ x^{n - 1}) / P_{X_{n} ∣ X^{n - 1}} (x^{'} ∣ x^{n - 1})

D_{ϕ} = m \geq 1 sup \frac{1}{m} x^{mn} \in X^{mn} max lo g \frac{P _{\tilde{X}^{mn}} ( x ^{mn} )}{P _{X^{mn}} ( x ^{mn} )} \geq 0 .

D_{ϕ} = m \geq 1 sup \frac{1}{m} x^{mn} \in X^{mn} max lo g \frac{P _{\tilde{X}^{mn}} ( x ^{mn} )}{P _{X^{mn}} ( x ^{mn} )} \geq 0 .

F_{x_{1}} (x_{2}^{n})

F_{x_{1}} (x_{2}^{n})

F_{x_{1}}^{- 1} (r)

\overline{I} \leq ⌊ \underline{I} ⌋_{l} + ∣ X ∣ \cdot 2^{- l} .

\overline{I} \leq ⌊ \underline{I} ⌋_{l} + ∣ X ∣ \cdot 2^{- l} .

Pr [u_{t_{j_{0} + k}}^{\infty} \neq = \overset{u}{^}_{\hat{t}_{j_{0} + k}}^{\infty}] \leq (4 p_{m a x})^{k}, \forall k \in N

Pr [u_{t_{j_{0} + k}}^{\infty} \neq = \overset{u}{^}_{\hat{t}_{j_{0} + k}}^{\infty}] \leq (4 p_{m a x})^{k}, \forall k \in N

D_{ϕ}

D_{ϕ}

P_{\tilde{X}_{2}^{n} ∣ \tilde{X}_{1}} (\tilde{x}_{2}^{n} ∣ \tilde{x}_{1})

P_{\tilde{X}_{2}^{n} ∣ \tilde{X}_{1}} (\tilde{x}_{2}^{n} ∣ \tilde{x}_{1})

= ⎩ ⎨ ⎧ \frac{1}{b} P_{\tilde{X}_{2}^{n} ∣ \tilde{X}_{1}} (\tilde{x}_{2}^{n} ∣ \tilde{x}_{1}), \frac{b - F _{\tilde{x}_{1}} ( x ~ _{2}^{n} - 1 )}{b}, 0, \tilde{x}_{2}^{n} ⪵ F_{\tilde{x}_{1}}^{- 1} (\tilde{x}_{2}^{n}), \tilde{x}_{2}^{n} = F_{\tilde{x}_{1}}^{- 1} (\tilde{x}_{2}^{n}), \mbox o t h er w i se .

F_{x_{1}} (x_{2}^{n}; s_{x_{1}})

F_{x_{1}} (x_{2}^{n}; s_{x_{1}})

F_{x_{1}}^{- 1} (r; s_{x_{1}})

D_{ϕ}

D_{ϕ}

E [l_{1}] > x \in X min H (X_{2}^{n} ∣ X_{1} = x) - lo g ((∣ X ∣ - 1) /2) .

E [l_{1}] > x \in X min H (X_{2}^{n} ∣ X_{1} = x) - lo g ((∣ X ∣ - 1) /2) .

j \in N in f (x_{1, (j)}, {x_{1, (j^{'})}^{n}}_{j^{'} = 1}^{j - 1}) \in X^{1 + n (j - 1)} min

j \in N in f (x_{1, (j)}, {x_{1, (j^{'})}^{n}}_{j^{'} = 1}^{j - 1}) \in X^{1 + n (j - 1)} min

P_{X_{1, (j)}^{n} ∣ {X_{1, (j^{'})}^{n}}_{j^{'} = 1}^{j - 1}} (x_{1, (j)} ∣ {x_{1, (j^{'})}^{n}}_{j^{'} = 1}^{j - 1}) > 0 .

P_{\tilde{X}_{1, (j)}} (σ (i_{0}; {P_{X} (i)}_{i}))

P_{\tilde{X}_{1, (j)}} (σ (i_{0}; {P_{X} (i)}_{i}))

P_{\tilde{X}_{2, (j)}^{n} ∣ \tilde{X}_{1, (j)}} (\tilde{x}_{2}^{n} ∣ \tilde{x}_{1}) \leq \frac{P _{X_{2}^{n} ∣ X_{1}} ( x ~ _{2}^{n} ∣ x ~ _{1} )}{∣ I ∣} < 2 P_{X_{2}^{n} ∣ X_{1}} (\tilde{x}_{2}^{n} ∣ \tilde{x}_{1})

P_{\tilde{X}_{2, (j)}^{n} ∣ \tilde{X}_{1, (j)}} (\tilde{x}_{2}^{n} ∣ \tilde{x}_{1}) \leq \frac{P _{X_{2}^{n} ∣ X_{1}} ( x ~ _{2}^{n} ∣ x ~ _{1} )}{∣ I ∣} < 2 P_{X_{2}^{n} ∣ X_{1}} (\tilde{x}_{2}^{n} ∣ \tilde{x}_{1})

D_{ϕ}

D_{ϕ}

\leq lo g i_{0} \in X max \frac{\frac{1}{i 0 + 1}}{P _{X_{1}} ( x _{σ (i_{0}; {P_{X} (i)}_{i})} )} x_{2}^{n} \in X_{2}^{n} max \frac{2 P _{X_{2}^{n} ∣ X_{1}} ( x _{2}^{n} ∣ x _{1} )}{P _{X_{2}^{n} ∣ X_{1}} ( x _{2}^{n} ∣ x _{1} )}

= lo g i \in {0, 1, \dots, ∣ X ∣ - 1} max \frac{2}{( i + 1 ) max _{x \in X}^{(i)} P _{X_{1}} ( x )} .

\overline{I}

\overline{I}

> \underline{I} - 2^{- (l (I) + 1)} + ∣ X ∣ \cdot 2^{- (l (I) + 1)}

= \underline{I} + (∣ X ∣ - 1) \cdot 2^{- (l (I) + 1)},

l (I) > - lo g ∣ I ∣ + lo g ((∣ X ∣ - 1) /2) .

l (I) > - lo g ∣ I ∣ + lo g ((∣ X ∣ - 1) /2) .

l_{1}

l_{1}

> - lo g ∣ I^{'} ∣ + lo g ((∣ X ∣ - 1) /2)

\geq - lo g P_{X_{2}^{n} ∣ X_{1}} (\tilde{x}_{2, (j)}^{n} ∣ \tilde{x}_{1, (j)}) + lo g ((∣ X ∣ - 1) /2) .

E [l_{1}]

E [l_{1}]

\geq H (X_{2}^{n} ∣ X_{1} = \tilde{x}_{1, (j)}) + lo g ((∣ X ∣ - 1) /2)

\overset{ˉ}{f} = \int_{0}^{1} f (r) d r, s_{0} = s \in [0, 1) argmax {\int_{0}^{s} (f (r) - \overset{ˉ}{f}) d r},

\overset{ˉ}{f} = \int_{0}^{1} f (r) d r, s_{0} = s \in [0, 1) argmax {\int_{0}^{s} (f (r) - \overset{ˉ}{f}) d r},

E_{\tilde{X}_{2}^{n}} [- lo g P_{X_{2}^{n} ∣ X_{1}} (\tilde{X}_{2}^{n} ∣ \tilde{x}_{1, (j)})]

E_{\tilde{X}_{2}^{n}} [- lo g P_{X_{2}^{n} ∣ X_{1}} (\tilde{X}_{2}^{n} ∣ \tilde{x}_{1, (j)})]

= \frac{1}{1 - a} \int_{a}^{1} (- lo g P_{X_{2}^{n} ∣ X_{1}} (F_{0}^{- 1} (r; s_{0}) ∣ \tilde{x}_{1, (j)})) d r

= \overset{ˉ}{f} + \frac{\int _{⟨ a + s_{0} ⟩}^{s_{0}} ( f ( r ) - f ˉ ) d r}{1 - a}

= \overset{ˉ}{f} + \frac{\int _{0}^{s_{0}} ( f ( r ) - f ˉ ) d r - \int _{0}^{⟨ a + s_{0} ⟩} ( f ( r ) - f ˉ ) d r}{1 - a}

\geq \overset{ˉ}{f} = H (X_{2}^{n} ∣ X_{1} = \tilde{x}_{1, (j)}),

s_{∣ X ∣ - 1} = s \in [0, 1) argmin {\int_{0}^{s} (f (r) - \overset{ˉ}{f}) d r} .

s_{∣ X ∣ - 1} = s \in [0, 1) argmin {\int_{0}^{s} (f (r) - \overset{ˉ}{f}) d r} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Wireless Communication Techniques · Error Correcting Code Techniques · Algorithms and Data Compression

Full text

Variable-to-Fixed Length Homophonic Coding Suitable for

Asymmetric Channel Coding

Junya Honda Hirosuke Yamamoto

Graduate School of Frontier Sciences,

The University of Tokyo,

Kashiwa-shi Chiba 277–8561, Japan

Email: [email protected], [email protected]

Abstract

In communication through asymmetric channels the capacity-achieving input distribution is not uniform in general. Homophonic coding is a framework to invertibly convert a (usually uniform) message into a sequence with some target distribution, and is a promising candidate to generate codewords with the nonuniform target distribution for asymmetric channels. In particular, a Variable-to-Fixed length (VF) homophonic code can be used as a suitable component for channel codes to avoid decoding error propagation. However, the existing VF homophonic code requires the knowledge of the maximum relative gap of probabilities between two adjacent sequences beforehand, which is an unrealistic assumption for long block codes. In this paper111This is the full version of the paper to appear in IEEE International Symposium on Information Theory (ISIT2017) with some minor corrections. we propose a new VF homophonic code without such a requirement by allowing one-symbol decoding delay. We evaluate this code theoretically and experimentally to verify its asymptotic optimality.

I Introduction

In communication through asymmetric channels the capacity-achieving input distribution is not uniform in general. On the other hand, in most practical codes such as linear codes all symbols appear with almost the same frequency and some modification is necessary to use them as optimal codebooks. Although it is known that biased codewords can be generated from an auxiliary code over an extended alphabet based on Gallager’s nonlinear mapping [1, p. 208], its complexity becomes very large when the target distribution is not expressed in simple rational numbers.

A promising solution to this problem is to generate such biased codewords by homophonic coding. Homophonic coding is a framework to invertibly convert a message with distribution $P_{U}$ into another sequence with target distribution $P_{X}$ . This framework is intuitively similar to a dual of lossless compression, where a biased source sequence is encoded into an almost uniform sequence. In fact, the inverse of a lossless code based on an LDPC matrix is used to construct capacity-achieving channel code in [2] although a practical algorithm for this code has not been known.

A homophonic code is called perfect if the generated sequence exactly follows the target distribution. A perfect Fixed-to-Variable length (FV) homophonic code222 A similar FV code is proposed in [3] later that is asymptotically perfect. can be constructed [4] based on an interval algorithm similar to a random number generator [5]. This code can actually be applied to generation of biased codewords for LDPC codes and polar codes [6][7] to achieve the capacity. These codes require a homophonic code applicable to a non-i.i.d. sequence to generate a codeword with some structure (such as parity-check constraints). On the other hand, a general channel coding framework is proposed in [8][9] where any homophonic code for i.i.d. sequences can be used at a cost that each codeword consists of some blocks and the entire code length becomes large.

Although the FV homophonic code in [4] is perfect and achieves the asymptotic bound on the coding rate, it is not appropriate to use this code as a component of a channel code. This is because an FV channel code is hard to run in parallel and suffers decoding error propagation, which is a serious problem for channel coding. Therefore it is desirable to use a Fixed-to-Fixed length (FF) or Variable-to-Fixed length (VF) homophonic code to avoid such error propagation.

Since it is difficult to construct a perfect homophonic code in the VF and FF frameworks333Although a perfect FF homophonic code is also constructed in [4], it suffers decoding errors with asymptotically vanishing but positive probability., there are some studies on homophonic codes whose output distribution asymptotically matches the target distribution. Such an asymptotically matching distribution is practically sufficient for the channel coding application, in contrast to the original motivation for homophonic coding [10][11] where an exactly matching random sequence is required to apply to cryptography.

Böcherer and Mathar considered homophonic coding with the name distribution matching and proposed a VF homophonic code based on Huffman coding with an approximation of the target distribution by a dyadic distribution [12][13]. This code requires $n$ -symbol extension to achieve $\mathrm{O}(1/n)$ redundancy even though the complexity does not scale with $n$ as in Huffman coding. Schulte and Böcherer [14] proposed an FF homophonic code which outputs sequences in a fixed type class, which has redundancy $\mathrm{O}((\log n)/n)$ . Honda and Yamamoto [15] proposed a VF homophonic code with redundancy $\mathrm{O}(1/n)$ with linear complexity by combining Shannon-Fano-Elias code and Gray code. Whereas the code in [14] is easier to handle in real systems from the nature of FF codes, the VF code in [15] is easy to extend to non-i.i.d. processes. Thus this VF code can be applied to the coding framework in [8][9] even if the channel is not memoryless where the capacity achieving input distribution is not i.i.d. A drawback of the scheme in [15] is that an upper bound on the maximum relative probability gap

[TABLE]

has to be known beforehand. This value is easy to compute for Markov processes of (not very large) order $k$ but hard to compute for general block codes, which makes application to block codes with structures difficult.

In this paper we propose a new VF homophonic coding scheme, which encodes a variable-length uniform sequence into an $n$ -bit sequence with some target probability distribution $P_{X^{n}}$ . Here $X^{n}$ does not have to be i.i.d. and the only requirement is that, as in arithmetic coding, the conditional probability $P_{X_{k}|X^{k-1}}(x_{k}|x^{k-1})$ for each $k\in\mathbb{N}$ be computable for a given $x^{k}$ . The cost for this advantage is that the scheme requires one code-symbol decoding delay, but a decoding error propagates to at most one block with very high probability. We prove the asymptotic optimality of the scheme under some conditions and confirm its performance by simulations for asymmetric channel coding application.

II Preliminaries

We consider a VF homophonic coding problem to encode a uniform input sequence $U^{\infty}=(U_{1},U_{2},\cdots)\in\{0,1\}^{\infty}$ into a sequence $x^{\infty}\in\mathcal{X}^{\infty}$ where $\mathcal{X}=\{0,1,\cdots,|\mathcal{X}|-1\}$ with target probability distribution $P_{X^{\infty}}$ . A random variable with distribution $P_{X^{\infty}}$ is denoted by $X^{\infty}=(X_{1},X_{2},\cdots)$ . In a VF homophonic coding scheme, variable-length subsequences $u_{1}^{l_{1}},u_{l_{1}+1}^{l_{2}},\cdots$ are invertibly encoded into fixed-length sequences $x_{1}^{n},x_{n+1}^{2n},\cdots$ , where a subsequence is denoted by, e.g., $x_{i}^{j}=(x_{i},x_{i+1},\cdots,x_{j})$ . We consider the case where blocks $X_{1}^{n},\,X_{n+1}^{2n},\cdots$ are i.i.d., whereas symbols $X_{1},\,X_{2},\cdots,\,X_{n}$ in $X_{1}^{n}$ may be non-i.i.d. For this reason, we often write $(X_{1,(1)}^{n},\allowbreak X_{1,(2)}^{n},\cdots)$ instead of $(X_{1}^{n},\,X_{n+1}^{2n},\cdots)$ . We discuss the extension to a general sequence $X^{\infty}=(X_{1},X_{2},\cdots)$ in Remark 2.

Let $\phi$ be a (possibly random) encoding function of a homophonic code. This code is called perfect if the generated sequence $\tilde{X}^{\infty}=\phi(U_{1}^{l_{1}})\allowbreak\phi(U_{l_{1}+1}^{l_{1}+l_{2}})\cdots$ exactly follows the distribution $P_{X^{\infty}}$ for the input sequence $U_{1}^{l_{1}}U_{l_{1}+1}^{l_{1}+l_{2}}\cdots$ i.i.d. from $P_{U}$ . We measure the gap between the generated sequence and $X^{\infty}$ by the max-divergence per block, denoted by

[TABLE]

A homophonic code $\phi$ is perfect if and only if $D_{\phi}=0$ . Note that a codeword $x^{n}$ with decoding error probablity $p_{\mathrm{e}}(x^{n})$ is generated with probability at most $2^{D_{\phi}}P_{X^{n}}(x^{n})$ under the homophonic code $\phi$ . As a result, if the block decoding error probability of some block code is $P_{\mathrm{e}}$ under the ideal distribution $P_{X^{n}}$ , the error probability generated by the homophonic code is bounded by $2^{D_{\phi}}P_{\mathrm{e}}$ .

Let $\lfloor r\rfloor_{l}$ denote the first $l\in\mathbb{N}$ bits of the binary expansion of $r\in[0,1)$ . We sometimes identify $\lfloor r\rfloor_{l}\in\allowbreak\{0,1\}^{l}$ with the real number $2^{-l}\lfloor 2^{l}r\rfloor\in[0,1)$ . The real number corresponding to the $l+1,\,l+2,\cdots$ -th bits is denoted by $\langle r\rangle_{l}=\allowbreak 2^{l}r-\lfloor 2^{l}r\rfloor\in[0,1)$ . Thus we have $r=\lfloor r\rfloor_{l}+2^{-l}\langle r\rangle_{l}$ . We also define $\langle r\rangle=r-\lfloor r\rfloor\in[0,1)$ for any $r\in\mathbb{R}$ . We define the cumulative distribution function given $X_{1}$ and its inverse by

[TABLE]

where $\preceq$ denotes the lexicographic order and $\min$ is taken under this order. We write $x_{2}^{n}-1$ for the last sequence before $x_{2}^{n}$ , that is, the largest sequence $y_{2}^{n}$ such that $y_{2}^{n}\precneqq x_{2}^{n}$ .

III Delayed VF Homophonic Code

In this section we propose a Delayed VF Homophonic (DVFH) code, which is based on an interval partitioning as in the cases of arithmetic coding and the homophonic code in [4].

First we give an intuition for the DVFH code before its description. In a DVFH code, we do not assign the full information specifying the (variable-length) input sequence $u^{l}$ to $x^{n}$ , and instead, we assign the information that is required to specify $u^{l}$ if $(\log|\mathcal{X}|)$ -bits of information were obtained in addition to $x^{l}$ . This additional information is assigned to the first element $x_{1}$ of the next encoding block, which causes the code to incur a one-symbol decoding delay. By this assignment of $(\log|\mathcal{X}|)$ -bits of information, the distribution of $x_{1}$ becomes different from $P_{X_{1}}$ but the remaining sequence $x_{2}^{n}$ almost follows $P_{X_{2}^{n}|X_{1}}$ .

Let $l(I)\in\mathbb{N}$ for an interval $I=[\underline{I},\overline{I})\subset[0,1)$ denote the largest integer $l$ such that

[TABLE]

In this way the lower and upper bounds of an interval $I$ are always denoted by $\underline{I}$ and $\overline{I}$ , respectively.

As detailed later, the encoder of a DVFH code sends $\tilde{x}_{2}^{n}$ such that $F_{\tilde{x}_{1}}(\tilde{x}_{2}^{n}-1)\leq 0.u_{1}u_{2}\cdots<F_{\tilde{x}_{1}}(\tilde{x}_{2}^{n})$ . Therefore $\lfloor F_{\tilde{x}_{1}}(\tilde{x}_{2}^{n}-1)\rfloor_{l_{0}}<0.u_{1}u_{2}\cdots\leq\lfloor F_{\tilde{x}_{1}}(\tilde{x}_{2}^{n}-1)\rfloor_{l_{0}}+|\mathcal{X}|\cdot 2^{-l_{0}}$ holds for $l_{0}=l([F_{x_{1}}(x_{2}^{n}-1),F_{x_{1}}(x_{2}^{n})))$ . This implies that if the decoder gets additional $(\log|\mathcal{X}|)$ -bits of information then $u_{1},u_{2},\cdots,u_{l_{0}}$ can be recovered.

Let $\sigma(\cdot;\mathcal{V})$ be a permutation of $\mathcal{X}=\{0,1,\cdots,|\mathcal{X}|-1\}$ representing the descending order of $\mathcal{V}=\{v_{i}\}\in\mathbb{R}^{|\mathcal{X}|}$ , that is, we have $v_{\sigma(0;\mathcal{V})}\geq v_{\sigma(1;\mathcal{V})}\geq\cdots\geq\allowbreak v_{\sigma(|\mathcal{X}|-1;\mathcal{V})}$ . Here ties are broken arbitrarily in a unified manner between the encoder and decoder. The encoding and decoding algorithms of a DVFH code are given in Algorithms 1 and 2, where the input $u^{\infty}$ is decoded into $\hat{u}^{\infty}$ . Here the length of an interval $I=[\underline{I},\,\overline{I})$ is expressed by $|I|=\overline{I}-\underline{I}$ . Step 2 of the decoding algorithm is exception handling and the condition is never satisfied if the decoder receives the generated sequence $\tilde{x}^{n}$ without error.

First we give a theorem on the unique decodability and the error propagation probability of a DVFH code.

Theorem 1.

(i) A DVFH code satisfies $u_{t_{j-1}+1}^{t_{j}}=\hat{u}_{\hat{t}_{j-1}+1}^{\hat{t}_{j}}$ for all $j=1,2,\cdots$ . (ii) Fix $j_{0}\in\mathbb{N}$ and $y^{n}\in\mathcal{X}^{n}$ arbitrarily. If the sequence of codewords $\tilde{x}_{(1)}^{n},\,\tilde{x}_{(2)}^{n},\,\cdots$ is generated by the encoder from uniformly distributed $u^{\infty}\in\{0,1\}^{\infty}$ and a sequence $\tilde{x}_{(1)}^{n},\,\cdots,\tilde{x}_{(j_{0}-1)}^{n},\,y^{n},\,\tilde{x}_{(j_{0}+1)}^{n},\,\tilde{x}_{(j_{0}+1)}^{n},\cdots$ is parsed by the decoder, then

[TABLE]

where $p_{\max}=\max_{x^{n}\in\mathcal{X}^{n}}P_{X_{2}^{n}|X_{1}}(x_{2}^{n}|x_{1})$ . (iii) The max-divergence between the target distribution and the distribution of the generated sequence satisfies

[TABLE]

where $\max_{x\in\mathcal{X}}^{(i)}f(x)=f(\sigma(i;\{f(x)\}_{x\in\mathcal{X}}))$ is the $i$ -th largest value in $\{f(x)\}_{x\in\mathcal{X}}$ .

The first part of this theorem shows that this code is uniquely decodable. The second part shows that if an error occurred in the $k$ -th codeword block $\tilde{x}_{1,(k)}^{n}$ then the error may occur in the decoding of $\tilde{x}_{1,(k-1)}^{n}$ and $\tilde{x}_{1,(k)}^{n}$ , but propagates to that of $x_{1,(k+1)}^{n},\,x_{1,(k+2)}^{n},\cdots$ with exponentially small probability if $p_{\max}<1/4$ . Here note that $p_{\max}=\max_{x^{n}\in\mathcal{X}^{n}}P_{X_{2}^{n}|X_{1}}(x_{2}^{n}|x_{1})$ itself is also a value exponentially small in $n$ for usual distributions $P_{X^{n}}$ . The last part shows that the max entropy is bounded by a constant independent of $n$ . This means that when a DVFH code is applied to the generation of biased codewords, the block decoding error probability is as most a constant times the error probability under the ideal codeword distribution as explained in the discussion around (1).

Unfortunately, we do not have a theoretical guarantee on the average input length of a DVFH code. This is because $r$ at each iteration is uniformly distributed over an interval $I\subset[0,1)$ rather than $[0,1)$ . For example, if $I=[0,b)$ for some $b>0$ then the distribution of $\tilde{x}_{2}^{n}$ becomes

[TABLE]

This problem can be avoided by, for example, adding a shared common random number $r_{(j)}^{\prime}\in[0,1)$ to $r$ at each iteration, which makes $r$ uniformly distributed over $[0,1)$ . The problem can also be avoided by replacing the cumulative conditional distribution $F_{\tilde{x}_{1}}(\tilde{x}_{2}^{n})$ with $\underline{I}+(\overline{I}-\underline{I})F_{x_{1}}(\tilde{x}_{2}^{n})$ , which is the technique used in [15]. However, this modification causes error propagation despite the use of a VF code since $I$ for the current loop depends on all the sequences sent in the previous loops.

In this paper we propose a modification of the code to guarantee a lower bound on the average input length without introducing common randomness and causing error propagation. This uses a “shifted” cumulative distribution given $x_{1}$ where an appropriately fixed value $s_{x_{1}}\in[0,1)$ is added under the mod 1 operation (see (7) and (10) in the proof of Theorem 2 for the specific value of $s_{x_{1}}$ ), which is expressed as

[TABLE]

Theorem 2.

*Consider the homophonic code where $F_{\tilde{x}_{1}}(\cdot)$ and $F_{\tilde{x}_{1}}^{-1}(\cdot)$ are replaced with $F_{\tilde{x}_{1}}(\cdot\,;s_{\tilde{x}_{1}})$ and $F_{\tilde{x}_{1}}^{-1}(\cdot\,;s_{\tilde{x}_{1}})$ , respectively, Step 1 in the encoding is replaced with $\tilde{x}_{1}:=i^{*}$ (with an arbitrary initial value $i^{*}\in\mathcal{X}$ ), and Step 2 in the decoding is replaced with $\tilde{x}_{1}:=\tilde{x}_{1,(j)}$ . Then (i) and (ii) of Theorem 1 still hold and the max divergence satisfies *

[TABLE]

Furthermore, there exists $\{s_{x_{1}}\}_{x_{1}}\in[0,1)^{|\mathcal{X}|}$ such that the average input length $l_{1}=t_{(j+1)}-t_{(j)}$ for any $j\in\mathbb{N}$ satisfies

[TABLE]

The value $s_{x_{1}}$ that assures (5) is difficult to compute in practice (although we only have to compute this value once as preprocessing). Nevertheless, this theorem may become one reason for the near optimal empirical performance of the DVFH code (without modification) shown in the next section.

As shown in the above theorem, the average input length of the modified code is bounded in terms of $\min_{x\in\mathcal{X}}\allowbreak H(X_{2}^{n}|X_{1}=x)$ . It must be noted that this value is much smaller than $H(X^{n})$ or $H(X_{2}^{n}|X_{1})$ for some $P_{X^{n}}$ , although $X_{2}^{n}$ does not heavily depend on $X_{1}$ in most “good” block codes. It is an important future work to devise a VF homophonic code which provably achieves the average input lengths $H(X^{n})-\mathrm{O}(1)$ for general distributions $P_{X^{n}}$ .

Remark 1.

In a DVFH code $(\log|\mathcal{X}|)$ -bits of additional information is assigned to $X_{1,(j-1)}$ . We can also consider a code such that the additional information is assigned to $X_{n,(j)}$ by switching the role of $X_{1,(j+1)}$ to $X_{n,(j)}$ . Such a code does not suffer a decoding delay, but theoretical guarantees hold in terms of $\inf_{x_{1}^{n-1}\in\mathcal{X}^{n-1}}P_{X_{n}|X_{1}^{n-1}}(x|x_{1}^{n-1})$ instead of $P_{X_{1}}(x)$ in, e.g., (4). Thus it is inappropriate to use this code to generate sequences which heavily depend on past sequences. However it can be a promising candidate when $X_{1}^{n}$ is i.i.d. or a Markov process of order $k\ll n$ , which is the case when a homophonic code is applied to the framework of channel coding in [9].

Remark 2.

In the case where $X_{1,(1)}^{n},X_{1,(2)}^{n},\cdots$ are not i.i.d., we can obtain a similar theoretical guarantee by replacing $P_{X_{2}^{n}|X_{1}}=P_{X_{2,(j)}^{n}|X_{1,(j)}}$ in the algorithms with $P_{X_{2,(j)}^{n}|X_{1,(j)},\{X_{1,(j^{\prime})}^{n}\}_{j^{\prime}=1}^{j-1}}$ , if

[TABLE]

However, error propagation inevitably occurs in the coding framework using probabilities depending on all the past generated blocks. Thus it is realistic to set the target probability distribution to be independent between blocks.

IV Numerical Results

In this section we first compare a DVFH code with other homophonic codes for i.i.d. target distributions. We next apply the DVFH code to polar coding for asymmetric channels. We used a DVFH code without introducing the modification considered in Theorem 2, which means that the theoretical guarantee on the coding rate in (5) does not hold.

Fig. 2 is the upper bound444We considered the theoretical upper bound instead of the empirical results for this point since it requires prohibitively large number of samples to estimate a distribution over $\mathcal{X}^{n}$ . on max-divergence $D_{\phi}$ between the generated sequences and the target probability. The plots555The plots of the FF code in this figure is slightly different from the ISIT version that contained a bug. of the FF code in [14] are the exact ones rather than upper bounds. The sequences generated by the FV code in [4] exactly follow the target distribution and its plot is not shown in the figure. Max divergences per block of the code in [15] and the proposed one are bounded independent of $n$ whereas that of the FF code in [14] is $\Theta(\log n)$ . Fig. 2 shows the redundancy of the average coding rate which is $\mathrm{E}[L_{\mathrm{in}}]/n-H(X)$ for VF and FF codes where $L_{\mathrm{in}}$ is the input length. For the FV code in [4], the redundancy is given by $m/\mathrm{E}[L_{\mathrm{out}}]-H(X)$ where $m$ and $L_{\mathrm{out}}$ are the input and output length, respectively, and we set $m=\lfloor nH(X)\rfloor$ so that the output length becomes roughly the same as the FF and VF codes. Each plot is the average over 10,000 sequential encoding.

As we can see from these figures the DVFH code achieves a comparable divergence for most target distributions, whereas the redundancy is almost zero or below even for shorter block length. Here the negative redundancy of the DVFH code does not contradict the Shannon bound since the distribution of the generated sequence is slightly different from the target distribution.

Next, Figs. 4 and 4 show the upper bound on max-divergence $D_{\phi}$ and the redundancy of the average coding rate for i.i.d. quaternary sequences with $P_{X}(2)=P_{X}(3)=P_{X}(4)=(1-P_{X}(1))/3$ , respectively. A tendency similar to the binary case can be seen from these figures.

Finally we consider an application of homophonic coding to polar codes for asymmetric channels, where capacity-achieving polar coding schemes are proposed in [16] and [7]. Whereas the former one is an FF code using a polar code for lossless compression to realize the optimal input distribution, the latter one is an FV code based on the FV homophonic code in [4] which is practically unrealistic because of the problem of error propagation. Based on this observation we compared the FF length code in [16] with the scheme in [7] by replacing the FV homophonic code with the DVFH code. Note that the FF homophonic code in [14] and the VF homophonic code in [15] are hard to apply since the target distribution is not i.i.d.

We considered AWGN channels with 4ASK modulation where input points are given by $X\in\{-3a,-a,+a,+3a\}$ for $a>0$ . The Signal-to-Noise Ratio (SNR) is set to 10db. The optimal input distribution is $(P_{X}(\pm a),P_{X}(\pm 3a))=(0.33,\,0.17)$ . The mutual information $I(X;Y)$ between the input and output is 1.582 and 1.628 for the uniform and the above input distribution, respectively. Note that it is also possible to optimize the input points as well as the input distribution but we only considered optimization of the latter one for practicality. We used polar codes over $\mathcal{X}=\mathrm{GF}(4)$ , with block lengths $2^{9},\,2^{11}$ and $2^{13}$ .

Fig. 5 shows the decoding error probabilities of the above two polar coding schemes and the original polar code with the uniform input distribution. Here note that there is some arbitrariness on the performance measure of the scheme using a DVFH code. First, decoding error propagates to roughly $3/4$ blocks (in the 4-ary case), that is, one decoding error of this code corresponds roughly to $7/4$ -block errors of other codes. Second, the input length for each codeword is variable and there exists a correlation between the message length and the error probability of the codeword. Based on this observation, we plotted twice the empirical decoding error probability to conservatively evaluate the scheme using a DVFH code.

As we can see from the figure the performance of the FF code in [16] is much worse than the performance of the uniform input polar code for moderate block lengths, although it is theoretically assured to be better than the uniform input asymptotically. On the other hand, the VF polar code using the DVFH code significantly outperforms the polar code with the uniform input.

V Proof of Theorem 2

In this section we prove Theorems 1 and 2. We prove these theorems based on the following lemma.

Lemma 1.

At Step 1 of the encoding, $|I|>1/2$ holds and $r$ is uniformly distributed over $I$ given $\tilde{x}_{(1)}^{n},\,\tilde{x}_{(2)}^{n},\cdots,\tilde{x}_{(j-1)}^{n}$ .

Proof.

$|I|>1/2$ straightforwardly follows from Steps 1–1. The latter part is proved by induction. For $j=1$ this proposition holds from $I=[0,1)$ and the uniformity of $(u_{1},u_{2},\cdots)$ . Next, assume that the proposition holds for $j\leq j_{0}-1$ . Then, given $\tilde{x}_{(j_{0})}^{n}=\tilde{x}^{n}$ , $r$ is uniformly distributed over $I^{\prime}$ in Step 1. Thus, $0.u_{t_{(j)}+l_{1}}u_{t_{(j)}+l_{1}+1}\cdots$ is uniformly distributed over $I:=[\langle\underline{I}_{i^{*}}^{\prime}\rangle_{l_{1}},\langle\overline{I}_{i^{*}}^{\prime}\rangle_{l_{1}})$ , which implies that the proposition holds for $j=j_{0}$ . ∎

Proof of Theorem 1: (i) Construction of $I^{\prime},l_{0},\,\{I_{i}^{\prime}\},\,l_{1}$ and $I$ is the same between the encoding and the decoding, provided that $i^{*}$ is the same between them. In addition, from relation between Steps 1 and 1 in the encoding and Steps 2 and 2 in the decoding, $i^{*}$ for each $j=j_{0}$ in the encoding is correctly recovered in the decoding at Step 2 for $j=j_{0}+1$ . Since $u_{t_{(j-1)}}^{t_{(j-1)}+l_{1}-1}=\lfloor\underline{I}^{\prime}_{i^{*}}\rfloor_{l_{1}}$ holds from $r\in I_{i^{*}}^{\prime}$ , $u_{t_{(j-1)}}^{t_{(j-1)}+l_{1}-1}$ is correctly recovered by Step 2 in the decoding,

(ii) A decoding error $u_{t_{j+k}}^{t_{j+k+1}-1}\neq\allowbreak\hat{u}_{\hat{t}_{j+k}}^{\hat{t}_{j+k+1}-1}$ occurs only if $I^{\prime}$ after Step 1 of the encoding and $I^{\prime}$ after Step 2 of the decoding are different for $j=j_{0}+k$ . From Lemma 1, $I^{\prime}\neq[F_{\tilde{x}_{1}}(\tilde{x}_{2}^{n}-1),\,F_{\tilde{x}_{1}}(\tilde{x}_{2}^{n}))$ occurs in the encoding with probability at most $\max_{x_{2}^{n}}P_{X_{2}^{n}|X_{1}}(x_{2}^{n}|\tilde{x}_{1})/{|I|}\leq 2p_{\max}$ given that $I^{\prime}$ are different between the encoding and decoding for $j=j_{0}+k-1$ . The same result holds for the decoding and therefore $I^{\prime}$ are different to each other with probability at most $4p_{\max}$ , which proves (3).

(iii) For each $i_{0}\in\mathcal{X}$ , $\tilde{x}_{1}=x_{\sigma(i_{0};\{P_{X}(i)\}_{i})}$ if and only if $r\in I_{\sigma(i_{0};\{I_{i}^{\prime}\}_{i})}^{\prime}$ from the encoding algorithm, which holds with probability $|I_{\sigma(i_{0};\{I_{i}^{\prime}\}_{i})}^{\prime}|/|I^{\prime}|$ from Lemma 1. Since $|I_{\sigma(0;\{I_{i}^{\prime}\}_{i})}^{\prime}|\geq|I_{\sigma(1;\{I_{i}^{\prime}\}_{i})}^{\prime}|\geq\cdots\geq|I_{\sigma(|\mathcal{X}|-1;\{I_{i}^{\prime}\}_{i})}^{\prime}|$ , we have

[TABLE]

We also have

[TABLE]

from Lemma 1. Thus we obtain (4) by

[TABLE]

Proof of Theorem 2.

As the former part is almost the same as the proof of Theorem 1, we only prove (5). Since $l(I)$ is the largest $l$ satisfying (2), we have

[TABLE]

which implies

[TABLE]

Therefore, the length of the assigned message is given by

[TABLE]

Now we consider the output distribution $P_{\tilde{X}_{2}^{n}|\tilde{X}_{1}}$ . Recall that $r$ is uniformly distributed over $I$ by Lemma 1. We have $I=[a,1)$ for some $a\in[0,1/2)$ if $\tilde{x}_{1,(j)}=0$ holds, $I=[0,b)$ for some $b\in[1/2,1)$ if $\tilde{x}_{1,(j)}=|\mathcal{X}|-1$ holds and $I=[0,1)$ otherwise. Thus, in the case where $\tilde{x}_{1,(j)}\notin\{0,|\mathcal{X}|-1\}$ we have

[TABLE]

for any $s_{\tilde{x}_{1,(j)}}$ and we consider the other case in the following.

Now consider the case $\tilde{x}_{1,(j)}=0$ . Let

[TABLE]

for $f(r)=-\log P_{X_{2}^{n}|X_{1}}(F_{x_{1}}^{-1}(r)|x_{1})$ . Then we have

[TABLE]

where (8) and (9) follow from $\int_{0}^{1}(f(r)-\bar{f})\mathrm{d}r=0$ and (7), respectively. In the case $\tilde{x}_{1,(j)}=|\mathcal{X}|-1$ we obtain $\mathrm{E}_{\tilde{X}_{2}^{n}}[-\log P_{X_{2}^{n}|X_{1}}(\tilde{X}_{2}^{n}|\tilde{x}_{1,(j)})]\geq H(X_{2}^{n}|X_{1}=\tilde{x}_{1,(j)})$ in the same way by letting

[TABLE]

We obtain (5) by combining this result with (6). ∎

VI Conclusion

In this paper we proposed a variable-to-fixed length homophonic code, DVFH code, which is easily applied to channel coding for asymmetric channels. This code can decode each block with one code-symbol decoding delay. The max-divergence of the generated sequence is bounded by a constant. The average input length of the code is very close the entropy in the simulation and it is shown to be asymptotically larger than the worst-case conditional entropy under a slight modification of the code. An important future work is to construct a code such that input length provably achieves the entropy rather than the conditional one.

Acknowledgment

This work was supported in part by JSPS KAKENHI Grant Number 16H00881,

Bibliography16

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. G. Gallager, Information Theory and Reliable Communication . New York: Wiley, 1968.
2[2] S. Miyake and J. Muramatsu, “A construction of channel code, joint source-channel code, and universal code for arbitrary stationary memoryless channels using sparse matrices,” IEICE Trans. Fundam. , vol. 92-A, no. 9, pp. 2333–2344, 2009.
3[3] R. A. Amjad and G. Böcherer, “Fixed-to-variable length distribution matching,” in Proceedings of IEEE International Symposium on Information Theory (ISIT 13) , 2013, pp. 1511–1515.
4[4] M. Hoshi and T. S. Han, “Interval algorithm for homophonic coding,” IEEE Trans. Inform. Theory , vol. 47, no. 3, pp. 1021–1031, 2001.
5[5] T. S. Han and M. Hoshi, “Interval algorithm for random number generation,” IEEE Trans. Inform. Theory , vol. 43, no. 2, pp. 599–611, 1997.
6[6] J. Honda, “Efficient polar and LDPC coding for asymmetric channels and sources,” Ph.D. dissertation, The University of Tokyo, 2013. [Online]. Available: http://repository.dl.itc.u-tokyo.ac.jp/dspace/bitstream/2261/56414/1/K-04103.pdf
7[7] R. Wang, J. Honda, H. Yamamoto, R. Liu, and Y. Hou, “Construction of polar codes for channels with memory,” in IEEE ITW 2015 , 2015, pp. 187–191.
8[8] G. Böcherer and R. Mathar, “Operating LDPC codes with zero shaping gap,” in IEEE ITW 2011 , 2011, pp. 330–334.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Variable-to-Fixed Length Homophonic Coding Suitable for

Abstract

I Introduction

II Preliminaries

III Delayed VF Homophonic Code

Theorem 1**.**

Theorem 2**.**

Remark 1**.**

Remark 2**.**

IV Numerical Results

V Proof of Theorem 2

Lemma 1**.**

Proof.

Proof of Theorem 2.

VI Conclusion

Acknowledgment

Theorem 1.

Theorem 2.

Remark 1.

Remark 2.

Lemma 1.