On Polar Coding for Binary Dirty Paper

Barak Beilin; David Burshtein

arXiv:1904.02450·cs.IT·April 5, 2019

On Polar Coding for Binary Dirty Paper

Barak Beilin, David Burshtein

PDF

TL;DR

This paper proposes an improved nested polar coding scheme for binary dirty paper channels, focusing on low delay and short to moderate blocklengths, with analysis on frozen bits and performance comparison.

Contribution

It introduces a new analysis of frozen bits in nested polar codes for binary DP and demonstrates an improved scheme with practical decoding strategies.

Findings

01

Frozen bits are often zero or small in number, reducing retransmission needs.

02

The scheme achieves performance close to the best possible rates for binary DP.

03

Analysis shows how frozen bits scale with blocklength under certain conditions.

Abstract

The problem of communication over binary dirty paper (DP) using nested polar codes is considered. An improved scheme, focusing on low delay, short to moderate blocklength communication is proposed. Successive cancellation list (SCL) decoding with properly defined CRC is used for channel coding, and SCL encoding without CRC is used for source coding. The performance is compared to the best achievable rate of any coding scheme for binary DP using nested codes. A well known problem with nested polar codes for binary DP is the existence of frozen channel code bits that are not frozen in the source code. These bits need to be retransmitted in a second phase of the scheme, thus reducing transmission rate. We observe that the number of these bits is typically either zero or a small number, and provide an improved analysis, compared to that presented in the literature, on the size of this set…

Tables1

Table 1. TABLE I: LDPC v.s. SCL binary DP performance, p = 0.1 𝑝 0.1 p=0.1 , D = 0.3 𝐷 0.3 D=0.3

Description	Block size	DP Rate	Bit Error Rate
LDPC scheme	100K	0.36	$1.28 \times 10^{- 5}$
SCL, $L_{c} = 4$ , $L_{s} = 1$	130K	0.353	$5.9 \times 10^{- 6}$
SCL, $L_{c} = 8$ , $L_{s} = 8$	130K	0.357	$5.6 \times 10^{- 6}$
SCL, $L_{c} = 16$ , $L_{s} = 50$	130K	0.362	$5.0 \times 10^{- 6}$
SCL, $L_{c} = 16$ , $L_{s} = 50$	65K	0.356	$1.2 \times 10^{- 5}$
Capacity		0.42

Equations81

Y = X \oplus S \oplus Z

Y = X \oplus S \oplus Z

C_{DP} = h_{2} (D) - h_{2} (p)

C_{DP} = h_{2} (D) - h_{2} (p)

F_{s}

F_{s}

F_{c}

F_{c} \cap F_{s}^{c} = {i : Z_{N}^{(i)} (p) \geq δ_{N} and Z_{N}^{(i)} (D) < 1 - δ_{N}^{2}}

F_{c} \cap F_{s}^{c} = {i : Z_{N}^{(i)} (p) \geq δ_{N} and Z_{N}^{(i)} (D) < 1 - δ_{N}^{2}}

\tilde{F}_{c} = {i : Z_{N}^{(i)} (p) \geq 1 - δ_{N}^{2}}

\tilde{F}_{c} = {i : Z_{N}^{(i)} (p) \geq 1 - δ_{N}^{2}}

∣ F_{c} \cap F_{s}^{c} ∣ \leq F_{c} ∖ \tilde{F}_{c} = o (N)

∣ F_{c} \cap F_{s}^{c} ∣ \leq F_{c} ∖ \tilde{F}_{c} = o (N)

∣ F_{c} \cap F_{s}^{c} ∣ \leq F_{c} ∖ \tilde{F}_{c} = O (N^{1 - α})

∣ F_{c} \cap F_{s}^{c} ∣ \leq F_{c} ∖ \tilde{F}_{c} = O (N^{1 - α})

∣ F_{c} \cap F_{s}^{c} ∣ \leq \hat{F}_{c} = Δ {i : δ_{N} (p) \leq Z_{N}^{(i)} (p) < δ_{N} (D)}

∣ F_{c} \cap F_{s}^{c} ∣ \leq \hat{F}_{c} = Δ {i : δ_{N} (p) \leq Z_{N}^{(i)} (p) < δ_{N} (D)}

ϵ_{1, n} = Δ Z_{n} (p), ϵ_{2, n} = Δ 1 - Z_{n} (D)

ϵ_{1, n} = Δ Z_{n} (p), ϵ_{2, n} = Δ 1 - Z_{n} (D)

(ϵ_{1, n + 1}, ϵ_{2, n + 1}) = (ϵ_{1, n}^{2}, 1 - (Z_{n} (D))^{2})

(ϵ_{1, n + 1}, ϵ_{2, n + 1}) = (ϵ_{1, n}^{2}, 1 - (Z_{n} (D))^{2})

= (ϵ_{1, n}^{2}, 1 - (1 - ϵ_{2, n})^{2}) = (ϵ_{1, n}^{2}, 2 ϵ_{2, n} - ϵ_{2, n}^{2})

(ϵ_{1, n + 1}, ϵ_{2, n + 1})

(ϵ_{1, n + 1}, ϵ_{2, n + 1})

ψ (x)

Z_{n + 1} (p)

Z_{n + 1} (p)

Z_{n + 1} (D)

(\tilde{\epsilon}_{1,n+1},\tilde{\epsilon}_{2,n+1})=\left\{\begin{array}[]{ll}\left(\tilde{\epsilon}_{1,n}^{2},2\tilde{\epsilon}_{2,n}-\tilde{\epsilon}_{2,n}^{2}\right)&\hbox{if $B_{n+1}=1$}\\ \left(2\tilde{\epsilon}_{1,n}-\tilde{\epsilon}_{1,n}^{2},\psi(\tilde{\epsilon}_{2,n})\right)&\hbox{if $B_{n+1}=0$}\end{array}\right.

(\tilde{\epsilon}_{1,n+1},\tilde{\epsilon}_{2,n+1})=\left\{\begin{array}[]{ll}\left(\tilde{\epsilon}_{1,n}^{2},2\tilde{\epsilon}_{2,n}-\tilde{\epsilon}_{2,n}^{2}\right)&\hbox{if $B_{n+1}=1$}\\ \left(2\tilde{\epsilon}_{1,n}-\tilde{\epsilon}_{1,n}^{2},\psi(\tilde{\epsilon}_{2,n})\right)&\hbox{if $B_{n+1}=0$}\end{array}\right.

\overset{ϵ}{^}_{1, n} = Δ Z_{n} (p), \overset{ϵ}{^}_{2, n} = Δ 1 - (Z_{n} (D))^{2}

\overset{ϵ}{^}_{1, n} = Δ Z_{n} (p), \overset{ϵ}{^}_{2, n} = Δ 1 - (Z_{n} (D))^{2}

(\overset{ϵ}{^}_{1, n + 1}, \overset{ϵ}{^}_{2, n + 1}) = (\overset{ϵ}{^}_{1, n}^{2}, 1 - (Z_{n} (D))^{4}) =

(\overset{ϵ}{^}_{1, n + 1}, \overset{ϵ}{^}_{2, n + 1}) = (\overset{ϵ}{^}_{1, n}^{2}, 1 - (Z_{n} (D))^{4}) =

(\overset{ϵ}{^}_{1, n}^{2}, 1 - (1 - \overset{ϵ}{^}_{2, n})^{2}) = (\overset{ϵ}{^}_{1, n}^{2}, 2 \overset{ϵ}{^}_{2, n} - \overset{ϵ}{^}_{2, n}^{2})

(\overset{ϵ}{^}_{1, n + 1}, \overset{ϵ}{^}_{2, n + 1}) \leq (2 \overset{ϵ}{^}_{1, n} - \overset{ϵ}{^}_{1, n}^{2}, \overset{ϵ}{^}_{2, n}^{2})

(\overset{ϵ}{^}_{1, n + 1}, \overset{ϵ}{^}_{2, n + 1}) \leq (2 \overset{ϵ}{^}_{1, n} - \overset{ϵ}{^}_{1, n}^{2}, \overset{ϵ}{^}_{2, n}^{2})

(Z_{n + 1} (D))^{2} \geq (Z_{n} (D))^{2} (2 - (Z_{n} (D))^{2})

(Z_{n + 1} (D))^{2} \geq (Z_{n} (D))^{2} (2 - (Z_{n} (D))^{2})

(\hat{\epsilon}_{1,n+1},\hat{\epsilon}_{2,n+1})\left\{\begin{array}[]{ll}=\left(\hat{\epsilon}_{1,n}^{2},2\hat{\epsilon}_{2,n}-\hat{\epsilon}_{2,n}^{2}\right)&\hbox{if $B_{n+1}=1$}\\ \leq\left(2\hat{\epsilon}_{1,n}-\hat{\epsilon}_{1,n}^{2},\hat{\epsilon}_{2,n}^{2}\right)&\hbox{if $B_{n+1}=0$}\end{array}\right.

(\hat{\epsilon}_{1,n+1},\hat{\epsilon}_{2,n+1})\left\{\begin{array}[]{ll}=\left(\hat{\epsilon}_{1,n}^{2},2\hat{\epsilon}_{2,n}-\hat{\epsilon}_{2,n}^{2}\right)&\hbox{if $B_{n+1}=1$}\\ \leq\left(2\hat{\epsilon}_{1,n}-\hat{\epsilon}_{1,n}^{2},\hat{\epsilon}_{2,n}^{2}\right)&\hbox{if $B_{n+1}=0$}\end{array}\right.

\overset{ϵ}{^}_{1, n} \overset{ϵ}{^}_{2, n} < γ \forall n

\overset{ϵ}{^}_{1, n} \overset{ϵ}{^}_{2, n} < γ \forall n

R_{n} = Δ lo g \frac{ϵ ^ _{1}}{1 - ϵ ^ _{1}} + lo g \frac{ϵ ^ _{2}}{1 - ϵ ^ _{2}}

R_{n} = Δ lo g \frac{ϵ ^ _{1}}{1 - ϵ ^ _{1}} + lo g \frac{ϵ ^ _{2}}{1 - ϵ ^ _{2}}

R_{n+1}\left\{\begin{array}[]{ll}=\log\frac{\hat{\epsilon}_{1}^{2}}{1-\hat{\epsilon}_{1}^{2}}+\log\frac{2\hat{\epsilon}_{2}-\hat{\epsilon}_{2}^{2}}{(1-\hat{\epsilon}_{2})^{2}}&\hbox{if $B_{n+1}=1$}\\ \leq\log\frac{2\hat{\epsilon}_{1}-\hat{\epsilon}_{1}^{2}}{(1-\hat{\epsilon}_{1})^{2}}+\log\frac{\hat{\epsilon}_{2}^{2}}{1-\hat{\epsilon}_{2}^{2}}&\hbox{if $B_{n+1}=0$}\end{array}\right.

R_{n+1}\left\{\begin{array}[]{ll}=\log\frac{\hat{\epsilon}_{1}^{2}}{1-\hat{\epsilon}_{1}^{2}}+\log\frac{2\hat{\epsilon}_{2}-\hat{\epsilon}_{2}^{2}}{(1-\hat{\epsilon}_{2})^{2}}&\hbox{if $B_{n+1}=1$}\\ \leq\log\frac{2\hat{\epsilon}_{1}-\hat{\epsilon}_{1}^{2}}{(1-\hat{\epsilon}_{1})^{2}}+\log\frac{\hat{\epsilon}_{2}^{2}}{1-\hat{\epsilon}_{2}^{2}}&\hbox{if $B_{n+1}=0$}\end{array}\right.

R_{n + 1} = R_{n} + lo g \frac{ϵ ^ _{1} ( 2 - ϵ ^ _{2} )}{( 1 + ϵ ^ _{1} ) ( 1 - ϵ ^ _{2} )}

R_{n + 1} = R_{n} + lo g \frac{ϵ ^ _{1} ( 2 - ϵ ^ _{2} )}{( 1 + ϵ ^ _{1} ) ( 1 - ϵ ^ _{2} )}

\frac{ϵ ^ _{1}}{1 - ϵ ^ _{1}} \cdot \frac{ϵ ^ _{2}}{1 - ϵ ^ _{2}} < 1

\frac{ϵ ^ _{1}}{1 - ϵ ^ _{1}} \cdot \frac{ϵ ^ _{2}}{1 - ϵ ^ _{2}} < 1

\frac{2 - ϵ ^ _{2}}{1 - ϵ ^ _{2}} = 1 + \frac{1}{1 - ϵ ^ _{2}} < 1 + \frac{1}{ϵ ^ _{1}} = \frac{ϵ ^ _{1} + 1}{ϵ ^ _{1}}

\frac{2 - ϵ ^ _{2}}{1 - ϵ ^ _{2}} = 1 + \frac{1}{1 - ϵ ^ _{2}} < 1 + \frac{1}{ϵ ^ _{1}} = \frac{ϵ ^ _{1} + 1}{ϵ ^ _{1}}

lo g \overset{ϵ}{^}_{1, n} + lo g \overset{ϵ}{^}_{2, n} < R_{n} \leq R_{0} =

lo g \overset{ϵ}{^}_{1, n} + lo g \overset{ϵ}{^}_{2, n} < R_{n} \leq R_{0} =

lo g \frac{Z ( p )}{1 - Z ( p )} + lo g \frac{1 - Z ^{2} ( D )}{Z ^{2} ( D )} = Δ lo g γ (Z (p), Z (D))

\hat{\epsilon}_{1,n+1}\hat{\epsilon}_{2,n+1}\leq\left\{\begin{array}[]{ll}\hat{\epsilon}_{1,n}^{2}\hat{\epsilon}_{2,n}(2-\hat{\epsilon}_{2,n})&\hbox{if $B_{n+1}=1$}\\ \hat{\epsilon}_{1,n}(2-\hat{\epsilon}_{1,n})\hat{\epsilon}_{2,n}^{2}&\hbox{if $B_{n+1}=0$}\end{array}\right.

\hat{\epsilon}_{1,n+1}\hat{\epsilon}_{2,n+1}\leq\left\{\begin{array}[]{ll}\hat{\epsilon}_{1,n}^{2}\hat{\epsilon}_{2,n}(2-\hat{\epsilon}_{2,n})&\hbox{if $B_{n+1}=1$}\\ \hat{\epsilon}_{1,n}(2-\hat{\epsilon}_{1,n})\hat{\epsilon}_{2,n}^{2}&\hbox{if $B_{n+1}=0$}\end{array}\right.

\rho_{n}\>{\stackrel{{\scriptstyle\scriptscriptstyle\Delta}}{{=}}}\>\frac{\hat{\epsilon}_{1,n+1}\hat{\epsilon}_{2,n+1}}{\hat{\epsilon}_{1,n}\hat{\epsilon}_{2,n}}\leq\left\{\begin{array}[]{ll}\hat{\epsilon}_{1,n}(2-\hat{\epsilon}_{2,n})&\hbox{if $B_{n+1}=1$}\\ \hat{\epsilon}_{2,n}(2-\hat{\epsilon}_{1,n})&\hbox{if $B_{n+1}=0$}\end{array}\right.

\rho_{n}\>{\stackrel{{\scriptstyle\scriptscriptstyle\Delta}}{{=}}}\>\frac{\hat{\epsilon}_{1,n+1}\hat{\epsilon}_{2,n+1}}{\hat{\epsilon}_{1,n}\hat{\epsilon}_{2,n}}\leq\left\{\begin{array}[]{ll}\hat{\epsilon}_{1,n}(2-\hat{\epsilon}_{2,n})&\hbox{if $B_{n+1}=1$}\\ \hat{\epsilon}_{2,n}(2-\hat{\epsilon}_{1,n})&\hbox{if $B_{n+1}=0$}\end{array}\right.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On Polar Coding for Binary Dirty Paper

Barak Beilin and David Burshtein

School of Electrical Engineering

Tel-Aviv University

Tel-Aviv 6997801, Israel

Email: [email protected], [email protected]

Abstract

The problem of communication over binary dirty paper (DP) using nested polar codes is considered. An improved scheme, focusing on low delay, short to moderate blocklength communication is proposed. Successive cancellation list (SCL) decoding with properly defined CRC is used for channel coding, and SCL encoding without CRC is used for source coding. The performance is compared to the best achievable rate of any coding scheme for binary DP using nested codes. A well known problem with nested polar codes for binary DP is the existence of frozen channel code bits that are not frozen in the source code. These bits need to be retransmitted in a second phase of the scheme, thus reducing transmission rate. We observe that the number of these bits is typically either zero or a small number, and provide an improved analysis, compared to that presented in the literature, on the size of this set and on its scaling with respect to the blocklength when the power constraint parameter is sufficiently large or the channel crossover probability sufficiently small.

I Introduction

Consider the problem of transmission over a side information channel with non-causal side information, also known as the Gelfand-Pinsker (GP) problem. Applications include watermarking codes, memories with defects, write once memories and transmission over broadcast channels. In the GP problem the encoder needs to send a message $M$ reliably over some memoryless channel $W(y\>|\>x,s)$ where $x\in{\cal X}$ is the input to the channel, $y\in{\cal Y}$ is the output, and $s\in{\cal S}$ is the channel state. For each transmitted symbol, $x$ , the state $s$ is obtained by i.i.d. sampling of some given source random variable $S\in{\cal S}$ . The encoder observes the channel state vector, ${\bf s}=(s_{0},s_{1},\ldots,s_{N-1})$ , non-causally, prior to transmission. It then constructs a codeword ${\bf x}=(x_{0},x_{1},\ldots,x_{N-1})$ which is a function of the message ${\bf m}$ and the state vector ${\bf s}$ . The decoder observes only the vector of channel outputs, ${\bf y}=(y_{0},y_{1},\ldots,y_{N-1})$ and constructs the decoded codeword $\hat{{\bf m}}$ from ${\bf y}$ .

The binary dirty paper (DP) problem depicted in Figure 1 is a side information problem with ${\cal X}={\cal S}={\cal Y}=\{0,1\}$ , $S\sim{\rm Ber}(1/2)$ ( ${\rm Ber}(q)$ denotes a Bernoulli $(0,1)$ random variable with probabilities $(1-q,q)$ ), and the channel $W(y\>|\>x,s)$ is defined by

[TABLE]

where $\oplus$ denotes XOR, and $Z\sim{\rm Ber}(p)$ for $p\in(0,1/2)$ .

In this problem there is a “power constraint” which comes in one of two possible forms. The first form is an average power constraint: Given some $D\in(0,1/2)$ , on average a fraction at most $D$ of the channel input bits, $X$ , are ones. That is, ${\rm E}w_{H}({\bf X})/N\leq D$ where ${\bf X}$ is the codeword and $w_{H}()$ denotes Hamming weight. The second form is an individual codeword power constraint, which is stronger than the average power constraint. In this case, each codeword, ${\bf x}$ , needs to satisfy $w_{H}({\bf x})/N\leq D$ . An error event in the communication happens when either the receiver does not decode the correct transmitted message, or (under the individual codeword power constraint) when the encoder cannot obtain a codeword that satisfies the required power constraint. For $D>p$ the capacity of the binary DP problem is given by

[TABLE]

where $h_{2}(\cdot)$ is the binary entropy function (base $2$ ). Following [1], we will assume in this paper that $D>p$ since for the other case, $D<p$ , the capacity as a function of $D$ can be achieved by time sharing with the point with $D=0$ and $R=0$ .

In [1, Section VIII] Arikan’s polar codes [2] were extended to the problem of binary DP using a polar nested codes structure. Given some pair $(p,D)$ such that $0<p<D<1/2$ and blocklength $N$ , one constructs two polar codes with blocklength $N$ . The first is a standard binary polar code, ${\mathcal{C}}_{c}$ , with frozen set $F_{c}$ , designed for reliable communication over the binary symmetric channel, BSC( $p$ ). The second code, ${\mathcal{C}}_{s}$ , with frozen set $F_{s}$ , is a binary polar code for lossy source coding (quantization) [1] of the source $S$ with distortion $D$ . The design of ${\mathcal{C}}_{s}$ is very similar to the design of a polar channel code for a BSC( $D$ ) channel, except for the threshold defining its frozen set. The polar coding scheme for the binary DP problem was formulated in [1] under an average power constraint but it is also valid under an individual codeword power constraint [3]. To describe the method, first assume that $F_{c}\subseteq F_{s}$ such that ${\mathcal{C}}_{c}$ and ${\mathcal{C}}_{s}$ are nested polar codes. As usual, denote by ${\bf u}$ the message and frozen bits of a polar code, and by ${\bf c}={\bf u}G_{N}$ the corresponding polar codeword where $G_{N}$ is the generating matrix defined in [2, Eq. (70)]. The encoder first sets ${\bf u}_{F_{c}}={\bf 0}$ and ${\bf u}_{F_{s}\setminus F_{c}}={\bf m}$ where ${\bf m}$ denotes the information bits that need to be transmitted. This defines a polar source code that we denote by ${\mathcal{C}}_{s}\left(F_{s},{\bf u}_{F_{s}}({\bf m})\right)$ . The encoder then observes ${\bf s}$ and obtains a polar codeword ${\bf s}^{\prime}\in{\mathcal{C}}_{s}\left(F_{s},{\bf u}_{F_{s}}({\bf m})\right)$ that satisfies the power constraint (e.g., under the individual codeword power constraint, $w_{H}({\bf s}\oplus{\bf s}^{\prime})\leq D$ ) using a successive cancellation (SC) encoding algorithm which is a randomized version of the standard SC decoding algorithm [2] (as explained in [1], in practice the standard SC decoding algorithm can be used by the encoder without modification, but the proof requires the randomized version of the algorithm). Now the encoder transmits ${\bf x}={\bf s}\oplus{\bf s}^{\prime}$ . The decoder receives ${\bf y}={\bf x}\oplus{\bf s}\oplus{\bf z}={\bf s}^{\prime}\oplus{\bf z}$ . Noting that ${\bf s}^{\prime}\in{\mathcal{C}}_{c}\left(F_{c},{\bf u}_{F_{c}}=0\right)$ , and that $F_{c}$ was designed to obtain a good channel code for the BSC( $p$ ), the decoder can decode $\hat{{\bf u}}_{F_{c}^{c}}$ ( $F_{c}^{c}$ denotes the complementary of $F_{c}$ ) from ${\bf y}$ using the SC decoding algorithm. The decoded message is obtained as $\hat{{\bf m}}=\hat{{\bf u}}_{F_{s}\setminus F_{c}}$ . The rate of this communication scheme is $R=(|F_{s}|-|F_{c}|)/N$ . Since for $N$ large enough $|F_{s}|/N\rightarrow h_{2}(D)$ and $|F_{c}|/N\rightarrow h_{2}(p)$ , we have $R\rightarrow C_{\rm DP}$ . That is, the scheme approaches capacity.

When the assumption $F_{c}\subseteq F_{s}$ is violated, it was suggested [1] to use a two phase transmission scheme. The first phase is identical to the one described above. In the second phase, we transmit ${\bf u}_{F_{c}\cap F_{s}^{c}}$ (or accumulate the bits of ${\bf u}_{F_{c}\cap F_{s}^{c}}$ of many first phase transmissions and send them together). In the second phase the transmitter ignores the power constraint and transmits ${\bf x}={\bf c}\oplus{\bf s}$ such that the state noise is canceled. However, if $|F_{c}\cap F_{s}^{c}|$ is small then the damage to the power constraint will be negligible (we can compensate by decreasing the design distortion $D$ of ${\mathcal{C}}_{s}$ ). The decoder starts by decoding ${\bf u}_{F_{c}\cap F_{s}^{c}}$ from the second phase transmission, and then it can apply the decoding described above for the case $F_{c}\subseteq F_{s}$ . Another possibility is to transmit in frames using the chaining construction [4] as used in [5] for achieving Marton’s region for broadcast channels. When using a sufficiently large number of frames, the rate loss of the chaining construction becomes negligible. However, the use of $L$ frames increases latency by a factor of $L$ , and hence is not suitable for low delay communications.

In this paper we propose an improved, low delay communication scheme over binary DP using nested polar codes [1], with CRC aided SC list (SCL) decoding [6] for channel coding, and SCL encoding without CRC for source coding. The performance is compared to the best achievable rate of any coding scheme for binary DP using nested codes. We observed that typically the set $F_{c}\cap F_{s}^{c}$ , that needs to be retransmitted in the second phase of the scheme [1] is zero for $D-p$ larger than some small threshold. Our main theoretical contribution is an improved analysis compared to that presented in [1] on $|F_{c}\cap F_{s}^{c}|$ and on its scaling with respect to the blocklength when $D$ is sufficiently large or $p$ sufficiently small.

II Polar SCL coding for binary dirty paper

We now discuss the design of an SCL coding scheme based on the SCL decoder [6] and the nested polar coding scheme of [1]. We first observe that lists are useful for improved lossy source coding since the encoder can choose the least distorted codeword from several possibilities. For example, using an SCL encoder with $L_{s}=50$ lists to encode a Ber( $1/2$ ) source at rate 0.258, with a polar code with blocklength $N=1024$ , the distortion was 0.217, compared to 0.226 when lists are not used. The theoretical minimum distortion for this rate is 0.21. Repeating the experiment with $N=4096$ yielded a distortion of 0.215 with $L_{s}=50$ lists compared to a distortion of 0.223 without lists. Although the use of CRC verification provides a significant reduction in the error rate for channel coding [6], it is not helpful for lossy source coding. This is due to the fact that in this problem we are only interested in the distortion between the source vector and any codeword. Hence a range of codewords which are sufficiently close to the source vector can be used, rather than a single preferred codeword as in the channel coding problem, making the CRC rule irrelevant. Hence, we use SCL encoder with $L_{s}$ lists and without CRC, and SCL decoder with $L_{c}$ lists and with CRC.

The other important design consideration relates to the proper definition of the CRC code. In [6], $r$ CRC bits are computed from $N-|F_{c}|-r$ information bits. The $r$ CRC bits are then appended to the $N-|F_{c}|-r$ information bits. For polar coding over side information channels it would be problematic to compute the CRC this way since this would impose a difficult constraint on the encoder side (how to output a valid codeword for channel coding that satisfies the required distortion bound). To solve this problem, we compute the $r$ CRC bits only from the message bits that need to be transmitted to the decoder. Without using CRC there are $|F_{s}\cap F_{c}^{c}|$ message bits. When using CRC we reduce this number to $|F_{s}\cap F_{c}^{c}|-r$ and compute the $r$ CRC bits from these message bits. We then append the $r$ CRC bits to the $|F_{s}\cap F_{c}^{c}|-r$ message bits, and place them in $F_{s}\cap F_{c}^{c}$ (as in [1] we set zeros in $F_{c}$ ). At the decoder side, in the last decoding stage only those lists for which the CRC is satisfied are considered as in [6].

As in [1, Eqs. (18)-(19)], the frozen sets of the codes ${\mathcal{C}}_{s}$ and ${\mathcal{C}}_{c}$ are defined by

[TABLE]

where $Z_{N}^{(i)}(D)$ ( $Z_{N}^{(i)}(p)$ , respectively) is the Bhattacharyya parameter of the $i$ -th sub-channel after111The basis of all the logarithms in this paper is 2. $n=\log N$ polarization steps of a BSC( $D$ ) (BSC( $p$ )) channel. In [1], $\delta_{N}(D)=1-\delta_{N}^{2}$ and $\delta_{N}(p)=\delta_{N}$ . Setting $\delta_{N}=\delta/N$ , yields error probability at most $\delta$ and, by [1, Lemmas 5, 6 and 7], average distortion at most $D+\sqrt{2}\delta$ . Thus, to meet a required distortion constraint, $D$ , we design $F_{s}$ using a BSC( $D^{\prime}$ ) channel with $D^{\prime}=D-\sqrt{2}\delta$ . A similar statement can also be made regarding the individual codeword power constraint [3, Theorem 2]. In practice we set the thresholds $\delta_{N}(p)$ ( $\delta_{N}(D)$ , respectively) such that the performance of a polar code under SCL decoding (encoding) with the set $F_{c}$ ( $F_{s}$ ) yields the required error rate (distortion) performance.

III Analysis of $|F_{c}\cap F_{s}^{c}|$

As was explained above, for low delay communications with polar codes using our SCL coding variant of the method in [1], $|F_{c}\cap F_{s}^{c}|$ needs to be small (ideally zero).

Consider the definition of $F_{s}$ and $F_{c}$ in (3)-(4) and suppose that $\delta_{N}(D)=1-\delta_{N}^{2}$ and $\delta_{N}(p)=\delta_{N}$ as in [1]. Then

[TABLE]

Define

[TABLE]

The analysis in [1] asserts the following

[TABLE]

where the inequality is due to the degradedness of the sub-channel $W_{N}^{(i)}(D)$ , corresponding to a BSC( $D$ ), with respect to $W_{N}^{(i)}(p)$ , corresponding to a BSC( $p$ ) [1]. The equality is due to the polarization of the BSC( $p$ ) channel. A more refined argument, using scaling results of polar codes [7, 8] shows

[TABLE]

for $\alpha=(1+1/0.2127)=0.175$ (this can be verified using the proofs of Theorem 1 and Theorem 2 in [8]).

However, we observed that for small to moderate values of $N$ , in the above bound of [1] (now formulated in terms of general threshold values, $\delta_{N}(D)$ and $\delta_{N}(p)$ ),

[TABLE]

$|\hat{F}_{c}|$ is quite large for actual practical thresholds, $\delta_{N}(D)$ and $\delta_{N}(p)$ , in the definitions of $F_{s}$ and $F_{c}$ . Fortunately, we have observed empirically that even though $|\hat{F}_{c}|$ tends to be relatively large, $|F_{c}\cap F_{s}^{c}|$ tends to be much smaller, and it vanishes for sufficiently large $D$ or sufficiently small $p$ . For example, consider the case where $L_{c}=L_{s}=8$ and frozen set thresholds designed for block error rate below $0.001$ for channel crossover $p$ , and average distortion below $D$ . Then for $N=1024$ and $p\in\{0.11,0.21,0.31\}$ we have $F_{c}\cap F_{s}^{c}=\emptyset$ for $D-p\geq 0.1$ . For larger values of $N$ , $|F_{c}\cap F_{s}^{c}|$ vanishes even starting from smaller values of $D-p$ . On the other hand, $|\hat{F}_{c}|$ is much larger, e.g. for $p=0.11$ and $D=0.25,0.45$ we have $|\hat{F}_{c}/N|=0.175,0.207$ for blocklength $N=1024$ , and $|\hat{F}_{c}/N|=0.157,0.18$ for $N=2048$ .

We now study the behavior of the set $F_{c}\cap F_{s}^{c}$ , which represents the deviation from perfect code nestedness, and prove that for $p$ sufficiently small or $D$ sufficiently large, $|F_{c}\cap F_{s}^{c}|=O(N^{\xi})$ where $\xi>0$ can be chosen arbitrarily small, thus improving (8) significantly for the case of sufficiently small $p$ or sufficiently large $D$ . In fact, we prove this result for any pair of binary memoryless symmetric (BMS) channels, $W(p)$ and $W(D)$ , with Bhattacharyya parameters $Z(p)$ and $Z(D)$ , without requiring degradedness of $W(D)$ with respect to $W(p)$ , which is important for the generalization of the results for side information channels beyond binary DP.

Consider the random processes $Z_{n}(p)$ and $Z_{n}(D)$ , $n=0,1,\ldots,\log N$ . They both follow the same sequence of Arikan’s channel transformations, defined by [2, Eq. (22)] if $B_{n}=0$ and by [2, Eq. (23)] if $B_{n}=1$ , where $\left<B_{1},B_{2},\ldots,B_{\log N}\right>$ defines the index of some polar sub-channel. Initially $Z_{0}(p)=Z(p)$ and $Z_{0}(D)=Z(D)$ . Denote

[TABLE]

In particular, $\epsilon_{1,0}=Z(p)$ and $\epsilon_{2,0}=1-Z(D)$ . Now, if $B_{n+1}=1$ then

[TABLE]

If $B_{n+1}=0$ then

[TABLE]

where the inequality actually denotes two inequalities, one for each term. These inequalities follow from the following well known relations, e.g. [2], [7, Eq. (13)], for $B_{n+1}=0$ ,

[TABLE]

Lemma 1.

Consider the process $(\tilde{\epsilon}_{1,n},\tilde{\epsilon}_{2,n})$ defined by $\tilde{\epsilon}_{1,0}=Z(p)$ , $\tilde{\epsilon}_{2,0}=1-Z(D)$ , and, for $n=1,2,\ldots,\log N$ ,

[TABLE]

For $n=\log N$ we have $N$ possible realizations of the process corresponding to all possible sub-channels. The number of realizations for which both $\tilde{\epsilon}_{1,\log N}\geq\delta_{N}(p)$ and $\tilde{\epsilon}_{2,\log N}>1-\delta_{N}(D)$ is an upper bound on $|F_{c}\cap F_{s}^{c}|$ .

The proof follows from (12)-(13) and the fact that all the functions that appear in (17), including $\psi()$ , are monotonically increasing for $\tilde{\epsilon}_{i,n}\in(0,1)$ , $i=1,2$ .

We note that the bound provided by Lemma 1 on $|F_{c}\cap F_{s}^{c}|$ is monotonically increasing in $Z(p)$ and monotonically decreasing in $Z(D)$ . We used Lemma 1 to compute bounds on $|F_{c}\cap F_{s}^{c}|$ for some $p$ and $D$ values, and compare with the actual value of $|F_{c}\cap F_{s}^{c}|$ . $\delta_{N}(p)$ and $\delta_{N}(D)$ were set to obtain a block error probability $0.001$ , and average distortion $D$ with $L_{c}=8$ and $L_{s}=8$ . As an example, for $N=1024$ and $p=0.11$ ( $p=0.21$ , respectively), $|F_{c}\cap F_{s}^{c}|$ vanishes for $D-p\geq 0.1$ ( $D-p\geq 0.1$ ) while the bound requires $D-p\geq 0.16$ ( $D-p\geq 0.14$ ). For $N=2048$ and $p=0.11$ (same results for $p=0.21$ ), $|F_{c}\cap F_{s}^{c}|$ vanishes for $D-p\geq 0.08$ while the bound requires $D-p\geq 0.14$ .

We proceed the analysis by defining $\hat{\epsilon}_{1,n}$ , $\hat{\epsilon}_{2,n}$ by

[TABLE]

such that $\hat{\epsilon}_{1,n}=\epsilon_{1,n}$ (see (10)). Similarly to (12) we have that if $B_{n+1}=1$ then

[TABLE]

Similarly to (13), if $B_{n+1}=0$ then

[TABLE]

The inequality for the left terms is due to (15) and the inequality for the right terms is due to (16) that can be rewritten as

[TABLE]

Thus we have

[TABLE]

We now claim the following key lemma.

Lemma 2.

[TABLE]

where $\gamma=\gamma(Z(p),Z(D))$ becomes arbitrarily small for $Z(p)$ sufficiently small or $Z(D)$ sufficiently large.

Proof.

For notational convenience, denote by $\hat{\epsilon}_{1}\>{\stackrel{{\scriptstyle\scriptscriptstyle\Delta}}{{=}}}\>\hat{\epsilon}_{1,n}$ , $\hat{\epsilon}_{2}\>{\stackrel{{\scriptstyle\scriptscriptstyle\Delta}}{{=}}}\>\hat{\epsilon}_{2,n}$ , and

[TABLE]

Hence,

[TABLE]

We will first show that for all $n$ , if $R_{n}\leq A$ , where $A<0$ , then $R_{n+1}\leq A$ . For that, it is sufficient to consider the first case in (26) ( $B_{n+1}=1$ ), since the same proof holds for the other case, $B_{n+1}=0$ . Now, the first case in (26) can be written as

[TABLE]

Since $R_{n}<A<0$ , we have

[TABLE]

Hence, $\hat{\epsilon}_{1}+\hat{\epsilon}_{2}<1$ . Therefore,

[TABLE]

Using this inequality in (27) yields $R_{n+1}<R_{n}\leq A$ as claimed. We conclude that if $R_{0}=A<0$ then $R_{n}\leq R_{0}$ for all $n$ , so that

[TABLE]

where the first inequality follows from the fact that $1-\hat{\epsilon}_{i,n}<1$ , for $i=1,2$ . Note that $\gamma(Z(p),Z(D))$ can be made arbitrarily small by choosing $Z(p)$ sufficiently small or $Z(D)$ sufficiently large. ∎

We can now state and prove our main result for the nested codes property.

Theorem 1.

Consider the case where $W(p)$ and $W(D)$ are BMS channels with Bhattacharyya parameters $Z(p)$ and $Z(D)$ . Given $0<Z(p)<Z(D)<1$ , suppose either a small enough $Z(p)$ or a large enough $Z(D)$ . Then, $|F_{c}\cap F_{s}^{c}|=O(N^{\xi})$ where $\xi>0$ can be set arbitrarily small.

Proof.

By (23) we have

[TABLE]

Hence,

[TABLE]

Now, since by Lemma 2 either $\hat{\epsilon}_{1,n}<\sqrt{\gamma}$ or $\hat{\epsilon}_{2,n}<\sqrt{\gamma}$ , we conclude that w.p. $1/2$ , $\rho_{n}\leq 2$ and w.p. $1/2$ , $\rho_{n}\leq 2\sqrt{\gamma}$ . That is,

[TABLE]

where, similarly to $\{B_{i}\}$ , the random variables $\{\tilde{B}_{i}\}$ are independent, binary, uniformly distributed (i.e., $\tilde{B}_{i}=(0,1)$ w.p. $(1/2,1/2)$ ).

Following [2, Section IV.B], define, for $\eta\in(0,1/2)$ , the event

[TABLE]

Using the same argument as in [2, Section IV.B] we know that if the event ${\cal U}_{n}(\eta)$ holds then

[TABLE]

for $\zeta=2\sqrt{\gamma}$ . It is also known [2, Section IV.B], by Chernoff’s bound, that

[TABLE]

Now, (36) implies that

[TABLE]

Setting $n=\log N$ we obtain

[TABLE]

where $\xi=h_{2}(0.5-\eta)$ . Furthermore, $\xi>0$ can be made arbitrarily small by setting $\eta\rightarrow 0.5^{-}$ . Recall that if either $Z(p)$ is small enough or $Z(D)$ large enough, then $\gamma$ can be made as small as desired. Hence $a$ can be set as large as desired (any $a>2$ is sufficient to prove the theorem). Recalling the connection between the random process $Z_{n}(p)$ ( $Z_{n}(D)$ , respectively) for $n=\log N$ and the values of the sub-channels, $Z_{N}^{(i)}(p)$ ( $Z_{N}^{(i)}(D)$ ) [2], we obtain

[TABLE]

Combining this with (5) concludes the proof (since we can take $a>2$ ). ∎

IV Simulation Results

We now present results for our polar SCL scheme to the binary DP problem. All the results presented here were achieved without the need to use retransmission, i.e., $F_{s}\cap F_{c}^{s}=\emptyset$ in all the reported cases.

Figure 2 presents our results for $p=0.11$ and $D\in{0.21,0.31,0.41}$ . We used $L_{c}=8$ lists and CRC of size 8 in the decoder, and $L_{s}\in\{1,50\}$ in the encoder. For each experiment the figure shows the maximum polar SCL rate under the constraint of error rate below $\epsilon_{p}=0.001$ and average distortion below $D$ . It can be seen how increasing the number of lists in the source encoder increases the achievable rate.

The figure also shows the approximated maximum achievable rate of any nested coding scheme for binary DP. It was obtained using the approximated maximum achievable channel coding rate, and minimum achievable lossy compression rate in a finite blocklength regime, in [9, Theorem 52] and [10, Eqs. (1), (11) and (93)] respectively. Since the code is nested, its approximated binary DP maximum achievable rate is obtained by subtracting these two approximations from [9] and [10]. Let $N$ be the blocklength. Denote by $\epsilon_{p}$ the block error rate, and by $\epsilon_{D}$ the distortion constraint violation rate. Then

[TABLE]

where $Q^{-1}()$ is the inverse of the standard Gaussian complementary cumulative distribution function. The approximated bounds in Fig. 2 were obtained by setting $\epsilon_{p}=0.001$ and $\epsilon_{D}=0.5$ , which means that we are taking the standard rate distortion bound for the lossy source coding part (since in this experiment we only set an average distortion constraint, as in [1]). Repeating the same experiment with $\epsilon_{D}=0.01$ yields an even smaller gap between the achievable rates using the SCL polar coding scheme and the bounds.

We have also compared our polar SCL coding scheme to the superposition coding scheme in [11] for long blocklength codes, using LDPC codes for channel coding and convolutional codes for source coding. In [11] the blocklength was $N$ =100,000. In our experiments we used both $N=2^{17}=\mbox{131,072}$ and $N=2^{16}=\mbox{65,536}$ . The results, shown in Table I, show comparable results for long blocklength codes. The SCL results were tested 100 times as in [11]. We note that the method in [11] required considerable computational resources (150–200 belief propagation iterations and 10–15 BCJR iterations with 1024 states in the decoding trellis). Results for shorter blocklengths are not reported in [11].

Acknowledgment

This research was supported by the Israel Science Foundation (grant no. 1868/18).

Bibliography11

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. B. Korada and R. L. Urbanke, “Polar codes are optimal for lossy source coding,” IEEE Transactions on Information Theory , vol. 56, no. 4, pp. 1751–1768, 2010.
2[2] E. Arikan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Transactions on Information Theory , vol. 55, no. 7, pp. 3051–3073, 2009.
3[3] D. Burshtein and A. Strugatski, “Polar write once memory codes,” IEEE Transactions on Information Theory , vol. 59, no. 8, pp. 5088–5101, August 2013.
4[4] S. H. Hassani and R. Urbanke, “Universal polar codes,” in IEEE International Symposium on Information Theory (ISIT) , 2014, pp. 1451–1455.
5[5] M. Mondelli, S. H. Hassani, I. Sason, and R. L. Urbanke, “Achieving Marton’s region for broadcast channels using polar codes,” IEEE Transactions on Information Theory , vol. 61, no. 2, pp. 783–800, February 2015.
6[6] I. Tal and A. Vardy, “List decoding of polar codes,” IEEE Transactions on Information Theory , vol. 61, no. 5, pp. 2213–2226, 2015.
7[7] S. H. Hassani, K. Alishahi, and R. Urbanke, “Finite-length scaling for polar codes,” IEEE Transactions on Information Theory , vol. 60, no. 10, pp. 5875–5898, 2014.
8[8] D. Goldin and D. Burshtein, “Improved bounds on the finite length scaling of polar codes,” IEEE Transactions on Information Theory , vol. 60, no. 11, pp. 6966–6978, November 2014.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On Polar Coding for Binary Dirty Paper

Abstract

I Introduction

II Polar SCL coding for binary dirty paper

III Analysis of ∣Fc∩Fsc∣|F_{c}\cap F_{s}^{c}|∣Fc​∩Fsc​∣

Lemma 1**.**

Lemma 2**.**

Proof.

Theorem 1**.**

Proof.

IV Simulation Results

Acknowledgment

III Analysis of $|F_{c}\cap F_{s}^{c}|$

Lemma 1.

Lemma 2.

Theorem 1.