On Polar Coding for Binary Dirty Paper
Barak Beilin, David Burshtein

TL;DR
This paper proposes an improved nested polar coding scheme for binary dirty paper channels, focusing on low delay and short to moderate blocklengths, with analysis on frozen bits and performance comparison.
Contribution
It introduces a new analysis of frozen bits in nested polar codes for binary DP and demonstrates an improved scheme with practical decoding strategies.
Findings
Frozen bits are often zero or small in number, reducing retransmission needs.
The scheme achieves performance close to the best possible rates for binary DP.
Analysis shows how frozen bits scale with blocklength under certain conditions.
Abstract
The problem of communication over binary dirty paper (DP) using nested polar codes is considered. An improved scheme, focusing on low delay, short to moderate blocklength communication is proposed. Successive cancellation list (SCL) decoding with properly defined CRC is used for channel coding, and SCL encoding without CRC is used for source coding. The performance is compared to the best achievable rate of any coding scheme for binary DP using nested codes. A well known problem with nested polar codes for binary DP is the existence of frozen channel code bits that are not frozen in the source code. These bits need to be retransmitted in a second phase of the scheme, thus reducing transmission rate. We observe that the number of these bits is typically either zero or a small number, and provide an improved analysis, compared to that presented in the literature, on the size of this set…
| Description | Block size | DP Rate | Bit Error Rate |
|---|---|---|---|
| LDPC scheme | 100K | 0.36 | |
| SCL, , | 130K | 0.353 | |
| SCL, , | 130K | 0.357 | |
| SCL, , | 130K | 0.362 | |
| SCL, , | 65K | 0.356 | |
| Capacity | 0.42 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On Polar Coding for Binary Dirty Paper
Barak Beilin and David Burshtein
School of Electrical Engineering
Tel-Aviv University
Tel-Aviv 6997801, Israel
Email: [email protected], [email protected]
Abstract
The problem of communication over binary dirty paper (DP) using nested polar codes is considered. An improved scheme, focusing on low delay, short to moderate blocklength communication is proposed. Successive cancellation list (SCL) decoding with properly defined CRC is used for channel coding, and SCL encoding without CRC is used for source coding. The performance is compared to the best achievable rate of any coding scheme for binary DP using nested codes. A well known problem with nested polar codes for binary DP is the existence of frozen channel code bits that are not frozen in the source code. These bits need to be retransmitted in a second phase of the scheme, thus reducing transmission rate. We observe that the number of these bits is typically either zero or a small number, and provide an improved analysis, compared to that presented in the literature, on the size of this set and on its scaling with respect to the blocklength when the power constraint parameter is sufficiently large or the channel crossover probability sufficiently small.
I Introduction
Consider the problem of transmission over a side information channel with non-causal side information, also known as the Gelfand-Pinsker (GP) problem. Applications include watermarking codes, memories with defects, write once memories and transmission over broadcast channels. In the GP problem the encoder needs to send a message reliably over some memoryless channel where is the input to the channel, is the output, and is the channel state. For each transmitted symbol, , the state is obtained by i.i.d. sampling of some given source random variable . The encoder observes the channel state vector, , non-causally, prior to transmission. It then constructs a codeword which is a function of the message and the state vector . The decoder observes only the vector of channel outputs, and constructs the decoded codeword from .
The binary dirty paper (DP) problem depicted in Figure 1 is a side information problem with , ( denotes a Bernoulli random variable with probabilities ), and the channel is defined by
[TABLE]
where denotes XOR, and for .
In this problem there is a “power constraint” which comes in one of two possible forms. The first form is an average power constraint: Given some , on average a fraction at most of the channel input bits, , are ones. That is, where is the codeword and denotes Hamming weight. The second form is an individual codeword power constraint, which is stronger than the average power constraint. In this case, each codeword, , needs to satisfy . An error event in the communication happens when either the receiver does not decode the correct transmitted message, or (under the individual codeword power constraint) when the encoder cannot obtain a codeword that satisfies the required power constraint. For the capacity of the binary DP problem is given by
[TABLE]
where is the binary entropy function (base ). Following [1], we will assume in this paper that since for the other case, , the capacity as a function of can be achieved by time sharing with the point with and .
In [1, Section VIII] Arikan’s polar codes [2] were extended to the problem of binary DP using a polar nested codes structure. Given some pair such that and blocklength , one constructs two polar codes with blocklength . The first is a standard binary polar code, , with frozen set , designed for reliable communication over the binary symmetric channel, BSC(). The second code, , with frozen set , is a binary polar code for lossy source coding (quantization) [1] of the source with distortion . The design of is very similar to the design of a polar channel code for a BSC() channel, except for the threshold defining its frozen set. The polar coding scheme for the binary DP problem was formulated in [1] under an average power constraint but it is also valid under an individual codeword power constraint [3]. To describe the method, first assume that such that and are nested polar codes. As usual, denote by the message and frozen bits of a polar code, and by the corresponding polar codeword where is the generating matrix defined in [2, Eq. (70)]. The encoder first sets and where denotes the information bits that need to be transmitted. This defines a polar source code that we denote by . The encoder then observes and obtains a polar codeword that satisfies the power constraint (e.g., under the individual codeword power constraint, ) using a successive cancellation (SC) encoding algorithm which is a randomized version of the standard SC decoding algorithm [2] (as explained in [1], in practice the standard SC decoding algorithm can be used by the encoder without modification, but the proof requires the randomized version of the algorithm). Now the encoder transmits . The decoder receives . Noting that , and that was designed to obtain a good channel code for the BSC(), the decoder can decode ( denotes the complementary of ) from using the SC decoding algorithm. The decoded message is obtained as . The rate of this communication scheme is . Since for large enough and , we have . That is, the scheme approaches capacity.
When the assumption is violated, it was suggested [1] to use a two phase transmission scheme. The first phase is identical to the one described above. In the second phase, we transmit (or accumulate the bits of of many first phase transmissions and send them together). In the second phase the transmitter ignores the power constraint and transmits such that the state noise is canceled. However, if is small then the damage to the power constraint will be negligible (we can compensate by decreasing the design distortion of ). The decoder starts by decoding from the second phase transmission, and then it can apply the decoding described above for the case . Another possibility is to transmit in frames using the chaining construction [4] as used in [5] for achieving Marton’s region for broadcast channels. When using a sufficiently large number of frames, the rate loss of the chaining construction becomes negligible. However, the use of frames increases latency by a factor of , and hence is not suitable for low delay communications.
In this paper we propose an improved, low delay communication scheme over binary DP using nested polar codes [1], with CRC aided SC list (SCL) decoding [6] for channel coding, and SCL encoding without CRC for source coding. The performance is compared to the best achievable rate of any coding scheme for binary DP using nested codes. We observed that typically the set , that needs to be retransmitted in the second phase of the scheme [1] is zero for larger than some small threshold. Our main theoretical contribution is an improved analysis compared to that presented in [1] on and on its scaling with respect to the blocklength when is sufficiently large or sufficiently small.
II Polar SCL coding for binary dirty paper
We now discuss the design of an SCL coding scheme based on the SCL decoder [6] and the nested polar coding scheme of [1]. We first observe that lists are useful for improved lossy source coding since the encoder can choose the least distorted codeword from several possibilities. For example, using an SCL encoder with lists to encode a Ber() source at rate 0.258, with a polar code with blocklength , the distortion was 0.217, compared to 0.226 when lists are not used. The theoretical minimum distortion for this rate is 0.21. Repeating the experiment with yielded a distortion of 0.215 with lists compared to a distortion of 0.223 without lists. Although the use of CRC verification provides a significant reduction in the error rate for channel coding [6], it is not helpful for lossy source coding. This is due to the fact that in this problem we are only interested in the distortion between the source vector and any codeword. Hence a range of codewords which are sufficiently close to the source vector can be used, rather than a single preferred codeword as in the channel coding problem, making the CRC rule irrelevant. Hence, we use SCL encoder with lists and without CRC, and SCL decoder with lists and with CRC.
The other important design consideration relates to the proper definition of the CRC code. In [6], CRC bits are computed from information bits. The CRC bits are then appended to the information bits. For polar coding over side information channels it would be problematic to compute the CRC this way since this would impose a difficult constraint on the encoder side (how to output a valid codeword for channel coding that satisfies the required distortion bound). To solve this problem, we compute the CRC bits only from the message bits that need to be transmitted to the decoder. Without using CRC there are message bits. When using CRC we reduce this number to and compute the CRC bits from these message bits. We then append the CRC bits to the message bits, and place them in (as in [1] we set zeros in ). At the decoder side, in the last decoding stage only those lists for which the CRC is satisfied are considered as in [6].
As in [1, Eqs. (18)-(19)], the frozen sets of the codes and are defined by
[TABLE]
where (, respectively) is the Bhattacharyya parameter of the -th sub-channel after111The basis of all the logarithms in this paper is 2. polarization steps of a BSC() (BSC()) channel. In [1], and . Setting , yields error probability at most and, by [1, Lemmas 5, 6 and 7], average distortion at most . Thus, to meet a required distortion constraint, , we design using a BSC() channel with . A similar statement can also be made regarding the individual codeword power constraint [3, Theorem 2]. In practice we set the thresholds (, respectively) such that the performance of a polar code under SCL decoding (encoding) with the set () yields the required error rate (distortion) performance.
III Analysis of
As was explained above, for low delay communications with polar codes using our SCL coding variant of the method in [1], needs to be small (ideally zero).
Consider the definition of and in (3)-(4) and suppose that and as in [1]. Then
[TABLE]
Define
[TABLE]
The analysis in [1] asserts the following
[TABLE]
where the inequality is due to the degradedness of the sub-channel , corresponding to a BSC(), with respect to , corresponding to a BSC() [1]. The equality is due to the polarization of the BSC() channel. A more refined argument, using scaling results of polar codes [7, 8] shows
[TABLE]
for (this can be verified using the proofs of Theorem 1 and Theorem 2 in [8]).
However, we observed that for small to moderate values of , in the above bound of [1] (now formulated in terms of general threshold values, and ),
[TABLE]
is quite large for actual practical thresholds, and , in the definitions of and . Fortunately, we have observed empirically that even though tends to be relatively large, tends to be much smaller, and it vanishes for sufficiently large or sufficiently small . For example, consider the case where and frozen set thresholds designed for block error rate below for channel crossover , and average distortion below . Then for and we have for . For larger values of , vanishes even starting from smaller values of . On the other hand, is much larger, e.g. for and we have for blocklength , and for .
We now study the behavior of the set , which represents the deviation from perfect code nestedness, and prove that for sufficiently small or sufficiently large, where can be chosen arbitrarily small, thus improving (8) significantly for the case of sufficiently small or sufficiently large . In fact, we prove this result for any pair of binary memoryless symmetric (BMS) channels, and , with Bhattacharyya parameters and , without requiring degradedness of with respect to , which is important for the generalization of the results for side information channels beyond binary DP.
Consider the random processes and , . They both follow the same sequence of Arikan’s channel transformations, defined by [2, Eq. (22)] if and by [2, Eq. (23)] if , where defines the index of some polar sub-channel. Initially and . Denote
[TABLE]
In particular, and . Now, if then
[TABLE]
If then
[TABLE]
where the inequality actually denotes two inequalities, one for each term. These inequalities follow from the following well known relations, e.g. [2], [7, Eq. (13)], for ,
[TABLE]
Lemma 1**.**
Consider the process defined by , , and, for ,
[TABLE]
For we have possible realizations of the process corresponding to all possible sub-channels. The number of realizations for which both and is an upper bound on .
The proof follows from (12)-(13) and the fact that all the functions that appear in (17), including , are monotonically increasing for , .
We note that the bound provided by Lemma 1 on is monotonically increasing in and monotonically decreasing in . We used Lemma 1 to compute bounds on for some and values, and compare with the actual value of . and were set to obtain a block error probability , and average distortion with and . As an example, for and (, respectively), vanishes for () while the bound requires (). For and (same results for ), vanishes for while the bound requires .
We proceed the analysis by defining , by
[TABLE]
such that (see (10)). Similarly to (12) we have that if then
[TABLE]
Similarly to (13), if then
[TABLE]
The inequality for the left terms is due to (15) and the inequality for the right terms is due to (16) that can be rewritten as
[TABLE]
Thus we have
[TABLE]
We now claim the following key lemma.
Lemma 2**.**
[TABLE]
where becomes arbitrarily small for sufficiently small or sufficiently large.
Proof.
For notational convenience, denote by , , and
[TABLE]
Hence,
[TABLE]
We will first show that for all , if , where , then . For that, it is sufficient to consider the first case in (26) (), since the same proof holds for the other case, . Now, the first case in (26) can be written as
[TABLE]
Since , we have
[TABLE]
Hence, . Therefore,
[TABLE]
Using this inequality in (27) yields as claimed. We conclude that if then for all , so that
[TABLE]
where the first inequality follows from the fact that , for . Note that can be made arbitrarily small by choosing sufficiently small or sufficiently large. ∎
We can now state and prove our main result for the nested codes property.
Theorem 1**.**
Consider the case where and are BMS channels with Bhattacharyya parameters and . Given , suppose either a small enough or a large enough . Then, where can be set arbitrarily small.
Proof.
By (23) we have
[TABLE]
Hence,
[TABLE]
Now, since by Lemma 2 either or , we conclude that w.p. , and w.p. , . That is,
[TABLE]
where, similarly to , the random variables are independent, binary, uniformly distributed (i.e., w.p. ).
Following [2, Section IV.B], define, for , the event
[TABLE]
Using the same argument as in [2, Section IV.B] we know that if the event holds then
[TABLE]
for . It is also known [2, Section IV.B], by Chernoff’s bound, that
[TABLE]
Now, (36) implies that
[TABLE]
Setting we obtain
[TABLE]
where . Furthermore, can be made arbitrarily small by setting . Recall that if either is small enough or large enough, then can be made as small as desired. Hence can be set as large as desired (any is sufficient to prove the theorem). Recalling the connection between the random process (, respectively) for and the values of the sub-channels, () [2], we obtain
[TABLE]
Combining this with (5) concludes the proof (since we can take ). ∎
IV Simulation Results
We now present results for our polar SCL scheme to the binary DP problem. All the results presented here were achieved without the need to use retransmission, i.e., in all the reported cases.
Figure 2 presents our results for and . We used lists and CRC of size 8 in the decoder, and in the encoder. For each experiment the figure shows the maximum polar SCL rate under the constraint of error rate below and average distortion below . It can be seen how increasing the number of lists in the source encoder increases the achievable rate.
The figure also shows the approximated maximum achievable rate of any nested coding scheme for binary DP. It was obtained using the approximated maximum achievable channel coding rate, and minimum achievable lossy compression rate in a finite blocklength regime, in [9, Theorem 52] and [10, Eqs. (1), (11) and (93)] respectively. Since the code is nested, its approximated binary DP maximum achievable rate is obtained by subtracting these two approximations from [9] and [10]. Let be the blocklength. Denote by the block error rate, and by the distortion constraint violation rate. Then
[TABLE]
[TABLE]
where is the inverse of the standard Gaussian complementary cumulative distribution function. The approximated bounds in Fig. 2 were obtained by setting and , which means that we are taking the standard rate distortion bound for the lossy source coding part (since in this experiment we only set an average distortion constraint, as in [1]). Repeating the same experiment with yields an even smaller gap between the achievable rates using the SCL polar coding scheme and the bounds.
We have also compared our polar SCL coding scheme to the superposition coding scheme in [11] for long blocklength codes, using LDPC codes for channel coding and convolutional codes for source coding. In [11] the blocklength was =100,000. In our experiments we used both and . The results, shown in Table I, show comparable results for long blocklength codes. The SCL results were tested 100 times as in [11]. We note that the method in [11] required considerable computational resources (150–200 belief propagation iterations and 10–15 BCJR iterations with 1024 states in the decoding trellis). Results for shorter blocklengths are not reported in [11].
Acknowledgment
This research was supported by the Israel Science Foundation (grant no. 1868/18).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. B. Korada and R. L. Urbanke, “Polar codes are optimal for lossy source coding,” IEEE Transactions on Information Theory , vol. 56, no. 4, pp. 1751–1768, 2010.
- 2[2] E. Arikan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Transactions on Information Theory , vol. 55, no. 7, pp. 3051–3073, 2009.
- 3[3] D. Burshtein and A. Strugatski, “Polar write once memory codes,” IEEE Transactions on Information Theory , vol. 59, no. 8, pp. 5088–5101, August 2013.
- 4[4] S. H. Hassani and R. Urbanke, “Universal polar codes,” in IEEE International Symposium on Information Theory (ISIT) , 2014, pp. 1451–1455.
- 5[5] M. Mondelli, S. H. Hassani, I. Sason, and R. L. Urbanke, “Achieving Marton’s region for broadcast channels using polar codes,” IEEE Transactions on Information Theory , vol. 61, no. 2, pp. 783–800, February 2015.
- 6[6] I. Tal and A. Vardy, “List decoding of polar codes,” IEEE Transactions on Information Theory , vol. 61, no. 5, pp. 2213–2226, 2015.
- 7[7] S. H. Hassani, K. Alishahi, and R. Urbanke, “Finite-length scaling for polar codes,” IEEE Transactions on Information Theory , vol. 60, no. 10, pp. 5875–5898, 2014.
- 8[8] D. Goldin and D. Burshtein, “Improved bounds on the finite length scaling of polar codes,” IEEE Transactions on Information Theory , vol. 60, no. 11, pp. 6966–6978, November 2014.
