Design of Polar Codes for Parallel Channels with an Average Power Constraint
Thomas Wiegart, Tobias Prinz, Fabian Steiner, Peihong Yuan

TL;DR
This paper proposes a novel polar code design for parallel BiAWGN channels that optimizes power allocation and bit-to-channel mapping to maximize mutual information, resulting in improved performance under average power constraints.
Contribution
It introduces a polar code design tailored for parallel channels with optimized power allocation and bit mapping, demonstrating gains over traditional methods.
Findings
Power allocation specific to polar codes yields significant performance gains.
Optimized bit-to-channel mapping improves mutual information sum.
The approach enhances polar code efficiency under average power constraints.
Abstract
Polar codes are designed for parallel binary-input additive white Gaussian noise (BiAWGN) channels with an average power constraint. The two main design choices are: the mapping between codeword bits and channels of different quality, and the power allocation under the average power constraint. Information theory suggests to allocate power such that the sum of mutual information (MI) terms is maximized. However, a power allocation specific to polar codes shows significant gains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsError Correcting Code Techniques · Advanced Wireless Communication Techniques · Algorithms and Data Compression
Design of Polar Codes for Parallel Channels with an Average Power Constraint
Thomas Wiegart, Tobias Prinz, Fabian Steiner, Peihong Yuan
Institute for Communications Engineering, Technical University of Munich, Germany
Email: {thomas.wiegart, tobias.prinz, fabian.steiner, peihong.yuan}@tum.de
Abstract
Polar codes are designed for parallel BiAWGN (BiAWGN) channels with an average power constraint. The two main design choices are: the mapping between codeword bits and channels of different quality, and the power allocation under the average power constraint. Information theory suggests to allocate power such that the sum of MI (MI) terms is maximized. However, a power allocation specific to polar codes shows significant gains.
Index Terms:
Polar Code, Power Allocation, Mercury-Waterfilling, Parallel Channels, Block-Fading Channels.
I Introduction
Polar codes were introduced in [1, 2]. They are the first class of codes that achieve the capacity of binary input discrete memoryless channels with a deterministic construction [2]. channel. In [3] it was shown that the effect of polarization also takes place for non-stationary channels. In this paper, we consider parallel BiAWGN channels with an average power constraint [4, Section 9.4]. Parallel channels naturally arise for OFDM (OFDM) transceivers, where a time-frequency resource block has multiple channels of different quality. The model also describes block-fading channels. Polar codes are a natural choice for parallel channels because the different channels can be interpreted as being pre-polarized.
To develop a basic understanding, we consider the special case of two parallel BiAWGN channels. We address the following two questions: 1) How should the codeword bits be mapped to channels of different quality — or equivalently, how should one design an interleaver between the codeword bits and the channel. This has been partially addressed in the literature, e.g., [5, 6]. Both papers propose a sorted mapping that combines two different channels such that each kernel gets one instance of both channels. We also use this mapping, but we show that it does not necessarily minimize the FER (FER).
- How should power be allocated for a good finite length performance? To the best of our knowledge, this has not been considered in the literature yet. We show that the information-theoretic approach of maximizing the achievable rate, also known as mercury/waterfilling [7], is suboptimal in terms of FER for finite length polar codes.
This work is structured as follows: in Sec. II we state the system model and preliminaries. In Sec. III we discuss the problem of designing polar codes for parallel channels. We provide numerical examples in Sec. IV and conclude in Sec. V.
II Preliminaries
II-A Notation
We denote random variables by capital letters (e.g., ) and deterministic variables or realizations by small letters (e.g., ). Deterministic vectors are denoted by a bold italic font with small letters (e.g., ), while we use a bold italic font with capital letters (e.g., ) for deterministic matrices and random vectors. We write .
II-B System Model
Consider parallel BiAWGN channels
[TABLE]
where , , , and denote the receive signal, the channel coefficient, the transmit signal, and additive white Gaussian noise with , respectively. For simplicity, we consider only parallel channels W_{1}\!\!:p_{\mathopen{}\mathclose{{}\left.Y\>\!\middle|\>\!X}\right.}\mathopen{}\mathclose{{}\left(y\>\!\middle|\>\!x\mathchar 59\relax h_{1}}\right) and W_{2}\!\!:p_{\mathopen{}\mathclose{{}\left.Y\>\!\middle|\>\!X}\right.}\mathopen{}\mathclose{{}\left(y\>\!\middle|\>\!x\mathchar 59\relax h_{2}}\right). We assume that the channel coefficients are known to the encoder and decoder. The input signals are scaled BPSK symbols, i.e., we have
[TABLE]
The value is the power of the transmit signal . We consider a common power constraint (see [4, Section 9.4])
[TABLE]
We combine uses of each of the two channels to a block of channel uses.
II-C Polar Codes
Polar codes are linear block codes described by three parameters : the block length , dimension , and a set of information bits with . The code rate is . The input has an information bit at position if , and zeros at the remaining positions, i.e., . These bits are called frozen. The codeword is generated from by
[TABLE]
denotes the -th Kronecker power of . The codeword is mapped to BPSK transmit symbols which are transmitted over the channel and received as the vector .
With SC (SC) decoding, the information bits , are estimated using and the estimates of the previous bits . The frozen bits are decoded to zero, i.e., for . The MI terms \mathop{}\!{\textnormal{I}}\mathopen{}\mathclose{{}\left(U_{i}\mathchar 59\relax\bm{Y}\>\!\middle|\>\!\bm{U}_{1}^{i-1}}\right) specify the maximum transmission rate over virtual channels with input , output , and known . These MI terms polarize to being either close to one or close to zero for large [2]. Thus, polar codes are often seen as a transformation of channel uses into virtual channels with MI either close to one or close to zero. The fraction of virtual channels with MI close to one approaches the capacity of the original channel for large , and thus polar codes are capacity achieving.
The positions in with smallest MI values are frozen. Polar code design consists of finding these positions. We use density evolution [8, 9] with a Gaussian approximation [10] to estimate the bit reliabilities.
The MI terms can be approximated recursively using the transform depicted in Fig. 1. The values are given by:
[TABLE]
where the -function [10] (and its inverse) is approximated numerically [11]. The FER with SC decoding is
[TABLE]
where \operatorname{Pr}\mathopen{}\mathclose{{}\left\{\hat{U}_{i}\neq U_{i}\>\!\middle|\>\!\bm{\hat{U}}_{1}^{i-1}=\bm{U}_{1}^{i-1}}\right\} denotes the probability that the first bit error of a block occurs at bit (i.e., the probability that the SC decoder makes the wrong decision for bit given that all previous decisions were correct). We can approximate (7) using
[TABLE]
i.e., we assume that a genie-aided decoder was used instead of the real SC decoder. Using the MI terms from density evolution, (8) can be calculated as
[TABLE]
where denotes the tail distribution function of the normal distribution. The functions in (9) can be approximated numerically.
II-D Mercury/Waterfilling
Information theory suggests to allocate power such that the achievable rate is maximized, i.e.,
[TABLE]
This optimization problem was solved in [7] for discrete channel input symbols in a (semi-)closed form, and is known as mercury/waterfilling. The naming is in analogy to the waterfilling solution for Gaussian inputs [4, Section 9.4].
Fig. 2 shows the mercury/waterfilling solution for two parallel BiAWGN channels with channel coefficients and . In the low-power regime, the power is allocated only to the better channel. When this channel’s MI starts to saturate, power is also assigned to the worse channel. For comparison, the waterfilling solution for Gaussian channel inputs is depicted by dashed curves.
II-E Normal Approximation
To take finite length effects into account, we resort to the NA (NA) (e.g., [12, Sec. II-F]), which is an approximation of the maximum achievable rate for a finite block length and reads as
[TABLE]
where is the capacity of the respective channel and is the dispersion. The dispersion is defined as \operatorname{Var}\mathopen{}\mathclose{{}\left[i(X\mathchar 59\relax Y)}\right] with being the information density. For the considered example of two parallel BiAWGN channels we have
[TABLE]
III Polar Code Design for Parallel Channels
III-A Problem Statement
We design polar codes for two parallel BiAWGN channels. Each channel is used times and a polar code of block length (which we assume to be a power of ) is applied jointly over all channel uses. The objective is to minimize the FER of a polar code under SC decoding.
We optimize the mapping of code word bits to different channels, the set of frozen bits, and the power allocation for and given the average power constraint . The FER under SC decoding can be estimated using (7) and (9), such that no Monte-Carlo simulations are necessary.
III-B Channel Mappings
The mapping of codeword bits to channels has been discussed in [5] and [6]. In [5], the authors propose to combine two different channels so that each kernel of the polar code gets one instance of the channel and one instance of the channel (see Fig. 4 for the kernel and Fig. 3a for an example of a polar code of length ). We denote this mapping as a sorted mapping. The other extreme is a mapping we call an alternating mapping111Our nomenclature refers to a non bit-reversal representation of the polar code. In a bit-reversal representation, these two mappings change their roles.. This mapping combines identical channels as long as possible, i.e., during the first polarization levels (from the channel perspective) for two different channels. An example of this mapping for a polar code of length is depicted in Fig. 3b.
The authors of [6] give reasons for using the sorted mapping. They minimize a bound on the FER (similar to (7)) with respect to the mapping :
[TABLE]
where denotes the Bhattacharyya-parameter of the -th virtual channel after levels of polarization. As solving (13) is not feasible, they resort to solving
[TABLE]
i.e., they minimize the sum of even-indexed Bhattacharyya-parameters after the first polarization level. The authors of [6] argue by numerical simulations that this heuristic leads to good results. The solution to this relaxed optimization problem is the sorted mapping. However, we figured out that in some scenarios (especially for very short blocks, e.g., for ) the alternating mapping achieves a lower FER than the sorted mapping. Thus the sorted mapping is not globally optimal. Nevertheless, we use the sorted mapping for the following reasons:
- •
After the first level of polarization (from the channel perspective), one obtains two different virtual channels and , see Fig. 3a. Thus, after the first level, the code behaves like a “regular” polar code that also creates two different virtual channels after the first level. This is in contrast to the alternating mapping, where after the first level of polarization there are four different virtual channels, see Fig. 3b. This insight gives an intuition on how to extend the system to more than two parallel channels, namely by aiming for a “regular” polar code after as few levels as possible.
- •
Compared to a polar code over identical channels with MI 1/2(\mathop{}\!{\textnormal{I}}\mathopen{}\mathclose{{}\left(X_{1}\mathchar 59\relax Y_{1}}\right)+\mathop{}\!{\textnormal{I}}\mathopen{}\mathclose{{}\left(X_{2}\mathchar 59\relax Y_{2}}\right)) the code over two parallel channels always leads to stronger polarization in the sense that after the first level of polarization, the virtual channel has worse quality than the channel that would arise from identical channels, and the virtual channel has better quality then the channel that would arise from identical channels. This is shown in Fig. 5 where the two mappings are compared in terms of achievable code rate at a fixed FER for different channels of constant average MI. When the MI of one channel increases (and thus the MI of the other channel decreases by the same amount), the achievable rate with the sorted mapping increases (for sufficiently large ), whereas the achievable rate with the alternating mapping decreases at first.
III-C Frozen Bit Selection
Suppose the power allocation is fixed, i.e., and are known. We use density evolution with Gaussian approximation to select the frozen bits as described in Sec. II-C. We propagate the MI of the channels through the graphs depicted in Fig. 3.
III-D Power Allocation
Next we consider the allocation of powers and . From an information theoretic perspective, the powers should be allocated such that the achievable rate (i.e., MI) is maximized. This is described in Sec. II-D and the solution is called mercury/waterfilling.
However, it turns out that mercury/waterfilling is not best for finite blocklength polar codes over parallel channels. In particular, we are interested in the power allocation that minimizes the FER of a polar code with fixed parameters (length, dimension, and average power constraint):
[TABLE]
where denotes the FER (calculated using (7) and (9)) of the polar code with frozen bit indices optimized for the power allocations and . We assume that the power constraint is fulfilled with equality. Thus, the optimization problem can be re-written as a one dimensional optimization problem in , i.e., we have
[TABLE]
The optimization problem can be solved using a simple grid search. Fig. 6 shows an example of the objective for two parallel channels with channel coefficients and . The FER is plotted versus the power allocation (normalized by ). Different curves correspond to different power constraints222The notation of average power in refers to a power gain with respect to the noise random variable with variance , i.e., we calculate ..
The power allocations that are given by mercury/waterfilling are depicted by asterisks. The dashed vertical line corresponds to the power allocation given by mercury/waterfilling in the Shannon limit, i.e., the point where 1/2(\mathop{}\!{\textnormal{I}}\mathopen{}\mathclose{{}\left(X_{1}\mathchar 59\relax Y_{1}}\right)+\mathop{}\!{\textnormal{I}}\mathopen{}\mathclose{{}\left(X_{2}\mathchar 59\relax Y_{2}}\right))=R (in the depicted scenario, the Shannon limit is at ). As one can see, the FER optimal power allocation is far from the power allocation given by mercury/waterfilling. The difference is several orders of magnitude in FER, or more than . The polar-optimal power allocation pushes the good channel further into saturation, i.e., we obtain channels with a stronger pre-polarization. These effects also occur at very long block lengths. Combining polar codes with CRC-aided SCL (SCL) decoding [13] also leads to similar effects. However, as the FER for SCL has to be obtained using Monte-Carlo simulations, the optimization is much more complex and we thus focus on optimizing the power allocation for SC decoding.
These results raise the question whether the effects are specific to polar codes or if they originate from a finite number of channel uses. To answer the question, we first compare with an LDPC code from the 5G eMBB (eMBB) standard [14]. The code is derived from basegraph one of the respective standard and has a blocklength of 16,200$$ and rate . As shown in Fig. 6 by dashed lines, the optimal power allocation closely follows the assignment given by mercury/waterfilling.
Secondly, we follow the approach of [15] and use a finite length bound for power allocation. Fig. 7 shows the achievable rate according to the normal approximation [12] for the scenario from Fig. 6.
The polar-optimal power allocation (red circle) reduces the achievable rate according to the normal approximation as compared to the mercury/waterfilling solution (black asterisk). Furthermore, the mercury/waterfilling solution is close to the maximum.
From these observations, we conjecture that the effects are inherently linked to polar codes. The behaviour may be partly explained by the following: if bits are frozen whose MI is not zero, then their MI is “lost” with SC decoding, as these bits can not be used for information transmission. On the other hand, bits with a MI not close to one need to be frozen to reach a feasible FER. Fig. 8 depicts this rate loss for the scenario from Fig. 6 with 10.37\text{,}\mathrm{d}\mathrm{B}$$. The rate loss with the polar optimal power allocation (red circle) is less than half of the rate loss with mercury/waterfilling (black asterisk). Thus, the polar-optimal power allocation is a tradeoff between rate loss (in terms of achievable rate) by sub-optimal power allocation and rate loss by imperfect polarization. Instead of minimizing the frame error rate one could also maximize the achievable rate of the unfrozen bits, i.e., the rate
[TABLE]
This leads to almost the same results as optimizing the FER (15), and brings the power allocation for polar codes back into an information theoretic framework.
IV Numerical Results
We investigate an extreme case of two parallel channels with , and BPSK (BPSK). The simulation results are depicted in Fig. 9. A polar code of block length 16,384$$ is used. The figure shows the FER versus the average power. With SC decoding, the polar code with optimized power allocation outperforms the polar code with mercury/waterfilling by at a FER of . For SCL decoding [13] with list size , the qualitative behaviour stays the same, but the gap between the two power allocations shrinks to approximately . The SC decoded polar code with optimized power allocation outperforms the SCL decoded polar code with mercury/waterfilling. When combining SCL decoding with an outer CRC with , the polar code with power allocation optimized for SC decoding still outperforms the polar code with mercury/waterfilling by . It outperforms the 5G LDPC code by about and operates approximately away from the normal approximation [12].
V Conclusion
We proposed a novel approach to allocate power for polar codes over parallel channels with an average power constraint. We showed significant gains in terms of FER as compared to power allocation by mercury/waterfilling. We elaborated on the design of polar codes for parallel channels and the mapping between codeword bits and channels of different quality. Future work involves a study of more than two parallel channels, including the design of the mapping between codeword bits and channels. A further research topic is the power allocation for polar codes with higher order modulation.
Acknowledgement
The authors would like to thank Dr. Gianluigi Liva for helpful and enlightning discussions regarding the error probability approximations in (8) and (9).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] N. Stolte, “Rekursive codes mit der Plotkin-konstruktion und ihre decodierung,” Ph.D. dissertation, Technische Universität, Darmstadt, Januar 2002. [Online]. Available: http://tuprints.ulb.tu-darmstadt.de/183/
- 2[2] E. Arıkan, “Channel polarization: a method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Trans. Inf. Theory , vol. 55, no. 7, pp. 3051–3073, July 2009.
- 3[3] M. Alsan and E. Telatar, “A simple proof of polarization and polarization for non-stationary memoryless channels,” IEEE Trans. Inf. Theory , vol. 62, no. 9, pp. 4873–4878, Sept 2016.
- 4[4] T. M. Cover and J. A. Thomas, Elements of Information Theory , 2nd ed. John Wiley & Sons, Inc., 2006.
- 5[5] H. Mahdavifar, M. El-Khamy, J. Lee, and I. Kang, “Compound polar codes,” in Inf. Theory and Appl. Workshop , Feb 2013, pp. 1–6.
- 6[6] S. Liu, Y. Hong, and E. Viterbo, “Polar codes for block fading channels,” in IEEE Wireless Commun. and Netw. Conf. Workshops , March 2017, pp. 1–6.
- 7[7] A. Lozano, A. M. Tulino, and S. Verdú, “Optimum power allocation for parallel Gaussian channels with arbitrary input distributions,” IEEE Trans. Inf. Theory , vol. 52, no. 7, pp. 3033–3051, July 2006.
- 8[8] R. Mori and T. Tanaka, “Performance and construction of polar codes on symmetric binary-input memoryless channels,” in IEEE Int. Symp. Inf. Theory (ISIT) , June 2009, pp. 1496–1500.
