Polar Codes with Memory

Wenyue Zhou; Qiang Liu; Yifei Shen; Xiaofeng Zhou; Chuan; Zhang; Yaohua Xu; Liping Li

arXiv:1907.00527·cs.IT·July 2, 2019

Polar Codes with Memory

Wenyue Zhou, Qiang Liu, Yifei Shen, Xiaofeng Zhou, Chuan, Zhang, Yaohua Xu, Liping Li

PDF

Open Access

TL;DR

This paper introduces polar codes with memory (PCM), which enhance error correction by sharing mutual information bits between consecutive blocks, significantly reducing packet error rate and latency across various decoding schemes.

Contribution

The paper proposes PCM, a novel coding scheme that improves error correction and latency performance, compatible with multiple decoding algorithms, and demonstrates hardware implementations.

Findings

01

PER decreases quadratically with PCM

02

PCM achieves comparable PER to higher-list SCL decoding

03

Hardware decoders significantly reduce latency and increase throughput

Abstract

Polar codes with memory (PCM) are proposed in this paper: a pair of consecutive code blocks containing a controlled number of mutual information bits. The shared mutual information bits of the succeeded block can help the failed block to recover. The underlying polar codes can employ any decoding scheme such as the successive cancellation (SC) decoding (PCM-SC), the belief propagation (BP) decoding (PCM-BP), and the successive cancellation list (SCL) decoding (PCM-SCL). The analysis shows that the packet error rate (PER) of PCM decreases to the order of PER squared while maintaining the same complexity as the underlying polar codes. Simulation results indicate that for PCM-SC, the PER is comparable to (less than 0.3 dB) the stand-alone SCL decoding with two lists for the block length $N = 256$ . The PER of PCM-SCL with $L$ lists can match that of the stand-alone SCL decoding with $2 L$ …

Tables1

Table 1. TABLE I : SYNTHESIS RESULTS COMPARISON OF DIFFERENT POLAR DECODERS FOR N = 256 𝑁 256 N=256 .

Decoder	PCM-SC-2 Decoder		Combinational SC Decoder [12]	Adaptive SCL Decoder [8]
Decoder	IS	LLI	Combinational SC Decoder [12]	$L = 2$	$L = 4$
LUTs	$4302$	$4781$	$35152$	$12589$	$16565$
FFs	$4629$	$15381$	$1561$	$7809$	$10217$
Total	$8931$	$20126$	$36944$	$20398$	$26782$
RAM [bit]	0	0	$1792$	0	0
Block RAMs	0	0	0	14	28
Max. Freq. [MHz]	105.3	104.7	—	135.5	121.5
Min.-Max.Latency [ $μ$ s]	$2.44 - 2.95$	$1.23 - 1.82$	—	$1.12 - 11.32$	$1.25 - 12.67$
Min.-Max.T/P [Mbps]	$83 - 100$	$166 - 201$	—	$12 - 197$	$11 - 177$

Equations42

L_{N}^{(i)} (y_{1}^{N}, \overset{u}{^}_{1}^{i - 1}) ≜ \frac{W _{N}^{(i)} ( y _{1}^{N} , u ^ _{1}^{i - 1} ∣0 )}{W _{N}^{(i)} ( y _{1}^{N} , u ^ _{1}^{i - 1} ∣1 )},

L_{N}^{(i)} (y_{1}^{N}, \overset{u}{^}_{1}^{i - 1}) ≜ \frac{W _{N}^{(i)} ( y _{1}^{N} , u ^ _{1}^{i - 1} ∣0 )}{W _{N}^{(i)} ( y _{1}^{N} , u ^ _{1}^{i - 1} ∣1 )},

\displaystyle\hat{u}_{i}=\left\{\begin{array}[]{ll}0,&\textrm{if $L_{N}^{(i)}(y_{1}^{N},\hat{u}_{1}^{(i-1)})\geq 1$}\\ 1,&\textrm{otherwise}.\\ \end{array}\right.

\displaystyle\hat{u}_{i}=\left\{\begin{array}[]{ll}0,&\textrm{if $L_{N}^{(i)}(y_{1}^{N},\hat{u}_{1}^{(i-1)})\geq 1$}\\ 1,&\textrm{otherwise}.\\ \end{array}\right.

P_{B} (A) \leq i \in A \sum P_{e} (W_{N}^{(i)}) .

P_{B} (A) \leq i \in A \sum P_{e} (W_{N}^{(i)}) .

P_{B} (A^{'}) \leq i \in A^{'} \sum P_{e} (W_{N}^{(i)}) .

P_{B} (A^{'}) \leq i \in A^{'} \sum P_{e} (W_{N}^{(i)}) .

P_{B} (A^{''}) \leq i \in A^{''} \sum P_{e} (W_{N}^{(i)}) .

P_{B} (A^{''}) \leq i \in A^{''} \sum P_{e} (W_{N}^{(i)}) .

i \in B \sum P_{e} (W_{N}^{(i)}) \geq i \in B^{'} \sum P_{e} (W_{N}^{(i)}) .

i \in B \sum P_{e} (W_{N}^{(i)}) \geq i \in B^{'} \sum P_{e} (W_{N}^{(i)}) .

i \in A^{'} \sum P_{e} (W_{N}^{(i)}) \leq i \in A^{''} \sum P_{e} (W_{N}^{(i)}) .

i \in A^{'} \sum P_{e} (W_{N}^{(i)}) \leq i \in A^{''} \sum P_{e} (W_{N}^{(i)}) .

P_{n e w} = P_{B}^{2} + P_{B} (1 - P_{B}) P_{B}^{'} .

P_{n e w} = P_{B}^{2} + P_{B} (1 - P_{B}) P_{B}^{'} .

P_{n e w} = P_{B}^{2} + P_{B} (1 - P_{B}) α P_{B} = (1 + α) P_{B}^{2} - α P_{B}^{3} .

P_{n e w} = P_{B}^{2} + P_{B} (1 - P_{B}) α P_{B} = (1 + α) P_{B}^{2} - α P_{B}^{3} .

P_{a} = P_{B} (1 - P_{B}) = P_{B} - P_{B}^{2} < P_{B} .

P_{a} = P_{B} (1 - P_{B}) = P_{B} - P_{B}^{2} < P_{B} .

P_{1} = \frac{1}{m} C_{m}^{1} P_{B} (1 - P_{B})^{m - 1} P_{B}^{'} .

P_{1} = \frac{1}{m} C_{m}^{1} P_{B} (1 - P_{B})^{m - 1} P_{B}^{'} .

P_{2} = \frac{2}{m} C_{m}^{2} P_{B}^{2} (1 - P_{B})^{m - 2} (\frac{1}{2} C_{2}^{1} P_{B}^{'} (1 - P_{B}^{'}) + P_{B}^{^{'} 2}) .

P_{2} = \frac{2}{m} C_{m}^{2} P_{B}^{2} (1 - P_{B})^{m - 2} (\frac{1}{2} C_{2}^{1} P_{B}^{'} (1 - P_{B}^{'}) + P_{B}^{^{'} 2}) .

P_{k} = \frac{k}{m} C_{m}^{k} P_{B}^{k} (1 - P_{B})^{m - k} (\frac{1}{k} C_{k}^{1} P_{B}^{'} (1 - P_{B}^{'})^{k - 1} + \frac{2}{k} C_{k}^{2} P_{B}^{^{'} 2} (1 - P_{B}^{'})^{k - 2} + ... + P_{B}^{^{'} k}) .

P_{k} = \frac{k}{m} C_{m}^{k} P_{B}^{k} (1 - P_{B})^{m - k} (\frac{1}{k} C_{k}^{1} P_{B}^{'} (1 - P_{B}^{'})^{k - 1} + \frac{2}{k} C_{k}^{2} P_{B}^{^{'} 2} (1 - P_{B}^{'})^{k - 2} + ... + P_{B}^{^{'} k}) .

P_{n e w} = k = 1 \sum m P_{k} = P_{B} (1 - P_{B})^{m - 1} P_{B}^{'} + \frac{2}{m} C_{m}^{2} P_{B}^{2} (1 - P_{B})^{m - 2} P_{B}^{'} + ... + P_{B}^{m} .

P_{n e w} = k = 1 \sum m P_{k} = P_{B} (1 - P_{B})^{m - 1} P_{B}^{'} + \frac{2}{m} C_{m}^{2} P_{B}^{2} (1 - P_{B})^{m - 2} P_{B}^{'} + ... + P_{B}^{m} .

P_{n e w} = α P_{B}^{2} (1 - P_{B})^{m - 1} + \frac{2}{m} C_{m}^{2} α P_{B}^{3} (1 - P_{B})^{m - 2} + ... + P_{B}^{m} .

P_{n e w} = α P_{B}^{2} (1 - P_{B})^{m - 1} + \frac{2}{m} C_{m}^{2} α P_{B}^{3} (1 - P_{B})^{m - 2} + ... + P_{B}^{m} .

R_{m} = \frac{m ( K - K _{crc} ) - ( m - 1 ) K _{p}}{m N} = R - \frac{K _{crc}}{N} - \frac{m - 1}{m} \frac{K _{p}}{N},

R_{m} = \frac{m ( K - K _{crc} ) - ( m - 1 ) K _{p}}{m N} = R - \frac{K _{crc}}{N} - \frac{m - 1}{m} \frac{K _{p}}{N},

u_{B}^{m} = u_{B}^{1} \oplus u_{B}^{2} \oplus ... \oplus u_{B}^{m - 1},

u_{B}^{m} = u_{B}^{1} \oplus u_{B}^{2} \oplus ... \oplus u_{B}^{m - 1},

R_{m} = \frac{m ( K - K _{crc} ) - K _{p}}{m N} = R - \frac{K _{crc}}{N} - \frac{K _{p}}{m N} .

R_{m} = \frac{m ( K - K _{crc} ) - K _{p}}{m N} = R - \frac{K _{crc}}{N} - \frac{K _{p}}{m N} .

P_{n e w} = P_{B} (1 - P_{B})^{m - 1} P_{B}^{'} + \frac{2}{m} C_{m}^{2} P_{B}^{2} (1 - P_{B})^{m - 2} + ... + \frac{k}{m} C_{m}^{k} P_{B}^{k} (1 - P_{B})^{m - k} + ... + P_{B}^{m},

P_{n e w} = P_{B} (1 - P_{B})^{m - 1} P_{B}^{'} + \frac{2}{m} C_{m}^{2} P_{B}^{2} (1 - P_{B})^{m - 2} + ... + \frac{k}{m} C_{m}^{k} P_{B}^{k} (1 - P_{B})^{m - k} + ... + P_{B}^{m},

P_{n e w} = α P_{B}^{2} (1 - P_{B})^{m - 1} + \frac{2}{m} C_{m}^{2} P_{B}^{2} (1 - P_{B})^{m - 2} + ... + \frac{k}{m} C_{m}^{k} P_{B}^{k} (1 - P_{B})^{m - k} + ... + P_{B}^{m} .

P_{n e w} = α P_{B}^{2} (1 - P_{B})^{m - 1} + \frac{2}{m} C_{m}^{2} P_{B}^{2} (1 - P_{B})^{m - 2} + ... + \frac{k}{m} C_{m}^{k} P_{B}^{k} (1 - P_{B})^{m - k} + ... + P_{B}^{m} .

P_{n e w} = (2 + α) P_{B}^{2} - (1 + 2 α) P_{B}^{3} + α P_{B}^{4} .

P_{n e w} = (2 + α) P_{B}^{2} - (1 + 2 α) P_{B}^{3} + α P_{B}^{4} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsError Correcting Code Techniques · Advanced Wireless Communication Techniques · DNA and Biological Computing

Full text

Polar Codes with Memory

Wenyue Zhou, Qiang Liu, Yifei Shen, Xiaofeng Zhou, Chuan Zhang, Yaohua Xu, and Liping Li This work was supported in part by the National Natural Science Foundation of China through grant 61501002, in part by the Natural Science Project of Ministry of Education of Anhui through grant KJ2015A102, and in part by the Talents Recruitment Program of Anhui University.Wenyue Zhou, Yaohua Xu and Liping Li are with the Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education, Anhui University, Hefei, China ([email protected]). Qiang Liu, Yifei Shen, Xiaofeng Zhou and Chuan Zhang are with the National Mobile Communications Research Laboratory, Southeast University, Nanjing, China ([email protected])

Abstract

Polar codes with memory (PCM) are proposed in this paper: a pair of consecutive code blocks containing a controlled number of mutual information bits. The shared mutual information bits of the succeeded block can help the failed block to recover. The underlying polar codes can employ any decoding scheme such as the successive cancellation (SC) decoding (PCM-SC), the belief propagation (BP) decoding (PCM-BP), and the successive cancellation list (SCL) decoding (PCM-SCL). The analysis shows that the packet error rate (PER) of PCM decreases to the order of PER squared while maintaining the same complexity as the underlying polar codes. Simulation results indicate that for PCM-SC, the PER is comparable to (less than 0.3 dB) the stand-alone SCL decoding with two lists for the block length $N=256$ . The PER of PCM-SCL with $L$ lists can match that of the stand-alone SCL decoding with $2L$ lists. Two hardware decoders for PCM are also implemented: the in-serial (IS) decoder and the low-latency interleaved (LLI) decoder. For $N=256$ , synthesis results show that in the worst case, the latency of the PCM LLI decoder is only $16.1\%$ of the adaptive SCL decoder with $L=2$ , while the throughput is improved by 13 times compared to it.

Index Terms:

polar codes, successive cancellation decoding, mutual information bits, interleaved decoder, polar codes with memory.

I Introduction

Polar codes invented by Arıkan [1] have been proven to be a coding scheme that can achieve the capacity of symmetric binary-input discrete memoryless channels (B-DMCs) with low complexity of encoding and successive cancellation (SC) decoding. Nevertheless, on account of the insufficient polarization, the error-correcting performance of moderate length polar codes under SC decoding is unsatisfactory [2, 3]. To acquire better finite-length performance, successive cancellation list (SCL) decoding was proposed in [4, 5, 3] and it is comparable to low-density parity-check (LDPC) codes in terms of error-correcting performance. Belief propagation (BP) was an alternative decoding algorithm [6, 2, 7] over the factor graph of polar codes. It has better performance than the SC decoding and supports parallel decoding. But the bit error rate (BER) performance of polar codes with BP decoding is still inferior to the SCL decoding (shown in this paper).

In this paper, a new construction scheme of polar codes is proposed by sharing a controlled number of information bits between two consecutive encoding blocks. The input stream is divided into an odd stream and an even stream. In the encoding process, the corresponding odd and even blocks share a fraction of information bits, which are called mutual information bits in this paper. Cyclic redundancy check (CRC) bits are attached to the information bits in each block. The encoding of these two blocks can be done sequentially or in parallel. In the decoding process, only when one of the pair is decoded correctly, the succeeded block can provide the estimations of the mutual information bits to the failed block. With a proper design of the positioning of the mutual information bits, the failed block can be recovered with another round of decoding.

Since the two consecutive blocks share mutual information bits, it is like there is some memory in the encoding process. Therefore, we call the proposed scheme polar codes with memory (PCM) to differentiate this scheme from the traditional polar encoding scheme. In addition, this scheme can be directly extended to $m$ ( $m>2$ ) blocks. Based on this, a general PCM is proposed in this paper, which reduces the effective code rate loss while maintaining the same order of PER, compared with the direct extension of PCM. Analysis shows that the packet error rate (PER) of PCM is only square of that of the underlying polar codes. Note that this great performance improvement comes with a complexity the same as the underlying polar codes. The decoding of PCM can be implemented by the SC, BP, or SCL decoding. In other words, the decoding complexity of PCM is the complexity of the underlying SC, BP, or SCL decoding. For ease of description, PCM-SC- $2$ is used to refer to the PCM employing the SC decoding and two blocks sharing mutual information bits. Similarly, PCM-BP- $2$ and PCM-SCL- $2$ refer to two blocks sharing mutual information bits, each block employing the BP and SCL decoding, respectively. In addition, PCM-SC- $m$ ( $m>3$ ) refers to the general PCM employing the SC decoding with $m$ blocks.

The simulation results show that the PER of PCM-SC-2 is only 0.3 dB away from the stand-alone SCL decoding with $L=2$ ( $L$ being the list size) for the studied case in the paper. The performance of PCM-BP-2 can achieve the same performance of the stand-alone SCL decoding with $L=2$ . In addition, the performance of PCM-SCL-2 with $L$ lists matches the performance of the stand-alone SCL decoding with $2L$ lists. Two hardware architectures are also proposed in the paper: an in-serial (IS) architecture and a low-latency interleaved (LLI) architecture. Implementation results show that for the block length 256, the proposed LLI architecture for PCM with the SC decoding has lower latency and higher throughput compared to the adaptive SCL decoder ( $L=4$ ) [8].

The rest of the paper is organized as follows. Section II is on the basics of polar codes. Section III introduces the proposed PCM scheme. Specifically, Section III-A introduces the encoding process of PCM, Section III-B is about the corresponding decoding process, and in Section III-C, the optimal strategy to position mutual information bits is proposed. The error performance of PCM is analyzed in Section III-D. The application of a BP or SCL decoder to PCM is introduced in Section III-E. We also compare PCM with Turbo codes in Section III-F. Moreover, we extend the PCM to $m>2$ blocks in Section IV. In Section V, the simulation results are provided to validate the proposed PCM. In Section VI, the hardware architectures of PCM decoding are implemented. The concluding remarks are provided at the end.

II Preliminaries of Polar Codes

Denote $v_{1}^{N}$ as an $N$ -length vector $(v_{1},...,v_{N})$ . Let $W:~{}\mathcal{X}$ $\rightarrow$ $\mathcal{Y}$ denote a symmetric B-DMC, with the input alphabet $\mathcal{X}=\{0,1\}$ , the output alphabet $\mathcal{Y}$ , and the channel transition probability $W(y|x)$ , $x\in\mathcal{X}$ , $y\in\mathcal{Y}$ . Let $N=2^{n}$ ( $n\geq 1$ ) denote the block code length. The generator matrix of polar codes is $G$ , which is given by $G=B_{N}F^{\otimes n}$ . Here $B_{N}$ denotes the bit-reversal permutation matrix, $F=\bigl{[}\begin{smallmatrix}1&0\\ 1&1\end{smallmatrix}\bigr{]}$ , and $F^{\otimes n}$ represents the $n$ -th Kronecker power of $F$ over the binary field $\mathbb{F}_{2}$ . The codewords $x_{1}^{N}$ can be obtained by $x_{1}^{N}=u_{1}^{N}G$ , where $u_{1}^{N}$ is the source vector, consisting of $K$ information bits and $N-K$ frozen bits (the fixed information in the source vector). The codeword $x_{1}^{N}$ is transmitted over $N$ independent copies of $W$ , written as $W^{N}$ , with a transition probability $W^{N}(y_{1}^{N}|x_{1}^{N})$ .

Channel polarization process has two parts: channel combining and channel splitting. Channel combining is a phase that combines copies of $W$ in a recursive manner to produce a vector channel $W_{N}$ , with $W_{N}(y_{1}^{N}|u_{1}^{N})=W^{N}(y_{1}^{N}|x_{1}^{N})$ . Channel splitting is an operation splitting $W_{N}$ back into a set of $N$ binary-input channels $W_{N}^{(i)}$ , $i\in\{1,2,...,N\}$ . The $i$ -th such channel is called bit channel $i$ (meaning the channel that bit $i$ virtually experiences). According to [1], $I(W_{N}^{(i)})$ (the capacity of bit channel $i$ ) converges to either 0 or 1 as $N$ tends to infinity, and the fraction of the bit channels with capacity 1 approaches $I(W)$ .

With finite block lengths, not all bit channels are fully polarized. The principle of polar codes is to choose the $K$ most reliable bit channels among $N$ bit channels to convey information bits. The other bits are called frozen bits which are fixed to be transmitted on the rest channels. The good information set is denoted as $\mathcal{A}$ and complementary set is ${\mathcal{A}_{c}}$ . Denote $u_{\mathcal{A}}$ as a subvector of the vector $u_{1}^{N}$ that takes elements of it from the set $\mathcal{A}$ .

The SC decoder is proposed in [1] and it recursively computes the likelihood ratio (LR) of bit $i$ from

[TABLE]

where $\hat{u}_{1}^{i-1}$ is the estimation of bits $u_{1}^{i-1}$ . The SC decoder generates the estimate $\hat{u}_{i}$ of bit $u_{i}$ ( $i\in\mathcal{A}$ ) from

[TABLE]

The decoding complexity of SC is $\mathcal{O}(N\log N)$ [1].

III Polar Codes with Memory

In this section, the encoding of PCM and the decoding strategies are introduced, which improves the performance of polar codes under the SC, BP or SCL decoding.

III-A Encoding with Memory

The top-level scheme is shown in Fig. 1. Let $K_{\mathrm{crc}}$ denote the number of CRC bits in each block and these CRC bits are part of the $K$ information bits. Then there are $K_{\mathrm{info}}=K-K_{\mathrm{crc}}$ pure information bits in each block. Let $K_{\mathrm{p}}$ be the number of the mutual information bits , and the number of the rest information bits is denoted as $K_{\mathrm{i}}=K_{\mathrm{info}}-K_{\mathrm{p}}$ .

In the encoding process, a frame of sequential input bits is first divided into chunks of the length $2K_{\mathrm{i}}+K_{\mathrm{p}}$ . Then each chunk is divided into two blocks: Block Odd (with $K_{\mathrm{i}}+K_{\mathrm{p}}$ bits) and Block Even (with $K_{\mathrm{i}}$ bits). Block Even then takes the $K_{\mathrm{p}}$ bits from Block Odd to form an input vector with the length $K_{\mathrm{info}}$ for its CRC generation. In this way, there are clearly $K_{\mathrm{p}}$ mutual information bits which are both included in Block Odd and Block Even. These mutual information bits are placed at the same indices, and the mutual information set is denoted as $\mathcal{B}$ . The input bit stream arrangement is shown in Fig. 2.

The encoding of the two blocks can be done sequentially or in parallel as seen from Fig. 1, where both the CRC attachment and the polar encoding are performed to Block Odd and Block Even alternatively, under the control of a switch.

III-B The Decoding Process

The symbols of encoded code blocks are transmitted over the symmetric B-DMC channel $W$ , and the noisy version of them are observed at the receiver side. The receiver collects chucks of samples with a length of $2N$ : the first $N$ samples for Block Odd and the rest for Block Even. The SC decoder generates an estimate $\hat{u}_{1}^{N}$ for each block. The CRC check module returns a check result for each block. The possible check results are:

•

Case $1$ : Both Block Odd and Block Even are decoded correctly;

•

Case $2$ : Block Odd is decoded correctly but Block Even is decoded incorrectly;

•

Case $3$ : Block Odd is decoded incorrectly while Block Even is decoded correctly;

•

Case $4$ : Both Blocks are decoded incorrectly.

For Case 1 and Case 2, since Block Odd is decoded correctly, the $K_{\mathrm{p}}$ estimations of the mutual information bits are stored in the memory for possible re-use by the second round of decoding of Block Even. For Case 1, since Block Even is also decoded correctly, there is no need for any more actions. For Case 2, Block Even is decoded incorrectly, a new round of SC decoding for Block Even can be carried out. For Case 3 and Case 4, since Block Odd is decoded incorrectly, the initial $N$ LR values of this block need to be saved for a possible new round of decoding. For Case 3, the correctly decoded Block Even can provide the estimations of the mutual information bits to Block Odd, invoking a new round of SC decoding of Block Odd. For Case 4, since Block Even is also decoded incorrectly, there is nothing the decoder can do for both blocks.

A more detailed description of the decoding process when a new round of SC decoding occurs is as follows. The $K_{\mathrm{p}}$ estimations of the mutual information bits from the correctly decoded block are fed to the incorrectly decoded block. Take Case 2 as an example. Here the decoder of Block Even can repeat the SC decoding up to the first bit in $\mathcal{B}$ . When it reaches to the first bit with the index $i\in{\mathcal{B}}$ , then the decoder takes this bit as a frozen bit: no matter what the calculated LR value is for $u_{i}$ , it is assigned to the decision taken from Block Odd. The SC decoding process goes on until the end, treating all bits in $\mathcal{B}$ as frozen bits. The re-decoding of Block Odd in Case 3 is the same as that of Case 2.

III-C Positioning of Mutual Information Bits

Every two consecutive transmitting blocks share $K_{\mathrm{p}}$ mutual information bits. The positioning of mutual information bits is to find an optimal way in assigning these mutual information bits to the input of the two blocks (Block Odd and Block Even). Here “optimal” means the best system error performance. The exact formulation is derived as follows. The size of set $\mathcal{B}$ is $|\mathcal{B}|=K_{\mathrm{p}}$ and the subvector $u_{\mathcal{B}}$ contains the mutual information bits. Theoretically, there are $\binom{K}{K_{\mathrm{p}}}$ ways to choose the set $\mathcal{B}$ . Assume the information set $\mathcal{A}=\{i_{1},i_{2},...,i_{K}\}$ is ordered in the ascending order with respect to the bit channel reliability. In other words, there exists the relationship of $P_{e}(W_{N}^{(i_{1})})\geq P_{e}(W_{N}^{(i_{2})})\geq...\geq P_{e}(W_{N}^{(i_{K})})$ , where $P_{e}(W_{N}^{(i)})$ is the error probability of the $i$ -th information bit. The following proposition states an optimal way to achieve the best union bound.

Proposition 1.

Supposing the information set $\mathcal{A}=\{i_{1},i_{2},...,i_{K}\}$ is ordered in the ascending order with respect to the bit channel reliability, then the set $\mathcal{B}$ containing the first $K_{\mathrm{p}}$ elements of the set $\mathcal{A}$ as the mutual information bits indices can produce the minimum union bound.

Proof.

Define the PER over the information set $\mathcal{A}$ as $P_{B}(\mathcal{A})$ . Then its union bound [9] is

[TABLE]

With a pair of consecutive code blocks, when re-decoding is performed for either of them, it is equivalent to the case that the information set of the other block is $\mathcal{A}^{\prime}=\mathcal{A}\setminus\mathcal{B}$ . This is because one block is decoded correctly and the mutual information bits are now considered as frozen bits for another block. In such circumstance, the union bound for the incorrectly decoded block is:

[TABLE]

Supposing set $\mathcal{B}^{\prime}$ is any other mutual information set, so the equivalent information set of the incorrect block can be similarly derived as $\mathcal{A}^{\prime\prime}=\mathcal{A}\setminus\mathcal{B}^{\prime}$ . So we can get

[TABLE]

Because set $\mathcal{B}$ contains the indices corresponding to the $K_{\mathrm{p}}$ largest error probabilities in $\mathcal{A}$ , it is obvious that

[TABLE]

Therefore,

[TABLE]

It means that the union bound of $P_{B}(\mathcal{A}^{\prime})$ is smaller than $P_{B}(\mathcal{A}^{\prime\prime})$ . Since $\mathcal{B}^{\prime}$ is arbitrary, we can conclude that $P_{B}(\mathcal{A}^{\prime})$ has the smallest union bound. ∎

III-D Error Performance Analysis

In this section, the error performance of PCM is analyzed. Here we omit the inside argument of $P_{B}(\mathcal{A})$ for compactness. Instead, the symbol $P_{B}$ is used to represent the underlying PER of polar codes with the information set $\mathcal{A}$ . The PER of PCM consists of two parts:

•

Part 1: Block Odd and Block Even are both decoded incorrectly, corresponding to Case 4 in Section III-B.

•

Part 2: The re-decoding of Block Even (Case 2) or Block Odd (Case 3) fails.

For Part 1, the error probability is $P_{B}^{2}$ . For Part 2, supposing the PER of the re-decoding is $P^{\prime}_{B}$ , the error probability is therefore $P_{B}(1-P_{B})P^{\prime}_{B}$ . The PER of PCM is therefore:

[TABLE]

With the optimal placement of the mutual information bits in Section III-C, there must be some blocks which can be recovered with the help of additional $K_{\mathrm{p}}$ frozen bits. Representing $P^{\prime}_{B}$ by $\alpha P_{B}$ , where $\alpha$ can be obtained empirically for now, Eq. (10) can be rewritten as:

[TABLE]

By Eq. (11), it is shown that with the same complexity of the SC decoding, PCM can achieve a PER which is on the order of the underlying PER squared.

III-E Decoding with a BP or SCL Decoder

In the proposed PCM, the SC decoding can be perfectly replaced by the BP or SCL decoding. For Case 2 and Case 3, only one block is decoded correctly. The correctly decoded block can provide correct decisions of the mutual information bits to be used by the incorrectly decoded block. Here note that for the BP decoding, the best way to use these correct decisions is to treat the mutual information bits as frozen bits, instead of using the soft values of them. The reason is simple: by treating them as frozen bits, the initial LR values of these bits are equivalently set to be infinity, which is definitely better than using finite soft LR values from the correctly decoded block. As for the SCL decoding, the mutual information bits are treated as frozen bits directly. Therefore, even with the BP or SCL decoding for PCM, the mutual information bits are used in the same way as the PCM employing the SC decoding.

III-F Comparison with Turbo Codes

The encoding of PCM shares a certain amount of information bits between a pair of consecutive blocks. This can be compared with Turbo codes with two parallel identical encoders. Compared to Turbo codes, there are several differences.

First, all the incoming information bits go through two identical encoders for Turbo codes. While PCM only shares a fraction of information bits between two encoding blocks, enabling a flexible code rate configuration. The second difference is that PCM does not constantly exchange soft information between two blocks in the decoding process. Instead, only when one block fails and the other succeeds, estimations of the mutual information bits are fed from the succeeded block to the failed block. The information pass can be considered as a sporadic procedure: the average percent of all additional rounds of decoding is only (denoted as $P_{a}$ )

[TABLE]

Compared with stand-alone polar codes, the additional decoding can result in a significant reduction in PER as shown in Eq. (11), and it accounts for only a small percentage of the overall decoding operations.

IV General Polar Codes with Memory

In Section III, PCM is proposed where two consecutive blocks share a controlled number of information bits. A natural question arises: can we extend this scheme to $m>2$ polar blocks and possibly achieve a better error performance? The direct extension of the encoding scheme from two blocks to $m$ blocks ( $m>2$ ) is first analyzed in this section. Then an improved encoding scheme is proposed which achieves the same order of the PER while improves the overall code rate of the direct extension. This improved version is called general PCM in this paper.

IV-A Direct Extension of Polar Codes with Memory

The PER of PCM in Section III-D is $P_{B}^{2}+P_{B}(1-P_{B})P^{\prime}_{B}$ , where each chunk contains two blocks. When this scheme is extended to $m$ blocks with each containing $K_{\mathrm{p}}$ mutual information bits, the PER consists of the following parts:

•

Part 1: Only one block is decoded incorrectly, and the new round of the decoding fails again;

•

Part 2: Two blocks are decoded incorrectly, and at least one block in the new round of decoding fails again;

•

…

•

Part $m$ : All of the $m$ blocks are decoded incorrectly.

For Part 1, because the re-decoding of the failed block fails again, there is one block error among $m$ polar blocks. The PER in this case is therefore:

[TABLE]

For Part 2, with two blocks failed, the final block error among $m$ blocks consists of two case: 1) one of the re-decoded blocks fails and 2) both of the re-decoded blocks are decoded incorrectly. Therefore, the PER is:

[TABLE]

Generally, for Part $k$ , $(1\leq k<m)$ , the error probability $P_{k}$ is:

[TABLE]

For Part $m$ , because all of the blocks are decoded incorrectly, the error probability $P_{m}$ is simply $P_{m}=P_{B}^{m}$ . Accumulating the error probability of each part and simplifying the formula, the PER of the direct extension of PCM is obtained:

[TABLE]

Replacing $P^{\prime}_{B}$ by $\alpha P_{B}$ in Eq. (16), we can obtain a new PER:

[TABLE]

With a relatively small $P_{B}$ , the new PER is dominated by $\alpha P_{B}^{2}(1-P_{B})^{m-1}$ , which corresponds to the situation when only one block is decoded incorrectly. All the other parts have terms on the order of at least $P_{B}^{3}$ . Based on this fact, a general encoding scheme in next section is proposed to deal with the case where one block is decoded incorrectly among $m$ blocks. For all the other cases, no re-decoding is performed. This enables the scheme to still maintain the same PER order while improves the overall code rate.

IV-B The General Polar Codes with Memory

In this section, a general encoding scheme of PCM is proposed. From the discussions in the previous section, it can be seen that the direct extension of the encoding scheme does not increase the minimum order of the PER. The PER performance is limited by the error event that there is only one failed block among $m$ blocks. All the other error events have lower PER level. If the encoding scheme is designed to only recover the limiting error event while ignoring those error events with lower PER level, then the overall effective code rate can be improved.

For the direct extension of PCM, the effective overall code rate is

[TABLE]

with a rate loss of $\frac{K_{\mathrm{crc}}}{N}+\frac{m-1}{m}\frac{K_{\mathrm{p}}}{N}$ , where $R$ denotes the code rate of the underlying polar codes. To reduce the rate loss of the direct extension of PCM, a general encoding scheme is proposed. Fig. 3 shows such an input bit arrangement of the general PCM, where each chunk contains $m$ blocks. In Fig. 3, the first $m-1$ blocks have their own information bits, no mutual information bits are shared among them. However, for each of these $m-1$ blocks, $K_{\mathrm{p}}$ information bits are taken out and added together (modulo two addition). The resultant $K_{\mathrm{p}}$ bits are put as the mutual information bits for the last block. So the input bit arrangement of the general PCM can be shown as follows:

[TABLE]

where $u_{\mathcal{B}}^{k}$ , $k\in(1,2,...,m)$ denotes the $K_{\mathrm{p}}$ mutual information bits of block $k$ . The positioning of the mutual information bits for all $m$ blocks follows Proposition 1: they are put as those most poorly protected information bits in each block.

In this way, the effective code rate of the general PCM is:

[TABLE]

With a large $m$ , the fractional rate loss $K_{\mathrm{p}}/mN$ is negligible with a constant $K_{\mathrm{p}}$ . However, a large $m$ comes with a higher decoding complexity. Trade-off can always be made between a small rate loss and a lower decoding latency.

The design of the general encoding scheme can recover the $K_{\mathrm{p}}$ mutual information bits of the failed block if all other $m-1$ blocks in the chunk are decoded successfully: $u_{\mathcal{B}}^{k}=\sum_{i=1,i\neq k}^{m}u_{\mathcal{B}}^{i}$ . This scheme can not correct more than one block error among $m$ blocks. When there is only one incorrectly decoded block, the correct $K_{\mathrm{p}}$ mutual information bits can be recovered and a new round of decoding can be performed.

For the general PCM, the new round of decoding occurs only when one block is decoded incorrectly, the PER of our proposed scheme is:

[TABLE]

which can be rewritten by replacing $P^{\prime}_{B}$ by $\alpha P_{B}$ :

[TABLE]

Comparing Eq. (17) and Eq. (22), the general PCM scheme has negligible performance loss compared with the direct extension scheme.

V Simulation Results

In this section, we provide simulation results to show the performance of PCM. The channel is the additive white Gaussian noise (AWGN) channel. The block length of the polar codes is $N=256$ , and the number of underlying information bits is $K=140$ , including a 12-bit CRC with a generator polynomial $g(x)=x^{12}+x^{11}+x^{10}+x^{9}+x^{8}+x^{4}+x+1$ . The code rate of the underlying polar codes is therefore $R=\frac{K}{N}=0.5469$ . The number of mutual information bits shared between two consecutive blocks is set as $K_{\mathrm{p}}=24$ .

Fig. 4 reports the BER performance of the PCM with two consecutive blocks sharing mutual information bits. The effective code rate of the PCM-SC-2 is $R_{2}=R-\frac{2(K-K_{\mathrm{crc}})-K_{\mathrm{p}}}{2N}=R-\frac{K_{\mathrm{crc}}}{N}-\frac{K_{\mathrm{p}}}{2N}=0.4531$ . For a fair comparison, the code rate of the stand-alone polar codes with SC, BP, and SCL decoding is adjusted as $R_{2}$ , and the stand-alone polar codes with SC and BP decoding also contain a 12-bit CRC. For the stand-alone SCL decoding, the list size $L$ is simulated for both $L=2$ and $L=4$ . It is observed that the PCM-SC- $2$ outperforms the traditional SC and BP decoding by about 0.41 dB and 0.22 dB at BER= $10^{-4}$ , respectively. In addition, PCM-SC-2 achieves a comparable performance (less than 0.3 dB) as the SCL decoding with $L=2$ at the same BER level. On the other hand, the PCM-BP- $2$ achieves the same performance as the SCL decoding with $L=2$ when $E_{b}/N_{0}\geq 4$ dB. Fig. 5 shows the corresponding PER performance, and the trend is consistent with that shown in Fig. 4.

Fig. 6 shows the simulated PER of the PCM-SC-2 and the PER analyzed in Eq. (11). Here the maximum (6.9) and the minimum (0.38) values of $\alpha$ are found from the simulations, producing the $P_{new}^{upper}$ and $P_{new}^{lower}$ in Fig. 6. It is observed that the PER performance of the PCM-SC-2 follows the lower bound for small $E_{b}/N_{0}$ values (less than 3 dB), and it follows the upper bound for large $E_{b}/N_{0}$ values (larger than 3 dB), which indicates that the PER performance of the PCM-SC-2 is on the level of PER squared of the underlying polar codes.

Fig. 7 reports the PER performance of the PCM employing the SCL decoding. It is shown that the PCM-SCL-2 with $L=2$ achieves the same performance as the stand-alone SCL decoding with $L=4$ when $E_{b}/N_{0}>3.5$ dB. And the PCM-SCL-2 with $L=4$ and $L=8$ outperform the stand-alone SCL decoding with $L=8$ and $L=16$ by about 0.1 dB and 0.15 dB at PER= $10^{-4}$ , respectively.

The good performance of PCM comes at an additional round of decoding, as shown by Eq. (12). Fig. 8 shows the ratio of the additional decoding to the overall decoding for the same system in Figs. 4 and 5. The curve labeled as $P_{B}$ shown in this figure is the PER of the underlying polar codes. The success rate of the additional decoding is also provided in this figure, shown by the line with asterisks. It can be seen that for $E_{b}/N_{0}\geq 3$ dB, the additional decoding rate and the additional success rate are matched. What is important is that the additional decoding efforts are controlled by the PER of the underlying polar codes: the $P_{B}$ curve also matches closely with the other two curves for large $E_{b}/N_{0}$ . This can be seen from Eq. (12): when $P_{B}$ is small ( $1-P_{B}$ approaches 1), the additional decoding rate $P_{a}$ is determined by $P_{B}$ . The decoding failure rate of PCM is therefore left with an order of $P_{B}^{2}$ .

Fig. 9 presents the PER curves of the general PCM with $m=3$ , and the parameters are the same as those in Fig. 4. By applying the proposed general encoding scheme, the PER of PCM-SC-3 can be represented as follows:

[TABLE]

According to Eq. (20), the code rate of PCM-SC- $3$ is $R_{3}=0.4688$ , so the stand-alone polar codes with SC, BP, and SCL decoding all have the same adjusted code rate as $R_{3}$ . It can be seen that PCM-SC-3 is about 0.18 dB worse compared with the stand-alone SCL with $L=2$ at PER= $10^{-3}$ level.

VI Hardware Architecture

In this section, two hardware architectures for PCM-SC-2 are proposed—the IS architecture and the LLI architecture. The IS architecture is based on the SC decoder proposed in [10], where the processing elements (PEs) are designed with pre-computation. The proposed architecture is capable of performing both SC decoding and PCM-SC-2 decoding. The LLI architecture is inspired by the 2-interleaved SC polar decoder [11], and it can reduce decoding latency remarkably with only a small increase in hardware consumption compared with the IS architecture.

VI-A In-serial PCM-SC-2 Decoder

In order to increase hardware utilization and reduce computational complexity, the decoder processes the data in the form of log-likelihood ratio (LLR) instead of LR. The top-level architecture of the proposed IS PCM-SC-2 decoder is shown in Fig. 10. It mainly consists of five modules: the LLR memory module, the SC decoder module, the CRC check module, the feedback module, and the bit memory module. Compared with the conventional SC decoder [10], the LLR memory module and the bit memory module are additional.

The LLR memory is used to store LLRs which are needed for Case 2 and Case 3. The bit memory module is an important module in the architecture. In the conventional SC decoder, the location and the content of frozen bits are set in advance. When the bit memory receives a frozen bit, it neglects it, and this frozen bit is sent to the feedback module directly. In the PCM-SC-2 decoder, when Block Odd and Block Even are decoded in the first round, they are decoded in the same way as in a conventional SC decoder. When Block Odd or Block Even passes the CRC check, the bit memory immediately stores the mutual information bits of this block. When it comes to Case 2 and Case 3, the bit memory will read mutual information bits and treat them as frozen bits in the second round decoding of the failed block. In this way, the $K_{\mathrm{p}}$ mutual information bits estimates from the correctly decoded block are effectively fed to the bit memory of incorrectly decoded block.

VI-B Low-latency Interleaved PCM-SC-2 Decoder

It should be noticed that when it comes to Case 2 and Case 3, the decoding latency of the IS PCM-SC-2 decoder is $1.5$ times of the conventional SC decoder. When Block Odd or Block Even performs a new round of decoding, the computation of LLRs is redundant before the first erroneous mutual information bit. Based on this, an LLI architecture employing interleaved decoding is proposed and shown in Fig. 11, which is introduced as follows.

For a conventional $N$ -bit SC decoder, there are $n$ stages—Stage 1 to Stage $n$ , with only one stage being active in a clock cycle. As described in [11], a 2-interleaved SC decoder can decode two polar blocks simultaneously. Inspired by this, the LLI architecture is proposed. The main idea is that when Block Odd is being decoded in Stage $i$ $(i\neq n)$ , Block Even can be decoded in Stage $i-1$ since this stage is idle for Block Odd. The decoding process of the two blocks will conflict in the last stage—Stage $n$ , because every block needs to stay in Stage $n$ for two clock cycles. Therefore, an additional PE is needed in this stage. As shown in Fig. 11, LLI PCM-SC-2 decoder has an extra PE in the Stage $n$ . In addition, two independent bit memories and feedback modules are designed for Block Odd and Block Even, in order to decode them simultaneously. Fig. 12 shows the PE of the LLI PCM-SC-2 decoder. It has two additional registers which are used to store LLRs of the two blocks, compared with that of the IS PCM-SC-2 decoder.

With the proposed design, whenever a mutual information bit in Block Even is decoded, it can be compared with the mutual information bit of the same location in Block Odd (which was decoded one clock cycle before). If the two bits are different, all intermediate LLR values of the two blocks, which are stored in the registers, will be immediately sent to the breakpoint memory, and then the decoding process continues. When it comes to Case 2 or Case 3, the incorrectly decoded block starts the second round of decoding from the position of the first different mutual information bit, and the intermediate LLRs are directly read from the breakpoint memory instead of being calculated. In the studied case of PCM with the same parameters as those of Fig. 4, the indices of the first mutual bit and the last mutual bit are 32 and 209, respectively. It means that if the second round of decoding is required, the computation of the LLRs before the $32$ -th bit is avoided in the worst case, and the computation of the LLRs before the $209$ -th bit is avoided in the best case.

VI-C Implementation Results

The two decoders are implemented on the Xilinx ZNYQ-7000 field-programmable gate array (FPGA) platform. The latency of the LLI decoder is lower than that of the IS decoder, and it is reduced nearly by half in the first round of decoding, due to the interleaved decoding for Block Odd and Block Even. It is also reduced in the second round of decoding, because the LLI decoder can begin with the first erroneous mutual information bit. Fig. 13 shows the reduction rate of the average latency for the LLI PCM-SC-2 decoder in the second round of decoding, and the number of the samples is 100000 at each $E_{b}/N_{0}$ . It is remarkable that the LLI PCM-SC-2 decoder can reduce the latency of the second round decoding by $49.3\%$ and $66.9\%$ at $E_{b}/N_{0}=1$ dB and $E_{b}/N_{0}=4$ dB, respectively. Fig. 14 shows the average latency of the two decoders. It can be seen that the average latency of the LLI PCM-SC-2 decoder is approximately half of that of the IS PCM-SC-2 decoder because the latency reduction rates are both around $50\%$ in the first round and the second round.

Table I shows the synthesis results comparison of different polar decoders for $N=256$ , including the IS and LLI PCM-SC-2 decoders, the combinational SC decoder in [12], and the adaptive SCL decoder [8] with $L=2$ and $L=4$ . As shown in Table I, the IS PCM-SC-2 decoder consumes the least hardware resources, although the maximum throughput is inferior to the others. The total consumptions (FF and LUT) of the IS PCM-SC-2 decoder and the LLI PCM-SC-2 decoder are only $24.2\%$ and $54.5\%$ of the consumption of SC decoder in [12], respectively. The hardware consumption of the LLI PCM-SC-2 decoder is $98.7\%$ and $75.2\%$ of the consumption of adaptive SCL decoder with $L=2$ and $L=4$ , respectively. The reason is that the SCL decoder needs $L$ SC decoder modules while the proposed architecture only needs one. Moreover, the decoders in [8] and [12] use additional RAM and Block RAMs, which increases the consumption of hardware resources, while the proposed PCM-SC-2 decoders do not.

Table I also shows the range of the latency and the throughput of the PCM-SC-2 decoders and the adaptive SCL decoder. It is observed that the minimum latency and the maximum throughput of the LLI PCM-SC-2 decoder are comparable to those of the adaptive SCL decoder with $L=2$ , and are slightly superior to those of the adaptive SCL decoder with $L=4$ . As for the worst situation, the maximum latency of the LLI PCM-SC-2 decoder is only $16.1\%$ and $14.4\%$ of the adaptive SCL decoder with $L=2$ and $L=4$ , and the minimum throughput is improved by more than 13 and 15 times compared to them, respectively.

VII Conclusion

In this paper, PCM employing the SC, BP, or SCL decoding is proposed. By sharing a certain amount of mutual information bits between a pair of blocks, this scheme can bring down the PER to the square of the underlying polar codes. Results show that for the block length 256, the proposed PCM-SC-2 and PCM-BP-2 decoders can match the PER of the stand-alone SCL decoder with two lists. The PER performance of PCM-SCL-2 decoder with $L$ lists can match the PER of the stand-alone SCL decoder with $2L$ lists. In the meantime, the proposed LLI hardware architecture for PCM can achieve 13 times more throughput compared to the adaptive SCL decoder with two lists when the block length $N=256$ in the worst case.

Bibliography12

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] E. Ar ı italic-ı \i kan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Trans. Inf. Theory , vol. 55, no. 7, pp. 3051–3073, 2009.
2[2] N. Hussami, S. Korada, and R. Urbanke, “Performance of polar codes for channel and source coding,” in Proc. IEEE Int. Symp. Inf. Theory , June 2009, pp. 1488–1492.
3[3] I. Tal and A. Vardy, “List decoding of polar codes,” IEEE Trans. Inf. Theory , vol. 61, no. 5, pp. 2213–2226, 2015.
4[4] K. Niu and K. Chen, “CRC-aided decoding of polar codes,” IEEE Commun. Lett. , vol. 16, no. 10, pp. 1668–1671, October 2012.
5[5] K. Chen, K. Niu, and J. Lin, “Improved successive cancellation decoding of polar codes,” IEEE Trans. Commun. , vol. 61, no. 8, pp. 3100–3107, August 2013.
6[6] E. Ar ı italic-ı \i kan, “A performance comparison of polar codes and reed-muller codes,” IEEE Commun. Lett. , vol. 12, no. 6, pp. 447–449, 2008.
7[7] A. Eslami and H. Pishro-Nik, “On bit error rate performance of polar codes in finite regime,” in Proc. Annual Allerton Conf. on Commun., Control, Computing (Allerton) , 2010, pp. 188–194.
8[8] A. Süral and E. Ar ı italic-ı \i kan, “An FPGA implementation of successive cancellation list decoding for polar codes,” Ph.D. dissertation, Bilkent Univ., Ankara, 2016.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Polar Codes with Memory

Abstract

Index Terms:

I Introduction

II Preliminaries of Polar Codes

III Polar Codes with Memory

III-A Encoding with Memory

III-B The Decoding Process

III-C Positioning of Mutual Information Bits

Proposition 1**.**

Proof.

III-D Error Performance Analysis

III-E Decoding with a BP or SCL Decoder

III-F Comparison with Turbo Codes

IV General Polar Codes with Memory

IV-A Direct Extension of Polar Codes with Memory

IV-B The General Polar Codes with Memory

V Simulation Results

VI Hardware Architecture

VI-A In-serial PCM-SC-2 Decoder

VI-B Low-latency Interleaved PCM-SC-2 Decoder

VI-C Implementation Results

VII Conclusion

Proposition 1.