Successive Cancellation List Decoding of Semi-random Unit Memory   Convolutional Codes

Wenchao Lin; Suihua Cai; Baodian Wei; Xiao Ma

arXiv:1905.11392·cs.IT·July 27, 2020

Successive Cancellation List Decoding of Semi-random Unit Memory Convolutional Codes

Wenchao Lin, Suihua Cai, Baodian Wei, Xiao Ma

PDF

Open Access

TL;DR

This paper introduces a novel successive cancellation list decoding method for semi-random unit memory convolutional codes, improving decoding performance and complexity tradeoffs over traditional methods.

Contribution

It proposes a new list decoding algorithm using empirical divergence testing for SRUMCCs, with analysis and simulation demonstrating its advantages.

Findings

01

Outperforms sequential decoding at high SNR

02

Achieves performance comparable to polar codes with similar delay

03

Provides a closed-form upper bound and simulated lower bound for performance

Abstract

We present in this paper a special class of unit memory convolutional codes (UMCCs), called semi-random UMCCs (SRUMCCs), where the information block is first encoded by a short block code and then transmitted in a block Markov (random) superposition manner. We propose a successive cancellation list decoding algorithm, by which a list of candidate codewords are generated serially until one passes an empirical divergence test instead of the conventional cyclic redundancy check (CRC). The threshold for testing the correctness of candidate codewords can be learned off-line based on the statistical behavior of the introduced empirical divergence function (EDF). The performance-complexity tradeoff and the performance-delay tradeoff can be achieved by adjusting the statistical threshold and the decoding window size. To analyze the performance, a closed-form upper bound and a simulated lower…

Tables2

Table 1. TABLE I: Average list sizes required to contain the correct candidate codeword

$SNR$	2.0	2.5	3.0	3.5	4.0
list size	1.256	1.069	1.019	1.005	1.001

Table 2. TABLE II: Average list sizes needed for T A subscript 𝑇 𝐴 T_{A} and T B subscript 𝑇 𝐵 T_{B}

$SNR$	2.0	2.5	3.0	3.5	4.0
$T_{A}$	1.3	1.35	1.4	1.45	1.5
$T_{B}$	0.95	1.0	1.05	1.1	1.15
list size for $T_{A}$	38	30	23	18	14
list size for $T_{B}$	25	8.2	2.6	1.3	1.1

Equations53

fER_{0} ⩽ t max fER_{t} ⩽ FER ⩽ t = 0 \sum L - 1 fER_{t} .

fER_{0} ⩽ t max fER_{t} ⩽ FER ⩽ t = 0 \sum L - 1 fER_{t} .

fER = \frac{1}{L} t = 0 \sum L - 1 fER_{t},

fER = \frac{1}{L} t = 0 \sum L - 1 fER_{t},

fER = \frac{number of erroneous decoded sub-frames}{total number of transmitted sub-frames} .

fER = \frac{number of erroneous decoded sub-frames}{total number of transmitted sub-frames} .

Pr {E_{t}} ⩽ fER_{0},

Pr {E_{t}} ⩽ fER_{0},

fER_{t}

fER_{t}

⩽ i = 0 \sum t Pr {E_{i}} = (t + 1) fER_{0} .

fER ⩽ \frac{1}{L} t = 0 \sum L - 1 (t + 1) fER_{0} = \frac{L + 1}{2} \cdot fER_{0} .

fER ⩽ \frac{1}{L} t = 0 \sum L - 1 (t + 1) fER_{0} = \frac{L + 1}{2} \cdot fER_{0} .

\hat{v}^{(0)} =

\hat{v}^{(0)} =

=

P (y^{(0)} y^{(1)} ∣ v^{(0)}) =

P (y^{(0)} y^{(1)} ∣ v^{(0)}) =

=

P (v^{(0)} ∣ y^{(0)} y^{(1)}) \propto P (y^{(0)} ∣ v^{(0)}) v^{(1)} \sum P (y^{(1)} ∣ v^{(0)} v^{(1)}),

P (v^{(0)} ∣ y^{(0)} y^{(1)}) \propto P (y^{(0)} ∣ v^{(0)}) v^{(1)} \sum P (y^{(1)} ∣ v^{(0)} v^{(1)}),

(\hat{v}^{(0)}, \hat{v}^{(1)}) = ar g (v^{(0)}, v^{(1)}) max P (y^{(0)} y^{(1)} ∣ v^{(0)} v^{(1)}) .

(\hat{v}^{(0)}, \hat{v}^{(1)}) = ar g (v^{(0)}, v^{(1)}) max P (y^{(0)} y^{(1)} ∣ v^{(0)} v^{(1)}) .

P (y^{(0)} y^{(1)} ∣ v^{(0)} v^{(1)}) = P (y^{(0)} ∣ v^{(0)}) P (y^{(1)} ∣ v^{(0)} v^{(1)}) .

P (y^{(0)} y^{(1)} ∣ v^{(0)} v^{(1)}) = P (y^{(0)} ∣ v^{(0)}) P (y^{(1)} ∣ v^{(0)} v^{(1)}) .

\hat{v}^{(0)} = ar g v^{(0)} max P (y^{(0)} ∣ v^{(0)}) [v^{(1)} max P (y^{(1)} ∣ v^{(0)} v^{(1)})] .

\hat{v}^{(0)} = ar g v^{(0)} max P (y^{(0)} ∣ v^{(0)}) [v^{(1)} max P (y^{(1)} ∣ v^{(0)} v^{(1)})] .

\hat{v}^{(0)} = ar g v^{(0)} \in L max P (y^{(0)} ∣ v^{(0)}) [v^{(1)} max P (y^{(1)} ∣ v^{(0)} v^{(1)})] .

\hat{v}^{(0)} = ar g v^{(0)} \in L max P (y^{(0)} ∣ v^{(0)}) [v^{(1)} max P (y^{(1)} ∣ v^{(0)} v^{(1)})] .

D (x, y) = \frac{1}{n} lo g_{2} \frac{P ( y ∣ x )}{P ( y )},

D (x, y) = \frac{1}{n} lo g_{2} \frac{P ( y ∣ x )}{P ( y )},

P (y) = i = 0 \prod n - 1 (\frac{1}{2} P (y_{i} ∣0) + \frac{1}{2} P (y_{i} ∣1)) .

P (y) = i = 0 \prod n - 1 (\frac{1}{2} P (y_{i} ∣0) + \frac{1}{2} P (y_{i} ∣1)) .

\lim_{n\rightarrow\infty}P\left[\big{|}D(\boldsymbol{v},\boldsymbol{y})-I(X;Y)\big{|}\leqslant\epsilon\right]=1,

\lim_{n\rightarrow\infty}P\left[\big{|}D(\boldsymbol{v},\boldsymbol{y})-I(X;Y)\big{|}\leqslant\epsilon\right]=1,

D (x, y) \approx E_{Y ∣ V} [\frac{1}{2} lo g_{2} \frac{P ( Y ∣0 )}{P ( Y )} + \frac{1}{2} lo g_{2} \frac{P ( Y ∣1 )}{P ( Y )}],

D (x, y) \approx E_{Y ∣ V} [\frac{1}{2} lo g_{2} \frac{P ( Y ∣0 )}{P ( Y )} + \frac{1}{2} lo g_{2} \frac{P ( Y ∣1 )}{P ( Y )}],

M_{2} (\hat{v}_{ℓ}^{(0)}) = D (\hat{v}_{ℓ}^{(0)}, y^{(0)}) + D (\tilde{v}_{ℓ}, y^{(1)} ⊙ ϕ (\hat{v}_{ℓ}^{(0)} R)),

M_{2} (\hat{v}_{ℓ}^{(0)}) = D (\hat{v}_{ℓ}^{(0)}, y^{(0)}) + D (\tilde{v}_{ℓ}, y^{(1)} ⊙ ϕ (\hat{v}_{ℓ}^{(0)} R)),

z^{(0)} = y^{(t + 1)} ⊙ ϕ (\hat{v}_{max}^{(t)} R) .

z^{(0)} = y^{(t + 1)} ⊙ ϕ (\hat{v}_{max}^{(t)} R) .

fER_{0}

fER_{0}

⩽ Pr ⎩ ⎨ ⎧ v^{(0)} \neq = 0 v^{(1)} ⋃ (v^{(0)}, v^{(1)}) is more likely than (0, 0) ⎭ ⎬ ⎫

⩽ v^{(0)} \neq = 0 v^{(1)} \sum Pr {(v^{(0)}, v^{(1)}) is more likely than (0, 0)} .

\mathscr{C}^{(0,1)}=\left\{(\boldsymbol{c}^{(0)},\boldsymbol{c}^{(1)})\bigg{|}\begin{array}[]{c}\boldsymbol{c}=(\boldsymbol{c}^{(0)},\cdots,\boldsymbol{c}^{(L)}){\rm~{}is~{}a~{}coded}\\ {\rm~{}sequence~{}with~{}}\boldsymbol{c}^{(0)}\neq\boldsymbol{0}\end{array}\right\}.

\mathscr{C}^{(0,1)}=\left\{(\boldsymbol{c}^{(0)},\boldsymbol{c}^{(1)})\bigg{|}\begin{array}[]{c}\boldsymbol{c}=(\boldsymbol{c}^{(0)},\cdots,\boldsymbol{c}^{(L)}){\rm~{}is~{}a~{}coded}\\ {\rm~{}sequence~{}with~{}}\boldsymbol{c}^{(0)}\neq\boldsymbol{0}\end{array}\right\}.

B (X) = 2^{- n + k} (1 + X)^{n} A (X) = w = 1 \sum 2 n B_{w} X^{w} .

B (X) = 2^{- n + k} (1 + X)^{n} A (X) = w = 1 \sum 2 n B_{w} X^{w} .

fER_{0} ⩽ w = 1 \sum 2 n B_{w} Q (\frac{w}{σ ^{2}}),

fER_{0} ⩽ w = 1 \sum 2 n B_{w} Q (\frac{w}{σ ^{2}}),

# Operations = (s + \overset{ˉ}{ℓ} - 1 + \overset{ˉ}{ℓ} s) n .

# Operations = (s + \overset{ˉ}{ℓ} - 1 + \overset{ˉ}{ℓ} s) n .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsError Correcting Code Techniques · Advanced Wireless Communication Techniques · DNA and Biological Computing

Full text

Successive Cancellation List Decoding of Semi-random Unit Memory Convolutional Codes

Wenchao Lin, Suihua Cai, Baodian Wei, and Xiao Ma ∗Corresponding author is Xiao Ma. This work was supported by the NSF of China (No. 61771499 and No. 61972431), the Science and Technology Planning Project of Guangdong Province (2018B010114001), the National Key R&D Program of China (2017YFB0802503), the Basic Research Project of Guangdong Provincial NSF (2016A030308008) and the Guangdong Basic and Applied Basic Research Foundation (2020A1515010687).This work was presented in part at 2019 IEEE International Symposium on Information Theory and 2018 IEEE International Symposium on Turbo Codes & Iterative Information Processing.The authors are with the School of Data and Computer Science and Guangdong Key Laboratory of Information Security Technology, Sun Yat-sen University, Guangzhou 510006, China (e-mail: [email protected], [email protected], [email protected], [email protected]).

Abstract

We present in this paper a special class of unit memory convolutional codes (UMCCs), called semi-random UMCCs (SRUMCCs), where the information block is first encoded by a short block code and then transmitted in a block Markov (random) superposition manner. We propose a successive cancellation list decoding algorithm, by which a list of candidate codewords are generated serially until one passes an empirical divergence test instead of the conventional cyclic redundancy check (CRC). The threshold for testing the correctness of candidate codewords can be learned off-line based on the statistical behavior of the introduced empirical divergence function (EDF). The performance-complexity tradeoff and the performance-delay tradeoff can be achieved by adjusting the statistical threshold and the decoding window size. To analyze the performance, a closed-form upper bound and a simulated lower bound are derived. Simulation results verify our analysis and show that:

The proposed list decoding algorithm with empirical divergence test outperforms the sequential decoding in high signal-to-noise ratio (SNR) region;
Taking the tail-biting convolutional codes (TBCC) as the basic codes, the proposed list decoding of SRUMCCs have comparable performance with the polar codes under the constraint of equivalent decoding delay.

Index Terms:

block Markov superposition transmission, empirical divergence test, successive cancellation list decoding, ultra-reliable and low latency communication (URLLC), unit memory convolutional code.

I Introduction

The channel coding theorem states that reliable transmission with arbitrarily low error rate is possible with unbounded coding length as long as the transmission rate is below the channel capacity [1]. The theorem was proved by the use of the random code ensemble, which usually has no efficient encoding and decoding algorithm. Therefore, much effort has been paid on constructing capacity-approaching channel codes with acceptable encoding and decoding complexity. Block codes and convolutional codes are two types of codes. In block coding, the information sequence is divided into $k$ -bit blocks, each being encoded independently. A number of powerful iteratively decodable block codes with long block length have been proposed. For example, low-density parity check (LDPC) codes [2] and turbo codes [3] perform near the Shannon limits under iterative belief propagation (BP) decoding algorithm. In contrast to block codes, the convolutional codes [4] are stream-oriented. The output from a convolutional encoder depends not only on the current input but also on the previous inputs. The classical convolutional codes typically have small constraint length and hence perform far away from the Shannon limits. An important class of convolutional codes is the unit memory convolutional codes (UMCCs) [5], since any convolutional code can be interpreted as a UMCC. It was pointed out in [5] that the UMCCs always achieve the largest free distance among all convolutional codes with the same rate and number of encoder states, indicating that the UMCCs perform better than classical convolutional codes with the same decoding complexity. The distance profile of the time-varying UMCCs was analyzed in [6] and good UMCCs were designed by search algorithms in [7, 8]. Efficient decoding algorithms for UMCCs were investigated in [9, 10, 11]. Since the rediscovery of LDPC codes, a class of capacity-approaching convolutional codes, called LDPC convolutional codes [12] or spatially coupled LDPC codes [13], have been constructed by coupling the parity-check matrices of the LDPC block codes. Note that the capacity-approaching convolutional codes can also be constructed by coupling the generator matrices of block codes [14].

The aforementioned codes designed for approaching the channel capacity are not suitable for emerging applications that are sensitive to the delay. Particularly, the ultra-reliable and low latency communications (URLLC) has caught more and more attention, which focuses on services with strict latency constraint, such as automated driving, medical applications, industrial automation and augmented/virtual reality. Hence, it becomes important to design efficient channel codes with short and moderate length (e.g., a thousand or less information bits) [15]. One solution is to construct LDPC codes by progressive edge growth (PEG) algorithm [16], which can deliver better codes than randomly constructed LDPC codes in short block length regime. Polar codes [17], another promising solution for short packet transmission, have been adopted by the 5G standard [18] for the control channel. Many works on constructions, decoding algorithms and decoder implementations for short polar codes have been reported [19, 20, 21, 22, 23]. Powerful classical short codes with near maximum likelihood decoding algorithm were also investigated for low latency communication. In [15], the extended Bose-Chaudhuri-Hocquenghem (BCH) codes were shown to perform near the normal approximation benchmark under ordered statistics decoding (OSD) [24]. As shown in [25], in the short block length regime, the tail-biting convolutional codes (TBCCs) with the wrap-around Viterbi algorithm (WAVA) [26] outperform significantly state-of-the-art iterative coding schemes. For the streaming services with strict latency constraint, such as real-time online games and video conferences, convolutional codes with a small decoding window size can be alternative choices. The comparison in [27, 28] between convolutional codes and PEG-LDPC codes showed that convolutional codes outperform LDPC codes for very short delay when the bit error rate is used as a performance metric.

In [29], we have proposed a class of block oriented convolutional codes, named semi-random block oriented convolutional codes (SRBOCCs), which is reduced to semi-random UMCCs (SRUMCCs) if the encoding memory $m$ is set to one. In [30], taking the truncated convolutional codes as the basic codes, we proposed a list decoding algorithm for SRUMCCs. We also showed in [31] that the performance can be further improved by taking TBCCs as the basic codes. As extension works of [29, 30, 31], we present in this paper more details on the SRUMCCs.

The encoding of the SRUMCCs consists of a structured coding process and a random coding process. At each time, the input information block (referred to as a sub-frame) is first encoded by a structured basic code and then superimposed with the random linear transformation of the previous codeword, resulting in a sub-block for transmission. Compared to the classical UMCCs, the input to the encoder of the SRUMCC at a time unit has the same length as the dimension of the basic code, which is typically large (e.g., $k\geqslant 32$ ), indicating that decoding the SRUMCCs with the Viterbi algorithm is impractical. Also because of the block-oriented feature, it makes sense to introduce the average sub-frame error rate as the performance metric, in addition to the commonly-used bit error rate (BER) and/or frame error rate (FER).

Another distinguished feature of the SRUMCCs is the randomness introduced by the random linear transformation, which is critical to develop a successive cancellation list decoding algorithm. The basic idea is to find a list of candidate codewords of the first sub-frame, and then to identify the transmitted one from the list based on the statistical behavior (in terms of the empirical divergence) of the second sub-frame. Evidently, any few errors in the first sub-frame will be boosted by the random transformation, resulting in a detectable effect on the second sub-frame. Hence, the correct candidate can be reliably distinguished from the erroneous ones.

With the proposed successive cancellation list decoding, the SRUMCCs have the following three attractive features.

•

The construction of SRUMCCs is flexible, in the sense that any codes with fast encoding algorithms and efficient list decoding algorithms can be taken as the basic codes. This suggests that the SRUMCCs can support a wide range of code rates by simply choosing the basic codes with the desired rate.

•

The performance of the successive cancellation list decoding algorithm depends critically on the performance of the first sub-frame, which can be predicted analytically by an upper bound derived from the weight enumerating functions (WEFs) of the basic codes. Simulation results show that, in high SNR region, the performance of the SRUMCCs are well predicted by the upper bounds.

•

The performance-complexity tradeoff and the performance-delay tradeoff can be achieved by adjusting the statistical threshold and the decoding window size.

This paper is organized as follows. In Section II, we present the encoding algorithm of the SRUMCCs. In Section III, the list decoding with empirical divergence test is proposed. In Section IV, by analyzing the performance and decoding complexity, the performance-complexity tradeoff and performance-delay tradeoff are discussed. Simulation results are presented in Section V. Finally, some concluding remarks are given in Section VI.

II Semi-Random Unit Memory Convolutional Code

II-A Encoding Algorithm

Let $\boldsymbol{u}=(\boldsymbol{u}^{(0)},\boldsymbol{u}^{(1)},\cdots,\boldsymbol{u}^{(L-1)})$ be the data to be transmitted, where $\boldsymbol{u}^{(t)}=(u^{(t)}_{0},u^{(t)}_{1},\cdots,u^{(t)}_{k-1})\in\mathbb{F}_{2}^{k}$ for $0\leqslant t\leqslant L-1$ . Taking a binary linear code $\mathscr{C}$ of dimension $k$ and length $n$ as the basic code, the encoding algorithm of the SRUMCC is described in Algorithm 1 (see Fig. 1 for reference). The code rate of the SRUMCC is $R=k/n\times L/(L+1)$ , which is slightly less than that of the basic code $\mathscr{C}$ . However, the rate loss is negligible for large $L$ .

Remarks: Recalling that the encoding of classical UMCCs is performed by computing $\boldsymbol{c}^{(t)}=\boldsymbol{u}^{(t)}\mathbf{G}_{0}+\boldsymbol{u}^{(t-1)}\mathbf{G}_{1}$ for $t\geqslant 0$ , the proposed SRUMCCs can be viewed as a special class of UMCCs with $\mathbf{G}_{0}=\mathbf{S}$ and $\mathbf{G}_{1}=\mathbf{SR}$ , of which one is structured and the other is random, hence the name. The speciality is outlined as below.

•

Unlike commonly accepted classical UMCCs with small $k$ , the SRUMCCs typically have large $k$ (hence large constraint length) induced by the block oriented encoding process, as is the same case for the convolutional LDPC codes. It makes sense to introduce the average sub-frame error rate, which is denoted as fER and described in the next subsection, as a new performance metric.

•

Due to the large constraint length, the Viterbi algorithm (VA), which is an efficient maximum likelihood decoding algorithm for classical UMCCs, does not apply to the decoding of SRUMCCs. Therefore, it is important to develop an efficient decoding algorithm for SRUMCCs, which is the main topic of this paper.

•

The encoding of the SRUMCCs involves a structured coding process and a random coding process, thus termed as “semi-random”. Good UMCCs with short constraint length are usually constructed by computer search [7], while good SRUMCCs can be constructed easily by generating $\mathbf{R}$ randomly. The randomness is helpful to the decoding process of the SRUMCCs, since any error pattern of the first sub-frame will result in a detectable effect on the next sub-frame.

As a kind of convolutional code, the SRUMCC has streaming properties. In other words, the encoded bits can be generated without waiting for the whole input block while the received signal can be decoded by a sliding window decoding algorithm with tunable delays. In contrast to block codes, in the SRUMCC coded system, the latency constraint is fulfilled by the limited decoding window instead of the short coding length.

II-B Performance Metric

Suppose that $\boldsymbol{c}^{(t)}$ is modulated with binary phase-shift keying (BPSK) signals and transmitted over additive white Gaussian noise (AWGN) channels, resulting in a noisy version $\boldsymbol{y}^{(t)}\in\mathbb{R}^{n}$ at the receiver. We focus on a sliding window decoding algorithm with the decoding window $w$ , which attempts to recover $\boldsymbol{u}^{(t)}$ from $(\boldsymbol{y}^{(t)},\cdots,\boldsymbol{y}^{(t+w-1)})$ . In other words, the decoding delay is $wn$ in terms of bits.

Given a decoding algorithm, define ${\rm fER}_{t}$ for $0\leqslant t\leqslant L-1$ as the probability that the decoding result $\hat{\boldsymbol{u}}^{(t)}$ is not equal to the transmitted vector $\boldsymbol{u}^{(t)}$ and $\rm FER$ as the probability that the decoding result $\hat{\boldsymbol{u}}$ is not equal to $\boldsymbol{u}$ . It is not difficult to verify that

[TABLE]

We define

[TABLE]

which is used as the performance metric in this paper111For conventional block codes, such as polar codes, we define $\rm fER$ as the probability that the decoding codeword is not equal to the transmitted codeword. That is, $\rm fER=FER$ . and can be evaluated in practice by

[TABLE]

The event that the decoding result $\hat{\boldsymbol{u}}^{(0)}$ is not equal to the transmitted vector $\boldsymbol{u}^{(0)}$ is referred to as the first error event $E_{0}$ . In general, we say that the first error event occurs at time $t$ , which is denoted by $E_{t}$ , if $\hat{\boldsymbol{u}}^{(i)}=\boldsymbol{u}^{(i)}$ for all $i<t$ but $\hat{\boldsymbol{u}}^{(t)}\neq\boldsymbol{u}^{(t)}$ . The probability that the first error event occurs at time $t$ can be bounded by

[TABLE]

since with $\hat{\boldsymbol{u}}^{(t-1)}$ being correct, the performance of the $t$ -th sub-frame will not be worse than that of the first sub-frame. In the worst case, the first error event at time $t$ causes catastrophic error-propagation. That is, the event $E_{t}=\{\hat{\boldsymbol{u}}^{(t)}\neq\boldsymbol{u}^{(t)}\}$ can cause $\hat{\boldsymbol{u}}^{(j)}\neq\boldsymbol{u}^{(j)}$ for all $j>t$ . The ${\rm fER}_{t}$ can be bounded by

[TABLE]

Therefore, the $\rm fER$ can be upper bounded by

[TABLE]

III Successive Cancellation List Decoding

As a kind of convolutional codes with large constraint length, the SRUMCCs are typically non-decodable by VA. The sub-optimal sequential decoding mentioned in [29] can be employed for decoding the SRUMCCs, although the memory load is heavy due to the requirement of a large amount of stack memory.

In this paper, we propose a sliding window algorithm with successive cancellation. The first and critical step is to recover reliably $\boldsymbol{v}^{(0)}$ , which is not interfered by any other sub-frames. By removing the effect of the first sub-frame, the second sub-frame is then decoded in the same way. This process will be continued until all sub-frames are decoded. In this section, we focus on the methods to estimate $\boldsymbol{v}^{(0)}$ from $(\boldsymbol{y}^{(0)},\boldsymbol{y}^{(1)})$ . The complete decoding algorithm is summarized in Algorithm 2 and the extension to the recovery of $\boldsymbol{v}^{(0)}$ from $(\boldsymbol{y}^{(0)},\boldsymbol{y}^{(1)},\boldsymbol{y}^{(2)})$ will be discussed in Subsection IV-B.

For illustrating the basic idea, we first introduce the maximum a posteriori (MAP) decoding and the maximum likelihood (ML) decoding, although they seem to be less practical. The list decoding with empirical divergence test is then proposed.

III-A Maximum A Posteriori Decoding

The MAP decoding is optimal in the sense that the error probability of $\boldsymbol{v}^{(0)}$ (i.e., $\rm fER_{0}$ ) is minimized. The MAP decoder always outputs the codeword222Without causing much ambiguity, we will use $\hat{\boldsymbol{v}}^{(0)}$ , $\hat{\boldsymbol{c}}^{(0)}$ and $\hat{\boldsymbol{u}}^{(0)}$ interchangeably in the remainder of this paper.

[TABLE]

where $P(\cdot)$ is the probability mass (or density) function. Since the channel is memoryless and $\boldsymbol{v}^{(0)}$ is independent with $\boldsymbol{v}^{(1)}$ , we have

[TABLE]

Therefore, we have

[TABLE]

by noticing that $\frac{P(\boldsymbol{v}^{(0)})}{P(\boldsymbol{y}^{(0)}\boldsymbol{y}^{(1)})}$ is constant for all $\boldsymbol{v}^{(0)}$ .

To find such a codeword $\hat{\boldsymbol{v}}^{(0)}$ , the MAP decoder explores all $2^{2nR}$ possible codewords $(\boldsymbol{v}^{(0)},\boldsymbol{v}^{(1)})$ , which implies that the complexity increases exponentially with the length of the basic code.

III-B Maximum Likelihood Decoding

Different from the MAP decoder, the ML decoder minimizes the error probability of the codeword $(\boldsymbol{v}^{(0)},\boldsymbol{v}^{(1)})$ . The ML decoder selects $\hat{\boldsymbol{v}}^{(0)}$ as output such that

[TABLE]

Since the channel is memoryless, we have

[TABLE]

Equivalently, the ML decoder outputs the codeword

[TABLE]

The ML decoding can also be viewed as an approximation to the MAP decoding, since the term $\max_{\boldsymbol{v}^{(1)}}P(\boldsymbol{y}^{(1)}|\boldsymbol{v}^{(0)}\boldsymbol{v}^{(1)})$ is the dominant term in $\sum_{\boldsymbol{v}^{(1)}}P(\boldsymbol{y}^{(1)}|\boldsymbol{v}^{(0)}\boldsymbol{v}^{(1)})$ .

Given $\boldsymbol{v}^{(0)}$ , the inner maximization over $\boldsymbol{v}^{(1)}$ in (12) can be achieved by performing the Viterbi algorithm (VA), which is more efficient than exploring all possible $\boldsymbol{v}^{(1)}$ . Unfortunately, no efficient algorithm to achieve the outer maximization, except exploring all $2^{nR}$ possible $\boldsymbol{v}^{(0)}$ , which implies that the complexity is lower than the MAP decoding but still increases exponentially with the length of the basic code.

III-C List Decoding with Empirical Divergence Test

One obvious way to reduce the complexity of the ML decoding is to limit the search space for $\boldsymbol{v}^{(0)}$ . Let $\mathcal{L}\subset\mathscr{C}$ be a list of $\ell_{\rm max}$ codewords. The decoder outputs the codeword

[TABLE]

Obviously, if the transmitted codeword $\boldsymbol{v}^{(0)}$ is included in the list $\mathcal{L}$ , the decoder with reduced search space performs no worse than the ML decoder. In contrast, an error must occur if the transmitted codeword is not in the list. Therefore, we need to generate efficiently a list $\mathcal{L}$ which contains the transmitted codeword with high probability.

We assume that the basic code $\mathscr{C}$ can be efficiently decoded by outputting a list of candidate codewords. To avoid messy notation, we omit the superscript of $\boldsymbol{v}^{(0)}$ and assume that a codeword $\boldsymbol{v}\in\mathscr{C}$ is transmitted. Upon receiving its noisy version $\boldsymbol{y}=(y_{0},y_{1},\cdots,y_{n-1})$ , the decoder serially outputs a list of candidate codewords $\hat{\boldsymbol{v}}_{\ell}$ , $\ell=1,2,\cdots,\ell_{\max}$ , where $\ell_{\max}$ is a parameter to trade off the performance against the complexity. We will not focus on the detailed implementation in this paper but simply conduct the serial list Viterbi algorithm (SLVA) [32] over the trellis representation of the basic code. For ease of notation, we use SLVA( $\boldsymbol{y}$ , $\ell$ ) to represent the $\ell$ -th output of the SLVA. In particular, SLVA( $\boldsymbol{y}$ , 1), simply denoted by VA( $\boldsymbol{y}$ ), is the output of the VA.

The list decoding is successful if the transmitted codeword occurs in the list. Obviously, the probability of the list decoding being successful can be as high as required by enlarging the list size $\ell_{\max}$ . Example 1 shows the performance of a TBCC under list decoding.

Example 1

*The $16$ -state $(2,1,4)$ TBCC defined by the polynomial generator matrix [33] $G(D)=[D^{4}+D^{2}+D+1,D^{4}+D^{3}+1]$ (denoted as $[27,31]_{8}$ in octal form for short) with information length $k=32$ ( $n=64$ ) is considered. The list decoding performance is shown in Fig. 2. *

For a large list size (e.g., $\ell_{\rm max}=64$ ), the transmitted codeword is included in the list with high probability. However, the average list sizes required to contain the correct candidate codeword can be much smaller than $\ell_{\rm max}=64$ , as tabulated in Table I. This implies that, in many cases, the list size can be smaller than $\ell_{\rm max}$ . To reduce the complexity, a serial list decoding is employed for the basic codes so that we can exit the decoding algorithm once the correct candidate codeword is identified. Then a question arises: How to check the correctness of the candidate codeword? One solution is to invoke the cyclic redundancy check (CRC), as embedded in polar codes [34]. However, the overhead (rate loss) due to the CRC is intolerable especially for a short basic code. Motivated by the jointly typical set decoding, which is employed in [35, Section 3.2] to prove the channel coding theorem, we consider checking the correctness of the candidate codeword by typicality. The list decoding process will terminate if a candidate codeword is found to be “jointly typical” with the received signal.

To proceed, we need the following concept. For the received signal $\boldsymbol{y}=(y_{0},\cdots,y_{n-1})\in\mathbb{R}^{n}$ , we define an empirical divergence function (EDF) as

[TABLE]

for $\boldsymbol{x}\in\mathbb{F}_{2}^{n}$ , where

[TABLE]

Note that, in the above definition, $P(\boldsymbol{y})$ is not equal to $2^{-k}\sum_{\boldsymbol{v}\in\mathscr{C}}P(\boldsymbol{y}|\boldsymbol{v})$ but to $2^{-n}\sum_{\boldsymbol{x}\in\mathbb{F}_{2}^{n}}P(\boldsymbol{y}|\boldsymbol{x})$ . Also note that $\boldsymbol{x}$ is not necessarily a codeword of $\mathscr{C}$ . Especially, we are interested in the following cases.

If $\boldsymbol{v}$ is the transmitted codeword, we have $D(\boldsymbol{v},\boldsymbol{y})\approx I(X;Y)>0$ , where $\approx$ is used to indicate that the EDF is around in probability its expectation for large $n$ . Here $I(X;Y)$ is the mutual information between the channel output $Y$ and the uniform binary input $X$ . To be precise, $D(\boldsymbol{v},\boldsymbol{y})\approx I(X;Y)$ means that, for an arbitrary small positive number $\epsilon$ ,

[TABLE]

as guaranteed by the weak law of large numbers (WLLN). 2. 2.

If $\boldsymbol{x}$ is randomly generated (hence typically not equal to the transmitted codeword), from the WLLN, we have

[TABLE]

which is negative from the concavity of the function $\log_{2}(\cdot)$ . 3. 3.

What are the typical values of $D(\hat{\boldsymbol{v}},\boldsymbol{y})$ , where $\hat{\boldsymbol{v}}={\rm VA}(\boldsymbol{y})$ ? Given $\boldsymbol{y}$ , since $D(\hat{\boldsymbol{v}},\boldsymbol{y})=\max_{\boldsymbol{v}\in\mathscr{C}}D(\boldsymbol{v},\boldsymbol{y})$ , we expect that $D(\hat{\boldsymbol{v}},\boldsymbol{y})\geqslant D(\boldsymbol{v},\boldsymbol{y})\approx I(X;Y)>0$ . 4. 4.

What about $D(\tilde{\boldsymbol{v}},\tilde{\boldsymbol{y}})$ ? Here $\tilde{\boldsymbol{v}}={\rm VA}(\tilde{\boldsymbol{y}})$ where $\tilde{\boldsymbol{y}}=\boldsymbol{x}\odot\boldsymbol{y}$ with $\boldsymbol{x}$ being a totally random bipolar vector and $\odot$ stands for component-wise product. That is, we first randomly flip the received vector $\boldsymbol{y}$ , and then execute the VA to find the first candidate codeword $\tilde{\boldsymbol{v}}$ . We expect that $D(\tilde{\boldsymbol{v}},\tilde{\boldsymbol{y}})$ is located between $D(\boldsymbol{v},\boldsymbol{y})$ of the first case and $D(\boldsymbol{x},\boldsymbol{y})$ of the second case.

Example 2

We consider the TBCC in Example 1 again and set ${\rm SNR}=4~{}{\rm dB}$ , at which the mutual information is $I(X;Y)\approx 0.79$ . The histogram is shown in Fig. 3, from which we observed that $D(\boldsymbol{v},\boldsymbol{y})$ is likely to be large with $\boldsymbol{v}$ being the transmitted codeword (or the output of the VA corresponding to $\boldsymbol{y}$ ). Note that the statistical behavior of $D(\tilde{\boldsymbol{v}},\tilde{\boldsymbol{y}})$ is different from that of $D(\boldsymbol{x},\boldsymbol{y})$ , since $\tilde{\boldsymbol{v}}$ is dependent on $\tilde{\boldsymbol{y}}$ . The typical values of $D(\tilde{\boldsymbol{v}},\tilde{\boldsymbol{y}})$ are greater than those of $D(\boldsymbol{x},\boldsymbol{y})$ but less than those of $D(\boldsymbol{v},\boldsymbol{y})$ .

The statistical behavior of the EDF can be helpful in the decoding process of the SRUMCCs. In the case when the decoding result of the first sub-frame $\hat{\boldsymbol{v}}^{(0)}$ equals to $\boldsymbol{v}^{(0)}$ , $\boldsymbol{y}^{(1)}\odot\phi(\hat{\boldsymbol{v}}^{(0)}\mathbf{R})$ is the Gaussian noisy version of $\boldsymbol{v}^{(1)}$ , where $\phi(\hat{\boldsymbol{v}}^{(0)}\mathbf{R})$ is the BPSK signal corresponding to the binary vector $\hat{\boldsymbol{v}}^{(0)}\mathbf{R}$ . In contrast, in the case when $\hat{\boldsymbol{v}}^{(0)}\neq\boldsymbol{v}^{(0)}$ , $\boldsymbol{y}^{(1)}\odot\phi(\hat{\boldsymbol{v}}^{(0)}\mathbf{R})$ is the randomly flipped Gaussian noisy version of $\boldsymbol{v}^{(1)}$ . Since these two cases have different statistical impact on the EDF, we are able to distinguish with high probability whether $\boldsymbol{y}^{(1)}\odot\phi(\hat{\boldsymbol{v}}^{(0)}\mathbf{R})$ is the randomly flipped Gaussian noisy version of $\boldsymbol{v}^{(1)}$ (equivalently, $\hat{\boldsymbol{v}}^{(0)}$ is erroneous) or not.

Given $\boldsymbol{y}^{(0)}$ , the SLVA is implemented to deliver serially a list of candidate codewords $\hat{\boldsymbol{v}}^{(0)}_{\ell}$ , for $1\leqslant\ell\leqslant\ell_{\rm max}$ . For each candidate codeword, we define a soft metric

[TABLE]

where $\tilde{\boldsymbol{v}}_{\ell}$ is the output of the VA with $\boldsymbol{y}^{(1)}\odot\phi(\hat{\boldsymbol{v}}^{(0)}_{\ell}\mathbf{R})$ as the input. The first term in the right hand side of (18) specifies the EDF between the candidate codeword and the received vector $\boldsymbol{y}^{(0)}$ , while the second term is the EDF between $\boldsymbol{y}^{(1)}\odot\phi(\hat{\boldsymbol{v}}^{(0)}_{\ell}\mathbf{R})$ and its corresponding VA output $\tilde{\boldsymbol{v}}_{\ell}$ . Both of them are likely to be large in the case when the candidate codeword is the transmitted one. Heuristically, we will set a threshold on $M_{2}(\hat{\boldsymbol{v}}^{(0)}_{\ell})$ to check the correctness of the candidate codeword, as illustrated in Example 3.

Example 3

The TBCC in Example 1 is taken as the basic code. We set ${\rm SNR}=3~{}{\rm dB}$ and $\ell_{\rm max}=64$ . With the help of the histogram shown in Fig. 4, we set a threshold $T$ to distinguish the correct candidate codeword from the erroneous one. The candidate codeword $\hat{\boldsymbol{v}}^{(0)}_{\ell}$ is treated to be correct only if $M_{2}(\hat{\boldsymbol{v}}^{(0)}_{\ell})\geqslant T$ , where $T$ is usually set large (e.g., $T=1.2$ in this example) to reduce the probability that an erroneous candidate is mistaken as the correct one. The threshold $T$ , depending on SNRs and coding parameters, can be determined off-line and stored for use in the decoding algorithm.

The list decoding algorithm, as summarized in Algorithm 2, is outlined as follows. The decoder employs the SLVA to compute the candidate codewords, which will be checked by (18) with a preset threshold, until finding a qualified one. If the list size reaches the maximum $\ell_{\rm max}$ and no candidate codeword is qualified, the decoder delivers $\hat{\boldsymbol{v}}^{(0)}_{\ell}$ with the maximum $M_{2}(\hat{\boldsymbol{v}}^{(0)}_{\ell})$ as output.

Remarks: The proposed list decoding algorithm is similar to the Feinstein’s suboptimal decoder presented in [36, Theorem 18], which is a conceptual algorithm to derive the performance bound for finite-length block codes. In Feinstein’s decoder, all codewords are tested one-by-one in a preset order (irrelevant to the received signal) by calculating the EDF. The first codeword with an EDF exceeding a fixed threshold is taken as the decoding output. This algorithm is rarely used in practice as we can imagine that the average number of tests to find the correct codeword is with the same order as the size of the codebook. In our algorithm, the codewords are tested serially in an order that is closely related to the received signal and determined by the SLVA on the first sub-frame. The first codeword with an EDF exceeding an off-line learned tunable threshold is taken as the decoding output. Evidenced by the simulation results, the transmitted codeword can be found with a small number of tests (hence low complexity), especially in the high SNR region. Compare to the fixed threshold, the tunable threshold is more attractive since the performance-complexity tradeoff can be achieved by adjusting the threshold.

It is worth pointing out that, besides the EDF, the likelihood function (or the Euclidean distance under the assumption of AWGN channel) can also be employed as a metric for the test. In this paper, we define the soft metric based on the EDF since it can be applied to a general channel. Another advantage of the EDF is the convenience in threshold design. As we have discussed, the expectation of the EDF between the transmitted codeword and the received signal is the mutual information between the channel output and input. Therefore, a rough threshold can be set directly based on the computable mutual information.

IV Performance and Complexity Analysis

IV-A Upper Bound

In this subsection, we derive an upper bound on ${\rm fER}_{0}$ under the ML decoding. Because of the linearity of the code, we assume that all zero codeword is transmitted. The ML decoder selects $\hat{\boldsymbol{v}}^{(0)}$ as output such that the codewords $(\hat{\boldsymbol{v}}^{(0)},\hat{\boldsymbol{v}}^{(1)})$ maximize $P(\boldsymbol{y}^{(0)}\boldsymbol{y}^{(1)}|\boldsymbol{v}^{(0)}\boldsymbol{v}^{(1)})$ . The ML decoding is successful if $\hat{\boldsymbol{v}}^{(0)}=\boldsymbol{0}$ and an error occurs if $\hat{\boldsymbol{v}}^{(0)}\neq\boldsymbol{0}$ . Note that $\hat{\boldsymbol{v}}^{(0)}$ can be correct even if $\hat{\boldsymbol{v}}^{(1)}\neq\boldsymbol{0}$ . The ${\rm fER}_{0}$ can be upper bounded by

[TABLE]

This bound is indeed the well-known union bound and can be calculated by deriving the weight distribution of the truncated code

[TABLE]

Let $A(X)$ be the WEF of the basic code $\mathscr{C}\backslash{\boldsymbol{0}}$ (all non-zero codewords). Then the ensemble WEF of the truncated code $\mathscr{C}^{(0,1)}$ with $\mathbf{R}$ being totally random is given by

[TABLE]

The upper bound on ${\rm fER}_{0}$ under the ML decoding is given by

[TABLE]

where $\sigma^{2}$ is the variance of the noise. Note that the bounding technique in [37], which is based on triplet-wise error probabilities, can also be applied here to tighten the upper bound in low SNR region. In this paper, we simply employ the union bound since we are interested in the performance in high SNR region, which can be well predicted by the union bound.

IV-B Lower Bound and Extended Windowed Decoding

Obviously, in the list decoding, the first sub-frame can be decoded correctly only if the transmitted codeword is included in the list. Therefore, the ${\rm fER}_{0}$ performance is not better than the list decoding performance of the basic code, which can be regarded as a lower bound and obtained by simulating the list decoding of the basic code. Example 4 is presented to illustrate the lower bound on ${\rm fER}_{0}$ .

Example 4

The basic code is the $16$ -state $(2,1,4)$ convolutional code with information length $k=32$ ( $n=64$ ), which is truncated without termination and defined by the polynomial generator matrix $G(D)=[27,31]_{8}$ . The list size is $\ell_{\rm max}=64$ and the thresholds are set properly based on the statistical behavior of the EDF. The ${\rm fER}_{0}$ performance of the list decoding are shown in Fig. 5, where “ $w=2$ ” corresponds to Algorithm 2. The corresponding lower bound is also plotted. We see that Algorithm 2 performs about $0.5~{}{\rm dB}$ away from the lower bound, implying that the statistical check is not always able to identify the transmitted codeword in the list. This gap can be narrowed, however, if the constraint on complexity and latency is relaxed. Indeed, we can extend Algorithm 2, which recovers $\boldsymbol{v}^{(0)}$ from $\boldsymbol{y}^{(0)}$ and $\boldsymbol{y}^{(1)}$ , to improve the performance by recovering $\boldsymbol{v}^{(0)}$ from $\boldsymbol{y}^{(0)}$ , $\boldsymbol{y}^{(1)}$ and $\boldsymbol{y}^{(2)}$ , as shown by the curve “ $w=3$ ” in Fig. 5. The details of such an extension is omitted here, while the basic idea is described below.

After receiving $\boldsymbol{y}^{(0)}$ and $\boldsymbol{y}^{(1)}$ , the decoder first attempts to recover $\boldsymbol{v}^{(0)}$ by Algorithm 2. In the case when the decision on $\boldsymbol{v}^{(0)}$ is not that confident, we keep a list of candidates for further processing. For each candidate $\hat{\boldsymbol{v}}^{(0)}$ , we perform Algorithm 2 to find $\hat{\boldsymbol{v}}^{(1)}$ and $\hat{\boldsymbol{v}}^{(2)}$ from $\boldsymbol{y}^{(1)}$ and $\boldsymbol{y}^{(2)}$ . Finally, we select $\hat{\boldsymbol{v}}^{(0)}$ such that $(\hat{\boldsymbol{v}}^{(0)},\hat{\boldsymbol{v}}^{(1)},\hat{\boldsymbol{v}}^{(2)})$ is the most likely candidate with respect to $(\boldsymbol{y}^{(0)},\boldsymbol{y}^{(1)},\boldsymbol{y}^{(2)})$ .

IV-C Decoding Complexity

In this subsection, taking the add-compare-select operation (the basic operation in both the VA and the SLVA) as an atomic operation, we analyze the complexity of the list decoding with empirical divergence test. Assume that the basic code $\mathscr{C}[n,k]$ has a trellis representation with $s$ states. To find the best candidate codeword by the SLVA (equivalently, by the VA), $sn$ operations are needed. With the $(\ell-1)$ -th best candidate codeword known, only $n$ operations are needed to find the $\ell$ -th best candidate codeword by the SLVA.

Let $\bar{\ell}\leqslant\ell_{\rm max}$ be the average list size. Then the SLVA requires on average $sn+(\bar{\ell}-1)n$ operations. For each candidate, the VA is employed to calculate the soft metric, which needs $\bar{\ell}sn$ operations. Hence the total operations for decoding each sub-frame is given by

[TABLE]

We see that the complexity is dominated by $\bar{\ell}sn$ . For fixed $n$ , to reduce the complexity, we can reduce the average list size $\bar{\ell}$ by tuning down the threshold.

V Simulation Results

In this section, all simulations are conducted by assuming BPSK modulation and AWGN channels. The SRUMCCs are terminated every $L=49$ blocks. All codes are decoded by Algorithm 2 with the maximum list size $\ell_{\rm max}=64$ and properly thresholds obtained based on the statistical behavior of the EDF, unless otherwise specified. All upper bounds taken as the benchmarks are derived by combining (6) and (22).

V-A Impact of Sub-frame Length on the Performance

Example 5

*The basic code is the $16$ -state $(2,1,4)$ convolutional code, which is truncated without termination and defined by the polynomial generator matrix $G(D)=[27,31]_{8}$ . Different sub-frame information lengths $k=32,48$ and different maximum list sizes $\ell_{\rm max}=64,128$ are considered. The $\rm fER$ is shown in Fig. 6. The upper bounds indicate that the ML performance of the SRUMCCs can be improved by increasing $k$ (hence the decoding delay). It is also worth pointing out that a larger $k$ usually requires a larger maximum list size $\ell_{\rm max}$ . For $k=32$ , the performance curve with $\ell_{\rm max}=64$ matches that with $\ell_{\rm max}=128$ , indicating that the performance is saturated with $\ell_{\rm max}=64$ . However, for $k=48$ , the performance can be improved by increasing the maximum list size from $\ell_{\rm max}=64$ to $\ell_{\rm max}=128$ . *

V-B Tradeoff Between Performance and Complexity

Example 6

The $16$ -state $(2,1,4)$ TBCC defined by the polynomial generator matrix $G(D)=[27,31]_{8}$ is taken as the basic code. The sub-frame information length is $k=32$ . We consider two sets of thresholds $T_{A}$ and $T_{B}$ specified in Table II. The $\rm fER$ is shown in Fig. 7, while the average list sizes needed for decoding a sub-frame are shown in Table II. We see that the complexity (average list size), at the cost of performance loss, can be reduced by tuning down the threshold. For example, at ${\rm SNR}=4~{}{\rm dB}$ , the computational complexity (average list size) can be reduced more than $10$ times if a performance degradation ( $\rm fER$ deterioration) is tolerated from $10^{-5}$ to $10^{-4}$ .

V-C Performance with Different Rates

Example 7

The $16$ -state $(2,1,4)$ TBCC defined by the polynomial generator matrix $G(D)=[27,31]_{8}$ , the $(3,1,4)$ TBCC defined by the polynomial generator matrix $G(D)=[25,33,37]_{8}$ and the $(4,1,4)$ TBCC defined by the polynomial generator matrix $G(D)=[25,27,33,37]_{8}$ are taken as the basic code. The sub-frame information lengths and the total rates are specified in the legends. The $\rm fER$ is shown in Fig. 8. We see that the SRUMCCs can support a wide range of code rates by simply choosing the basic code with the desired rate.

V-D Comparison with Sequential Decoding

Example 8

The Cartesian product of Reed-Muller code ${\rm RM}[8,4]^{8}$ is taken as the basic code. The sub-frame information length is $k=32$ . For comparison, the same code is also decoded by the sequential decoding [29] with the same decoding window and a stack of size 20000. The $\rm fER$ is shown in Fig. 9. We see that the proposed list decoding algorithm outperforms the sequential decoding algorithm in high SNR region.

V-E Comparison with Other Codes

Example 9

The $16$ -state $(2,1,4)$ TBCC defined by the polynomial generator matrix $G(D)=[27,31]_{8}$ is taken as the basic code. The sub-frame information length is $k=32$ . For comparison, we have also redrawn the performance curve of the polar code [21] without CRC. The coding length of the polar code is 128 (the same decoding delay as the SRUMCC). The $\rm fER$ is shown in Fig. 10, where “SCL(16)” represents the successive cancellation list algorithm [20] with list size 16. We see that the SRUMCC with list decoding is competitive with the polar code.

VI Conclusion

In this paper, we have presented more details on the SRUMCCs, which can be decoded by successive cancellation list decoding with empirical divergence test. The decoder outputs serially a list of decoding candidates and identifies the correct one by a statistical threshold, which can be designed based on the statistical behavior of the EDF. The performance-complexity tradeoff and the performance-delay tradeoff can be achieved by adjusting the statistical threshold and the decoding window size. A closed-form upper bound based on the weight enumerating function was derived to analyze the performance in high SNR region. Simulation results showed that the proposed list decoding outperforms the sequential decoding in high SNR region and that under the constraint of equivalent decoding delay, the SRUMCCs have comparable performance with the polar codes.

Bibliography37

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] C. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J. , vol. 27, no. 3, pp. 379–423, July 1948.
2[2] R. Gallager, Low-Density Parity-Check Codes . Cambridge, MA: MIT Press, 1963.
3[3] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding: Turbo-codes,” in Int. Conf. Commun. , Geneva, Switzerland, May 1993, pp. 1064–1070.
4[4] P. Elias, “Coding for noisy channels,” IRE Conv. Rec. , vol. 4, pp. 37–47, Jan. 1955.
5[5] L. Lee, “Short unit-memory byte-oriented binary convolutional codes having maximal free distance,” IEEE Trans. Inf. Theory , vol. 22, no. 3, pp. 349–352, May 1976.
6[6] C. Thommesen and J. Justesen, “Bounds on distances and error exponents of unit memory codes,” IEEE Trans. Inf. Theory , vol. 29, no. 5, pp. 637–649, Sept. 1983.
7[7] W. Ebel, “A directed search approach for unit-memory convolutional codes,” IEEE Trans. Inf. Theory , vol. 42, no. 4, pp. 1290–1297, Jul. 1996.
8[8] A. Said and R. Palazzo, “Using combinatorial optimization to design good unit-memory convolutional codes,” IEEE Trans. Inf. Theory , vol. 39, no. 3, pp. 1100–1108, May 1993.