A Lower Bound on the Probability of Error of Polar Codes over BMS   Channels

Boaz Shuval; Ido Tal

arXiv:1701.01628·cs.IT·March 6, 2018

A Lower Bound on the Probability of Error of Polar Codes over BMS Channels

Boaz Shuval, Ido Tal

PDF

Open Access

TL;DR

This paper introduces a novel method to derive tighter lower bounds on the probability of decoding error for polar codes over BMS channels by analyzing pairs of bits and reducing alphabet sizes.

Contribution

It proposes a new approach to lower-bound error probabilities of bit pairs in polar codes, improving upon existing bounds under successive-cancellation decoding.

Findings

01

Lower bounds on error probabilities are significantly improved.

02

The method effectively reduces alphabet sizes for analysis.

03

Results demonstrate tighter bounds compared to previous methods.

Abstract

Polar codes are a family of capacity-achieving codes that have explicit and low-complexity construction, encoding, and decoding algorithms. Decoding of polar codes is based on the successive-cancellation decoder, which decodes in a bit- wise manner. A decoding error occurs when at least one bit is erroneously decoded. The various codeword bits are correlated, yet performance analysis of polar codes ignores this dependence: the upper bound is based on the union bound, and the lower bound is based on the worst-performing bit. Improvement of the lower bound is afforded by considering error probabilities of two bits simultaneously. These are difficult to compute explicitly due to the large alphabet size inherent to polar codes. In this research we propose a method to lower-bound the error probabilities of bit pairs. We develop several transformations on pairs of synthetic channels that make…

Tables5

Table 1. TABLE I: Conditional distribution W a , b ( y a , y b | u a , u b ) subscript 𝑊 𝑎 𝑏 subscript 𝑦 𝑎 conditional subscript 𝑦 𝑏 subscript 𝑢 𝑎 subscript 𝑢 𝑏 W_{a,b}(y_{a},y_{b}|u_{a},u_{b}) . In this case, the ML decoders of the marginals do not minimize ℙ { ℰ a ∪ ℰ b } ℙ subscript ℰ 𝑎 subscript ℰ 𝑏 \mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\} .

$(u_{a}, u_{b})$	$(y_{a}, y_{b})$
$(u_{a}, u_{b})$	$(0, 0)$	$(0, 1)$	$(1, 0)$	$(1, 1)$
$(0, 0)$	$0.30$	$0.04$	$0.04$	$0.62$
$(0, 1)$	$0.44$	$0.46$	$0.01$	$0.09$
$(1, 0)$	$0.22$	$0.49$	$0.24$	$0.05$
$(1, 1)$	$0.05$	$0.54$	$0.32$	$0.09$

Table 2. TABLE II: Various decoders for joint channel W a , b subscript 𝑊 𝑎 𝑏 W_{a,b} from Table I . Three decoders are shown: the optimal decoder, the IML decoder, and the IMJP decoder. The leftmost column is the received joint channel output, and the remaining columns depict the decisions of the various decoders.

$(y_{a}, y_{b})$	$({\hat{u}}_{a}, {\hat{u}}_{b}) = ϕ (y_{a}, y_{b})$
$(y_{a}, y_{b})$	optimal	IML	IMJP
$(0, 0)$	$(0, 1)$	$(1, 1)$	$(1, 0)$
$(0, 1)$	$(1, 1)$	$(1, 0)$	$(1, 0)$
$(1, 0)$	$(1, 1)$	$(0, 1)$	$(0, 0)$
$(1, 1)$	$(0, 0)$	$(0, 0)$	$(0, 0)$

Table 3. TABLE III: Channel W a , b ′ ( y a , y b | u a , u b ) subscript superscript 𝑊 ′ 𝑎 𝑏 subscript 𝑦 𝑎 conditional subscript 𝑦 𝑏 subscript 𝑢 𝑎 subscript 𝑢 𝑏 W^{\prime}_{a,b}(y_{a},y_{b}|u_{a},u_{b}) , degraded from W a , b subscript 𝑊 𝑎 𝑏 W_{a,b} of Table I .

$(u_{a}, u_{b})$	$(y_{a}, y_{b})$
$(u_{a}, u_{b})$	$(0^{'}, 0^{'})$	$(1^{'}, 1^{'})$
$(0, 0)$	$0.92$	$0.08$
$(0, 1)$	$0.53$	$0.47$
$(1, 0)$	$0.27$	$0.73$
$(1, 1)$	$0.14$	$0.86$

Table 4. TABLE IV: Probability table of joint synthetic channel W − , + subscript 𝑊 W_{-,+} derived from a BSC with crossover probability 0.2 0.2 0.2 . Only the case where u a = 0 subscript 𝑢 𝑎 0 u_{a}=0 is shown.

( $u_{a} = 0$ )	$u_{b} = 0$			$u_{b} = 1$
$y_{a}$	$d_{b}$			$d_{b}$
$y_{a}$	$- \frac{15}{17}$	$0$	$\frac{15}{17}$	$- \frac{15}{17}$	$0$	$\frac{15}{17}$
$00$	$0.02$	$0$	$0$	$0.32$	$0$	$0$
$01$	$0$	$0.08$	$0$	$0$	$0.08$	$0$
$10$	$0$	$0.08$	$0$	$0$	$0.08$	$0$
$11$	$0$	$0$	$0.32$	$0$	$0$	$0.02$

Table 5. TABLE V: Probability table of the symmetrized version of the channel from Table IV . Only the case where u a = 0 subscript 𝑢 𝑎 0 u_{a}=0 is shown.

( $u_{a} = 0$ )	$u_{b} = 0$			$u_{b} = 1$
$y_{a}$	$d_{b}$			$d_{b}$
$y_{a}$	$- \frac{15}{17}$	$0$	$\frac{15}{17}$	$- \frac{15}{17}$	$0$	$\frac{15}{17}$
$\overset{\circ}{0}$	$0.02$	$0$	$0.32$	$0.32$	$0$	$0.02$
$\overset{\circ}{1}$	$0$	$0.16$	$0$	$0$	$0.16$	$0$

Equations303

W^{-} (y_{1}, y_{2} ∣ u_{1})

W^{-} (y_{1}, y_{2} ∣ u_{1})

W^{+} (y_{1}, y_{2}, u_{1} ∣ u_{2})

\hat{U}_{a} (y_{1}^{N}, \overset{u}{^}_{1}^{a - 1}) = ⎩ ⎨ ⎧ u_{a} arg max W_{a} (y_{1}^{N}, \overset{u}{^}_{1}^{a - 1} ∣ u_{a}), u_{a}, a \in A a \in A^{c},

\hat{U}_{a} (y_{1}^{N}, \overset{u}{^}_{1}^{a - 1}) = ⎩ ⎨ ⎧ u_{a} arg max W_{a} (y_{1}^{N}, \overset{u}{^}_{1}^{a - 1} ∣ u_{a}), u_{a}, a \in A a \in A^{c},

B_{a} = {u_{1}^{N}, y_{1}^{N} ∣ \overset{u}{^}_{1}^{a - 1} = u_{1}^{a - 1}, \hat{U}_{a} (y_{1}^{N}, \overset{u}{^}_{1}^{a - 1}) \neq = u_{a}} .

B_{a} = {u_{1}^{N}, y_{1}^{N} ∣ \overset{u}{^}_{1}^{a - 1} = u_{1}^{a - 1}, \hat{U}_{a} (y_{1}^{N}, \overset{u}{^}_{1}^{a - 1}) \neq = u_{a}} .

E_{a} = {u_{1}^{N}, y_{1}^{N} ∣ \hat{U}_{a} (y_{1}^{N}, u_{1}^{a - 1}) \neq = u_{a}} .

E_{a} = {u_{1}^{N}, y_{1}^{N} ∣ \hat{U}_{a} (y_{1}^{N}, u_{1}^{a - 1}) \neq = u_{a}} .

P {a \in A ⋃ E_{a}} \leq a \in A \sum P {E_{a}} .

P {a \in A ⋃ E_{a}} \leq a \in A \sum P {E_{a}} .

P {a \in A ⋃ E_{a}} \geq a \in A max P {E_{a}} .

P {a \in A ⋃ E_{a}} \geq a \in A max P {E_{a}} .

P {a \in A ⋃ E_{a}} \geq a, b \in A max P {E_{a} \cup E_{b}} .

P {a \in A ⋃ E_{a}} \geq a, b \in A max P {E_{a} \cup E_{b}} .

P {a \in A ⋃ E_{a}} \geq a \in A \sum P {E_{a}} - a, b \in A, a < b \sum P {E_{a} \cap E_{b}} .

P {a \in A ⋃ E_{a}} \geq a \in A \sum P {E_{a}} - a, b \in A, a < b \sum P {E_{a} \cap E_{b}} .

W (y ∣ u) = z \sum P (y ∣ z) Q (z ∣ u) .

W (y ∣ u) = z \sum P (y ∣ z) Q (z ∣ u) .

W_{a, b} (y_{a}, y_{b} ∣ u_{a}, u_{b}) = z_{a}, z_{b} \sum P (y_{a}, y_{b} ∣ z_{a}, z_{b}) Q_{a, b} (z_{a}, z_{b} ∣ u_{a}, u_{b}) .

W_{a, b} (y_{a}, y_{b} ∣ u_{a}, u_{b}) = z_{a}, z_{b} \sum P (y_{a}, y_{b} ∣ z_{a}, z_{b}) Q_{a, b} (z_{a}, z_{b} ∣ u_{a}, u_{b}) .

P_{e} (W) = u \sum y \sum \frac{W ( y ∣ u )}{∣ U ∣} P {ϕ (y) \neq = u} .

P_{e} (W) = u \sum y \sum \frac{W ( y ∣ u )}{∣ U ∣} P {ϕ (y) \neq = u} .

W (y ∣ u) > W (y ∣ u^{'}) \forall u^{'} \neq = u \Rightarrow ϕ (y) = u .

W (y ∣ u) > W (y ∣ u^{'}) \forall u^{'} \neq = u \Rightarrow ϕ (y) = u .

P_{e} (W_{a, b}) = u_{a}, u_{b} \sum y_{a}, y_{b} \sum \frac{W _{a, b} ( y _{a} , y _{b} ∣ u _{a} , u _{b} )}{∣ U ∣ ^{2}} P {ϕ (y_{a}, y_{b}) \neq = (u_{a}, u_{b})} .

P_{e} (W_{a, b}) = u_{a}, u_{b} \sum y_{a}, y_{b} \sum \frac{W _{a, b} ( y _{a} , y _{b} ∣ u _{a} , u _{b} )}{∣ U ∣ ^{2}} P {ϕ (y_{a}, y_{b}) \neq = (u_{a}, u_{b})} .

P_{e}^{⋆} (W_{a, b}) \leq ϕ_{a}, ϕ_{b} min P {E_{a} \cup E_{b}} \leq P_{e}^{IML} (W_{a, b}) .

P_{e}^{⋆} (W_{a, b}) \leq ϕ_{a}, ϕ_{b} min P {E_{a} \cup E_{b}} \leq P_{e}^{IML} (W_{a, b}) .

P_{e}^{IMJP} (W_{a, b}) = ϕ_{a}, ϕ_{b} min P {E_{a} \cup E_{b}} .

P_{e}^{IMJP} (W_{a, b}) = ϕ_{a}, ϕ_{b} min P {E_{a} \cup E_{b}} .

P_{e}^{⋆} (W_{a, b})

P_{e}^{⋆} (W_{a, b})

P_{e}^{IML} (W_{a, b})

P_{e}^{IMJP} (W_{a, b})

y_{b} = ((y_{1}^{N}, u_{1}^{a - 1}), u_{a}, u_{a + 1}^{b - 1}) \equiv (y_{a}, u_{a}, y),

y_{b} = ((y_{1}^{N}, u_{1}^{a - 1}), u_{a}, u_{a + 1}^{b - 1}) \equiv (y_{a}, u_{a}, y),

W_{a, b} (y_{a}, y_{b} ∣ u_{a}, u_{b}) = 2 W_{b} (y_{b} ∣ u_{b}) [y_{b} = (y_{a}, u_{a}, y)],

W_{a, b} (y_{a}, y_{b} ∣ u_{a}, u_{b}) = 2 W_{b} (y_{b} ∣ u_{b}) [y_{b} = (y_{a}, u_{a}, y)],

W_{a, b} (y_{a}, y_{b} ∣ u_{a}, u_{b})

W_{a, b} (y_{a}, y_{b} ∣ u_{a}, u_{b})

= W_{a, b} (y_{a}, u_{a}, y ∣ u_{a}, u_{b})

ϕ_{a} (y_{a}) = u_{a} arg max T (y_{a} ∣ u_{a}),

ϕ_{a} (y_{a}) = u_{a} arg max T (y_{a} ∣ u_{a}),

T (y_{a} ∣ u_{a}) = \frac{1}{2} u_{b}, y_{b} \sum W_{a, b} (y_{a}, y_{b} ∣ u_{a}, u_{b}) P {ϕ_{b} (y_{b}) = u_{b}} .

T (y_{a} ∣ u_{a}) = \frac{1}{2} u_{b}, y_{b} \sum W_{a, b} (y_{a}, y_{b} ∣ u_{a}, u_{b}) P {ϕ_{b} (y_{b}) = u_{b}} .

φ_{i} (y_{i}, u_{i}) ≜ P {ϕ_{i} (y_{i}) = u_{i}}, i = a, b .

φ_{i} (y_{i}, u_{i}) ≜ P {ϕ_{i} (y_{i}) = u_{i}}, i = a, b .

1 - P {E_{a} \cup E_{b}}

1 - P {E_{a} \cup E_{b}}

= \frac{1}{4} u_{a}, u_{b} \sum y_{a}, y_{b} \sum W_{a, b} (y_{a}, y_{b} ∣ u_{a}, u_{b}) φ_{a} (y_{a}, u_{a}) φ_{b} (y_{b}, u_{b})

= \frac{1}{2} u_{a}, y_{a}, y_{b} \sum φ_{a} (y_{a}, u_{a}) [y_{b} = (y_{a}, u_{a}, y)] g (y_{b})

g (y_{b}) = u_{b} \sum φ_{b} (y_{b}, u_{b}) W_{b} (y_{b} ∣ u_{b}) .

g (y_{b}) = u_{b} \sum φ_{b} (y_{b}, u_{b}) W_{b} (y_{b} ∣ u_{b}) .

1 - P {E_{a} \cup E_{b}}

1 - P {E_{a} \cup E_{b}}

= \frac{1}{4} u_{a}, y_{a} \sum u_{b}, y_{b} \sum W_{a, b} (y_{a}, y_{b} ∣ u_{a}, u_{b}) φ_{a} (y_{a}, u_{a}) φ_{b} (y_{b}, u_{b})

= \frac{1}{2} u_{a}, y_{a} \sum φ_{a} (y_{a}, u_{a}) \cdot \frac{1}{2} u_{b}, y_{b} \sum W_{a, b} (y_{a}, y_{b} ∣ u_{a}, u_{b}) φ_{b} (y_{b}, u_{b})

= \frac{1}{2} u_{a}, y_{a} \sum T (y_{a} ∣ u_{a}) φ_{a} (y_{a}, u_{a}),

T (y_{a} ∣ u_{a}) = y \sum u_{b} \sum W_{b} (y_{a}, u_{a}, y ∣ u_{b}) φ_{b} (y_{b}, u_{b}) = y \sum u_{b} max W_{b} (y_{a}, u_{a}, y ∣ u_{b}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsError Correcting Code Techniques · Advanced Wireless Communication Techniques · DNA and Biological Computing

Full text

A Lower Bound on the Probability of Error of Polar Codes over BMS Channels

Boaz Shuval, Ido Tal

Department of Electrical Engineering,

Technion, Haifa 32000, Israel.

Email: {bshuval@campus, idotal@ee}.technion.ac.il An abbreviated version of this article, with the proofs omitted, has appeared in ISIT 2017.

Abstract

Polar codes are a family of capacity-achieving codes that have explicit and low-complexity construction, encoding, and decoding algorithms. Decoding of polar codes is based on the successive-cancellation decoder, which decodes in a bit-wise manner. A decoding error occurs when at least one bit is erroneously decoded. The various codeword bits are correlated, yet performance analysis of polar codes ignores this dependence: the upper bound is based on the union bound, and the lower bound is based on the worst-performing bit. Improvement of the lower bound is afforded by considering error probabilities of two bits simultaneously. These are difficult to compute explicitly due to the large alphabet size inherent to polar codes. In this research we propose a method to lower-bound the error probabilities of bit pairs. We develop several transformations on pairs of synthetic channels that make the resultant synthetic channels amenable to alphabet reduction. Our method yields lower bounds that significantly improve upon currently known lower bounds for polar codes under successive-cancellation decoding.

Index Terms:

Channel polarization, channel upgrading, lower bounds, polar codes, probability of error.

I Introduction

Polar codes [1] are a family of codes that achieve capacity on binary, memoryless, symmetric (BMS) channels and have low-complexity construction, encoding, and decoding algorithms. This is the setting we consider. Polar codes have since been extended to a variety of settings including source-coding [2, 3], non-binary channels [4], asymmetric channels [5], settings with memory [6, 7, 8], and more. The probability of error of polar codes is given by a union of correlated error events. The union bound, which ignores this correlation, is used to upper-bound the error probability. In this work, we exploit the correlation between error events to develop a general method for lower-bounding the probability of error of polar codes.

Figure 1 shows a numerical example of the lower bound developed in this paper. We designed a polar code of length $N=2^{10}=1024$ and rate $R=0.1$ for a Binary Symmetric Channel (BSC) with crossover probability $0.2$ . We plot upper and lower bounds on the probability of error of this code under successive cancellation decoding, when used over BSCs of varying crossover probabilities. Our lower bound significantly improves upon the existing (trivial) lower bound, and is tight over a large range of crossover probabilities.

Our method is based on lower-bounding the probability of correlated error events. It consists of several operations and transformations that we detail throughout this article. A high-level description of the key steps appears at the end of the introduction, once we establish some notation.

Polar codes are based on an iterative construction that transforms $N=2^{n}$ identical and independent channel uses into low-entropy and high-entropy channels. The low-entropy channels are almost noiseless, whereas the high-entropy channels are almost pure noise. Arıkan showed [1] that for every $\epsilon>0$ , as $N\to\infty$ the proportion of channels with capacity greater than $1-\epsilon$ tends to the channel capacity $C$ and the proportion of channels with capacity less than $\epsilon$ tends to $1-C$ .

The polar construction begins with two identical and independent copies of a BMS channel $W$ and transforms them into two new channels,

[TABLE]

Channel $W^{+}$ is a better channel than $W$ whereas channel $W^{-}$ is worse than $W$ .111By this we mean that channel $W^{+}$ can be stochastically degraded to channel $W$ , which in turn can be stochastically degraded to $W^{-}$ . This construction can be repeated multiple times; each time we take two identical copies of a channel, say $W^{+}$ and $W^{+}$ , and polarize them, e.g., to $W^{+-}$ and $W^{++}$ . We call the operation $W\mapsto W^{-}$ a ‘ $-$ ’-transform, and the operation $W\mapsto W^{+}$ a ‘ $+$ ’-transform.

There are $N=2^{n}$ possible combinations of $n$ ‘ $-$ ’- and ‘ $+$ ’-transforms; we define channel $W_{a}$ as follows. Let $\langle\alpha_{1},\alpha_{2},\ldots,\alpha_{n}\rangle$ be the binary expansion of $a-1$ , where $\alpha_{1}$ is the most significant bit (MSB). Then, channel $W_{a}$ is obtained by $n$ transforms of $W$ according to the sequence $\alpha_{1},\alpha_{2},\ldots,\alpha_{n}$ , starting with the MSB: if $\alpha_{j}=0$ we do a ‘ $-$ ’-transform and if $\alpha_{j}=1$ we do a ‘ $+$ ’-transform. For example, if $n=3$ , channel $W_{5}$ is $W^{+--}$ , i.e., it first undergoes a ‘ $+$ ’-transform and then two ‘ $-$ ’-transforms.

Overall, we obtain $N$ channels $W_{1},\ldots,W_{N}$ ; channel $W_{a}$ has input $u_{a}$ and output $y_{1},\ldots,y_{N},u_{1},\ldots,u_{a-1}$ . That is, channel $W_{a}$ has binary input $u_{a}$ , output that consists of the output and input of channel $W_{a-1}$ , and assumes that the input bits of future channels $u_{a+1},\ldots,u_{N}$ are uniform. We call these synthetic channels. One then determines which synthetic channels are low-entropy and which are high-entropy, and transmits information over the low-entropy synthetic channels and predetermined values over the high-entropy synthetic channels. Since the values transmitted over the latter are predetermined, we call the high-entropy synthetic channels frozen.

Decoding is accomplished via the successive-cancellation (SC) decoder. It decodes the synthetic channels in succession, using previous bit decisions as part of the output. The bit decision for a synthetic channel is either based on its likelihood or, if it is frozen, on its predetermined value. That is, denoting the set of non-frozen synthetic channels by $\mathcal{A}$ ,

[TABLE]

where we denoted $y_{1}^{N}=y_{1},\ldots,y_{N}$ and similarly for the previous bit decisions $\hat{u}_{1}^{a-1}$ . As non-frozen synthetic channels are almost noiseless, previous bit decisions are assumed to be correct. Thus, when $N$ is sufficiently large, this scheme can be shown to achieve capacity [1], as the proportion of almost noiseless channels is $C$ .

To analyze the performance of polar codes, let $\mathcal{B}_{a}$ denote the event that channel $W_{a}$ errs under SC decoding while channels $1,2,\ldots,a-1$ do not. That is,

[TABLE]

The probability of error of polar codes under SC decoding is given by $\mathbb{P}\left\{\bigcup_{a\in\mathcal{A}}\mathcal{B}_{a}\right\}$ . Let $\mathcal{E}_{a}$ denote the event that channel $W_{a}$ errs given that a genie had revealed to it the true previous bits, i.e.

[TABLE]

We call an SC decoder with access to genie-provided previous bits a genie-aided decoder. Some thought reveals that $\bigcup_{a\in\mathcal{A}}\mathcal{B}_{a}=\bigcup_{a\in\mathcal{A}}\mathcal{E}_{a}$ (see [4, Proposition 2.1] or [10, Lemma 1]). Thus, the probability of error of polar codes under SC decoding is equivalently given by $P_{e}^{\textrm{{SC}}}(W)=\mathbb{P}\left\{\bigcup_{a\in\mathcal{A}}\mathcal{E}_{a}\right\}$ . In the sequel we assume a genie-aided decoder.

The events $\{\mathcal{B}_{a}\}$ are disjoint but difficult to analyze. The events $\mathcal{E}_{a}$ are easier to analyze, but are no longer disjoint. A straightforward upper bound for $\mathbb{P}\left\{\bigcup_{a\in\mathcal{A}}\mathcal{E}_{a}\right\}$ is the union bound:

[TABLE]

This bound facilitated the analysis of [1]. An important question is how tight this upper bound is. To this end, one approach is to develop a lower bound to $\mathbb{P}\left\{\bigcup_{a\in\mathcal{A}}\mathcal{E}_{a}\right\}$ , which is what we pursue in this work.

A trivial lower bound on a union is

[TABLE]

Better lower bounds may be obtained by considering pairs of error events:

[TABLE]

Via the inclusion-exclusion principle, one can combine lower bounds on multiple pairs of error events to obtain a better lower bound [11]

[TABLE]

This can also be cast in terms of unions of error events using $\mathbb{P}\left\{\mathcal{E}_{a}\cap\mathcal{E}_{b}\right\}=\mathbb{P}\left\{\mathcal{E}_{a}\right\}+\mathbb{P}\left\{\mathcal{E}_{b}\right\}-\mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\}$ .

To our knowledge, to date there have been two attempts to compute a lower bound on the performance of the SC decoder, both based on (4). The first attempt was in [10], using a density evolution approach, and the second attempt in [12] applies only to the Binary Erasure Channel (BEC). We briefly introduce these below, but first we explain where the difficulty lies.

The probability $\mathbb{P}\left\{\mathcal{E}_{a}\right\}$ is given by an appropriate functional of the probability distribution of synthetic channel $W_{a}$ . However, the output alphabet of $W_{a}$ is very large. If the output alphabet of $W$ is $\mathcal{Y}$ then the output alphabet of $W_{a}$ has size $|\mathcal{Y}|^{N}2^{a-1}$ . This quickly grows unwieldy, recalling that $N=2^{n}$ . It is infeasible to store this probability distribution and it must be approximated. Such approximations are the subject of [9]; they enable one to compute upper and lower bounds on various functionals of the synthetic channel $W_{a}$ .

To compute probabilities of unions of events, one must know the joint distribution of two synthetic channels. The size of the joint channel’s output alphabet is the product of each synthetic channel’s alphabet size, rendering the joint distribution infeasible to store.

The authors of [10] suggested to approximate the joint distribution of pairs of synthetic channels using a density evolution approach. This provides an iterative method to compute the joint channel, but does not address the problem of the amount of memory required to store it. Practical implementation of density evolution must involve quantization [13, Appendix B]. The probability of error derived from quantized joint channels approximates, but does not generally bound, the real probability of error. For the special case of the BEC, as noted and analyzed in [10], no quantization is needed, as the polar transform of a BEC is a BEC. Thus, they were able to precisely compute the probabilities of unions of error events of descendants of a BEC using density evolution.

The same bounds for the BEC were developed in [12] using a different approach, again relying on the property that the polar transform of a BEC is a BEC. The authors were able to track the joint probability of erasure during the polarization process. Furthermore, they were able to show that the union bound is asymptotically tight for the BEC.

In this work, we develop an algorithm to compute lower bounds on the joint probability of error of two synthetic channels $\mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\}$ . Our technique is general, and applies to synthetic channels that are polar descendants of any BMS channel. We use these bounds in (4) to lower-bound the probability of error of polar codes. For the special case of the BEC, we recover the results of [10] and [12] using our bounds.

Concretely, consider two synthetic channels, $W_{a}(y_{a}|u_{a})$ and $W_{b}(y_{b}|u_{b})$ , which we call the a-channel and the b-channel, respectively. Their joint channel is $W_{a,b}(y_{a},y_{b}|u_{a},u_{b})$ . Our algorithm lower-bound the probability that a successive cancellation decoder errs on either channel. It is based on the following key steps:

Replace successive cancellation with a different decoding criterion (Section III). 2. 2.

Bring the joint channel to a different form that makes the b-channel decoding immediately apparent from the received symbol (Section IV-A). 3. 3.

Apply the symmetrizing transform, after which the output of the a-channel is independent of the input of the b-channel (Section V). 4. 4.

Apply the upgrade-couple transform, which splits each a-channel output to multiple symbols. However, each such new symbol is constrained to appear with only a small subset of b-channel outputs (Section VI-A). 5. 5.

Reduce each channel’s alphabet size. This is done by stochastically upgrading one channel while keeping the other channel constant. Each channel has a different upgrading procedure; the a-channel upgrading procedure is detailed in Section VI-A, and the b-channel upgrading procedure is detailed in Section VI-B.

II Overview of Our Method

In this section we provide a brief overview of our method, and lay out the groundwork for the sections that follow. We aim to produce a lower bound on the probability of error of two synthetic channels. Since we cannot know the precise joint distribution, we must approximate it. The approximation is rooted in stochastic degradation.

Degradation is a partial ordering of channels. Let $W(y|u)$ and $Q(z|u)$ be two channels. We say that $W$ is (stochastically) degraded with respect to $Q$ , denoted $W\preccurlyeq Q$ , when there exists some channel $P(y|z)$ such that

[TABLE]

If $W$ is degraded with respect to $Q$ then $Q$ is upgraded with respect to $W$ . Degradation implies an ordering on the probability of error of the channels [13, Chapter 4]: if $W\preccurlyeq Q$ then $P_{e}^{\star}(W)\geq P_{e}^{\star}(Q)$ , where $P_{e}^{\star}$ denotes the probability of error of the optimal decoder (defined in Section III-A).

The notion of degradation readily applies to joint channels. If $W_{a,b}(y_{a},y_{b}|u_{a},u_{b})$ and $Q_{a,b}(z_{a},z_{b}|u_{a},u_{b})$ are two joint channels, we say that $Q_{a,b}(z_{a},z_{b}|u_{a},u_{b})\succcurlyeq W_{a,b}(y_{a},y_{b}|u_{a},u_{b})$ via some degrading channel $P(y_{a},y_{b}|z_{a},z_{b})$ if

[TABLE]

As for the single channel case, if $Q_{a,b}\succcurlyeq W_{a,b}$ then $P_{e}^{\star}(W_{a,b})\geq P_{e}^{\star}(Q_{a,b})$ , where $P_{e}^{\star}$ is the probability of error of the optimal decoder for the joint channel. Indeed our approach will be to approximate the joint synthetic channel with an upgraded joint channel with smaller output alphabet. There is a snag, however: this ordering of error probabilities does not hold, in general, for suboptimal decoders.

The SC decoder, used for polar codes, is suboptimal. In the genie-aided case, which we consider here, it is equivalent to performing a maximum likelihood decision on each marginal separately. We shall demonstrate the suboptimality of the SC decoder in Section III. Then, we will develop a different decoding criterion whose performance lower-bounds the SC decoder performance and is ordered by degradation. While in general finding this decoder requires an exhaustive search, for the special case of polar codes this decoder is easily found. It does, however, imply a special structure for the degrading channel, which we use to our advantage.

We investigate the joint distribution of two synthetic channels in LABEL:sec_properties_of_joint_bitchannels. We first bring it to a more convenient form that will be used in the sequel. Then, we explain how to polarize a joint synthetic channel distribution and explore some consequences of symmetry. Further consequences of symmetry are the subject of LABEL:sec_Symmetrized_JointBit-Channels, in which we transform the channel to another form that greatly simplifies the steps that follow. This form exposes the inherent structure of the joint channel.

How to actually upgrade joint channels is the subject of LABEL:sec_upgrading_procedures_for_jointbit_channels. We upgrade the joint channel in two ways; each upgrades one marginal without changing the other. We cannot simply upgrade the marginals, as we must consider the joint channel as a whole. This is where the above-mentioned symmetrizing and upgrade-couple transforms come into play.

We present our algorithm for lower-bounding the probability of error of polar codes in Section VII. This algorithm is based on the building blocks presented in the previous sections. Details of our implementation appears in Section VIII. We demonstrate our algorithm with some numerical results in LABEL:sec_numericalresults, and conclude with a short discussion in Section X.

II-A Notation

We denote by $y_{j}^{k}=y_{j},y_{j+1},\ldots,y_{k}$ for $j<k$ . We use an Iverson-style notation (see [14]) for indicator (characteristic) functions. That is, for a logical expression $\mathtt{expr}$ , $\left[\mathtt{expr}\right]$ is [math] whenever $\mathtt{expr}$ is not true and is $1$ otherwise. We assume that the indicator function takes precedence whenever it appears, e.g., $n^{-1}\left[n>0\right]$ is [math] for $n=0$ .

III Decoding of Two Dependent Channels

In this section, we tackle decoding of two dependent channels. We explain how this differs from the case of decoding a single channel, and dispel some misconceptions that may arise. We then specialize the discussion to polar codes. We explain the difficulty with combining the SC decoder with degradation procedures, and develop a different decoding criterion instead. Finally, we develop a special structure for the degrading channel that, combined with the decoding criterion, implies ordering of probability of error by degradation.

III-A General Case

A decoder for channel $W:\mathcal{U}\to\mathcal{Y}$ is a mapping $\phi$ that maps every output symbol $y\in\mathcal{Y}$ to some $u\in\mathcal{U}$ . The average probability of error of the decoder for equiprobable inputs is given by

[TABLE]

The decoder is deterministic for symbols $y$ for which $\mathbb{P}\left\{\phi(y)\neq u\right\}$ assumes only the values [math] and $1$ . For some symbols, however, we allow the decoder to make a random decision. If $W(y|u)=W(y|u^{\prime})$ for some $u,u^{\prime}\in\mathcal{U}$ , then $P_{e}(W)$ is the same whether $\phi(y)=u$ or $\phi(y)=u^{\prime}$ . Thus, the probability of error is insensitive to the resolution of ties. We denote the error event of a decoder by $\mathcal{E}=\left\{(u,y):\phi(y)\neq u\right\}.$ It is dependent on the decoder, i.e., $\mathcal{E}=\mathcal{E}(\phi)$ ; we suppress this to avoid cumbersome notation. Clearly, $P_{e}(W)=\mathbb{P}\left\{\mathcal{E}\right\}$ .

The maximum-likelihood (ML) decoder, well known to minimize $P_{e}(W)$ when the input bits are equiprobable, is defined by

[TABLE]

The ML decoder is not unique, as it does not define how ties are resolved. In the absence of ties, the ML decoding rule is $\phi(y)=\operatorname*{arg\,max}_{u}W(y|u)$ . We denote by $P_{e}^{\textrm{{ML}}}(W)$ the probability of error of the ML decoder.

We now consider two dependent binary-input channels, $W_{a}:\mathcal{U}\to\mathcal{Y}_{a}$ and $W_{b}:\mathcal{U}\to\mathcal{Y}_{b}$ , with joint distribution $W_{a,b}:\mathcal{U}\times\mathcal{U}\to\mathcal{Y}_{a}\times\mathcal{Y}_{b}$ . A decoder is a mapping $\phi:\mathcal{Y}_{a}\times\mathcal{Y}_{b}\to\mathcal{U}\times\mathcal{U}$ . The joint probability of error of the decoder is, as above,

[TABLE]

An optimal decoder for the joint channel considers both outputs together and makes a decision for both inputs jointly, to minimize $P_{e}(W_{a,b})$ . We denote its probability of error by $P_{e}^{\star}(W_{a,b})$ . When the input bits are equiprobable, $P_{e}^{\star}(W_{a,b})=P_{e}^{\textrm{{ML}}}(W_{a,b})$ .

Rather than jointly decoding the input bits based on the joint output, we may opt to decode each marginal channel separately. That is, consider decoders of the form $\phi(y_{a},y_{b})=(\phi_{a}(y_{a}),\phi_{b}(y_{b}))$ . In words, the decoder of channel $W_{a}$ bases its decision solely on $y_{a}$ and completely ignores $y_{b}$ and vice versa. What are the optimal decoders $\phi_{a}$ and $\phi_{b}$ ? The answer depends on the criterion of optimality.

Denote by $\mathcal{E}_{i}$ the error event of channel $W_{i}$ under some decoder $\phi_{i}:\mathcal{Y}_{i}\to\mathcal{U}$ . The *Individual Maximum Likelihood *(IML) decoder minimizes each individual marginal channel’s probability of error. That is, we set $\phi_{a}$ and $\phi_{b}$ as ML decoders for their respective marginal channels. We denote its joint probability of error by $P_{e}^{\textrm{{IML}}}(W_{a,b})$ . Hence, $P_{e}^{\textrm{{IML}}}(W_{a,b})$ is computed by (8), with $\phi(y_{a},y_{b})=(\phi_{a}^{\textrm{{ML}}}(y_{a}),\phi_{b}^{\textrm{{ML}}}(y_{b}))$ , where $\phi_{a}^{\textrm{{ML}}}$ and $\phi_{b}^{\textrm{{ML}}}$ are ML decoders for the marginal channels $W_{a}$ and $W_{b}$ , respectively.

Another criterion is to minimize $\mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\}$ , the probability that at least one of the decoders makes an error. We call the decoder that minimizes this probability using individual decoders for each channel the Individual Minimum Joint Probability of error (IMJP) decoder. The event $\mathcal{E}_{a}\cup\mathcal{E}_{b}$ is not the same as the error event of the optimal decoder for the joint channel, even when the individual decoders turn out to be ML decoders. This is because we decode each input bit separately using only a portion of the joint output. Clearly,

[TABLE]

We denote

[TABLE]

The three decoders in (9) successively use less information for their decisions. The optimal decoder uses both outputs jointly as well as knowledge of the joint probability distribution; the IMJP decoder retains the knowledge of the joint probability distribution, but uses each output separately; finally, the IML decoder dispenses with the joint probability distribution and operates as if the marginals are independent channels.

Example 1.

The conditional distribution $W_{a,b}(y_{a},y_{b}|u_{a},u_{b})$ of some joint channel is given in Table I.222This is not a joint distribution of two synthetic channels that result from polarization. However, the phenomena observed here hold for joint distributions of two synthetic channels as well, and similar examples may be constructed for the polar case. The marginals are channels $W_{a}(y_{a}|u_{a})$ and $W_{b}(y_{b}|u_{b})$ . Three decoders for this channel are shown in Table II. Note that for the IML and IMJP decoders we have $\phi(y_{a},y_{b})=(\phi_{a}(y_{a}),\phi_{b}(y_{b}))$ .

The optimal decoder for the joint channel chooses, for each output pair, the input pair with the highest probability. The IML decoder is formed by using an ML decoder for each marginal; the ML decoders of the marginals decide that the input is [math] when $1$ is received and vice versa. The IMJP decoder is found by checking all combinations of marginal channel decoders $\phi_{a}$ and $\phi_{b}$ and choosing that pair the achieves $\min_{\phi_{a},\phi_{b}}\mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\}$ . We then have

[TABLE]

As expected, (9) holds.

We now demonstrate that the probability of error of suboptimal decoders is not ordered by degradation. To this end, we degrade the joint channel in Table I by merging the output symbols $(0,0),(1,1)$ into a new symbol, $(0^{\prime},0^{\prime})$ and $(0,1),(1,0)$ into a new symbol, $(1^{\prime},1^{\prime})$ . We denote the new joint channel by $W^{\prime}_{a,b}$ and provide its conditional distribution in Table III. For each of the marginals, the ML decoder declares [math] upon receipt of $0^{\prime}$ , and $1$ otherwise. Hence, for the degraded channel, $P_{e}^{\textrm{{IML}}}(W^{\prime}_{a,b})=1-(0.92+0.86)/4=0.555$ , which is lower than $P_{e}^{\textrm{{IML}}}(W_{a,b})$ . For the degraded channel, the IML decoder is also the optimal decoder. As this is a degraded channel, however, $P_{e}^{\textrm{{IML}}}(W^{\prime}_{a,b})=P_{e}^{\star}(W^{\prime}_{a,b})\geq P_{e}^{\star}(W_{a,b})=0.52$ .

III-B Polar Coding Setting

Given a joint channel, finding an optimal or IML decoder is an easy task. In both cases we use maximum-likelihood decoders; in the first case based on the joint channel, whereas in the second case based on the marginal channels. On the other hand, finding an IMJP decoder requires an exhaustive search, which may be costly. In the polar coding setting, as we now show, the special structure of joint synthetic channels permits finding the IMJP decoder without resorting to a search procedure.

III-B1 Joint Distribution of Two Synthetic Channels

Let $W$ be some BMS channel that undergoes $n$ polarization steps. Let $a$ and $b$ be two indices of synthetic channels, where $b>a$ . The synthetic channels are $W_{a}(y_{a}|u_{a})$ and $W_{b}(y_{b}|u_{b})$ , where $y_{a}=(y_{1}^{N},u_{1}^{a-1})$ , $y_{b}=(y_{1}^{N},u_{1}^{b-1})$ , and $N=2^{n}$ . We call them the a-channel and the b-channel, respectively. Their joint distribution is $W_{a,b}(y_{a},y_{b}|u_{a},u_{b})$ ; this is the probability that the output of the a-channel is $y_{a}$ and the output of the b-channel is $y_{b}$ , given that the inputs to the channels are $u_{a}$ and $u_{b}$ , respectively.

With probability $1$ , the prefix of $y_{b}$ is $(y_{a},u_{a})$ . Namely, $y_{b}$ has the form

[TABLE]

where $y$ denotes the remainder of $y_{b}$ after removing $y_{a}$ and $u_{a}$ . Thus,

[TABLE]

for some arbitrary $y$ . The factor $2$ stems from the uniform distribution of $u_{a}$ . With some abuse of notation, we will write

[TABLE]

The rightmost expression makes it clear that the portion of $y_{b}$ in which the input of the a-channel appears must equal the actual input of the a-channel.

Observe from (10) that we can think of $W_{b}(y_{a},u_{a},y|u_{b})$ as the joint channel $W_{a,b}$ up to a constant factor. Indeed, we will use $W_{b}(y_{a},u_{a},y|u_{b})$ to denote the joint channel where convenient.

III-B2 Decoders for Joint Synthetic Channels

Which decoders can we consider for joint synthetic channels? The optimal decoder extracts $u_{a}$ from the output of the b-channel and proceeds to decode $u_{b}$ . This outperforms the SC decoder but is also impractical and does not lend itself to computing the probability that is of interest to us, the probability that either of the synthetic channels errs. A natural suggestion is to mimic the SC decoder, i.e., to use an IML decoder. The joint probability of error of this decoder may decrease after stochastic degradation, so we discard this option.

Consider two decoders $\phi_{a}$ and $\phi_{b}$ for channels $W_{a}$ and $W_{b}$ , respectively. As above, $\mathcal{E}_{i}$ is the error event of channel $W_{i}$ using decoder $\phi_{i}$ , $i=a,b$ . We seek a lower bound on $\mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\}$ . Therefore, we choose decoders $\phi_{a}$ and $\phi_{b}$ that minimize $\mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\}$ ; this is none other than the IMJP decoder. Its performance lower-bounds that of the IML decoder [see (9)]. As we shall later see, combined with a suitable degrading channel structure, the probability of error of the IMJP decoder increases after stochastic degradation. Conversely, it decreases under stochastic upgradation; thus, combining the IMJP decoder with a suitable upgrading procedure produces the desired lower bound.

Multiple decoders may achieve $\min_{\phi_{a},\phi_{b}}\mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\}$ . One decoder can be found in a straight-forward manner; we call it the IMJP decoder. The following theorem shows how to find it. Its proof is a direct consequence of Lemmas 3 and 4 that follow.

Theorem 1.

Let $W_{a}(y_{a}|u_{a})$ and $W_{b}(y_{b}|u_{b})$ be two channels with joint distribution $W_{a,b}$ that satisfies (10). Then, $\min_{\phi_{a},\phi_{b}}\mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\}$ is achieved by setting $\phi_{b}$ as an ML decoder for $W_{b}$ and $\phi_{a}$ according to

[TABLE]

where

[TABLE]

Note that $T(y_{a}|u_{a})$ is not a conditional distribution; it is non-negative, but its sum over $y_{a}$ does not necessarily equal $1$ . In the right-hand side of (12), the dependence on $y_{a},u_{a}$ is via (10), as $W_{a,b}(y_{a},y_{b}|u_{a},u_{b})=0$ if $y_{b}\neq(y_{a},u_{a},y)$ for some $y$ .

Corollary 2.

Theorem 1* holds for any two synthetic channels $W_{a}(y_{a}|u_{a})$ and $W_{b}(y_{b}|u_{b})$ that result from the same number of polarization steps of a BMS, where index $b$ is greater than $a$ .*

Proof:

In the polar code case, the joint channel satisfies (10), so Theorem 1 applies. ∎

In what follows, denote

[TABLE]

Lemma 3.

Let $W_{a}(y_{a}|u_{a})$ and $W_{b}(y_{b}|u_{b})$ be two dependent binary-input channels with equiprobable inputs and joint distribution $W_{a,b}$ that satisfies (10). Let $\phi_{a}:\mathcal{Y}_{a}\to\mathcal{U}$ be some decoder for channel $W_{a}$ with error event $\mathcal{E}_{a}$ . Then, setting $\phi_{b}$ as an ML decoder for $W_{b}$ achieves $\min_{\phi_{b}}\mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\}$ .

Proof:

Recall that $y_{b}=(y_{a},u_{a},y)$ . Using (10),

[TABLE]

where

[TABLE]

The problem of finding the decoder $\phi_{b}$ that minimizes $\mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\}$ is separable over $u_{a},y_{a},y_{b}$ ; the terms $\varphi_{a}(y_{a},u_{a})$ , $\left[y_{b}=(y_{a},u_{a},y)\right]$ are non-negative and independent of $u_{b}$ . Therefore, the optimal decoder $\phi_{b}$ is given by $\phi_{b}(y_{b})=\arg\max_{u^{\prime}_{b}}W_{b}(y_{b}|u^{\prime}_{b}).$ ∎ We remark that Lemma 3 holds for any a-channel decoder $\phi_{a}$ . Thus, regardless of the selection of $\phi_{a}$ , the optimal decoder for the b-channel (in the sense of minimizing $\min_{\phi_{b}}\mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\}$ ) is an ML decoder.

Lemma 4.

Let $W_{a}(y_{a}|u_{a})$ and $W_{b}(y_{b}|u_{b})$ be two binary-input channels with joint distribution $W_{a,b}(y_{a},y_{b}|u_{a},u_{b})$ and equiprobable inputs. Let $\phi_{b}:\mathcal{Y}_{b}\to\mathcal{U}$ be some decoder for channel $W_{b}$ . Then, the decoder $\phi_{a}$ for channel $W_{a}$ given by (11) minimizes $\mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\}$ .

Proof:

Since the input is equiprobable,

[TABLE]

where the last equality is by (12). The problem of finding the decoder $\phi_{a}$ that minimizes $\mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\}$ is separable over $y_{a}$ ; clearly the optimal decoder is the one that sets $\phi_{a}(y_{a})=\arg\max_{u^{\prime}_{a}}T(y_{a}|u^{\prime}_{a}).$ ∎

Using (10), if $\phi_{b}$ is chosen as an ML decoder, as per Lemma 3, we have the following expression for $T(y_{a}|u_{a})$ :

[TABLE]

The IMJP and IML decoders do not coincide in general, although in some cases they may indeed coincide. We demonstrate this in the following example.

Example 2.

Let $W$ be a binary symmetric channel with crossover probability $p$ . We perform $n=2$ polarization steps and consider the joint channel $W_{1,4}$ , i.e., $W_{a}=W^{--}$ and $W_{b}=W^{++}$ . When $p=0.4$ , we have $0.6544=P_{e}^{\textrm{{IMJP}}}(W_{1,4})<P_{e}^{\textrm{{IML}}}(W_{1,4})=0.6976$ . On the other hand, when $p=0.2$ , the IMJP and IML decoders coincide, and $P_{e}^{\textrm{{IMJP}}}(W_{1,4})=P_{e}^{\textrm{{IML}}}(W_{1,4})=0.5136$ . In either case, (9) holds.

*Remark 1**.*

In the special case where $W$ is a BEC and $W_{a}$ and $W_{b}$ are two of its polar descendants, the IMJP and IML (SC) decoders coincide. This is thanks to a special property of the BEC that erasures for a synthetic channel are determined by the outputs of the $N=2^{n}$ copies of a BEC, regardless of the inputs of previous synthetic channels. We show this in Appendix A.

III-B3 Proper Degrading Channels

The IMJP decoder is attractive for joint polar synthetic channels since, by Theorem 1, we can efficiently compute it. This was made possible by the successive form of the joint channel (10). Thus, we seek degrading channels that maintain this form.

Let $W_{a,b}(y_{a},y_{b}|u_{a},u_{b})$ be a joint distribution of two synthetic channels and let $Q_{a,b}(z_{a},z_{b}|u_{a},u_{b})\succcurlyeq W_{a,b}(y_{a},y_{b}|u_{a},u_{b})$ . The marginal channels of $Q_{a,b}$ are $Q_{a}(z_{a}|u_{a})$ and $Q_{b}(z_{b}|u_{b})$ . The most general degrading channel is of the form

[TABLE]

where $P_{1}$ and $P_{2}$ are probability distributions. This form does not preserve the successive structure of joint synthetic channels (10). Even if $Q_{a,b}$ satisfies (10), the resulting $W_{a,b}$ may not. To this end, we turn to a subset of degrading channels. Recalling that $y_{b}=(y_{a},u_{a},y)$ , we consider degrading channels of the form

[TABLE]

That is, these degrading channels degrade $z_{a}$ , the output of $Q_{a}$ , to $y_{a}$ , pass $u_{a}$ unchanged, and degrade $z$ , the remainder of $Q_{b}$ ’s output, to $y$ . For this to be a valid channel, $P_{a}$ and $P_{b}$ must be probability distributions. This degrading channel structure is illustrated in Figure 2. By construction, degrading channels of the form (14) preserve the form (10) that is required for efficiently computing the IMJP decoder as in Theorem 1.

Definition 1 (Proper degrading channels).

A degrading channel of the form (14) is called proper. We write $Q\overset{p}{\succcurlyeq}W$ to denote that channel $Q$ is upgraded from $W$ with a proper degrading channel. We say that an upgrading (degrading) procedure is proper if its degrading channel is proper.

By marginalizing the joint channel it is straight-forward to deduce the following for joint synthetic channel distributions.

Lemma 5.

If $Q_{a,b}(z_{a},u_{a},z|u_{a},u_{b})\overset{p}{\succcurlyeq}W_{a,b}(y_{a},u_{a},y|u_{a},u_{b})$ , then $Q_{a}(z_{a}|u_{a})\succcurlyeq W_{a}(y_{a}|u_{a})$ and $Q_{b}(z_{a},u_{a},z|u_{b})\succcurlyeq W_{b}(y_{a},u_{a},y|u_{b})$ .

This lemma is encouraging, but insufficient for our purposes. It is easy to take degrading channels that are used for degrading a single (not joint) synthetic channel and cast them into a proper degrading channel for joint channels. This, however, is not our goal. Instead, we start with $W_{a,b}$ and seek an upgraded $Q_{a,b}$ with smaller output alphabet that can be degraded to $W_{a,b}$ using a proper degrading channel. This is a very different problem than the degrading one, and its solution is not immediately apparent. Plain-vanilla attempts to use upgrading procedures for single channels fail to produce the desired results. Later, we develop proper upgrading procedures that upgrade one of the marginals without changing the other.

We now show that the probability of error of the IMJP decoder does not decrease after degradation by proper degrading channels. Intuitively, this is because the decoder for the original channel can simulate the degrading channel. We denote by $\mathcal{E}^{W}_{a}$ the error event of channel $W_{a}$ under some decoder $\phi_{a}$ , and similarly define $\mathcal{E}^{Q}_{a}$ , $\mathcal{E}^{W}_{b}$ , and $\mathcal{E}^{Q}_{b}$ . Further, we denote by $\phi_{i}$ decoders for $W_{i}$ and by $\psi_{i}$ decoders for $Q_{i}$ , $i=a,b$ .

Lemma 6.

Let joint channel $W_{a,b}(y_{a},u_{a},y|u_{a},u_{b})$ have marginals $W_{a}(y_{a}|u_{a})$ and $W_{b}(y_{a},u_{a},y|u_{b})$ . Assume that $Q_{a,b}(z_{a},u_{a},z|u_{a},u_{b})\overset{p}{\succcurlyeq}W_{a,b}(y_{a},u_{a},y|u_{a},u_{b})$ , then $\min_{\psi_{a},\psi_{b}}\mathbb{P}\left\{\mathcal{E}^{Q}_{a}\cup\mathcal{E}^{Q}_{b}\right\}\leq\min_{\phi_{a},\phi_{b}}\mathbb{P}\left\{\mathcal{E}^{W}_{a}\cup\mathcal{E}^{W}_{b}\right\}$ .

Proof:

The proof follows by noting that for any decoder $\phi_{i}$ , $i=a,b$ we can find a decoder $\psi_{i}$ with identical performance. First consider the decoder for channel $a$ . Denote by $\arg P_{a}(y_{a}|z_{a})$ the result of drawing $y_{a}$ with probability $P_{a}(\cdot|z_{a})$ . Then, the decoder $\psi_{a}$ for $Q_{a}$ , defined as $\psi_{a}(z_{a})=\phi_{a}(\arg P_{a}(y_{a}|z_{a}))$ , has performance identical to $\phi_{a}$ for $W_{a}$ . The decoder $\psi_{a}$ results from first degrading the a-channel output and only then decoding. Next, consider the decoder for the b-channel. Denote by $\arg P_{b}(y|z_{a},u_{a},z,y_{a})$ the result of drawing $y$ with probability $P_{b}(\cdot|z_{a},u_{a},z,y_{a})$ . Then, similar to the a-channel case, the decoder $\psi_{b}$ for $Q_{b}$ , defined as $\psi_{b}(z_{a},u_{a},z)=\phi_{b}(\arg P_{a}(y_{a}|z_{a}),u_{a},\arg P_{b}(y|z_{a},u_{a},z,y_{a}))$ , has performance identical to $\phi_{b}$ for $W_{b}$ . Hence, the best decoder pair $\psi_{a},\psi_{b}$ cannot do worse than the best decoder pair $\phi_{a},\phi_{b}$ . ∎

Let $W$ be a BMS channel that undergoes $n$ polarization steps. The probability of error of a polar code with non-frozen set $\mathcal{A}$ under SC decoding is given by $P_{e}^{\textrm{{SC}}}(W)=\mathbb{P}\left\{\bigcup_{a\in\mathcal{A}}\mathcal{E}_{a}^{\textrm{{ML}}}\right\},$ where $\mathcal{E}_{a}^{\textrm{{ML}}}$ is the error probability of synthetic channel $W_{a}$ under ML decoding. Obviously, for any $\mathcal{A}^{\prime}\subseteq\mathcal{A}$ ,

[TABLE]

We have already mentioned the simplest such lower bound, $P_{e}^{\textrm{{SC}}}(W)\geq\max_{a\in\mathcal{A}}\mathbb{P}\left\{\mathcal{E}_{a}^{\textrm{{ML}}}\right\}$ . We now show that the IMJP decoder provides a tighter lower bound. To this end, recall that $P_{e}^{\textrm{{IMJP}}}(W_{a,b})=\min_{\phi_{a},\phi_{b}}\mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\},$ where $\mathcal{E}_{i}$ is the probability of error of channel $i$ under decoder $\phi_{i}$ , $i=a,b$ .

Lemma 7.

Let $W$ be a BMS channel that undergoes $n$ polarization steps, and let $\mathcal{A}$ be the non-frozen set. Then,

[TABLE]

Proof:

Using (15), $P_{e}^{\textrm{{SC}}}(W)\geq\max_{a,b\in\mathcal{A}}\mathbb{P}\left\{\mathcal{E}_{a}^{\textrm{{ML}}}\cup\mathcal{E}_{b}^{\textrm{{ML}}}\right\}$ . By definition, the IMJP decoder seeks decoders $\phi_{a}$ and $\phi_{b}$ that minimize the joint probability of error of synthetic channels with indices $a$ and $b$ . Therefore, for any two indices $a$ and $b$ we have $\mathbb{P}\left\{\mathcal{E}_{a}^{\textrm{{ML}}}\cup\mathcal{E}_{b}^{\textrm{{ML}}}\right\}\geq P_{e}^{\textrm{{IMJP}}}(W_{a,b}).$ In particular, this holds for the indices $a,b$ that maximize the right-hand side. This establishes the leftmost inequality of (16).

To establish the rightmost inequality of (16), we first show that for any $a,b$ ,

[TABLE]

To see this, first recall that the IMJP decoder performs ML decoding on the b-channel, yielding $P_{e}^{\textrm{{IMJP}}}(W_{a,b})\geq\mathbb{P}\left\{\mathcal{E}_{b}^{\textrm{{ML}}}\right\}$ . Next, we construct $W^{\prime}_{a,b}\overset{p}{\succcurlyeq}W_{a,b}$ in which the b-channel is noiseless, by augmenting the $y$ portion of the output of $W_{a,b}$ with $u_{b}$ , i.e.,

[TABLE]

Channel $W^{\prime}_{a,b}$ can be degraded to $W_{a,b}$ using a proper degrading channel by omitting $v_{b}$ from the $y$ portion of the output and leaving $y_{a}$ unchanged. Thus, $P_{e}^{\textrm{{IMJP}}}(W_{a,b})\geq P_{e}^{\textrm{{IMJP}}}(W^{\prime}_{a,b})=\mathbb{P}\left\{\mathcal{E}_{a}^{\textrm{{ML}}}\right\}$ .

Finally, denote $a_{0}=\operatorname*{arg\,max}_{a\in\mathcal{A}}\mathbb{P}\left\{\mathcal{E}_{a}^{\textrm{{ML}}}\right\}$ . By (17), for any $c>a_{0}>d$ we have $P_{e}^{\textrm{{IMJP}}}(W_{a_{0},c})\geq\mathbb{P}\left\{\mathcal{E}_{a_{0}}^{\textrm{{ML}}}\right\}$ and $P_{e}^{\textrm{{IMJP}}}(W_{d,a_{0}})\geq\mathbb{P}\left\{\mathcal{E}_{a_{0}}^{\textrm{{ML}}}\right\}$ . Since $\max_{a,b\in\mathcal{A}}P_{e}^{\textrm{{IMJP}}}(W_{a,b})\geq\max_{c,d}\{P_{e}^{\textrm{{IMJP}}}(W_{a_{0},c}),P_{e}^{\textrm{{IMJP}}}(W_{d,a_{0}})\}$ we obtain the proof. ∎

Lemmas 6 and 7 are instrumental for our lower bound, which combines upgrading operations and the IMJP decoder.

IV Properties of Joint Synthetic Channels

In this section, we study the properties of joint synthetic channels. We begin by bringing the joint synthetic channel into an equivalent form where the b-channel’s ML decision is immediately apparent. We then explain how to jointly polarize synthetic channels. Finally, we describe some consequences of symmetry on joint channels and on the IMJP decoder.

IV-A Representation of Joint Synthetic Channel Distribution using $D$ -values

Two channels $W$ and $W^{\prime}$ with the same input alphabet but possibly different output alphabets are called equivalent if $W\succcurlyeq W^{\prime}$ and $W^{\prime}\succcurlyeq W$ . We denote this by $W\equiv W^{\prime}$ . Channel equivalence can cast a channel in a more convenient form. For example, if $W$ is a BMS, one can transform it to an equivalent channel whose output is a sufficient statistic, such as a $D$ -value (see Appendix B), in which case the ML decoder’s decision is immediately apparent.

Let $W_{a,b}(y_{a},u_{a},y|u_{a},u_{b})$ be a joint synthetic channel. Since the joint distribution is determined by the distribution of $W_{b}$ , we can transform $W_{a,b}$ to an equivalent channel in which the b-channel $D$ -value333By “b-channel $D$ -value” we mean the $D$ -value computed for channel $W_{b}$ . Instead of $D$ -values, other sufficient statistics of the b-channel could have been used. In fact, for practical implementation (see Section VIII), we recommend to use likelihood ratios, which offer a superior dynamic range. Our use of $D$ -values in the exposition was prompted by their bounded range: $[-1,1]$ . This simplifies many of the expressions that follow. of symbol $(y_{a},u_{a},y)$ is immediately apparent.

Definition 2 ( $D$ -value representation).

Joint channel $W_{a,b}(y_{a},u_{a},d_{b}|u_{a},u_{b})$ is in $D$ -value representation if the marginal $W_{b}$ satisfies

[TABLE]

We use the same notation $W_{a,b}$ for both the regular and the $D$ -value representations of the joint channel due to their equivalence. The discussion of the various representations of joint channels in Section III-B applies here as well. In particular, we will frequently use $W_{b}(y_{a},u_{a},d_{b}|u_{b})$ to denote the joint synthetic channel distribution.

The following lemma affords a more convenient description of the joint channel, in which, in line with the IMJP decoder, the b-channel’s ML decision is immediately apparent. Moreover, this description greatly simplifies the expressions that follow.

Lemma 8.

Channels $W_{a,b}(y_{a},u_{a},y|u_{a},u_{b})$ and $W_{a,b}(y_{a},u_{a},d_{b}|u_{a},u_{b})$ are equivalent and the degrading channels from one to the other are proper.

Proof:

To establish equivalence we show that each channel is degraded from the other using proper degrading channels. The only portion of interest in (14) is $P_{b}$ , as in either direction $y_{a}$ and $u_{a}$ are unchanged by the degrading channel. Denote by $D_{y_{a},u_{a}}^{d_{b}}$ the set of all symbols $y$ such that the b-channel $D$ -value of $(y_{a},u_{a},y)$ is $d_{b}$ , for fixed $y_{a},u_{a}$ . Then,

[TABLE]

where

[TABLE]

Clearly, the b-channel $D$ -value of $(y_{a},u_{a},d_{b})$ is $d_{b}$ .

On the other hand, by (10) and since all symbols in $D_{y_{a},u_{a}}^{d_{b}}$ share the same b-channel $D$ -value,

[TABLE]

where

[TABLE]

and $W_{b}(y_{a},u_{a},y)=\frac{1}{2}\sum_{u_{b}}W_{b}(y_{a},u_{a},y|u_{b})$ . ∎

*Remark 2**.*

In Section IV-B we will show how to jointly polarize a joint channel $W_{a,b}$ . Even if $W_{a,b}$ is given in $D$ -value representation, the jointly polarized version is not. However, this lemma enables us to convert the jointly polarized distribution to $D$ -value representation. This is possible because Lemma 8 holds for any representation of $W_{a,b}(y_{a},u_{a},y|u_{a},u_{b})$ in which $u_{a},y_{a}$ are the input and output, respectively, of the a-channel, $u_{b}$ is the input of the b-channel, and $(y_{a},u_{a},y)$ is the output of the b-channel. In particular, $y$ need not consist of inputs to channels $W_{a+1},\ldots,W_{b-1}$ .

*Remark 3**.*

At this point the reader may wonder why we have stopped here and not converted the a-channel output to its $D$ -value. The reason is that this constitutes a degrading operation, which is the opposite of what we need. Two a-channel symbols with the same a-channel $D$ -value may have very different meanings for the IMJP decoder. Thus, we cannot combine them to a single symbol without incurring loss.

When the joint channel is in $D$ -value representation, proper degrading channels admit the form

[TABLE]

It is obvious that all properties obtained from degrading channels of the form (14) are retained for degrading channels of the form (20). By Lemma 8, we may assume that the degraded channel is also in $D$ -value representation.

IV-B Polarization for Joint Synthetic Channels

Let $W_{a,b}(y_{a},u_{a},d_{b}|u_{a},u_{b})$ be some joint synthetic channel distribution in $D$ -value representation. Recall that $a$ and $b$ are indices of synthetic channels. For $\alpha,\beta\in\{-,+\}$ , we denote by $a^{\alpha}$ and $b^{\beta}$ the indices of the synthetic channels that result from polar transforms of $W_{a}$ and $W_{b}$ according to $\alpha$ and $\beta$ . That is,

[TABLE]

and a similar relationship holds for $b^{\beta}$ . The resulting joint channel is, thus, $W_{a^{\alpha},b^{\beta}}$ .

Even though $W_{a,b}$ is in $D$ -value representation, after a polarization transform this is no longer the case. Of course, one can always bring the polarized joint channel to an equivalent $D$ -value representation as in Lemma 8.

The polar construction is shown in Figure 3. Here, two independent copies of the joint channel $W_{a,b}$ (in $D$ -value representation) are combined. The inputs and outputs of the a-channel of each copy are denoted explicitly using thicker arrows with hollow tips ( ). For example, for the bottom copy of $W_{a,b}$ , the a-input is $\nu_{a}$ and the a-output is $\eta_{a}$ , whereas the b-input is $(\nu_{b})$ and the b-output is $(\eta_{a},\nu_{a},\delta_{b})$ .

The input $u_{a^{\alpha}}$ and output $y_{a^{\alpha}}$ of $W_{a^{\alpha}}$ are given by

[TABLE]

The input $u_{b^{\beta}}$ and output $y_{b^{\beta}}$ of $W_{b^{\beta}}$ are given by

[TABLE]

Note that $y_{a^{\alpha}}$ and $u_{a^{\alpha}}$ are contained in $y_{b^{\beta}}$ . That is, $y_{b^{\beta}}=(y_{a^{\alpha}},u_{a^{\alpha}},y_{r})$ , where

[TABLE]

Thus, the joint output of both channels is $y_{b^{\beta}}$ .

The distribution of the jointly polarized channel is given by

[TABLE]

where

[TABLE]

We have shown how to generate $W_{a^{\alpha},b^{\beta}}$ from $W_{a,b}$ . Another case of interest is generating $W_{a^{-},a^{+}}$ from $W_{a}$ . Denote the output of $W_{a^{-}}$ by $y_{a^{-}}$ . The output of $W_{a^{+}}$ is $(y_{a^{-}},u_{a})$ . From (10), we need only compute $W_{a^{+}}$ to find $W_{a^{-},a^{+}}$ . This is accomplished by (1).

If two channels are ordered by degradation, so are their polar transforms [3, Lemma 4.7]. That is, if $Q\succcurlyeq W$ then $Q^{-}\succcurlyeq W^{-}$ and $Q^{+}\succcurlyeq W^{+}$ . This is readily extended to joint channels. To this end, for BMS channel $W$ we denote the joint channel formed by its ‘ $-$ ’- and ‘ $+$ ’-transforms by $W_{-,+}$ .

Lemma 9.

Let BMS channel $Q\succcurlyeq W$ . Then $Q_{-,+}\overset{p}{\succcurlyeq}W_{-,+}$ .

Proof:

Using (5) and the definition of $W_{-,+}$ we have

[TABLE]

where $P_{a}(y_{1},y_{2}|z_{1},z_{2})=P(y_{1}|z_{1})P(y_{2}|z_{2})$ is a proper degrading channel. ∎

Lemma 10.

If $Q_{a,b}(z_{a},z_{b}|u_{a},u_{b})\overset{p}{\succcurlyeq}W_{a,b}(y_{a},y_{b}|u_{a},u_{b})$ , then, for $\alpha,\beta\in\{-,+\}$ , $Q_{a^{\alpha},b^{\beta}}\overset{p}{\succcurlyeq}W_{a^{\alpha},b^{\beta}}$ .

Proof:

The proof follows similar lines to the proof of Lemma 9. Expand $W_{a^{\alpha},b^{\beta}}$ using (21) and expand again using the definition of joint degradation with a proper degrading channel. Using the one-to-one mappings between the outputs of the polarized channels and the inputs and outputs of non-polarized channels, the desired results are obtained. The details are mostly technical, and are omitted. ∎

The operational meaning of Lemma 10 is that to compute an upgraded approximation of $W_{a^{\alpha},b^{\beta}}$ we may start with $Q_{a,b}$ , an upgraded approximation of $W_{a,b}$ , and polarize it. The result $Q_{a^{\alpha},b^{\beta}}$ is an upgraded approximation of $W_{a^{\alpha},b^{\beta}}$ . This enables us to iteratively compute upgraded approximations of joint synthetic channels. Whenever the joint synthetic channel exceeds an allotted size, we upgrade it to a joint channel with a smaller alphabet size and continue from there. We make sure to use proper upgrading procedures; this preserves the special structure of the joint channel and enables us to compute a lower bound on the probability of error. In Section VI we derive such upgrading procedures.

Since a sequence of polarization and proper upgrading steps is equivalent to proper upgrading of the overall polarized joint channel, using Lemmas 6 and 7 we obtain that the IMJP decoding error of a joint channel that has undergone multiple polarization and proper upgrading steps lower-bounds the SC decoding error of the joint channel that has undergone only the same polarization steps (without upgrading steps).

IV-C Double Symmetry for Joint Channels

A binary input channel $W(y|u)$ is called symmetric if for every output $y$ there exists a conjugate output $\bar{y}$ such that $W(y|0)=W(\bar{y}|1)$ . We now extend this to joint synthetic channels.

Definition 3 (Double symmetry).

Joint channel $W_{b}(y_{a},u_{a},d_{b}|u_{b})$ exhibits double symmetry if for every $y_{a}$ , $d_{b}$ there exist $y^{(\mathsf{a})}_{a}$ , $y^{(\mathsf{b})}_{a}$ , $y^{(\mathsf{ab})}_{a}$ such that

[TABLE]

We call $(\cdot)^{(\mathsf{a})}$ the a-conjugate; $(\cdot)^{(\mathsf{b})}$ the b-conjugate; and $(\cdot)^{(\mathsf{ab})}$ the ab-conjugate. We can also cast this definition using the regular (non- $D$ -value) representation of joint channels in a straight-forward manner, which we omit here.

Example 3.

Let $W$ be a BMS channel and denote by $W_{-,+}$ the joint channel formed by its ‘ $-$ ’- and ‘ $+$ ’-transforms. What are the a-, b-, and ab-conjugates of the a-channel output $y_{a}$ ? Recall that the output of the a-channel $W^{-}$ consists of the outputs of two copies of $W$ . Denote $y_{a}=(y_{1},y_{2})$ , where $y_{1}$ and $y_{2}$ are two possible outputs of $W$ with conjugates $\bar{y}_{1},\bar{y}_{2}$ , respectively. We then have

[TABLE]

By symmetry of $W$ we obtain $y^{(\mathsf{a})}_{a}=(\bar{y}_{1},y_{2})$ , $y^{(\mathsf{b})}_{a}=(\bar{y}_{1},\bar{y}_{2})$ , and $y^{(\mathsf{ab})}_{a}=(y_{1},\bar{y}_{2})$ . Indeed,

[TABLE]

We leave it to the reader to show that (22) holds for the $D$ -value representation of the joint channel.

Pairs of polar synthetic channels exhibit double symmetry. One can see this directly from symmetry properties of polar synthetic channels, see [1, Proposition 13]. Alternatively, one can use induction to show directly that the polar construction preserves double symmetry; we omit the details. This implies the following Proposition.

Proposition 11.

Let $W_{a,b}$ be the joint distribution of two synthetic channels $W_{a}$ and $W_{b}$ that result from $n$ polarization steps of BMS channel $W$ . Then, $W_{a,b}$ exhibits double symmetry.

The following is a direct consequence of double symmetry.

Lemma 12.

Let $W_{a,b}(y_{a},u_{a},d_{b}|u_{a},u_{b})$ be a joint channel in $D$ -value representation that exhibits double symmetry. Then

For the b-channel, $(y_{a},u_{a},d_{b})$ and $(y^{(\mathsf{a})}_{a},\bar{u}_{a},d_{b})$ have the same b-channel $D$ -value $d_{b}$ . 2. 2.

For the a-channel, $y_{a}$ and $y^{(\mathsf{b})}_{a}$ have the same a-channel $D$ -value $d_{a}$ , and $y^{(\mathsf{a})}_{a}$ and $y^{(\mathsf{ab})}_{a}$ have the same a-channel $D$ -value $-d_{a}$ .

Proof:

The first item is obvious from (22). For the second item, note that

[TABLE]

where $(a)$ is by (22). In the same manner, $y^{(\mathsf{a})}_{a}$ and $y^{(\mathsf{ab})}_{a}$ have the same a-channel $D$ -value, $-d_{a}$ . ∎

Lemma 12 implies that an SC decoder does not distinguish between $y_{a}$ and $y^{(\mathsf{b})}_{a}$ when making its decision for the a-channel. We now show that a similar conclusion holds for the IMJP decoder.

Lemma 13.

Let $y_{a}$ be some output of $W_{a}$ . Then

[TABLE]

Proof:

Theorem 1 holds for joint channels given in $D$ -value representation, $W_{a,b}(y_{a},u_{a},d_{b}|u_{a},u_{b})$ . This is easily seen by following the proof with minor changes. Under the $D$ -value representation, (13) becomes

[TABLE]

The remainder of the proof hinges on double symmetry and follows along similar lines to the proof of Lemma 12, with $W_{a}$ replaced with $T$ and accordingly the sum over $u_{b}$ replaced with a maximum operation over $u_{b}$ . ∎

Lemma 13 implies that the IMJP decoder does not distinguish between $y_{a}$ and $y^{(\mathsf{b})}_{a}$ .

Corollary 14.

Let $\phi_{a}$ be the IMJP decoder for the a-channel. Then $\phi_{a}(y_{a})=\phi_{a}(y^{(\mathsf{b})}_{a})=1-\phi_{a}(y^{(\mathsf{a})}_{a})=1-\phi_{a}(y^{(\mathsf{ab})}_{a}).$

V Symmetrized Joint Synthetic Channels

In this section we introduce the symmetrizing transform. The resultant channel is degraded from the original joint channel yet has the same probability of error. Its main merit is to decouple the a-channel from the b-channel. This simpler structure is the key to upgrading the a-channel, as we shall see in Section VI.

V-A Symmetrized Joint Channel

The SC decoder observes marginal distributions and makes a decision based on the $D$ -value of each synthetic channel’s output. In particular, by Lemma 12, the SC decoder makes the same decision for the a-channel whether its output was $y_{a}$ or $y^{(\mathsf{b})}_{a}$ and the b-channel decision is based on $d_{b}$ without regard to $y_{a}$ . By Corollary 14, the IMJP decoder acts similarly. That is, the IMJP decoder makes the same decision for the a-channel whether its output is $y_{a}$ or $y^{(\mathsf{b})}_{a}$ , and the decision for the b-channel is based solely on $d_{b}$ .

We conclude that if the a-channel were told only whether its output was one of $\{y_{a},y^{(\mathsf{b})}_{a}\}$ , it would make the same decision had it been told its output was, say, $y_{a}$ . This is true for either the SC or IMJP decoder. Consequently, either decoder’s probability of error is unaffected by obscuring the a-channel output in this manner.

This leads us to define a symmetrized version of the joint synthetic channel distribution, $\accentset{\circ}{W}_{a,b}$ , as follows. Let444The order of elements in $\accentset{\circ}{y}_{a}$ and $\bar{\accentset{\circ}{y}}_{a}$ does not matter. That is, $\{y_{a},y^{(\mathsf{b})}_{a}\}$ is a set containing both $y_{a}$ and $y^{(\mathsf{b})}_{a}$ .

[TABLE]

and define

[TABLE]

Lemma 15.

Let $W_{a,b}$ be a joint synthetic channel distribution, and let $\accentset{\circ}{W}_{a,b}$ be its symmetrized version. Then, the probability of error under SC (IMJP) decoding of either channel is identical.

Proof:

By Lemma 12 for the SC decoder or Corollary 14 for the IMJP decoder, if the decoder for the symmetrized channel makes an error for some symbol $\accentset{\circ}{y}_{a}$ then the decoder for the non-symmetrized channel makes an error for both $y_{a}$ and $y^{(\mathsf{b})}_{a}$ , and vice-versa. Therefore, denoting by $\mathcal{E}$ the error indicator of the decoder,

[TABLE]

where $(a)$ is by (24). ∎

The marginal synthetic channels $\accentset{\circ}{W}_{a}$ and $\accentset{\circ}{W}_{b}$ are given by

[TABLE]

Note that by double symmetry

[TABLE]

Definition 4 (Symmetrized distribution).

A joint channel whose marginals satisfy (25) is called symmetrized.

The name ‘symmetrized’ stems from comparison of (25) and (22). We note that Theorem 1 holds for $\accentset{\circ}{W}_{a,b}$ .

A symmetrized joint channel remains symmetrized upon polarization. That is, if $\accentset{\circ}{W}_{a,b}$ is a symmetrized joint channel and $\accentset{\circ}{W}_{a^{\alpha},b^{\beta}}$ , $\alpha,\beta\in\{-,+\}$ is the result of jointly polarizing it (without applying a further symmetrization operation), then the marginals $\accentset{\circ}{W}_{a^{\alpha}}$ and $\accentset{\circ}{W}_{b^{\beta}}$ satisfy (25). This is easily seen from (21) and (25).

Clearly, $\accentset{\circ}{W}_{a,b}$ is degraded with respect to $W_{a,b}$ , exactly the opposite of our main thrust. Nevertheless, as established in Lemma 15, both channels have the same probability of error under SC (IMJP) decoding. Moreover, if we upgrade the symmetrized version of the channel, its probability of error under IMJP decoding lower-bounds the probability of error of the non-symmetrized channel under either SC or IMJP decoding.

What is not immediately obvious, however, is what happens after polarization. That is, if we take a joint channel, symmetrize it, and then polarize it, how does its probability of error compare to the original joint channel that has just undergone polarization? Furthermore, what happens if the symmetrized version undergoes an upgrading transform?

In the following proposition, we provide an answer. To this end, a joint polarization step is a pair $(\alpha,\beta)\in\{-,+\}^{2}$ that denotes which transforms the a-channel and b-channel undergo. For example, the result of joint polarization step $(-,+)$ on joint channel $W_{a,b}$ is the joint channel $W_{a^{-},b^{+}}$ . A sequence $\mathsf{t}$ of such pairs is called a sequence of joint polarization steps. The joint polarization steps are applied in succession: the result of joint polarization of $W_{a,b}$ according to the sequence $\mathsf{t}=\{(\alpha_{1},\beta_{1}),(\alpha_{2},\beta_{2}),(\alpha_{3},\beta_{3}),\ldots,(\alpha_{k},\beta_{k})\}$ is the same as the result of joint polarization of $W_{a^{\alpha_{1}},b^{\beta_{1}}}$ according to the sequence $\mathsf{t}^{\prime}=\{(\alpha_{2},\beta_{2}),(\alpha_{3},\beta_{3}),\ldots,(\alpha_{k},\beta_{k})\}$ .

Proposition 16.

Let $W_{a,b}$ be a joint distribution of two synthetic channels and let $W_{a,b}^{\mathsf{t}}$ denote this joint distribution after a sequence $\mathsf{t}$ of joint polarization steps. Then $P_{e}^{\textrm{{IMJP}}}(W_{a,b}^{\mathsf{t}})\geq P_{e}^{\textrm{{IMJP}}}(\accentset{\circ}{Q}_{a,b}^{\mathsf{t}})$ , where $\accentset{\circ}{Q}_{a,b}^{\mathsf{t}}$ is the distribution of $\accentset{\circ}{W}_{a,b}$ after the same sequence of polarization steps and any number of proper upgrading transforms along the way.

Proof:

Let $W_{a,b}$ be a joint channel with symmetrized version $\accentset{\circ}{W}_{a,b}$ . For $\alpha,\beta\in\{-,+\}$ , denote by $W_{a^{\alpha},b^{\beta}}$ and $\accentset{\circ}{W}_{a^{\alpha},b^{\beta}}$ the polarized versions of $W_{a,b}$ and $\accentset{\circ}{W}_{a,b}$ , respectively. For the $b^{\beta}$ -channel, the decoder makes the same decision for either $W_{a^{\alpha},b^{\beta}}$ or $\accentset{\circ}{W}_{a^{\alpha},b^{\beta}}$ . This is because the decision is based on the b-channel $D$ -value, which is unaffected by symmetrization [see (24)].

Next, for the $a^{\alpha}$ channel, using on (21) a derivation similar to the proof of Lemma 13, $T(y_{a^{\alpha}}|u_{a^{\alpha}})=T(y^{\prime}_{a^{\alpha}}|u_{a^{\alpha}})$ , where $y^{\prime}_{a^{\alpha}}$ is any combination of an element of $\accentset{\circ}{y}_{a}$ and an element of $\accentset{\circ}{\eta}_{a}$ . That is, $y^{\prime}_{a^{\alpha}}$ is any one of $\{y_{a},\eta_{a}\}$ , $\{y^{(\mathsf{b})}_{a},\eta_{a}\}$ , $\{y_{a},\eta^{(\mathsf{b})}_{a}\}$ , and $\{y^{(\mathsf{b})}_{a},\eta^{(\mathsf{b})}_{a}\}$ . Thus, the IMJP decoder makes the same decision for the $a^{\alpha}$ -channel for either $W_{a^{\alpha},b^{\beta}}$ or $\accentset{\circ}{W}_{a^{\alpha},b^{\beta}}$ .

We compare the channels obtained by the following two procedures.

•

Procedure 1: Joint channel $W_{a,b}$ goes through sequence $\mathsf{t}$ of polarization steps.

•

Procedure 2: Joint channel $W_{a,b}$ is symmetrized to form $\accentset{\circ}{W}_{a,b}$ . It goes through sequence $\mathsf{t}$ of polarization steps (without any further symmetrization operations).

We iteratively apply the above reasoning and conclude in a similar manner to Lemma 15 that both channels have the same performance under IMJP decoding. Next, we modify Procedure 2.

•

Procedure 2a: Joint channel $W_{a,b}$ is symmetrized to form $\accentset{\circ}{W}_{a,b}$ . It goes through sequence $\mathsf{t}$ of polarization steps (without any further symmetrization operations), but at some point mid-sequence, it undergoes a proper upgrading procedure.

Since polarizing and proper upgrading is equivalent to proper upgrading and polarizing (see Lemma 10) we can assume that the upgrading happens after the entire sequence of polarization steps. Thus, under IMJP decoding, the probability of error of the channel that results from Procedure 2a lower-bounds the probability of error of the channels resulting from Procedures 1 and 2. Similarly, multiple upgrading transforms can also be thought of as occurring after all polarization steps. ∎

Corollary 17.

Let $W$ be a BMS channel that undergoes $n$ polarization steps. Let $W_{a,b}$ be the joint channel of two of its polar descendants such that $a,b\in\mathcal{A}$ , and let $\accentset{\circ}{Q}_{a,b}\overset{p}{\succcurlyeq}\accentset{\circ}{W}_{a,b}$ . Then $P_{e}^{\textrm{{SC}}}(W)\geq P_{e}^{\textrm{{IMJP}}}(\accentset{\circ}{Q}_{a,b})$ .

Proof:

A direct consequence of Lemmas 7 and 6 combined with Proposition 16. ∎

We emphasize that, by Proposition 16, it does not matter how we arrive at $\accentset{\circ}{Q}_{a,b}$ . So long as $\accentset{\circ}{Q}_{a,b}\overset{p}{\succcurlyeq}\accentset{\circ}{W}_{a,b}$ and $a,b\in\mathcal{A}$ , we can use $\accentset{\circ}{Q}_{a,b}$ to obtain a lower bound on $P_{e}^{\textrm{{SC}}}(W)$ . A practical way to obtain $\accentset{\circ}{Q}_{a,b}$ is via multiple proper upgrading operations that we perform after joint polarization operations. This is the route we take in Section VII.

Due to Proposition 16, we henceforth assume that joint channel $W_{a,b}$ is symmetrized, and no longer distinguish symmetrized channels or symbols by the $(\accentset{\circ}{\cdot})$ symbol. Replacing the joint channel with its symmetrized version need only be performed once, at the first instance the two channels go through different polarization transforms.

Implementation: Since symmetrization is performed only once, and since this invariably happens when converting a channel $W$ to $W_{-,+}$ , we find the a-, b-, and ab-conjugates using the results of Example 3. We then form the symmetrized channel using (24). Note that it is sufficient to find just the b-conjugates and use the first equation of (24).

V-B Decomposition of Symmetrized Joint Channels

Let the joint channel be $W_{b}(y_{a},u_{a},d_{b}|u_{b})$ , which, as mentioned above, we assume to be symmetrized. We have

[TABLE]

in which we used the independence and uniformity of the input bits $u_{a}$ and $u_{b}$ . The distribution $W_{1}$ is given by $W_{1}(y_{a}|u_{a},u_{b})=2\sum_{d_{b}}W_{b}(y_{a},u_{a},d_{b}|u_{b}).$ Whenever $W_{1}(y_{a}|u_{a},u_{b})$ is nonzero, distribution $W_{2}(d_{b}|u_{b};y_{a},u_{a})$ is obtained by dividing $W_{b}(y_{a},u_{a},d_{b}|u_{b})$ by $W_{1}(y_{a}|u_{a},u_{b})/2$ . Our notation $W_{2}(d_{b}|u_{b};y_{a},u_{a})$ (with a semicolon, as opposed to $W_{2}(d_{b}|y_{a},u_{a},d_{b})$ ) reminds us that for fixed $y_{a},u_{a}$ , channel $W_{2}$ is a binary-input channel with input $u_{b}$ and output $d_{b}$ . If $W_{1}(y_{a0}|u_{a0},u_{b})=0$ for some $y_{a0},u_{a0}$ , we define $W_{2}(d_{b}|u_{b};y_{a0},u_{a0})$ to be some arbitrary BMS channel, to ensure it is always a valid channel.

Since the joint channel is symmetrized, by (25) we have $W_{1}(y_{a}|u_{a},u_{b})=W_{1}(y_{a}|u_{a},\bar{u}_{b})$ . Hence, for any $u_{b}$ ,

[TABLE]

That is, a consequence of symmetrization is that given $u_{a}$ , output $y_{a}$ becomes independent of $u_{b}$ . This is not true in the general case where the joint channel is not symmetrized.

The decomposition of (26) essentially decouples the symmetrized joint channel to a product of two distributions.

Lemma 18.

Let $W_{b}(y_{a},u_{a},d_{b}|u_{b})$ be a symmetrized joint channel. It admits the decomposition

[TABLE]

For any $y_{a},u_{a}$ , channel $W_{2}$ is a BMS channel with input $u_{b}$ and output $d_{b}$ , i.e.,

[TABLE]

Moreover, $W_{2}$ satisfies

[TABLE]

Proof:

Using (27) in (26) yields (28). The remainder of this lemma is readily obtained by using (25) in (28). ∎

Definition 5 (Decoupling decomposition).

A decomposition of the form (28) for a symmetrized joint channel is called a decoupling decomposition. Channel $W_{a}$ is obtained by marginalization, i.e.,

[TABLE]

where the latter equality, which is due to symmetry, holds for any $u_{b}$ . Then, we compute channel $W_{2}(d_{b}|u_{b};y_{a},u_{a})$ using (28). The special case where $W_{a}(y_{a}|u_{a})=0$ requires special attention. Such a case invariably happens for perfect symbols — that is, symbols for which $W_{a}(y_{a}|u_{a})>0$ but $W_{a}(y_{a}|\bar{u}_{a})=0$ for some $u_{a}\in\{0,1\}$ . Specifically, we ensure that $W_{2}$ is a well-defined BMS channel even in this case, so we set it to an arbitrary BSC. Thus,

[TABLE]

When setting to an arbitrary BSC, we make sure not to add new b-channel $D$ -values. One possible choice is to set to a BSC whose output has the highest b-channel $D$ -value.

We use decoupling decompositions of symmetrized joint channels in the sequel. We shall see in Section VI-A that $W_{2}$ plays a central role in the a-channel upgrading procedure.

We conclude this section with an example that compares a joint channel and its symmetrized version. In particular, we demonstrate the decoupling decomposition for the symmetrized joint channel.

Example 4.

Let $W$ be a BSC with crossover probability $0.2$ and consider $W_{-,+}$ , the joint synthetic channel of the ‘ $-$ ’- and ‘ $+$ ’-transforms of $W$ . In $D$ -value representation, the a-channel has four possible outputs $y_{a}\in\{00,01,10,11\}$ and there are three values of $d_{b}$ : $d_{b}\in\{-\frac{15}{17},0,\frac{15}{17}\}$ . Table IV contains the probability table of this joint synthetic channel for $u_{a}=0$ and varying $y_{a},u_{b},d_{b}$ . When $y_{a}=00$ and $u_{a}=0$ , the b-channel input $u_{b}$ is more likely to be $1$ than [math]. Similarly, when $y_{a}=11$ and $u_{a}=0$ , the b-channel input $u_{b}$ is more likely to be [math] than $1$ . Thus, the channel in Table IV does not satisfy (28).

After symmetrization, the a-channel output is either $\accentset{\circ}{0}=\{00,11\}$ or $\accentset{\circ}{1}=\{01,10\}$ . The probability table for the symmetrized channel with $u_{a}=0$ is shown in Table V. Here, when $u_{a}=0$ and $\accentset{\circ}{0}$ is received at the a-channel, $u_{b}=0$ or $1$ are equally likely. Indeed, $W^{-}$ is a BSC with crossover probability $2p(1-p)=0.32$ , and the channel in Table V satisfies (28).

VI Upgrading Procedures for Joint Synthetic Channels

In this section, we introduce proper upgrading procedures for joint synthetic channels. The overall goal is to reduce the alphabet size of the joint channel. The upgrading procedures we develop enable us to reduce the alphabet size of each of the marginals without changing the distribution of the other; there is a different procedure for each marginal. As an intermediate step, we further couple the marginals by increasing the alphabet size of one of them.

The joint channel $W_{a,b}$ is assumed to be symmetrized and in $D$ -value representation. The upgrading procedures will maintain this. As discussed in Section V, we do not distinguish symmetrized channels with any special symbol. The upgrading procedure of Section VI-A hinges on symmetrization. The upgrading procedure of Section VI-B does not require symmetrization and holds for non-symmetrized channels without change. However, we shall see that symmetrization simplifies the resulting expressions.

VI-A Upgrading Channel $W_{a}$

We now introduce a theorem that enables us to deduce an upgrading procedure that upgrades $W_{a}$ and reduces its output alphabet size. Let symmetrized joint channel $W_{b}(y_{a},u_{a},d_{b}|u_{b})$ admit decoupling decomposition (28). Let $Q_{b}(z_{a},u_{a},z_{b}|u_{b})$ be another symmetrized joint channel, where $z_{b}$ represents the $D$ -value of the b-channel output. It also admits a decoupling decomposition,

[TABLE]

Theorem 19.

Let $W_{b}$ and $Q_{b}$ be symmetrized joint channels with decoupling decompositions (28) and (44), respectively. Then, $Q_{b}\overset{p}{\succcurlyeq}W_{b}$ if

$Q_{a}(z_{a}|u_{a})\succcurlyeq W_{a}(y_{a}|u_{a})$ * with degrading channel $P_{a}(y_{a}|z_{a})$ .* 2. 2.

$Q_{2}(z_{b}|u_{b};z_{a},u_{a})\succcurlyeq W_{2}(d_{b}|u_{b};y_{a},u_{a})$ * for all $u_{a},y_{a},z_{a}$ such that $P_{a}(y_{a}|z_{a})>0$ .*

Before going into the proof, some comments are in order. First, we do not claim that any $Q_{b}$ that is upgraded from $W_{b}$ must satisfy this theorem. Second, the meaning of the second item is that, for fixed $z_{a},u_{a}$ , BMS channel $Q_{2}(z_{b}|u_{b};z_{a},u_{a})$ with binary input $u_{b}$ is upgraded from a set of BMS channels $\{W_{2}(d_{b}|u_{b};y_{a},u_{a})\}_{y_{a}}$ with the same binary input.

Proof:

Using decoupling decompositions (28) and (44) and the structure of a proper degrading channel (20), $Q_{b}\overset{p}{\succcurlyeq}W_{b}$ if and only if there exist $P_{a}^{\prime}$ and $P_{b}^{\prime}$ such that

[TABLE]

where

[TABLE]

We now find $P_{a}^{\prime}$ and $P_{b}^{\prime}$ from the conditions of the theorem.

The first condition of the theorem implies that there exists a channel $P_{a}(y_{a}|z_{a})$ such that

[TABLE]

The second condition of the theorem implies that for each $y_{a},u_{a},z_{a}$ there exists a channel $P_{b}(d_{b}|y_{a},z_{a},u_{a},z_{b})$ such that

[TABLE]

We set

[TABLE]

Using (54) in (50), we have

[TABLE]

It is easily verified that (47) is satisfied by $P_{a}^{\prime}=P_{a}$ and this $V$ , completing the proof. ∎

*Remark 4**.*

Recall from (31) that when $W_{a}(y_{a}|u_{a})=0$ , we set $W_{2}$ to an arbitrary BSC. At this point, the reader may wonder what effect — if any — does this have on the resulting joint channel. We now show that there is no effect. To see this, observe from (51) that if $W_{a}(y_{a}|u_{a})=0$ and $P_{a}(y_{a}|z_{a})>0$ , then necessarily $Q_{a}(z_{a}|u_{a})=0$ . Hence, by (44), $Q_{b}(z_{a},u_{a},z_{b}|u_{b})=0$ . This latter equality is the same regardless of how we had set $W_{2}(d_{b}|u_{b};y_{a},u_{a})$ .

How might one use Theorem 19 to upgrade the a-channel? A naive way would be to first upgrade the marginal $W_{a}$ to $Q_{a}$ using some known method (e.g., the methods of [9], see Appendix C). This yields degrading channel $P_{a}$ by which one can find channel $Q_{2}$ that satisfies (54). With $Q_{a}$ and $Q_{2}$ at hand, one forms the product (44) to obtain $Q_{b}$ . If the reader were to attempt to do this, she would find out that it often changes the b-channel. Moreover, this change may be radical: the resulting b-channel may be so upgraded to become almost noiseless, which boils down to an uninteresting bound, the trivial lower bound (3). It is possible to upgrade the a-channel without changing the b-channel; this requires an additional transform we now introduce.

The upgrade-couple transform enables upgrading the a-channel without changing the b-channel. The idea is to split each a-channel symbol to several classes, according to the possible b-channel outputs. Symbols within a class have the same $W_{2}$ channel, so that confining upgrade-merges to operate within a class inherently satisfies the second condition of Theorem 19. Thus, we circumvent changes to the b-channel. This results in only a modest increase to the number of output symbols of the overall joint channel.

Let channel $W_{b}$ have $2B$ possible $D$ -values, $\pm d_{b1},\pm d_{b2},\ldots,\pm d_{bB}$ . We assume that erasure symbols are duplicated,555That is, there is a “positive” and a “negative” erasure, see [9, Lemma 4]. and $0\leq d_{b1}\leq d_{b2}\leq\cdots\leq d_{bB}\leq 1$ . For each a-channel symbol $y_{a}$ we define $B^{2}$ upgrade-couple symbols $y_{a}^{i,j}$ , $i,j\in\{1,2,\ldots B\}$ . The new symbols couple the outputs of the a- and b-channels (whence the name of the upgrade-couple transform). Namely, if the a-channel output is $y_{a}^{i,j}$ and $u_{a}=0$ , the b-channel output can only be $\pm d_{bi}$ ; if the a-channel output is $y_{a}^{i,j}$ and $u_{a}=1$ , the b-channel output can only be $\pm d_{bj}$ .

The upgrade-couple channel $\check{W}_{b}(y_{a}^{i,j},u_{a},d_{b}|u_{b})$ is defined by

[TABLE]

where

[TABLE]

and $W_{2}(d_{b}|u_{b};y_{a},u_{a})$ is derived from the decoupling decomposition of $W_{b}$ , see (31).

As intuition for the factor $S_{i,j}(y_{a},u_{a},d_{b})$ , observe that it ensures that $\check{W}_{b}(y_{a}^{i,j},u_{a}=0,d_{b}|u_{b})=0$ for $d_{b}\not\in\pm d_{bi}$ and that $\check{W}_{b}(y_{a}^{i,j},u_{a}=1,d_{b}|u_{b})=0$ for $d_{b}\not\in\pm d_{bj}$ . Crucially, it does not upgrade the marginal channels (see LABEL:cor_Wbstar_and_Wbhatstar_are_thesame and 25). In particular, as shown in LABEL:lem_properties_ofupgrade-couple, the factor $S_{i,j}(y_{a},u_{a},d_{b})$ ensures that symbols $y_{a}$ of channel $W_{a}$ and $y_{a}^{i,j}$ of channel $\check{W}_{a}$ share the same a-channel $D$ -value.

*Remark 5**.*

For the original joint channel there may be a-channel symbols $y_{a}$ for which $W_{a}(y_{a}|0)>0$ but $W_{a}(y_{a}|1)=0$ . For the upgrade-couple channel $\check{W}_{b}$ , the symbol $y_{a}^{i,j}$ determines the possible values for the b-channel output when $u_{a}=0$ or when $u_{a}=1$ . The symbol $y_{a}$ never appears with positive probability if $u_{a}=1$ , yet, because it may appear with positive probability if $u_{a}=0$ , we still need to map it to some $y_{a}^{i,j}$ . The upgrade-couple transform is well defined even in this case, thanks to our definition of $W_{2}$ , see (31). In particular, if $y_{a}$ never occurs with positive probability with $u_{a}=1$ , say, then $y_{a}^{i,j}$ for the upgrade-couple channel also never occurs with positive probability with $u_{a}=1$ (see Lemma 23, item 2).

A parameter that is related to $S_{i,j}$ and will be useful in the sequel is

[TABLE]

For every $y_{a},u_{a},d_{b}$ , there must exist some $i,j$ such that $S_{i,j}(y_{a},u_{a},d_{b})>0$ . The following lemma makes this clear.

Lemma 20.

For any $y_{a},u_{a},d_{b}$ we have

[TABLE]

Proof:

Without loss of generality, we shall show (57) for $u_{a}=0$ and $d_{b}=+d_{b1}$ . Observe that $S_{i,j}(y_{a},0,d_{b1})=0$ for all $i>1$ . Thus, $\sum_{i,j}S_{i,j}(y_{a},0,d_{b1})=\sum_{j}S_{1,j}(y_{a},0,d_{b1})$ . Next, by (29),

[TABLE]

where the latter equality is because $W_{2}$ is a valid BMS channel.

To see (58), observe that

[TABLE]

Summing over $i,j$ and using (29) yields the result. ∎

As we now show, since $W_{b}$ is symmetrized, so is $\check{W}_{b}$ .

Lemma 21.

Let $W_{b}(y_{a},u_{a},d_{b}|u_{b})$ be a symmetrized joint channel. Then, $\check{W}_{b}(y_{a}^{i,j},u_{a},d_{b}|u_{b})$ , defined as in (55), is also symmetrized.

Proof:

To establish the lemma, we need to show that (25) holds for the upgrade-couple channel. For the a-channel $W_{a}$ , let symbols $y_{a},\bar{y}_{a}$ be conjugates, i.e., $W_{a}(y_{a}|u_{a})=W_{a}(\bar{y}_{a}|\bar{u}_{a})$ . Channel $W_{b}$ is symmetrized, so, by (30), $S_{i,j}(y_{a},u_{a},d_{b})=S_{j,i}(\bar{y}_{a},\bar{u}_{a},d_{b})$ . Furthermore, by definition, $S_{i,j}(y_{a},u_{a},d_{b})=S_{i,j}(y_{a},u_{a},-d_{b})$ . Thus,

[TABLE]

Next, recall that $\check{W}_{a}(y_{a}^{i,j}|u_{a})=\sum_{d_{b},u_{b}}\check{W}_{b}(y_{a}^{i,j},u_{a},d_{b}|u_{b})$ , so that $\check{W}_{a}(y_{a}^{i,j}|u_{a})=\check{W}_{a}(\bar{y}_{a}^{j,i}|\bar{u}_{a})$ . Thus, (25) holds as required. ∎

In the proof of Lemma 21 we have seen that the conjugate symbol of $y_{a}^{i,j}$ is $\bar{y}_{a}^{j,i}$ (with the order of $i$ and $j$ flipped). We summarize this in the following corollary.

Corollary 22.

If $W_{a}(\bar{y}_{a}|\bar{u}_{a})=W_{a}(y_{a}|u_{a})$ then $\check{W}_{a}(\bar{y}_{a}^{j,i}|\bar{u}_{a})=\check{W}_{a}(y_{a}^{i,j}|u_{a})$ .

Since $\check{W}_{b}$ is symmetrized, it admits decoupling decomposition

[TABLE]

Denote by $\text{BSC}(p)$ a binary symmetric channel with crossover probability $p$ . In Lemma 23 we derive $\check{W}_{a}$ [see (61)] and establish that for every $y_{a}$ ,

[TABLE]

That is, when $u_{a}=0$ we have $\check{W}_{2}(\pm d_{bi}|u_{b};y_{a}^{i,j},u_{a})=(1\pm(-1)^{u_{b}}d_{bi})/2$ , when $u_{a}=1$ we have $\check{W}_{2}(\pm d_{bj}|u_{b};y_{a}^{i,j},u_{a})=(1\pm(-1)^{u_{b}}d_{bj})/2$ , and $\check{W}_{2}(d_{b}|u_{b};y_{a}^{i,j},u_{a})$ is zero for any other $d_{b}$ . We emphasize that we define $\check{W}_{2}(d_{b}|u_{b};y_{a}^{i,j},u_{a})$ using (60) even if $\check{W}_{a}(y_{a}^{i,j}|u_{a})=0$ .

Lemma 23.

Let $W_{b}(y_{a},u_{a},d_{b}|u_{b})$ be a symmetrized joint channel and let $\check{W}_{b}(y_{a}^{i,j},u_{a},d_{b}|u_{b})$ be defined as in (55), with decoupling decomposition (59). Then

Joint channel $\check{W}_{b}$ is upgraded from joint channel $W_{b}$ with a proper degrading channel that deterministically maps $y_{a}^{i,j}$ to $y_{a}$ . 2. 2.

We have

[TABLE]

Moreover, symbols $y_{a}$ of channel $W_{a}$ and $y_{a}^{i,j}$ of channel $\check{W}_{a}$ have the same a-channel $D$ -value for every $i,j$ such that $\check{W}_{b}(y_{a}^{i,j},u_{a},d_{b}|u_{b})>0$ . 3. 3.

For every $y_{a}$ , BMS channel $\check{W}_{2}(d_{b}|u_{b};y_{a}^{i,j},u_{a})$ with input $u_{b}$ and output $d_{b}$ is $\text{BSC}((1-d_{bi})/2)$ if $u_{a}=0$ and $\text{BSC}((1-d_{bj})/2)$ if $u_{a}=1$ .

Proof:

For the first item, we sum (55) over $i,j$ and obtain, using (57),

[TABLE]

That is, joint channel $\check{W}_{b}$ is upgraded from $W_{b}$ with degrading channel $P_{a}$ that deterministically maps $y_{a}^{i,j}$ to $y_{a}$ . This is a proper degrading channel.

For the second item, we marginalize $\check{W}_{b}$ over $d_{b}$ and $u_{b}$ . Using (28) in the right-hand-side of (55), we obtain (61), where $\alpha_{i,j}(y_{a})$ is given in (56). Whenever $\check{W}_{b}(y_{a}^{i,j},u_{a},d_{b}|u_{b})>0$ , we have, by (55), $\alpha_{i,j}(y_{a})>0$ . Thus,

[TABLE]

implying that $y_{a}$ and $y_{a}^{i,j}$ have the same a-channel $D$ -value for their respective channels.

For the final item, if $\check{W}_{a}(y_{a}^{i,j}|u_{a})=0$ , we are free to set $\check{W}_{2}(d_{b}|u_{b};y_{a}^{i,j},u_{a})$ as we please, so we set it as per the item. Otherwise, there are only two values of $d_{b}$ for which $S_{i,j}(y_{a},u_{a},d_{b})$ is nonzero. Hence, $\check{W}_{b}$ can output only two b-channel $D$ -values for fixed $y_{a}^{i,j}$ and $u_{a}$ . Thus, $\check{W}_{2}$ is a BMS channel with only two possible outputs, or, in other words, a BSC. A BSC that outputs $D$ -values $\pm d$ , $0\leq d\leq 1$ , has crossover probability $(1-d)/2$ . This establishes the item. ∎

Definition 6 (Canonical channel).

The canonical channel $W^{*}(d|u)$ of channel $W(y|u)$ has a single entry for each $D$ -value. That is, denoting by $D_{d}$ the set of symbols $y$ whose $D$ -value is $d$ , we have $W^{*}(d|u)=\sum_{D_{d}}W(y|u).$ It can be shown that a channel is equivalent to its canonical form, i.e., each form can be degraded from the other.

Corollary 24.

The canonical b-channels of $\check{W}_{b}(y_{a}^{i,j},u_{a},d_{b}|u_{b})$ and $W_{b}(y_{a},u_{a},d_{b}|u_{b})$ coincide.

Proof:

This is a direct consequence of the first item of Lemma 23:

[TABLE]

∎

Corollary 25.

The canonical a-channels of $\check{W}_{b}(y_{a}^{i,j},u_{a},d_{b}|u_{b})$ and $W_{b}(y_{a},u_{a},d_{b}|u_{b})$ coincide.

Proof:

This follows from the second item of Lemma 23, (58), and (61). ∎

Definition 7 (Class).

The class $C_{i,j}$ is the set of symbols $y_{a}^{i,j}$ with fixed $i,j$ .

There are $B^{2}$ classes. The size of each class is the number of symbols $y_{a}$ . By (60), $\check{W}_{2}(d_{b}|u_{b};y_{a}^{i,j},u_{a})$ is the same BSC for all symbols of class $C_{i,j}$ and fixed $u_{a}$ . Thus, the second item of Theorem 19 becomes trivial and is immediately satisfied if we use an upgrading procedure that upgrade-merges several symbols of the same class $C_{i,j}$ .

To determine which upgrading procedures may be used, we turn to the degrading channel. So long as the degrading channel does not mix a symbol and its conjugate, the upgrading procedure can be confined to a single class. This is because conjugate symbols belong to different classes, as established in Corollary 22. Thus, of the upgrading procedures of [9] (see Appendix C) we can use either upgrade-merge-3 without restriction or upgrade-merge-2 provided that the two symbols to be merged have the same a-channel $D$ -value.

Theorem 26.

Let $W_{b}(y_{a},u_{a},d_{b}|u_{b})$ be some joint channel with marginals $W_{a}(y_{a}|u_{a}),W_{b}^{*}(d_{b}|u_{b})$ and upgrade-couple counterpart $\check{W}_{b}(y_{a}^{i,j},u_{a},d_{b}|u_{b})$ . Let $Q_{a}(z_{a}|u_{a})\succcurlyeq W_{a}(y_{a}|u_{a})$ obtained by an upgrade-merge-3 procedure. Then there exists joint channel $\check{Q}_{b}(z_{a}^{i,j},u_{a},d_{b}|u_{b})\overset{p}{\succcurlyeq}\check{W}_{b}(y_{a}^{i,j},u_{a},d_{b}|u_{b})$ with canonical marginals $\check{Q}_{a}^{*}(z_{a}|u_{a}),\check{Q}_{b}^{*}(d_{b}|u_{b})$ such that $\check{Q}_{a}^{*}=Q_{a}^{*}$ and $\check{Q}_{b}^{*}=W_{b}^{*}$ .

Proof:

The idea is to confine the upgrading procedures to work within a class, utilizing Theorem 19 over each class separately.

Assume that the upgrading procedure from $W_{a}$ to $Q_{a}$ replaces symbols $y_{a1},y_{a2},y_{a3}$ with symbols $z_{a1},z_{a3}$ . We obtain $\check{Q}_{b}$ by using Theorem 19 for each class $C_{i,j}$ of $\check{W}_{b}$ separately. The a-channel upgrade procedure for class $C_{i,j}$ is upgrade-merge-3 from $\check{W}_{a}$ to $\check{Q}_{a}$ that replaces symbols $y_{a1}^{i,j},y_{a2}^{i,j},y_{a3}^{i,j}$ with symbols $z_{a1}^{i,j},z_{a3}^{i,j}$ . As the upgrade is confined to symbols of the same class, the channel $\check{W}_{2}$ — given by (60) — is the same regardless of $y_{a}$ , as established in Lemma 23, item 3. Hence, the second item of Theorem 19 is automatically satisfied within a class $C_{i,j}$ , with

[TABLE]

for all $y_{a},z_{a}$ . Channel $\check{Q}_{b}$ is then obtained by the product of $\check{Q}_{a}$ and $\check{Q}_{2}$ as per (59):

[TABLE]

By properties of upgrade-merge-3 (see (77) in Appendix C-B) we have $\sum_{z_{a}}\check{Q}_{a}(z_{a}^{i,j}|u_{a})=\sum_{y_{a}}\check{W}_{a}(y_{a}^{i,j}|u_{a}).$ Therefore,

[TABLE]

where in $(a)$ we used the decoupling decomposition (63); $(b)$ and $(c)$ are by Lemma 23, item 3 and by (62); finally, $(d)$ is due to Corollary 24.

To see that the canonical a-channel marginals coincide, note that by Lemma 23, item 2, for any fixed $z_{a}$ , the symbols $\{z_{a}^{i,j}\}_{i,j}$ all have the same a-channel $D$ -value. Let $d_{a}$ be some a-channel $D$ -value, and let $D_{d_{a}}$ be the set of a-channel outputs $z_{a}$ whose a-channel $D$ -value is $d_{a}$ . Then,

[TABLE]

where $(a)$ is a direct consequence of the expressions for upgrade-merge-3 and our construction of upgrading each class separately. ∎

To use Theorem 26, one begins with a design parameter $A$ that controls the output alphabet size. Working one class at a time, one then applies upgrade operations in succession to reduce the class size to $2A$ . The resulting channel, therefore, will have $2AB^{2}$ symbols overall. The canonical a-channel marginal that results from this operation will have at most $2A$ symbols.

*Remark 6**.*

The upgrade-merge-3 procedure replaces three conjugate symbol pairs with two conjugate symbol pairs. Recall from Corollary 22 that after the upgrade-couple transform, conjugate symbols belong to different classes. In particular, if $y_{a}$ and $\bar{y}_{a}$ are a conjugate pair of the a-channel before the upgrade-couple transform, then $y_{a}^{i,j}\in C_{i,j}$ and $\bar{y}_{a}^{j,i}\in C_{j,i}$ are a conjugate pair of the a-channel after the upgrade-couple transform. Therefore, when one uses Theorem 26 to replace the symbols

[TABLE]

one must also replace their conjugates

[TABLE]

We still always operate within a class as nowhere do we mix symbols from different classes. Alternatively, one may upgrade only classes $C_{i,j}$ with $i\geq j$ and then use channel symmetry to obtain the upgraded forms of classes $C_{j,i}$ .

There is one case where it is possible to use upgrade-merge-2, as stated in the following corollary.

Corollary 27.

Theorem 26* also holds if the a-channel upgrade procedure is upgrade-merge-2 applied to two symbols of the same a-channel $D$ -value.*

Proof:

While in general the upgrade-merge-2 procedure mixes a symbol and its conjugate, when the two symbols to be merged have the same a-channel $D$ -value this is no longer the case (see Appendix C-A), and we can follow along the lines of the proof of Theorem 26. We omit the details. ∎

The reason that [9] introduced both the upgrade-merge-2 and upgrade-merge-3 procedures despite the superiority of the latter stems from numerical issues. To implement upgrade-merge-3 we must divide by the difference of the extremal $D$ -values to be merged. If these are very close this can lead to numerical errors. Upgrade-merge-2 is not susceptible to such errors. On the other hand, upgrade-merge-2 cannot be used in the manner stated above; it requires us to mix symbols from two classes $C_{i,j}$ and $C_{j,i}$ that may have wildly different $\check{Q}_{2}$ channels. Thus, this will undesirably upgrade the b-channel.

In practice, however, we may be confronted with a triplet of symbols with very close, but not identical, a-channel $D$ -values. To avoid numerical issues, we utilize a fourth nearby symbol. Say that our triplet666To simplify notation, we omit the dependence on the class; it is clear that we do this for each class separately. is $y_{a1},y_{a2},y_{a3}$ with a-channel $D$ -values $d_{a1}\leq d_{a2}<d_{a3}$ such that $d_{a3}-d_{a1}<\epsilon$ , for some “closeness” threshold $\epsilon$ . Let $y_{a4}$ have a-channel $D$ -value $d_{a4}$ such that $d_{a4}-d_{a1}>\epsilon$ . Then, we apply upgrade-merge-3 twice: first for $y_{a1},y_{a2},y_{a4}$ obtaining $z_{a1},z_{a4}$ with a-channel $D$ -values $d_{a1},d_{a4}$ and then for $z_{a1},y_{a3},z_{a4}$ , ending up with $z^{\prime}_{a1},z^{\prime}_{a4}$ with a-channel $D$ -values $d_{a1},d_{a4}$ . In this example we have chosen a fourth symbol with a greater a-channel $D$ -value than $d_{a4}$ , but we could have similarly chosen a fourth symbol with a smaller a-channel $D$ -value than $d_{a1}$ instead.

VI-B Upgrading Channel $W_{b}$

We now show how to upgrade $W_{a,b}(y_{a},u_{a},d_{b}|u_{a},u_{b})$ to channel $Q_{a,b}(y_{a},u_{a},z_{b}|u_{a},u_{b})$ such that $Q_{b}\succcurlyeq W_{b}$ and $Q_{a}=W_{a}$ . The idea is to begin with $W_{b}^{*}$ , a channel equivalent to $W_{b}$ in which $y_{a}$ and $u_{a}$ are not explicit in the output. The channel $W_{b}^{*}$ is given by $W_{b}^{*}(d_{b}|u_{b})=\sum_{y_{a},u_{a}}W_{b}(y_{a},u_{a},d_{b}|u_{b})$ . We upgrade $W_{b}^{*}$ to $Q_{b}^{*}$ using some known method, such that channel $P_{b}^{*}$ degrades $Q_{b}^{*}$ to $W_{b}^{*}$ . To form upgraded channel $Q_{b}$ , we “split” the outputs of $Q_{b}^{*}$ to include $y_{a}$ and $u_{a}$ and find a degrading channel that degrades $Q_{b}$ to $W_{b}$ . We shall see that the upgraded channel $Q_{b}$ is given by

[TABLE]

where $W_{b}(y_{a},u_{a},d_{b})$ and $W_{b}^{*}(d_{b})$ are defined in (65), below. Finally, we form the joint channel $Q_{a,b}$ using (10). We illustrate this in Figure 4.

Theorem 28.

Let $W_{b}(y_{a},u_{a},d_{b}|u_{b})$ be a joint channel where $d_{b}$ is the $D$ -value of the b-channel’s output. Let $W_{b}^{*}(d_{b}|u_{b})$ be a channel equivalent to $W_{b}$ , and let $Q_{b}^{*}(z_{b}|u_{b})\succcurlyeq W_{b}^{*}(d_{b}|u_{b})$ with degrading channel $P_{b}^{*}(d_{b}|z_{b})$ . Then there exists joint channel $Q_{b}(y_{a},u_{a},z_{b}|u_{b})$ such that $Q_{b}(y_{a},u_{a},z_{b}|u_{b})\overset{p}{\succcurlyeq}W_{b}(y_{a},u_{a},d_{b}|u_{b})$ and $\sum_{y_{a},u_{a}}Q_{b}(y_{a},u_{a},z_{b}|u_{b})=Q_{b}^{*}(z_{b}|u_{b})$ .

Proof:

We shall explicitly find $Q_{b}$ and an appropriate degrading channel. The degrading channel will be of the form $P_{b}(d_{b}|y_{a},u_{a},z_{b})$ , i.e., $y_{a}$ and $u_{a}$ pass through the degrading channel unchanged. Such degrading channels are proper. Since $Q_{b}^{*}\succcurlyeq W_{b}^{*}$ we have, for any $d_{b}$ and $u_{b}$ ,

[TABLE]

Denote

[TABLE]

We assume that $W_{b}^{*}(d_{b})>0$ , for otherwise output $d_{b}$ never appears with positive probability and may be ignored, and define

[TABLE]

For each $z_{b}$ , we will shortly define constants $\mu_{y_{a},u_{a}}^{z_{b}}$ such that $\mu_{y_{a},u_{a}}^{z_{b}}\geq 0$ and $\sum_{y_{a},u_{a}}\mu_{y_{a},u_{a}}^{z_{b}}=1$ . Similar to (66), we use these constants to define channel $Q_{b}$ by

[TABLE]

Indeed, $\sum_{y_{a},u_{a}}Q_{b}(y_{a},u_{a},z_{b}|u_{b})=Q_{b}^{*}(z_{b}|u_{b})$ . We now find the constants $\mu_{y_{a},u_{a}}^{z_{b}}$ and an appropriate degrading channel $P_{b}(d_{b}|y_{a},u_{a},z_{b})$ such that

[TABLE]

which will establish our goal.

Let $y_{a},u_{a}$ , and $d_{b}$ be such that the left-hand side of (68) is positive777Since $W_{b}^{*}(d_{b})>0$ , there will always be at least one selection of $y_{a},u_{a}$ for which the left-hand side of (68) is positive., so that $\rho_{y_{a},u_{a}}^{d_{b}}>0$ . We shall see that the resulting expressions hold for the zero case as well. Using (66) and (67), we can rewrite (68) as

[TABLE]

Comparing this with (64), we set

[TABLE]

It is easily verified that $\mu_{y_{a},u_{a}}^{z_{b}}\geq 0$ and $\sum_{y_{a},u_{a}}\mu_{y_{a},u_{a}}^{z_{b}}=1$ . Using the expression for $\mu_{y_{a},u_{a}}^{z_{b}}$ in (69) yields

[TABLE]

This is a valid probability distribution. We remark that (68) is satisfied by (70) and (71) even when $\rho_{y_{a},u_{a}}^{d_{b}}=0$ . We have found $Q_{b}$ and a proper degrading channel $P_{b}$ as required. ∎

Corollary 29.

In Theorem 28, the marginal a-channels of $Q_{b}$ and $W_{b}$ coincide.

Proof:

By construction, the degrading channel from $Q_{b}$ to $W_{b}$ does not change the a-channel output, implying that the a-channel marginal remains the same. ∎

To use Theorem 28, one begins with design parameter $B$ that controls the output alphabet size. The channel $Q_{b}^{*}$ , with output alphabet of size $2B$ , is obtained from $W_{b}^{*}$ using a sequence of upgrade operations. To obtain upgraded joint channel $Q_{b}$ , one uses the Theorem to turn them into a sequence of upgrade operations to be performed on channel $W_{b}$ . If one uses the techniques of [9], the upgrade operations will consist of upgrade-merge-2 and upgrade-merge-3 operations (see Appendix C). In the following examples we apply Theorem 28 specifically to these upgrades.

For brevity, we will use the following notation:

[TABLE]

Example 5 (Upgrading $W_{b}$ Based on Upgrade-Merge-2).

The upgrade-merge-2 procedure of [9] selects two conjugate symbols pairs and replaces them with a single conjugate symbol pair. The details of the transformation, in our notation, appear in Appendix C-A.

Let joint channel $W_{b}(y_{a},u_{a},d_{b}|u_{b})$ have b-channel marginal $W_{b}^{*}(d_{b}|u_{b})$ , in which all symbols with the same $D$ -value are combined to a single symbol. We select symbols $d_{bj},d_{bk}$ and their respective conjugates $\bar{d}_{bj}=-d_{bj},\bar{d}_{bk}=-d_{bk}$ , such that $d_{bk}\geq d_{bj}>0$ and upgrade $W_{b}^{*}(d_{b}|u_{b})$ to $Q_{b}^{*}(z_{b}|u_{b})$ given by (75) (Appendix C-A). We denote by $\mathcal{D}_{b}$ the output alphabet of $W_{b}^{*}$ and by $\mathcal{D}_{z_{bk}}$ the set

[TABLE]

The output alphabet of $Q_{b}^{*}$ is $\mathcal{Z}=(\mathcal{D}_{b}\setminus\mathcal{D}_{z_{bk}})\cup(z_{bk},\bar{z}_{bk})$ ; outputs of $Q_{b}^{*}$ represent $D$ -values. In particular, the $D$ -values of $z_{bk}$ and $\bar{z}_{bk}$ are $d_{bk}$ and $-d_{bk}$ , respectively.

Using Theorem 28, we form channel $Q_{b}(y_{a},u_{a},z_{b}|u_{b})$ by

[TABLE]

where by (70),

[TABLE]

We can simplify this when $W_{b}$ is a symmetrized channel. In this case, ${\pi}_{y_{a},u_{a}}^{d_{b}}={\pi}_{y_{a},u_{a}}^{\bar{d}_{b}}$ , yielding

[TABLE]

Therefore, the upgraded joint channel becomes

[TABLE]

where

[TABLE]

Example 6 (Upgrading $W_{b}$ Based on Upgrade-Merge-3).

The upgrade-merge-3 procedure replaces three conjugate symbols pairs with two conjugate symbol pairs. The details of the transformation, in our notation, appear in Appendix C-B.

As above, let joint channel $W_{b}(y_{a},u_{a},d_{b}|u_{b})$ have b-channel marginal $W_{b}^{*}(d_{b}|u_{b})$ . For the upgrade procedure we select symbols $d_{bi},d_{bj},d_{bk}$ and their respective conjugates, such that $0\leq d_{bi}<d_{bj}\leq d_{bk}$ .888We could have also selected them such that $0\leq d_{bi}\leq d_{bj}<d_{bk}$ . At least one of the inequalities $d_{bi}\leq d_{bj}$ or $d_{bj}\leq d_{bk}$ must be strict. We upgrade $W_{b}^{*}(d_{b}|u_{b})$ to $Q_{b}^{*}(z_{b}|u_{b})$ given by (76) (Appendix C-B). We denote by $\mathcal{D}_{b}$ the output alphabet of $W_{b}^{*}$ and by $\mathcal{D}_{z_{bk},z_{bi}}$ the set

[TABLE]

The output alphabet of $Q_{b}^{*}$ is $\mathcal{Z}=(\mathcal{D}_{b}\setminus\mathcal{D}_{z_{bk},z_{bi}})\cup(z_{bk},z_{bi},\bar{z}_{bi},\bar{z}_{bk})$ ; outputs of $Q_{b}^{*}$ represent $D$ -values. In particular, the $D$ -values of $z_{bk}$ and $z_{bi}$ are $d_{bk}$ and $d_{bi}$ , respectively.

Assuming that $W_{b}$ is symmetrized, we form channel $Q_{b}(y_{a},u_{a},z_{b}|u_{b})$ using Theorem 28 as

[TABLE]

where by (70),

[TABLE]

and $\mu_{y_{a},u_{a}}^{\bar{z}_{bk}}=\mu_{y_{a},u_{a}}^{z_{bk}}$ , $\mu_{y_{a},u_{a}}^{\bar{z}_{bi}}=\mu_{y_{a},u_{a}}^{z_{bi}}$ . The latter two equalities are due to our assumption that $W_{b}$ is symmetrized.

Denoting

[TABLE]

the upgraded joint channel is given by

[TABLE]

*Remark 7**.*

We observe from these examples an interesting parallel between the a-channel and b-channel upgrading procedures. In the former case, we confine upgrade operations to a single class, in which the b-channel $D$ -values are fixed. In light of the above examples, the latter case may be viewed as confining upgrade procedures to “classes” in which $y_{a}$ and $u_{a}$ are fixed.

VII Lower Bound Procedure

The previous sections have introduced several ingredients for building an overall procedure for obtaining a lower bound on the probability of error of polar codes under SC decoding. We now combine these ingredients and present the overall procedure. First, we lower-bound the probability of error of two synthetic channels. Then, we show how to use lower bounds on channel pairs to obtain better lower bounds on the union of many error events.

VII-A Lower Bound on the Joint Probability of Error of Two Synthetic Channels

We now present an upgrading procedure for $W_{a,b}$ that results in channel $Q_{a,b}$ with a smaller alphabet size. The procedure leverages the recursive nature of polar codes.

The input to our procedure is BMS channel $W$ , the number of polarization steps $n$ , the indices $a$ and $b$ of the a-channel and b-channel, respectively, and parameters $A$ and $B$ that control the output alphabet sizes of the a- and b-channels, respectively. The binary expansions of $a-1$ and $b-1$ are $\mathbf{a}=\langle\alpha_{1},\alpha_{2},\ldots,\alpha_{m}\rangle$ and $\mathbf{b}=\langle\beta_{1},\beta_{2},\ldots,\beta_{m}\rangle$ , respectively. These expansions specify the order of polarization transforms to be performed, where [math] implies a ‘ $-$ ’-transform and $1$ implies a ‘ $+$ ’-transform.

The algorithm consists of a sequence of polarization and upgrading steps. After each polarization step, we bring the channel to $D$ -value representation, as described in Section IV-A. A side effect of polarization is increase in alphabet size. The upgrading steps prevents the alphabet size of the channels from growing beyond a predetermined size. After the final upgrading step we obtain joint channel $Q_{a,b}$ , which is properly upgraded from $W_{a,b}$ . We compute $P_{e}^{\textrm{{IMJP}}}(Q_{a,b})$ , which serves as a lower bound to $P_{e}^{\textrm{{IML}}}(W_{a,b})$ . We recall that $P_{e}^{\textrm{{IML}}}(W_{a,b})$ is the probability of error under SC decoding of the joint synthetic channel $W_{a,b}$ . This, in turn, lower-bounds $P_{e}^{\textrm{{SC}}}(W)$ (see Corollary 17).

Algorithm 1 provides a high-level description of the procedure. We begin by determining the first index $m$ for which $\alpha_{m}$ and $\beta_{m}$ differ (i.e. $\alpha_{\ell}=\beta_{\ell}$ for $\ell<m$ and $\alpha_{m}\neq\beta_{m}$ ). The first $m-1$ polarization steps are of a single channel, as the a-channel and b-channel indices are the same. Since these are single channels, we utilize the upgrading procedures of [9] to reduce the output alphabet size. At the $m$ th polarization step, the a- and b-channels differ. We perform joint polarization described in LABEL:sec_polarization_for_joint_bitchannels and symmetrize the joint channel using (24). This symmetrization need only be performed once as subsequent polarizations maintain symmetrization (Proposition 16). We then perform the b-channel upgrading procedure (Section VI-B), which reduces the b-channel alphabet size to $2B$ . Following that, we upgrade the a-channel. As discussed in LABEL:subsec_Upgrading $W_a$ , this consists of two steps. First, we upgrade-couple the channel, to generate $B^{2}$ classes. Second, for each class separately, we use the a-channel upgrade procedure until each class has at most $2A$ elements (see Theorem 26 and Corollary 27). We confine the a-channel upgrade procedure to the class by utilizing only upgrade-merge-3 operations. We continue to polarize and upgrade the joint channel in this manner, until $\ell=n$ . After the final polarization and upgrading operation, we compute the probability of error of the IMJP decoder for the resulting channel.

The lower bound of this procedure compares favorably with the trivial lower bound, $\max\{\mathbb{P}\left\{\mathcal{E}_{a}\right\},\mathbb{P}\left\{\mathcal{E}_{b}\right\}\}$ . This is because our upgrading procedure only ever changes one marginal, keeping the other intact. Since it leverages upgrading transforms that can be used on single channels, the marginal channels obtained are the same as would be obtained on single channels using the same upgrading steps. Thus, by Lemma 7 this lower bound is at least as good as $\max\{\mathbb{P}\left\{\mathcal{E}_{a}\right\},\mathbb{P}\left\{\mathcal{E}_{b}\right\}\}$ .

*Remark 8**.*

When the BMS $W$ is a BEC, we can recover the bounds of [10] and [12] using our upgrading procedure. Only a-channel upgrades are required, as the b-channel, in $D$ -value representation, remains a BEC. For each a-channel symbol, the channel $W_{2}$ in (26) is either a perfect channel or a pure-noise channel (see Lemma 32 in Appendix A). Thus, the upgrade-couple procedure splits the a-channel symbols to those that see a perfect channel regardless of $u_{a}$ and those that see a pure-noise channel regardless of $u_{a}$ . Merging a-channel symbols of the same class is equivalent to merging a-channel symbols for which $\check{W}_{2}$ is the same type of channel. We thus merge a-channel symbols of the same a-channel $D$ -value that “see” the same type of b-channel. This corresponds to keeping track of the correlation between erasure events of the two channels.

*Remark 9**.*

An initial step of Algorithm 1 is to upgrade the channel $W$ , even before any polarization operations. This step enables us to apply our algorithm on continuous-output channels, see [9, Section VI].

VII-B Lower Bound for More than Two Synthetic channels

Recall that the probability of error of polar codes under SC decoding may be expressed as $\mathbb{P}\left\{\bigcup_{a\in\mathcal{A}}\mathcal{E}_{a}\right\}$ . In the previous section, we developed a lower bound on $\mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\}$ , $a<b$ , which lower bounds $\mathbb{P}\left\{\bigcup_{a\in\mathcal{A}}\mathcal{E}_{a}\right\}$ . This lower bound may be strengthened by considering several pairs of synthetic channels and using (4). We now show how this can be done.

Lemma 30.

The probability of error of a union of $M$ events, $\cup_{a=1}^{M}\mathcal{E}_{a}$ is lower bounded by

[TABLE]

Proof:

The proof hinges on using the identity $\mathbb{P}\left\{\mathcal{E}_{a}\cap\mathcal{E}_{b}\right\}=\mathbb{P}\left\{\mathcal{E}_{a}\right\}+\mathbb{P}\left\{\mathcal{E}_{b}\right\}-\mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\}$ in (4). Note that any set of $M$ numbers $\{p_{1},p_{2},\ldots,p_{M}\}$ satisfies

[TABLE]

so that

[TABLE]

Therefore,

[TABLE]

Using this in (4) yields the desired bound. ∎

In practice, we combine the lower bound of Lemma 30 with (15). That is, we compute lower bounds on $\mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\}$ for all pairs of channels in some subset $\mathcal{A}^{\prime}$ of the non-frozen set, and use Lemma 30 over this subset.

Such bounds are highly dependent on the selection of the subset $\mathcal{A}^{\prime}$ . One possible strategy is as follows. Let $\mathcal{B}$ be the set of $k$ worst synthetic channels in the non-frozen set for some $k$ . For each channel pair in $\mathcal{B}$ , compute a lower bound on the joint probability of error using Algorithm 1. Then, form all possible subsets of $\mathcal{B}$ (there are $2^{k}$ such subsets) and use LABEL:lem_lower_bound_on_union_using_unions_oftwo_events for each subset. Choose the subset with the highest upper bound as $\mathcal{A}^{\prime}$ . The reason for going over all possible subsets is that bounds based on the inclusion-exclusion principle are not guaranteed to be higher than the highest pairwise probability, see [15].

VIII Implementation

Our implementation of Algorithm 1, in C++, is available for download at [16]. In this section we provide some details on the implementation.

A naive implementation of Algorithm 1 is to perform all steps successively at each iteration. That is, first jointly polarize the joint channel, then bring the channel to $D$ -value representation, followed by the b-channel upgrade procedure and the upgrade-couple procedure, and finally perform the a-channel upgrade procedure. One quickly finds out, however, a limitation posed by this approach: the memory required to store the outcomes of these stages becomes prohibitively large when the alphabet-size control parameters $A$ and $B$ grow.

Observe, however, that the total required memory at the end of each iteration of Algorithm 1 is actually quite small. We need only store the values of $\check{W}_{a}(y_{a}^{i,j}|0)$ for each value of $y_{a},i,j$ (a total of $2A\cdot B^{2}$ combinations), a mapping between $y_{a}$ and its conjugate $\bar{y}_{a}$ , and a list of size $B$ that stores the possible b-channel $D$ -values. Then, we can compute $\check{W}_{b}(y_{a}^{i,j},u_{a},d_{b}|u_{b})$ using (59), (60), and Corollary 22. Thus, our data structure for an upgrade-coupled joint channel utilizes a three-dimensional matrix of size $(2A)\times B\times B$ to store $\check{W}_{a}(y_{a}^{i,j}|0)$ (specifically, we use the cube data structure provided by [17]). As for the mapping between $y_{a}$ and its conjugate, if $\check{W}_{a}(y_{a}^{i,j}|0)$ is stored in element (y,i,j) of the matrix, and y is even, then $\check{W}_{a}(\bar{y}_{a}^{i,j}|0)$ is stored in element (y+1,i,j). We store the absolute values of the b-channel $D$ -values in a vector of length $B$ .

The second key observation is that each upgrading procedure only ever changes one marginal. That is, the a-channel upgrading procedure leaves the marginal b-channel unchanged, and the b-channel upgrading procedure does not affect the marginal a-channel. Thus, since our upgrading procedure leverage upgrading procedures for single channels, we can pre-compute the upgraded marginal channels. In essence, given a target upgraded marginal channel — computed beforehand using the techniques of [9] — our upgrading procedures “split” the probability of a output symbol among two absorbing symbols. The “splitting” factors are functions of the $D$ -values of the three symbols (see appendix C). Indeed, we compute beforehand the polarized and upgraded marginal channels.

The joint polarization step maps each pair of symbols, $y_{a_{1}}^{i_{1},j_{1}}$ and $y_{a_{2}}^{i_{2},j_{2}}$ to up to four polarized counterparts (see Section IV-B). Knowing beforehand what the upgraded marginal channels should be, we can directly split each polarized symbol into the relevant absorbing symbols. We incorporate the upgrade-couple operation into this by utilizing the factor $\alpha_{i,j}$ from (56).

Thus, in our implementation, rather than performing each step of an iteration in its entirety, we perform all steps in one fell swoop. This sidesteps the memory-intensive step of computing the upgrade-coupled jointly polarized channel. The interested reader is urged to look at our source code for further details.

*Remark 10**.*

The description here was given in terms of $D$ -values, in line with the exposition in this paper. However, for numerical purposes we recommend — and use — likelihood ratios in practical implementation. Likelihood ratios have a greater dynamic range than that of $D$ -values, and therefore offer better numerical precision.999As an example, two very different likelihood ratios: $\lambda_{1}=10^{20}$ and $\lambda_{2}=10^{30}$ , cannot be differentiated in double precision upon conversion to $D$ -values. There is a one-to-one correspondence between $D$ -values and likelihood ratios (see appendix B), and all $D$ -value based formulas are easily translated to their likelihood ratio counterparts.

IX Numerical Results

Figures 1 and 5 present numerical results of our bound for two cases. In both cases, we designed a polar code for a specific BSC, and then assessed its performance when used over different BSCs. Specifically:

•

Figure 1: A code of length $N=2^{10}=1024$ , rate $R=0.1$ , designed for a BSC with crossover probability $0.2$ .

•

Figure 5: A code of length $N=2^{11}=2048$ , rate $R=0.25$ , designed for a BSC with crossover probability $0.18$ .

The codes were designed using the techniques of [9] with $128$ quantization levels. The non-frozen set $\mathcal{A}$ consisted of the $\lfloor NR\rfloor$ channels with smallest probability of error. This non-frozen set was fixed.

For each code, we plot three bounds on the probability of error, when used over specific BSCs: an upper bound on the probability of error, the trivial lower bound on the probability of error, and the new lower bound on the probability of error presented in this paper.

For the upper bound, we computed an upper bound on $\sum_{a\in\mathcal{A}}P_{e}^{\textrm{{ML}}}(W_{a})$ , and for the trivial lower bound we computed a lower bound on $\max_{a\in\mathcal{A}}P_{e}^{\textrm{{ML}}}(W_{a})$ ; upper and lower bounds on the probability of error of single channels (i.e., on $P_{e}^{\textrm{{ML}}}(W_{a})$ ) were obtained using the techniques of [9]. The new lower bound is based on the IMJP decoder, as described in this paper. We computed the IMJP decoding error, with $2A=2B=32$ for all possible pairs of the $20$ worst channels in the non-frozen set.101010Note that there is a different set of $20$ worst channels for each crossover probability. For each crossover probability, we selected the $20$ channels in the (fixed) non-frozen set with the highest upper bound on decoding error when used over a BSC with that crossover probability. We then used Lemma 30, computed for the subset of these $20$ channels that yielded the highest bound; this provides a significantly improved bound over the bound given by the worst-performing pair. The computation utilized [18] for parallel computation of the IMJP decoding error over different channel pairs.

As one may observe, our bounds improve upon the previously known lower bound (3). In fact, they are quite close to the upper bound on the probability of error. This provides strong numerical evidence that error events of channel pairs dominate the error probability of polar codes under SC decoding.

X Discussion and Outlook

This research was inspired by [12], which showed that — for the BEC — the union bound on the probability of error of polar codes under SC decoding is asymptotically tight. The techniques of [12] hinged on the property that a polarized BEC is itself a BEC. Or, put another way, that the family of binary erasure channels is closed under the polar transform. This property enabled the authors to directly track the joint probability of erasure during the polarization process and bound its rate of decay. Unfortunately, this property is not shared by other channel families.

Design of polar codes for channel coding is based on selecting a set of indices to be frozen. One design rule is to select the worst-performing indices as the frozen set. For example, for a code of length $N$ and rate $R$ , choose the $N(1-R)$ indices with the highest probability of error (such channels can be identified using the techniques of [9]). This design rule optimizes the union bound on the probability of error of polar codes, (2). As Parizi and Telatar have shown in [12], for the BEC such a design rule is essentially optimal. It is an open question whether a similar claim can be made for other BMS channel families.

As our numerical results show, below a certain crossover probability the upper bound and our lower bound all but coincide, with a significant gap to the trivial lower bound. Thus, we conjecture that the ratio between the union bound and the actual probability of error approaches $1$ asymptotically for any BMS channel. This will imply the essential optimality of the the union bound as a design rule. Moreover, we believe that the tools developed in this research are key to proving this conjecture.

One possible approach is to track analytically the evolution of joint error probabilities during the polarization process. The symmetrization transformation and the resultant decoupling decomposition bring joint channels to a form more amenable to analysis. One may look at, for example, the Bhattacharyya parameter of the channel $W_{2}$ from (28), when $u_{a},y_{a}$ are fixed,

[TABLE]

This quantity, together with the Bhattacharyya parameters of the a-channel, may be used to bound $\mathbb{P}\left\{\mathcal{E}_{a}\cap\mathcal{E}_{b}\right\}$ . Tracking the evolution of these parameters — or bounds on them — may enable the study of the decay of $\mathbb{P}\left\{\mathcal{E}_{a}\cap\mathcal{E}_{b}\right\}$ (if indeed there is such decay). In fact, it can be shown that applying the above suggestion to the BEC coincides with the approach of [12].

Interestingly, our bounds are tight despite the various manipulations they perform on the joint channel. The joint channels that result from our procedure are very different from the actual joint channel, yet have no effect on the marginal distributions. This curious outcome merits further research on the upgrade-couple transform and its effect on the joint channel.

There are several additional avenues of further research. These include:

•

Our results apply only to BMS channels. It would be interesting to extend them to richer settings, such as channels with non-binary input, or non-symmetric channels.

•

This research has concentrated on SC decoding. Can it be expanded/applied to other decoding methods for polar codes (e.g., successive cancellation list (SCL) decoding [20])? A logical first step in analyzing SCL decoding is to look at pairs of error events, as done here.

Acknowledgment

The assistance of Ina Talmon is gratefully acknowledged.

Appendix A The IMJP decoder for a BEC

In the special case where $W$ is a BEC and $W_{a}$ and $W_{b}$ are two of its polar descendants, we have the following.

Proposition 31.

Let $W_{a}(y_{a}|u_{a})$ and $W_{b}(y_{a},u_{a},y|u_{b})$ be two polar descendants of a BEC in the same tier. Then, the IMJP and the IML (SC) decoders coincide.

To prove this, we first show that for the BEC erasures are determined by the received channel symbols, $y_{1}^{2^{n}}$ , and not previous bit decisions. This implies that for fixed $y_{a}$ , regardless of $y$ and in particular $u_{a}$ , either channel $W_{b}$ always experiences an erasure, or always experiences a non-erasure. If $W_{b}$ experiences an erasure, it doesn’t matter what $\phi_{a}$ decides in terms of the IMJP decoder – it may as well use an ML decoder; if $W_{b}$ does not experience an erasure, then the best bet of $W_{a}$ is to use an ML decoder. This suggests that the IML and IMJP decoders coincide.

Lemma 32.

Let $W_{a}(y_{1}^{2^{n}},u_{1}^{a-1}|u_{a})$ be a polar descendant of a BEC, $W$ . Then, there exists a set $E_{n}$ , dependent only on $a$ , such that $W_{a}$ has an erasure if and only if $y_{1}^{2^{n}}\in E_{n}$ .

Proof:

Here, $y_{1}^{2^{n}}$ are the received channel symbols, and $u_{1}^{a-1}$ the previous bit decisions that are part of $W_{a}$ ’s output. Let $\langle\alpha_{1},\alpha_{2},\ldots,\alpha_{n}\rangle$ be the binary expansion of $a-1$ , with $\alpha_{1}$ the MSB. Recall that channel $W_{a}$ is the result of $n$ polarization steps determined by $\alpha_{1},\alpha_{2},\ldots,\alpha_{n}$ , where $\alpha_{j}=0$ is a ‘ $-$ ’-transform and $\alpha_{j}=1$ is a ‘ $+$ ’-transform.

Consider first the case where $n=1$ , i.e., $a-1=\alpha_{1}$ . If $\alpha_{1}=0$ then $W_{a}=W^{-}$ has an erasure if and only if at least one of $y_{1},y_{2}$ is an erasure, i.e., if and only if $y_{1}^{2}\in E_{1}$ , $E_{1}=\{y_{1}^{2}|y_{1}=e\text{ or }y_{2}=e\}$ . If $\alpha_{1}=1$ then $W_{a}=W^{+}$ has an erasure if and only if both $y_{1}$ and $y_{2}$ are erasures, i.e., if and only if $y_{1}^{2}\in E_{1}$ , $E_{1}=\{y_{1}^{2}|y_{1}=e\text{ and }y_{2}=e\}$ . Therefore, the claim is true for $n=1$ .

We proceed by induction. Let the claim be true for $n-1$ : for $a^{\prime}-1=\langle\alpha_{1},\alpha_{2},\ldots,\alpha_{n-1}\rangle$ , there exists a set $E_{n-1}$ such that $W_{a^{\prime}}$ has an erasure if and only if $y_{1}^{2^{n-1}}\in E_{n-1}$ . If $\alpha_{n}=0$ , then $W_{a}$ is the result of a ‘ $-$ ’-transform of two BEC channels $W_{a^{\prime}}$ , so it has an erasure if and only if at least one of them erases. In other words, $W_{a}$ has an erasure if and only if $y_{1}^{2^{n}}\in E_{n}$ , $E_{n}=\{y_{1}^{2^{n}}|y_{1}^{2^{n-1}}\in E_{n-1}\text{ or }y_{2^{n-1}+1}^{2^{n}}\in E_{n-1}\}$ . If, however, $\alpha_{n}=1$ , then $W_{a}$ is the result of a ‘ $+$ ’-transform of two BEC channels $W_{a^{\prime}}$ , so it has an erasure if and only if both of them erase. In other words, $W_{a}$ has an erasure if and only if $y_{1}^{2^{n}}\in E_{n}$ , $E_{n}=\{y_{1}^{2^{n}}|y_{1}^{2^{n-1}}\in E_{n-1}\text{ and }y_{2^{n-1}+1}^{2^{n}}\in E_{n-1}\}$ . Thus, the claim is true for $n$ as well, completing the proof. ∎

Proof:

By Lemma 3, a decoder $\phi_{b}$ that minimizes $\mathbb{P}\left\{\mathcal{E}_{a}\cup\mathcal{E}_{b}\right\}$ is an ML decoder. It remains to show that a minimizing $\phi_{a}$ is also an ML decoder. Marginalizing the joint channel (10) yields $W_{a}$ :

[TABLE]

The ML decoder for channel $W_{a}$ maximizes $W_{a}(y_{a}|u_{a})$ with respect to $u_{a}$ ; decoder $\phi_{a}$ , on the other hand, maximizes $T(y_{a}|u_{a})$ , defined in (12). Using (10) we recast the expression for $T$ in the same form as the expression for $W_{a}$ ,

[TABLE]

By Lemma 32, whether $W_{b}$ has an erasure depends solely on the received channel symbols, which are wholly contained in $y_{a}$ , and not on previous bit decisions. In particular, in computing $W_{a}$ or $T$ , we either sum over only erasure symbols or over only non-erasure symbols. Since $\phi_{b}$ is an ML decoder for $W_{b}$ , if $y_{b}$ is an erasure of $W_{b}$ then $W_{a}(y_{a}|u_{a})=2T(y_{a}|u_{a})$ ; if $y_{b}$ is not an erasure of $W_{b}$ then $W_{a}(y_{a}|u_{a})=T(y_{a}|u_{a})$ . In either case, it is clear that the decision based on (11) is identical to the ML decision. Therefore, $\phi_{a}$ is an ML decoder as well, implying that the IMJP decoder is an IML decoder. ∎

Appendix B Introduction to $D$ -values

The decision of an ML decoder for a memoryless binary-input channel $W_{Y|U}$ may be based on any sufficient statistic of the channel output. One well-known sufficient statistic is the log-likelihood ratio (LLR), $l(y)=\log\left(\frac{W_{Y|U}(y|0)}{W_{Y|U}(y|1)}\right)$ . When $l(y)$ is positive, the decoder declares that [math] was transmitted; when $l(y)$ is negative, the decoder declares that $1$ was transmitted; $l(y)=0$ constitutes an erasure, at which the decoder makes some random choice. Another sufficient statistic is the $D$ -value.

The $D$ -value of output $y$ , $d(y)$ , is given by

[TABLE]

Clearly, $-1\leq d(y)\leq 1$ . A maximum likelihood decoder makes its decision based on the sign of the $D$ -value. Assuming a symmetric channel input, $U=0,1$ with probability $1/2$ , using Bayes’ law on (73) yields

[TABLE]

The input is binary, hence $W_{U|Y}(0|y)+W_{U|Y}(1|y)=1$ . Consequently (74) yields

[TABLE]

There is a one-to-one correspondence between $d(y)$ and $l(y)$ , $l(y)=\log\frac{1+d(y)}{1-d(y)},$ or, equivalently, $d(y)=\tanh(l(y)/2).$

If channel $W_{Y|U}$ is symmetric, for each output $y$ there is a conjugate output $\bar{y}$ ; their LLRs and $D$ -values are related: $l(\bar{y})=\frac{1}{l(y)},d(\bar{y})=-d(y).$

Since the $D$ -value is a sufficient statistic of a BMS channel, we may replace the channel output with its $D$ -value. Thus, we may assume that the output $y$ of channel $W_{Y|U}$ is a $D$ -value, i.e., $y=W_{U|Y}(0|y)-W_{U|Y}(1|y)$ . In this case, we say that $W$ is in $D$ -value representation.

Recall that every BMS channel can be decomposed into BSCs [19, Theorem 2.1]. We can think of the output of a BMS as consisting of the “reliability” of the BSC and its output. The absolute value of the $D$ -value corresponds to the BSC’s reliability and its sign to the BSC output ([math] or $1$ ).

A comprehensive treatment of $D$ -values and LLRs in relation to BMS channels appears in [13, Chapter 4].

Appendix C Upgrades of a BMS Channel

We state here in our notation the two upgrades of a BMS channel from [9].

Let $W$ be a discrete BMS whose outputs are $D$ -values $\pm d_{1},\pm d_{2},\ldots,\pm d_{m}$ , and let the probability of symbol $d_{\ell}$ be ${\pi}^{d_{\ell}}\triangleq W(d_{\ell}|u)+W(-d_{\ell}|u)=W(d_{\ell}|0)+W(d_{\ell}|1)$ , $\ell=1,\ldots,m$ . Without loss of generality, $0\leq d_{1}\leq d_{2}\leq\cdots\leq d_{m}\leq 1$ . Clearly, ${\pi}^{d_{\ell}}\geq 0$ for all $\ell$ , and $\sum_{\ell=1}^{m}{\pi}^{d_{\ell}}=1$ . Moreover, ${\pi}^{d_{\ell}}={\pi}^{-d_{\ell}}$ . Namely, this is a BMS that decomposes to $m$ different BSCs, with crossover probabilities $(1-d_{\ell})/2$ , $\ell=1,\ldots,m$ . BSC channel $\ell$ is selected with probability ${\pi}^{d_{\ell}}$ . We have $W(d_{\ell}|u)=({\pi}^{d_{\ell}}/2)\cdot(1+(-1)^{u}d_{\ell})$ and $W(-d_{\ell}|u)=W(d_{\ell}|\bar{u})$ .

C-A The Upgrade-merge-2 Procedure

The first upgrade-merge of [9] takes two $D$ -values $d_{j}\leq d_{k}$ and merges them by transferring the probability of $d_{j}$ to $d_{k}$ . We call it upgrade-merge-2. Channel $W:\mathcal{U}\to\mathcal{Y}$ is upgraded to channel $Q^{(2)}:\mathcal{U}\to\mathcal{Z}$ ; the output alphabet of $Q^{(2)}$ is $\mathcal{Z}=(\mathcal{Y}\setminus\{d_{j},d_{k},-d_{j},-d_{k}\})\cup\{z_{k},-z_{k}\},$ and

[TABLE]

where

[TABLE]

The degrading channel from $Q^{(2)}$ to $W$ is shown in Figure 6a. We show only the portion of interest, i.e., we do not show the symbols that this degrading channel does not change. The parameters of the degrading channel are

[TABLE]

Indeed, $p_{1},p_{2},p_{3}\geq 0$ and $p_{1}+p_{2}+p_{3}=1$ , so this constitutes a valid channel. Note that if $d_{j}=d_{k}$ then $p_{3}=0$ .

C-B The Upgrade-merge-3 Procedure

The second upgrade-merge of [9] removes a $D$ -value $d_{j}$ by splitting its probability between a preceding $D$ -value $d_{i}\leq d_{j}$ and a succeeding $D$ -value $d_{k}\geq d_{j}$ . We call it upgrade-merge-3. Unlike upgrade-merge-2, at least one of these inequalities must be strict (i.e., either $d_{i}<d_{j}$ or $d_{j}<d_{k}$ ). Channel $W:\mathcal{U}\to\mathcal{Y}$ is upgraded to channel $Q^{(3)}:\mathcal{U}\to\mathcal{Z}$ with output alphabet $\mathcal{Z}=(\mathcal{Y}\setminus\{d_{i},d_{j},d_{k},-d_{i},-d_{j},-d_{k}\})\cup\{z_{i},z_{k},-z_{i},-z_{k}\},$ and

[TABLE]

where

[TABLE]

Note that

[TABLE]

The degrading channel from $Q^{(3)}(z|u)$ to $W(y|u)$ is shown in Figure 6b, showing only the interesting portion of the channel. The parameters of the channel are $p_{\ell}={\pi}^{d_{\ell}}/{\pi}^{z_{\ell}}$ , and $q_{\ell}=1-p_{\ell}$ , ${\ell}=i,k$ . This is a valid channel as ${\pi}^{z_{\ell}}\geq{\pi}^{d_{\ell}}$ .

It can be shown [9, Lemma 12] that $Q^{(2)}\succcurlyeq Q^{(3)}\succcurlyeq W$ . That is, upgrade-merge-3 yields a better (closer) upgraded approximation of $W$ than does upgrade-merge-2.

Bibliography20

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] E. Arıkan, “Channel polarization: a method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Transactions on Information Theory , vol. 55, no. 7, pp. 3051–3073, July 2009.
2[2] ——, “Source polarization,” in 2010 IEEE International Symposium on Information Theory , June 2010, pp. 899–903.
3[3] S. B. Korada, “Polar codes for channel and source coding,” Ph.D. dissertation, Ecole Polytechnique Fédèrale de Lausanne, 2009.
4[4] E. Şaşoğlu, “Polarization and polar codes,” Foundations and Trends® in Communications and Information Theory , vol. 8, no. 4, pp. 259–381, 2011.
5[5] J. Honda and H. Yamamoto, “Polar coding without alphabet extension for asymmetric models,” IEEE Transactions on Information Theory , vol. 59, no. 12, pp. 7829–7838, December 2013.
6[6] E. Şaşoğlu, “Polarization in the presence of memory,” in 2011 IEEE International Symposium on Information Theory Proceedings , July 2011, pp. 189–193.
7[7] E. Şaşoğlu and I. Tal, “Polar coding for processes with memory,” in 2016 IEEE International Symposium on Information Theory (ISIT) . IEEE, 2016, pp. 225–229.
8[8] B. Shuval and I. Tal, “Fast polarization for processes with memory,” 2017. [Online]. Available: ar Xiv:1710.02849

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A Lower Bound on the Probability of Error of Polar Codes over BMS Channels

Abstract

Index Terms:

I Introduction

II Overview of Our Method

II-A Notation

III Decoding of Two Dependent Channels

III-A General Case

Example 1**.**

III-B Polar Coding Setting

III-B1 Joint Distribution of Two Synthetic Channels

III-B2 Decoders for Joint Synthetic Channels

Theorem 1**.**

Corollary 2**.**

Proof:

Lemma 3**.**

Proof:

Lemma 4**.**

Proof:

Example 2**.**

Remark 1*.*

III-B3 Proper Degrading Channels

Definition 1** (Proper degrading channels).**

Lemma 5**.**

Lemma 6**.**

Proof:

Lemma 7**.**

Proof:

IV Properties of Joint Synthetic Channels

IV-A Representation of Joint Synthetic Channel Distribution using DDD-values

Definition 2** (DDD-value representation).**

Lemma 8**.**

Proof:

Remark 2*.*

Remark 3*.*

IV-B Polarization for Joint Synthetic Channels

Lemma 9**.**

Proof:

Lemma 10**.**

Proof:

IV-C Double Symmetry for Joint Channels

Definition 3** (Double symmetry).**

Example 3**.**

Proposition 11**.**

Lemma 12**.**

Proof:

Lemma 13**.**

Proof:

Corollary 14**.**

V Symmetrized Joint Synthetic Channels

V-A Symmetrized Joint Channel

Lemma 15**.**

Proof:

Definition 4** (Symmetrized distribution).**

Proposition 16**.**

Proof:

Corollary 17**.**

Proof:

V-B Decomposition of Symmetrized Joint Channels

Lemma 18**.**

Proof:

Definition 5** (Decoupling decomposition).**

Example 4**.**

VI Upgrading Procedures for Joint Synthetic Channels

VI-A Upgrading Channel WaW_{a}Wa​

Theorem 19**.**

Proof:

Remark 4*.*

Remark 5*.*

Lemma 20**.**

Proof:

Lemma 21**.**

Proof:

Example 1.

Theorem 1.

Corollary 2.

Lemma 3.

Lemma 4.

Example 2.

*Remark 1**.*

Definition 1 (Proper degrading channels).

Lemma 5.

Lemma 6.

Lemma 7.

IV-A Representation of Joint Synthetic Channel Distribution using $D$ -values

Definition 2 ( $D$ -value representation).

Lemma 8.

*Remark 2**.*

*Remark 3**.*

Lemma 9.

Lemma 10.

Definition 3 (Double symmetry).

Example 3.

Proposition 11.

Lemma 12.

Lemma 13.

Corollary 14.

Lemma 15.

Definition 4 (Symmetrized distribution).

Proposition 16.

Corollary 17.

Lemma 18.

Definition 5 (Decoupling decomposition).

Example 4.

VI-A Upgrading Channel $W_{a}$

Theorem 19.

*Remark 4**.*

*Remark 5**.*

Lemma 20.

Lemma 21.

Corollary 22.

Lemma 23.

Definition 6 (Canonical channel).

Corollary 24.

Corollary 25.

Definition 7 (Class).

Theorem 26.

*Remark 6**.*

Corollary 27.

VI-B Upgrading Channel $W_{b}$

Theorem 28.

Corollary 29.

Example 5 (Upgrading $W_{b}$ Based on Upgrade-Merge-2).

Example 6 (Upgrading $W_{b}$ Based on Upgrade-Merge-3).

*Remark 7**.*

*Remark 8**.*

*Remark 9**.*

Lemma 30.

*Remark 10**.*

Proposition 31.

Lemma 32.

Appendix B Introduction to $D$ -values