A Lower Bound on the Probability of Error of Polar Codes over BMS Channels
Boaz Shuval, Ido Tal

TL;DR
This paper introduces a novel method to derive tighter lower bounds on the probability of decoding error for polar codes over BMS channels by analyzing pairs of bits and reducing alphabet sizes.
Contribution
It proposes a new approach to lower-bound error probabilities of bit pairs in polar codes, improving upon existing bounds under successive-cancellation decoding.
Findings
Lower bounds on error probabilities are significantly improved.
The method effectively reduces alphabet sizes for analysis.
Results demonstrate tighter bounds compared to previous methods.
Abstract
Polar codes are a family of capacity-achieving codes that have explicit and low-complexity construction, encoding, and decoding algorithms. Decoding of polar codes is based on the successive-cancellation decoder, which decodes in a bit- wise manner. A decoding error occurs when at least one bit is erroneously decoded. The various codeword bits are correlated, yet performance analysis of polar codes ignores this dependence: the upper bound is based on the union bound, and the lower bound is based on the worst-performing bit. Improvement of the lower bound is afforded by considering error probabilities of two bits simultaneously. These are difficult to compute explicitly due to the large alphabet size inherent to polar codes. In this research we propose a method to lower-bound the error probabilities of bit pairs. We develop several transformations on pairs of synthetic channels that make…
| optimal | IML | IMJP | |
|---|---|---|---|
| () | |||||||
|---|---|---|---|---|---|---|---|
| () | |||||||
|---|---|---|---|---|---|---|---|
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsError Correcting Code Techniques · Advanced Wireless Communication Techniques · DNA and Biological Computing
A Lower Bound on the Probability of Error of Polar Codes over BMS Channels
Boaz Shuval, Ido Tal
Department of Electrical Engineering,
Technion, Haifa 32000, Israel.
Email: {bshuval@campus, idotal@ee}.technion.ac.il An abbreviated version of this article, with the proofs omitted, has appeared in ISIT 2017.
Abstract
Polar codes are a family of capacity-achieving codes that have explicit and low-complexity construction, encoding, and decoding algorithms. Decoding of polar codes is based on the successive-cancellation decoder, which decodes in a bit-wise manner. A decoding error occurs when at least one bit is erroneously decoded. The various codeword bits are correlated, yet performance analysis of polar codes ignores this dependence: the upper bound is based on the union bound, and the lower bound is based on the worst-performing bit. Improvement of the lower bound is afforded by considering error probabilities of two bits simultaneously. These are difficult to compute explicitly due to the large alphabet size inherent to polar codes. In this research we propose a method to lower-bound the error probabilities of bit pairs. We develop several transformations on pairs of synthetic channels that make the resultant synthetic channels amenable to alphabet reduction. Our method yields lower bounds that significantly improve upon currently known lower bounds for polar codes under successive-cancellation decoding.
Index Terms:
Channel polarization, channel upgrading, lower bounds, polar codes, probability of error.
I Introduction
Polar codes [1] are a family of codes that achieve capacity on binary, memoryless, symmetric (BMS) channels and have low-complexity construction, encoding, and decoding algorithms. This is the setting we consider. Polar codes have since been extended to a variety of settings including source-coding [2, 3], non-binary channels [4], asymmetric channels [5], settings with memory [6, 7, 8], and more. The probability of error of polar codes is given by a union of correlated error events. The union bound, which ignores this correlation, is used to upper-bound the error probability. In this work, we exploit the correlation between error events to develop a general method for lower-bounding the probability of error of polar codes.
Figure 1 shows a numerical example of the lower bound developed in this paper. We designed a polar code of length and rate for a Binary Symmetric Channel (BSC) with crossover probability . We plot upper and lower bounds on the probability of error of this code under successive cancellation decoding, when used over BSCs of varying crossover probabilities. Our lower bound significantly improves upon the existing (trivial) lower bound, and is tight over a large range of crossover probabilities.
Our method is based on lower-bounding the probability of correlated error events. It consists of several operations and transformations that we detail throughout this article. A high-level description of the key steps appears at the end of the introduction, once we establish some notation.
Polar codes are based on an iterative construction that transforms identical and independent channel uses into low-entropy and high-entropy channels. The low-entropy channels are almost noiseless, whereas the high-entropy channels are almost pure noise. Arıkan showed [1] that for every , as the proportion of channels with capacity greater than tends to the channel capacity and the proportion of channels with capacity less than tends to .
The polar construction begins with two identical and independent copies of a BMS channel and transforms them into two new channels,
[TABLE]
Channel is a better channel than whereas channel is worse than .111By this we mean that channel can be stochastically degraded to channel , which in turn can be stochastically degraded to . This construction can be repeated multiple times; each time we take two identical copies of a channel, say and , and polarize them, e.g., to and . We call the operation a ‘’-transform, and the operation a ‘’-transform.
There are possible combinations of ‘’- and ‘’-transforms; we define channel as follows. Let be the binary expansion of , where is the most significant bit (MSB). Then, channel is obtained by transforms of according to the sequence , starting with the MSB: if we do a ‘’-transform and if we do a ‘’-transform. For example, if , channel is , i.e., it first undergoes a ‘’-transform and then two ‘’-transforms.
Overall, we obtain channels ; channel has input and output . That is, channel has binary input , output that consists of the output and input of channel , and assumes that the input bits of future channels are uniform. We call these synthetic channels. One then determines which synthetic channels are low-entropy and which are high-entropy, and transmits information over the low-entropy synthetic channels and predetermined values over the high-entropy synthetic channels. Since the values transmitted over the latter are predetermined, we call the high-entropy synthetic channels frozen.
Decoding is accomplished via the successive-cancellation (SC) decoder. It decodes the synthetic channels in succession, using previous bit decisions as part of the output. The bit decision for a synthetic channel is either based on its likelihood or, if it is frozen, on its predetermined value. That is, denoting the set of non-frozen synthetic channels by ,
[TABLE]
where we denoted and similarly for the previous bit decisions . As non-frozen synthetic channels are almost noiseless, previous bit decisions are assumed to be correct. Thus, when is sufficiently large, this scheme can be shown to achieve capacity [1], as the proportion of almost noiseless channels is .
To analyze the performance of polar codes, let denote the event that channel errs under SC decoding while channels do not. That is,
[TABLE]
The probability of error of polar codes under SC decoding is given by . Let denote the event that channel errs given that a genie had revealed to it the true previous bits, i.e.
[TABLE]
We call an SC decoder with access to genie-provided previous bits a genie-aided decoder. Some thought reveals that (see [4, Proposition 2.1] or [10, Lemma 1]). Thus, the probability of error of polar codes under SC decoding is equivalently given by . In the sequel we assume a genie-aided decoder.
The events are disjoint but difficult to analyze. The events are easier to analyze, but are no longer disjoint. A straightforward upper bound for is the union bound:
[TABLE]
This bound facilitated the analysis of [1]. An important question is how tight this upper bound is. To this end, one approach is to develop a lower bound to , which is what we pursue in this work.
A trivial lower bound on a union is
[TABLE]
Better lower bounds may be obtained by considering pairs of error events:
[TABLE]
Via the inclusion-exclusion principle, one can combine lower bounds on multiple pairs of error events to obtain a better lower bound [11]
[TABLE]
This can also be cast in terms of unions of error events using .
To our knowledge, to date there have been two attempts to compute a lower bound on the performance of the SC decoder, both based on (4). The first attempt was in [10], using a density evolution approach, and the second attempt in [12] applies only to the Binary Erasure Channel (BEC). We briefly introduce these below, but first we explain where the difficulty lies.
The probability is given by an appropriate functional of the probability distribution of synthetic channel . However, the output alphabet of is very large. If the output alphabet of is then the output alphabet of has size . This quickly grows unwieldy, recalling that . It is infeasible to store this probability distribution and it must be approximated. Such approximations are the subject of [9]; they enable one to compute upper and lower bounds on various functionals of the synthetic channel .
To compute probabilities of unions of events, one must know the joint distribution of two synthetic channels. The size of the joint channel’s output alphabet is the product of each synthetic channel’s alphabet size, rendering the joint distribution infeasible to store.
The authors of [10] suggested to approximate the joint distribution of pairs of synthetic channels using a density evolution approach. This provides an iterative method to compute the joint channel, but does not address the problem of the amount of memory required to store it. Practical implementation of density evolution must involve quantization [13, Appendix B]. The probability of error derived from quantized joint channels approximates, but does not generally bound, the real probability of error. For the special case of the BEC, as noted and analyzed in [10], no quantization is needed, as the polar transform of a BEC is a BEC. Thus, they were able to precisely compute the probabilities of unions of error events of descendants of a BEC using density evolution.
The same bounds for the BEC were developed in [12] using a different approach, again relying on the property that the polar transform of a BEC is a BEC. The authors were able to track the joint probability of erasure during the polarization process. Furthermore, they were able to show that the union bound is asymptotically tight for the BEC.
In this work, we develop an algorithm to compute lower bounds on the joint probability of error of two synthetic channels . Our technique is general, and applies to synthetic channels that are polar descendants of any BMS channel. We use these bounds in (4) to lower-bound the probability of error of polar codes. For the special case of the BEC, we recover the results of [10] and [12] using our bounds.
Concretely, consider two synthetic channels, and , which we call the a-channel and the b-channel, respectively. Their joint channel is . Our algorithm lower-bound the probability that a successive cancellation decoder errs on either channel. It is based on the following key steps:
Replace successive cancellation with a different decoding criterion (Section III). 2. 2.
Bring the joint channel to a different form that makes the b-channel decoding immediately apparent from the received symbol (Section IV-A). 3. 3.
Apply the symmetrizing transform, after which the output of the a-channel is independent of the input of the b-channel (Section V). 4. 4.
Apply the upgrade-couple transform, which splits each a-channel output to multiple symbols. However, each such new symbol is constrained to appear with only a small subset of b-channel outputs (Section VI-A). 5. 5.
Reduce each channel’s alphabet size. This is done by stochastically upgrading one channel while keeping the other channel constant. Each channel has a different upgrading procedure; the a-channel upgrading procedure is detailed in Section VI-A, and the b-channel upgrading procedure is detailed in Section VI-B.
II Overview of Our Method
In this section we provide a brief overview of our method, and lay out the groundwork for the sections that follow. We aim to produce a lower bound on the probability of error of two synthetic channels. Since we cannot know the precise joint distribution, we must approximate it. The approximation is rooted in stochastic degradation.
Degradation is a partial ordering of channels. Let and be two channels. We say that is (stochastically) degraded with respect to , denoted , when there exists some channel such that
[TABLE]
If is degraded with respect to then is upgraded with respect to . Degradation implies an ordering on the probability of error of the channels [13, Chapter 4]: if then , where denotes the probability of error of the optimal decoder (defined in Section III-A).
The notion of degradation readily applies to joint channels. If and are two joint channels, we say that via some degrading channel if
[TABLE]
As for the single channel case, if then , where is the probability of error of the optimal decoder for the joint channel. Indeed our approach will be to approximate the joint synthetic channel with an upgraded joint channel with smaller output alphabet. There is a snag, however: this ordering of error probabilities does not hold, in general, for suboptimal decoders.
The SC decoder, used for polar codes, is suboptimal. In the genie-aided case, which we consider here, it is equivalent to performing a maximum likelihood decision on each marginal separately. We shall demonstrate the suboptimality of the SC decoder in Section III. Then, we will develop a different decoding criterion whose performance lower-bounds the SC decoder performance and is ordered by degradation. While in general finding this decoder requires an exhaustive search, for the special case of polar codes this decoder is easily found. It does, however, imply a special structure for the degrading channel, which we use to our advantage.
We investigate the joint distribution of two synthetic channels in LABEL:sec_properties_of_joint_bitchannels. We first bring it to a more convenient form that will be used in the sequel. Then, we explain how to polarize a joint synthetic channel distribution and explore some consequences of symmetry. Further consequences of symmetry are the subject of LABEL:sec_Symmetrized_JointBit-Channels, in which we transform the channel to another form that greatly simplifies the steps that follow. This form exposes the inherent structure of the joint channel.
How to actually upgrade joint channels is the subject of LABEL:sec_upgrading_procedures_for_jointbit_channels. We upgrade the joint channel in two ways; each upgrades one marginal without changing the other. We cannot simply upgrade the marginals, as we must consider the joint channel as a whole. This is where the above-mentioned symmetrizing and upgrade-couple transforms come into play.
We present our algorithm for lower-bounding the probability of error of polar codes in Section VII. This algorithm is based on the building blocks presented in the previous sections. Details of our implementation appears in Section VIII. We demonstrate our algorithm with some numerical results in LABEL:sec_numericalresults, and conclude with a short discussion in Section X.
II-A Notation
We denote by for . We use an Iverson-style notation (see [14]) for indicator (characteristic) functions. That is, for a logical expression , is [math] whenever is not true and is otherwise. We assume that the indicator function takes precedence whenever it appears, e.g., is [math] for .
III Decoding of Two Dependent Channels
In this section, we tackle decoding of two dependent channels. We explain how this differs from the case of decoding a single channel, and dispel some misconceptions that may arise. We then specialize the discussion to polar codes. We explain the difficulty with combining the SC decoder with degradation procedures, and develop a different decoding criterion instead. Finally, we develop a special structure for the degrading channel that, combined with the decoding criterion, implies ordering of probability of error by degradation.
III-A General Case
A decoder for channel is a mapping that maps every output symbol to some . The average probability of error of the decoder for equiprobable inputs is given by
[TABLE]
The decoder is deterministic for symbols for which assumes only the values [math] and . For some symbols, however, we allow the decoder to make a random decision. If for some , then is the same whether or . Thus, the probability of error is insensitive to the resolution of ties. We denote the error event of a decoder by It is dependent on the decoder, i.e., ; we suppress this to avoid cumbersome notation. Clearly, .
The maximum-likelihood (ML) decoder, well known to minimize when the input bits are equiprobable, is defined by
[TABLE]
The ML decoder is not unique, as it does not define how ties are resolved. In the absence of ties, the ML decoding rule is . We denote by the probability of error of the ML decoder.
We now consider two dependent binary-input channels, and , with joint distribution . A decoder is a mapping . The joint probability of error of the decoder is, as above,
[TABLE]
An optimal decoder for the joint channel considers both outputs together and makes a decision for both inputs jointly, to minimize . We denote its probability of error by . When the input bits are equiprobable, .
Rather than jointly decoding the input bits based on the joint output, we may opt to decode each marginal channel separately. That is, consider decoders of the form . In words, the decoder of channel bases its decision solely on and completely ignores and vice versa. What are the optimal decoders and ? The answer depends on the criterion of optimality.
Denote by the error event of channel under some decoder . The *Individual Maximum Likelihood *(IML) decoder minimizes each individual marginal channel’s probability of error. That is, we set and as ML decoders for their respective marginal channels. We denote its joint probability of error by . Hence, is computed by (8), with , where and are ML decoders for the marginal channels and , respectively.
Another criterion is to minimize , the probability that at least one of the decoders makes an error. We call the decoder that minimizes this probability using individual decoders for each channel the Individual Minimum Joint Probability of error (IMJP) decoder. The event is not the same as the error event of the optimal decoder for the joint channel, even when the individual decoders turn out to be ML decoders. This is because we decode each input bit separately using only a portion of the joint output. Clearly,
[TABLE]
We denote
[TABLE]
The three decoders in (9) successively use less information for their decisions. The optimal decoder uses both outputs jointly as well as knowledge of the joint probability distribution; the IMJP decoder retains the knowledge of the joint probability distribution, but uses each output separately; finally, the IML decoder dispenses with the joint probability distribution and operates as if the marginals are independent channels.
Example 1**.**
The conditional distribution of some joint channel is given in Table I.222This is not a joint distribution of two synthetic channels that result from polarization. However, the phenomena observed here hold for joint distributions of two synthetic channels as well, and similar examples may be constructed for the polar case. The marginals are channels and . Three decoders for this channel are shown in Table II. Note that for the IML and IMJP decoders we have .
The optimal decoder for the joint channel chooses, for each output pair, the input pair with the highest probability. The IML decoder is formed by using an ML decoder for each marginal; the ML decoders of the marginals decide that the input is [math] when is received and vice versa. The IMJP decoder is found by checking all combinations of marginal channel decoders and and choosing that pair the achieves . We then have
[TABLE]
As expected, (9) holds.
We now demonstrate that the probability of error of suboptimal decoders is not ordered by degradation. To this end, we degrade the joint channel in Table I by merging the output symbols into a new symbol, and into a new symbol, . We denote the new joint channel by and provide its conditional distribution in Table III. For each of the marginals, the ML decoder declares [math] upon receipt of , and otherwise. Hence, for the degraded channel, , which is lower than . For the degraded channel, the IML decoder is also the optimal decoder. As this is a degraded channel, however, .
III-B Polar Coding Setting
Given a joint channel, finding an optimal or IML decoder is an easy task. In both cases we use maximum-likelihood decoders; in the first case based on the joint channel, whereas in the second case based on the marginal channels. On the other hand, finding an IMJP decoder requires an exhaustive search, which may be costly. In the polar coding setting, as we now show, the special structure of joint synthetic channels permits finding the IMJP decoder without resorting to a search procedure.
III-B1 Joint Distribution of Two Synthetic Channels
Let be some BMS channel that undergoes polarization steps. Let and be two indices of synthetic channels, where . The synthetic channels are and , where , , and . We call them the a-channel and the b-channel, respectively. Their joint distribution is ; this is the probability that the output of the a-channel is and the output of the b-channel is , given that the inputs to the channels are and , respectively.
With probability , the prefix of is . Namely, has the form
[TABLE]
where denotes the remainder of after removing and . Thus,
[TABLE]
for some arbitrary . The factor stems from the uniform distribution of . With some abuse of notation, we will write
[TABLE]
The rightmost expression makes it clear that the portion of in which the input of the a-channel appears must equal the actual input of the a-channel.
Observe from (10) that we can think of as the joint channel up to a constant factor. Indeed, we will use to denote the joint channel where convenient.
III-B2 Decoders for Joint Synthetic Channels
Which decoders can we consider for joint synthetic channels? The optimal decoder extracts from the output of the b-channel and proceeds to decode . This outperforms the SC decoder but is also impractical and does not lend itself to computing the probability that is of interest to us, the probability that either of the synthetic channels errs. A natural suggestion is to mimic the SC decoder, i.e., to use an IML decoder. The joint probability of error of this decoder may decrease after stochastic degradation, so we discard this option.
Consider two decoders and for channels and , respectively. As above, is the error event of channel using decoder , . We seek a lower bound on . Therefore, we choose decoders and that minimize ; this is none other than the IMJP decoder. Its performance lower-bounds that of the IML decoder [see (9)]. As we shall later see, combined with a suitable degrading channel structure, the probability of error of the IMJP decoder increases after stochastic degradation. Conversely, it decreases under stochastic upgradation; thus, combining the IMJP decoder with a suitable upgrading procedure produces the desired lower bound.
Multiple decoders may achieve . One decoder can be found in a straight-forward manner; we call it the IMJP decoder. The following theorem shows how to find it. Its proof is a direct consequence of Lemmas 3 and 4 that follow.
Theorem 1**.**
Let and be two channels with joint distribution that satisfies (10). Then, is achieved by setting as an ML decoder for and according to
[TABLE]
where
[TABLE]
Note that is not a conditional distribution; it is non-negative, but its sum over does not necessarily equal . In the right-hand side of (12), the dependence on is via (10), as if for some .
Corollary 2**.**
Theorem 1* holds for any two synthetic channels and that result from the same number of polarization steps of a BMS, where index is greater than .*
Proof:
In the polar code case, the joint channel satisfies (10), so Theorem 1 applies. ∎
In what follows, denote
[TABLE]
Lemma 3**.**
Let and be two dependent binary-input channels with equiprobable inputs and joint distribution that satisfies (10). Let be some decoder for channel with error event . Then, setting as an ML decoder for achieves .
Proof:
Recall that . Using (10),
[TABLE]
where
[TABLE]
The problem of finding the decoder that minimizes is separable over ; the terms , are non-negative and independent of . Therefore, the optimal decoder is given by ∎ We remark that Lemma 3 holds for any a-channel decoder . Thus, regardless of the selection of , the optimal decoder for the b-channel (in the sense of minimizing ) is an ML decoder.
Lemma 4**.**
Let and be two binary-input channels with joint distribution and equiprobable inputs. Let be some decoder for channel . Then, the decoder for channel given by (11) minimizes .
Proof:
Since the input is equiprobable,
[TABLE]
where the last equality is by (12). The problem of finding the decoder that minimizes is separable over ; clearly the optimal decoder is the one that sets ∎
Using (10), if is chosen as an ML decoder, as per Lemma 3, we have the following expression for :
[TABLE]
The IMJP and IML decoders do not coincide in general, although in some cases they may indeed coincide. We demonstrate this in the following example.
Example 2**.**
Let be a binary symmetric channel with crossover probability . We perform polarization steps and consider the joint channel , i.e., and . When , we have . On the other hand, when , the IMJP and IML decoders coincide, and . In either case, (9) holds.
Remark 1*.*
In the special case where is a BEC and and are two of its polar descendants, the IMJP and IML (SC) decoders coincide. This is thanks to a special property of the BEC that erasures for a synthetic channel are determined by the outputs of the copies of a BEC, regardless of the inputs of previous synthetic channels. We show this in Appendix A.
III-B3 Proper Degrading Channels
The IMJP decoder is attractive for joint polar synthetic channels since, by Theorem 1, we can efficiently compute it. This was made possible by the successive form of the joint channel (10). Thus, we seek degrading channels that maintain this form.
Let be a joint distribution of two synthetic channels and let . The marginal channels of are and . The most general degrading channel is of the form
[TABLE]
where and are probability distributions. This form does not preserve the successive structure of joint synthetic channels (10). Even if satisfies (10), the resulting may not. To this end, we turn to a subset of degrading channels. Recalling that , we consider degrading channels of the form
[TABLE]
That is, these degrading channels degrade , the output of , to , pass unchanged, and degrade , the remainder of ’s output, to . For this to be a valid channel, and must be probability distributions. This degrading channel structure is illustrated in Figure 2. By construction, degrading channels of the form (14) preserve the form (10) that is required for efficiently computing the IMJP decoder as in Theorem 1.
Definition 1** (Proper degrading channels).**
A degrading channel of the form (14) is called proper. We write to denote that channel is upgraded from with a proper degrading channel. We say that an upgrading (degrading) procedure is proper if its degrading channel is proper.
By marginalizing the joint channel it is straight-forward to deduce the following for joint synthetic channel distributions.
Lemma 5**.**
If , then and .
This lemma is encouraging, but insufficient for our purposes. It is easy to take degrading channels that are used for degrading a single (not joint) synthetic channel and cast them into a proper degrading channel for joint channels. This, however, is not our goal. Instead, we start with and seek an upgraded with smaller output alphabet that can be degraded to using a proper degrading channel. This is a very different problem than the degrading one, and its solution is not immediately apparent. Plain-vanilla attempts to use upgrading procedures for single channels fail to produce the desired results. Later, we develop proper upgrading procedures that upgrade one of the marginals without changing the other.
We now show that the probability of error of the IMJP decoder does not decrease after degradation by proper degrading channels. Intuitively, this is because the decoder for the original channel can simulate the degrading channel. We denote by the error event of channel under some decoder , and similarly define , , and . Further, we denote by decoders for and by decoders for , .
Lemma 6**.**
Let joint channel have marginals and . Assume that , then .
Proof:
The proof follows by noting that for any decoder , we can find a decoder with identical performance. First consider the decoder for channel . Denote by the result of drawing with probability . Then, the decoder for , defined as , has performance identical to for . The decoder results from first degrading the a-channel output and only then decoding. Next, consider the decoder for the b-channel. Denote by the result of drawing with probability . Then, similar to the a-channel case, the decoder for , defined as , has performance identical to for . Hence, the best decoder pair cannot do worse than the best decoder pair . ∎
Let be a BMS channel that undergoes polarization steps. The probability of error of a polar code with non-frozen set under SC decoding is given by where is the error probability of synthetic channel under ML decoding. Obviously, for any ,
[TABLE]
We have already mentioned the simplest such lower bound, . We now show that the IMJP decoder provides a tighter lower bound. To this end, recall that where is the probability of error of channel under decoder , .
Lemma 7**.**
Let be a BMS channel that undergoes polarization steps, and let be the non-frozen set. Then,
[TABLE]
Proof:
Using (15), . By definition, the IMJP decoder seeks decoders and that minimize the joint probability of error of synthetic channels with indices and . Therefore, for any two indices and we have In particular, this holds for the indices that maximize the right-hand side. This establishes the leftmost inequality of (16).
To establish the rightmost inequality of (16), we first show that for any ,
[TABLE]
To see this, first recall that the IMJP decoder performs ML decoding on the b-channel, yielding . Next, we construct in which the b-channel is noiseless, by augmenting the portion of the output of with , i.e.,
[TABLE]
Channel can be degraded to using a proper degrading channel by omitting from the portion of the output and leaving unchanged. Thus, .
Finally, denote . By (17), for any we have and . Since we obtain the proof. ∎
Lemmas 6 and 7 are instrumental for our lower bound, which combines upgrading operations and the IMJP decoder.
IV Properties of Joint Synthetic Channels
In this section, we study the properties of joint synthetic channels. We begin by bringing the joint synthetic channel into an equivalent form where the b-channel’s ML decision is immediately apparent. We then explain how to jointly polarize synthetic channels. Finally, we describe some consequences of symmetry on joint channels and on the IMJP decoder.
IV-A Representation of Joint Synthetic Channel Distribution using -values
Two channels and with the same input alphabet but possibly different output alphabets are called equivalent if and . We denote this by . Channel equivalence can cast a channel in a more convenient form. For example, if is a BMS, one can transform it to an equivalent channel whose output is a sufficient statistic, such as a -value (see Appendix B), in which case the ML decoder’s decision is immediately apparent.
Let be a joint synthetic channel. Since the joint distribution is determined by the distribution of , we can transform to an equivalent channel in which the b-channel -value333By “b-channel -value” we mean the -value computed for channel . Instead of -values, other sufficient statistics of the b-channel could have been used. In fact, for practical implementation (see Section VIII), we recommend to use likelihood ratios, which offer a superior dynamic range. Our use of -values in the exposition was prompted by their bounded range: . This simplifies many of the expressions that follow. of symbol is immediately apparent.
Definition 2** (-value representation).**
Joint channel is in -value representation if the marginal satisfies
[TABLE]
We use the same notation for both the regular and the -value representations of the joint channel due to their equivalence. The discussion of the various representations of joint channels in Section III-B applies here as well. In particular, we will frequently use to denote the joint synthetic channel distribution.
The following lemma affords a more convenient description of the joint channel, in which, in line with the IMJP decoder, the b-channel’s ML decision is immediately apparent. Moreover, this description greatly simplifies the expressions that follow.
Lemma 8**.**
Channels and are equivalent and the degrading channels from one to the other are proper.
Proof:
To establish equivalence we show that each channel is degraded from the other using proper degrading channels. The only portion of interest in (14) is , as in either direction and are unchanged by the degrading channel. Denote by the set of all symbols such that the b-channel -value of is , for fixed . Then,
[TABLE]
where
[TABLE]
Clearly, the b-channel -value of is .
On the other hand, by (10) and since all symbols in share the same b-channel -value,
[TABLE]
where
[TABLE]
and . ∎
Remark 2*.*
In Section IV-B we will show how to jointly polarize a joint channel . Even if is given in -value representation, the jointly polarized version is not. However, this lemma enables us to convert the jointly polarized distribution to -value representation. This is possible because Lemma 8 holds for any representation of in which are the input and output, respectively, of the a-channel, is the input of the b-channel, and is the output of the b-channel. In particular, need not consist of inputs to channels .
Remark 3*.*
At this point the reader may wonder why we have stopped here and not converted the a-channel output to its -value. The reason is that this constitutes a degrading operation, which is the opposite of what we need. Two a-channel symbols with the same a-channel -value may have very different meanings for the IMJP decoder. Thus, we cannot combine them to a single symbol without incurring loss.
When the joint channel is in -value representation, proper degrading channels admit the form
[TABLE]
It is obvious that all properties obtained from degrading channels of the form (14) are retained for degrading channels of the form (20). By Lemma 8, we may assume that the degraded channel is also in -value representation.
IV-B Polarization for Joint Synthetic Channels
Let be some joint synthetic channel distribution in -value representation. Recall that and are indices of synthetic channels. For , we denote by and the indices of the synthetic channels that result from polar transforms of and according to and . That is,
[TABLE]
and a similar relationship holds for . The resulting joint channel is, thus, .
Even though is in -value representation, after a polarization transform this is no longer the case. Of course, one can always bring the polarized joint channel to an equivalent -value representation as in Lemma 8.
The polar construction is shown in Figure 3. Here, two independent copies of the joint channel (in -value representation) are combined. The inputs and outputs of the a-channel of each copy are denoted explicitly using thicker arrows with hollow tips ( ). For example, for the bottom copy of , the a-input is and the a-output is , whereas the b-input is and the b-output is .
The input and output of are given by
[TABLE]
The input and output of are given by
[TABLE]
Note that and are contained in . That is, , where
[TABLE]
Thus, the joint output of both channels is .
The distribution of the jointly polarized channel is given by
[TABLE]
where
[TABLE]
We have shown how to generate from . Another case of interest is generating from . Denote the output of by . The output of is . From (10), we need only compute to find . This is accomplished by (1).
If two channels are ordered by degradation, so are their polar transforms [3, Lemma 4.7]. That is, if then and . This is readily extended to joint channels. To this end, for BMS channel we denote the joint channel formed by its ‘’- and ‘’-transforms by .
Lemma 9**.**
Let BMS channel . Then .
Proof:
Using (5) and the definition of we have
[TABLE]
where is a proper degrading channel. ∎
Lemma 10**.**
If , then, for , .
Proof:
The proof follows similar lines to the proof of Lemma 9. Expand using (21) and expand again using the definition of joint degradation with a proper degrading channel. Using the one-to-one mappings between the outputs of the polarized channels and the inputs and outputs of non-polarized channels, the desired results are obtained. The details are mostly technical, and are omitted. ∎
The operational meaning of Lemma 10 is that to compute an upgraded approximation of we may start with , an upgraded approximation of , and polarize it. The result is an upgraded approximation of . This enables us to iteratively compute upgraded approximations of joint synthetic channels. Whenever the joint synthetic channel exceeds an allotted size, we upgrade it to a joint channel with a smaller alphabet size and continue from there. We make sure to use proper upgrading procedures; this preserves the special structure of the joint channel and enables us to compute a lower bound on the probability of error. In Section VI we derive such upgrading procedures.
Since a sequence of polarization and proper upgrading steps is equivalent to proper upgrading of the overall polarized joint channel, using Lemmas 6 and 7 we obtain that the IMJP decoding error of a joint channel that has undergone multiple polarization and proper upgrading steps lower-bounds the SC decoding error of the joint channel that has undergone only the same polarization steps (without upgrading steps).
IV-C Double Symmetry for Joint Channels
A binary input channel is called symmetric if for every output there exists a conjugate output such that . We now extend this to joint synthetic channels.
Definition 3** (Double symmetry).**
Joint channel exhibits double symmetry if for every , there exist , , such that
[TABLE]
We call the a-conjugate; the b-conjugate; and the ab-conjugate. We can also cast this definition using the regular (non--value) representation of joint channels in a straight-forward manner, which we omit here.
Example 3**.**
Let be a BMS channel and denote by the joint channel formed by its ‘’- and ‘’-transforms. What are the a-, b-, and ab-conjugates of the a-channel output ? Recall that the output of the a-channel consists of the outputs of two copies of . Denote , where and are two possible outputs of with conjugates , respectively. We then have
[TABLE]
By symmetry of we obtain , , and . Indeed,
[TABLE]
We leave it to the reader to show that (22) holds for the -value representation of the joint channel.
Pairs of polar synthetic channels exhibit double symmetry. One can see this directly from symmetry properties of polar synthetic channels, see [1, Proposition 13]. Alternatively, one can use induction to show directly that the polar construction preserves double symmetry; we omit the details. This implies the following Proposition.
Proposition 11**.**
Let be the joint distribution of two synthetic channels and that result from polarization steps of BMS channel . Then, exhibits double symmetry.
The following is a direct consequence of double symmetry.
Lemma 12**.**
Let be a joint channel in -value representation that exhibits double symmetry. Then
For the b-channel, and have the same b-channel -value . 2. 2.
For the a-channel, and have the same a-channel -value , and and have the same a-channel -value .
Proof:
The first item is obvious from (22). For the second item, note that
[TABLE]
where is by (22). In the same manner, and have the same a-channel -value, . ∎
Lemma 12 implies that an SC decoder does not distinguish between and when making its decision for the a-channel. We now show that a similar conclusion holds for the IMJP decoder.
Lemma 13**.**
Let be some output of . Then
[TABLE]
Proof:
Theorem 1 holds for joint channels given in -value representation, . This is easily seen by following the proof with minor changes. Under the -value representation, (13) becomes
[TABLE]
The remainder of the proof hinges on double symmetry and follows along similar lines to the proof of Lemma 12, with replaced with and accordingly the sum over replaced with a maximum operation over . ∎
Lemma 13 implies that the IMJP decoder does not distinguish between and .
Corollary 14**.**
Let be the IMJP decoder for the a-channel. Then
V Symmetrized Joint Synthetic Channels
In this section we introduce the symmetrizing transform. The resultant channel is degraded from the original joint channel yet has the same probability of error. Its main merit is to decouple the a-channel from the b-channel. This simpler structure is the key to upgrading the a-channel, as we shall see in Section VI.
V-A Symmetrized Joint Channel
The SC decoder observes marginal distributions and makes a decision based on the -value of each synthetic channel’s output. In particular, by Lemma 12, the SC decoder makes the same decision for the a-channel whether its output was or and the b-channel decision is based on without regard to . By Corollary 14, the IMJP decoder acts similarly. That is, the IMJP decoder makes the same decision for the a-channel whether its output is or , and the decision for the b-channel is based solely on .
We conclude that if the a-channel were told only whether its output was one of , it would make the same decision had it been told its output was, say, . This is true for either the SC or IMJP decoder. Consequently, either decoder’s probability of error is unaffected by obscuring the a-channel output in this manner.
This leads us to define a symmetrized version of the joint synthetic channel distribution, , as follows. Let444The order of elements in and does not matter. That is, is a set containing both and .
[TABLE]
and define
[TABLE]
Lemma 15**.**
Let be a joint synthetic channel distribution, and let be its symmetrized version. Then, the probability of error under SC (IMJP) decoding of either channel is identical.
Proof:
By Lemma 12 for the SC decoder or Corollary 14 for the IMJP decoder, if the decoder for the symmetrized channel makes an error for some symbol then the decoder for the non-symmetrized channel makes an error for both and , and vice-versa. Therefore, denoting by the error indicator of the decoder,
[TABLE]
where is by (24). ∎
The marginal synthetic channels and are given by
[TABLE]
Note that by double symmetry
[TABLE]
Definition 4** (Symmetrized distribution).**
A joint channel whose marginals satisfy (25) is called symmetrized.
The name ‘symmetrized’ stems from comparison of (25) and (22). We note that Theorem 1 holds for .
A symmetrized joint channel remains symmetrized upon polarization. That is, if is a symmetrized joint channel and , is the result of jointly polarizing it (without applying a further symmetrization operation), then the marginals and satisfy (25). This is easily seen from (21) and (25).
Clearly, is degraded with respect to , exactly the opposite of our main thrust. Nevertheless, as established in Lemma 15, both channels have the same probability of error under SC (IMJP) decoding. Moreover, if we upgrade the symmetrized version of the channel, its probability of error under IMJP decoding lower-bounds the probability of error of the non-symmetrized channel under either SC or IMJP decoding.
What is not immediately obvious, however, is what happens after polarization. That is, if we take a joint channel, symmetrize it, and then polarize it, how does its probability of error compare to the original joint channel that has just undergone polarization? Furthermore, what happens if the symmetrized version undergoes an upgrading transform?
In the following proposition, we provide an answer. To this end, a joint polarization step is a pair that denotes which transforms the a-channel and b-channel undergo. For example, the result of joint polarization step on joint channel is the joint channel . A sequence of such pairs is called a sequence of joint polarization steps. The joint polarization steps are applied in succession: the result of joint polarization of according to the sequence is the same as the result of joint polarization of according to the sequence .
Proposition 16**.**
Let be a joint distribution of two synthetic channels and let denote this joint distribution after a sequence of joint polarization steps. Then , where is the distribution of after the same sequence of polarization steps and any number of proper upgrading transforms along the way.
Proof:
Let be a joint channel with symmetrized version . For , denote by and the polarized versions of and , respectively. For the -channel, the decoder makes the same decision for either or . This is because the decision is based on the b-channel -value, which is unaffected by symmetrization [see (24)].
Next, for the channel, using on (21) a derivation similar to the proof of Lemma 13, , where is any combination of an element of and an element of . That is, is any one of , , , and . Thus, the IMJP decoder makes the same decision for the -channel for either or .
We compare the channels obtained by the following two procedures.
- •
Procedure 1: Joint channel goes through sequence of polarization steps.
- •
Procedure 2: Joint channel is symmetrized to form . It goes through sequence of polarization steps (without any further symmetrization operations).
We iteratively apply the above reasoning and conclude in a similar manner to Lemma 15 that both channels have the same performance under IMJP decoding. Next, we modify Procedure 2.
- •
Procedure 2a: Joint channel is symmetrized to form . It goes through sequence of polarization steps (without any further symmetrization operations), but at some point mid-sequence, it undergoes a proper upgrading procedure.
Since polarizing and proper upgrading is equivalent to proper upgrading and polarizing (see Lemma 10) we can assume that the upgrading happens after the entire sequence of polarization steps. Thus, under IMJP decoding, the probability of error of the channel that results from Procedure 2a lower-bounds the probability of error of the channels resulting from Procedures 1 and 2. Similarly, multiple upgrading transforms can also be thought of as occurring after all polarization steps. ∎
Corollary 17**.**
Let be a BMS channel that undergoes polarization steps. Let be the joint channel of two of its polar descendants such that , and let . Then .
Proof:
A direct consequence of Lemmas 7 and 6 combined with Proposition 16. ∎
We emphasize that, by Proposition 16, it does not matter how we arrive at . So long as and , we can use to obtain a lower bound on . A practical way to obtain is via multiple proper upgrading operations that we perform after joint polarization operations. This is the route we take in Section VII.
Due to Proposition 16, we henceforth assume that joint channel is symmetrized, and no longer distinguish symmetrized channels or symbols by the symbol. Replacing the joint channel with its symmetrized version need only be performed once, at the first instance the two channels go through different polarization transforms.
Implementation: Since symmetrization is performed only once, and since this invariably happens when converting a channel to , we find the a-, b-, and ab-conjugates using the results of Example 3. We then form the symmetrized channel using (24). Note that it is sufficient to find just the b-conjugates and use the first equation of (24).
V-B Decomposition of Symmetrized Joint Channels
Let the joint channel be , which, as mentioned above, we assume to be symmetrized. We have
[TABLE]
in which we used the independence and uniformity of the input bits and . The distribution is given by Whenever is nonzero, distribution is obtained by dividing by . Our notation (with a semicolon, as opposed to ) reminds us that for fixed , channel is a binary-input channel with input and output . If for some , we define to be some arbitrary BMS channel, to ensure it is always a valid channel.
Since the joint channel is symmetrized, by (25) we have . Hence, for any ,
[TABLE]
That is, a consequence of symmetrization is that given , output becomes independent of . This is not true in the general case where the joint channel is not symmetrized.
The decomposition of (26) essentially decouples the symmetrized joint channel to a product of two distributions.
Lemma 18**.**
Let be a symmetrized joint channel. It admits the decomposition
[TABLE]
For any , channel is a BMS channel with input and output , i.e.,
[TABLE]
Moreover, satisfies
[TABLE]
Proof:
Using (27) in (26) yields (28). The remainder of this lemma is readily obtained by using (25) in (28). ∎
Definition 5** (Decoupling decomposition).**
A decomposition of the form (28) for a symmetrized joint channel is called a decoupling decomposition. Channel is obtained by marginalization, i.e.,
[TABLE]
where the latter equality, which is due to symmetry, holds for any . Then, we compute channel using (28). The special case where requires special attention. Such a case invariably happens for perfect symbols — that is, symbols for which but for some . Specifically, we ensure that is a well-defined BMS channel even in this case, so we set it to an arbitrary BSC. Thus,
[TABLE]
When setting to an arbitrary BSC, we make sure not to add new b-channel -values. One possible choice is to set to a BSC whose output has the highest b-channel -value.
We use decoupling decompositions of symmetrized joint channels in the sequel. We shall see in Section VI-A that plays a central role in the a-channel upgrading procedure.
We conclude this section with an example that compares a joint channel and its symmetrized version. In particular, we demonstrate the decoupling decomposition for the symmetrized joint channel.
Example 4**.**
Let be a BSC with crossover probability and consider , the joint synthetic channel of the ‘’- and ‘’-transforms of . In -value representation, the a-channel has four possible outputs and there are three values of : . Table IV contains the probability table of this joint synthetic channel for and varying . When and , the b-channel input is more likely to be than [math]. Similarly, when and , the b-channel input is more likely to be [math] than . Thus, the channel in Table IV does not satisfy (28).
After symmetrization, the a-channel output is either or . The probability table for the symmetrized channel with is shown in Table V. Here, when and is received at the a-channel, or are equally likely. Indeed, is a BSC with crossover probability , and the channel in Table V satisfies (28).
VI Upgrading Procedures for Joint Synthetic Channels
In this section, we introduce proper upgrading procedures for joint synthetic channels. The overall goal is to reduce the alphabet size of the joint channel. The upgrading procedures we develop enable us to reduce the alphabet size of each of the marginals without changing the distribution of the other; there is a different procedure for each marginal. As an intermediate step, we further couple the marginals by increasing the alphabet size of one of them.
The joint channel is assumed to be symmetrized and in -value representation. The upgrading procedures will maintain this. As discussed in Section V, we do not distinguish symmetrized channels with any special symbol. The upgrading procedure of Section VI-A hinges on symmetrization. The upgrading procedure of Section VI-B does not require symmetrization and holds for non-symmetrized channels without change. However, we shall see that symmetrization simplifies the resulting expressions.
VI-A Upgrading Channel
We now introduce a theorem that enables us to deduce an upgrading procedure that upgrades and reduces its output alphabet size. Let symmetrized joint channel admit decoupling decomposition (28). Let be another symmetrized joint channel, where represents the -value of the b-channel output. It also admits a decoupling decomposition,
[TABLE]
Theorem 19**.**
Let and be symmetrized joint channels with decoupling decompositions (28) and (44), respectively. Then, if
* with degrading channel .* 2. 2.
* for all such that .*
Before going into the proof, some comments are in order. First, we do not claim that any that is upgraded from must satisfy this theorem. Second, the meaning of the second item is that, for fixed , BMS channel with binary input is upgraded from a set of BMS channels with the same binary input.
Proof:
Using decoupling decompositions (28) and (44) and the structure of a proper degrading channel (20), if and only if there exist and such that
[TABLE]
where
[TABLE]
We now find and from the conditions of the theorem.
The first condition of the theorem implies that there exists a channel such that
[TABLE]
The second condition of the theorem implies that for each there exists a channel such that
[TABLE]
We set
[TABLE]
[TABLE]
It is easily verified that (47) is satisfied by and this , completing the proof. ∎
Remark 4*.*
Recall from (31) that when , we set to an arbitrary BSC. At this point, the reader may wonder what effect — if any — does this have on the resulting joint channel. We now show that there is no effect. To see this, observe from (51) that if and , then necessarily . Hence, by (44), . This latter equality is the same regardless of how we had set .
How might one use Theorem 19 to upgrade the a-channel? A naive way would be to first upgrade the marginal to using some known method (e.g., the methods of [9], see Appendix C). This yields degrading channel by which one can find channel that satisfies (54). With and at hand, one forms the product (44) to obtain . If the reader were to attempt to do this, she would find out that it often changes the b-channel. Moreover, this change may be radical: the resulting b-channel may be so upgraded to become almost noiseless, which boils down to an uninteresting bound, the trivial lower bound (3). It is possible to upgrade the a-channel without changing the b-channel; this requires an additional transform we now introduce.
The upgrade-couple transform enables upgrading the a-channel without changing the b-channel. The idea is to split each a-channel symbol to several classes, according to the possible b-channel outputs. Symbols within a class have the same channel, so that confining upgrade-merges to operate within a class inherently satisfies the second condition of Theorem 19. Thus, we circumvent changes to the b-channel. This results in only a modest increase to the number of output symbols of the overall joint channel.
Let channel have possible -values, . We assume that erasure symbols are duplicated,555That is, there is a “positive” and a “negative” erasure, see [9, Lemma 4]. and . For each a-channel symbol we define upgrade-couple symbols , . The new symbols couple the outputs of the a- and b-channels (whence the name of the upgrade-couple transform). Namely, if the a-channel output is and , the b-channel output can only be ; if the a-channel output is and , the b-channel output can only be .
The upgrade-couple channel is defined by
[TABLE]
where
[TABLE]
and is derived from the decoupling decomposition of , see (31).
As intuition for the factor , observe that it ensures that for and that for . Crucially, it does not upgrade the marginal channels (see LABEL:cor_Wbstar_and_Wbhatstar_are_thesame and 25). In particular, as shown in LABEL:lem_properties_ofupgrade-couple, the factor ensures that symbols of channel and of channel share the same a-channel -value.
Remark 5*.*
For the original joint channel there may be a-channel symbols for which but . For the upgrade-couple channel , the symbol determines the possible values for the b-channel output when or when . The symbol never appears with positive probability if , yet, because it may appear with positive probability if , we still need to map it to some . The upgrade-couple transform is well defined even in this case, thanks to our definition of , see (31). In particular, if never occurs with positive probability with , say, then for the upgrade-couple channel also never occurs with positive probability with (see Lemma 23, item 2).
A parameter that is related to and will be useful in the sequel is
[TABLE]
For every , there must exist some such that . The following lemma makes this clear.
Lemma 20**.**
For any we have
[TABLE]
Proof:
Without loss of generality, we shall show (57) for and . Observe that for all . Thus, . Next, by (29),
[TABLE]
where the latter equality is because is a valid BMS channel.
To see (58), observe that
[TABLE]
Summing over and using (29) yields the result. ∎
As we now show, since is symmetrized, so is .
Lemma 21**.**
Let be a symmetrized joint channel. Then, , defined as in (55), is also symmetrized.
Proof:
To establish the lemma, we need to show that (25) holds for the upgrade-couple channel. For the a-channel , let symbols be conjugates, i.e., . Channel is symmetrized, so, by (30), . Furthermore, by definition, . Thus,
[TABLE]
Next, recall that , so that . Thus, (25) holds as required. ∎
In the proof of Lemma 21 we have seen that the conjugate symbol of is (with the order of and flipped). We summarize this in the following corollary.
Corollary 22**.**
If then .
Since is symmetrized, it admits decoupling decomposition
[TABLE]
Denote by a binary symmetric channel with crossover probability . In Lemma 23 we derive [see (61)] and establish that for every ,
[TABLE]
That is, when we have , when we have , and is zero for any other . We emphasize that we define using (60) even if .
Lemma 23**.**
Let be a symmetrized joint channel and let be defined as in (55), with decoupling decomposition (59). Then
Joint channel is upgraded from joint channel with a proper degrading channel that deterministically maps to . 2. 2.
We have
[TABLE]
Moreover, symbols of channel and of channel have the same a-channel -value for every such that . 3. 3.
For every , BMS channel with input and output is if and if .
Proof:
For the first item, we sum (55) over and obtain, using (57),
[TABLE]
That is, joint channel is upgraded from with degrading channel that deterministically maps to . This is a proper degrading channel.
For the second item, we marginalize over and . Using (28) in the right-hand-side of (55), we obtain (61), where is given in (56). Whenever , we have, by (55), . Thus,
[TABLE]
implying that and have the same a-channel -value for their respective channels.
For the final item, if , we are free to set as we please, so we set it as per the item. Otherwise, there are only two values of for which is nonzero. Hence, can output only two b-channel -values for fixed and . Thus, is a BMS channel with only two possible outputs, or, in other words, a BSC. A BSC that outputs -values , , has crossover probability . This establishes the item. ∎
Definition 6** (Canonical channel).**
The canonical channel of channel has a single entry for each -value. That is, denoting by the set of symbols whose -value is , we have It can be shown that a channel is equivalent to its canonical form, i.e., each form can be degraded from the other.
Corollary 24**.**
The canonical b-channels of and coincide.
Proof:
This is a direct consequence of the first item of Lemma 23:
[TABLE]
∎
Corollary 25**.**
The canonical a-channels of and coincide.
Proof:
This follows from the second item of Lemma 23, (58), and (61). ∎
Definition 7** (Class).**
The class is the set of symbols with fixed .
There are classes. The size of each class is the number of symbols . By (60), is the same BSC for all symbols of class and fixed . Thus, the second item of Theorem 19 becomes trivial and is immediately satisfied if we use an upgrading procedure that upgrade-merges several symbols of the same class .
To determine which upgrading procedures may be used, we turn to the degrading channel. So long as the degrading channel does not mix a symbol and its conjugate, the upgrading procedure can be confined to a single class. This is because conjugate symbols belong to different classes, as established in Corollary 22. Thus, of the upgrading procedures of [9] (see Appendix C) we can use either upgrade-merge-3 without restriction or upgrade-merge-2 provided that the two symbols to be merged have the same a-channel -value.
Theorem 26**.**
Let be some joint channel with marginals and upgrade-couple counterpart . Let obtained by an upgrade-merge-3 procedure. Then there exists joint channel with canonical marginals such that and .
Proof:
The idea is to confine the upgrading procedures to work within a class, utilizing Theorem 19 over each class separately.
Assume that the upgrading procedure from to replaces symbols with symbols . We obtain by using Theorem 19 for each class of separately. The a-channel upgrade procedure for class is upgrade-merge-3 from to that replaces symbols with symbols . As the upgrade is confined to symbols of the same class, the channel — given by (60) — is the same regardless of , as established in Lemma 23, item 3. Hence, the second item of Theorem 19 is automatically satisfied within a class , with
[TABLE]
for all . Channel is then obtained by the product of and as per (59):
[TABLE]
By properties of upgrade-merge-3 (see (77) in Appendix C-B) we have Therefore,
[TABLE]
where in we used the decoupling decomposition (63); and are by Lemma 23, item 3 and by (62); finally, is due to Corollary 24.
To see that the canonical a-channel marginals coincide, note that by Lemma 23, item 2, for any fixed , the symbols all have the same a-channel -value. Let be some a-channel -value, and let be the set of a-channel outputs whose a-channel -value is . Then,
[TABLE]
where is a direct consequence of the expressions for upgrade-merge-3 and our construction of upgrading each class separately. ∎
To use Theorem 26, one begins with a design parameter that controls the output alphabet size. Working one class at a time, one then applies upgrade operations in succession to reduce the class size to . The resulting channel, therefore, will have symbols overall. The canonical a-channel marginal that results from this operation will have at most symbols.
Remark 6*.*
The upgrade-merge-3 procedure replaces three conjugate symbol pairs with two conjugate symbol pairs. Recall from Corollary 22 that after the upgrade-couple transform, conjugate symbols belong to different classes. In particular, if and are a conjugate pair of the a-channel before the upgrade-couple transform, then and are a conjugate pair of the a-channel after the upgrade-couple transform. Therefore, when one uses Theorem 26 to replace the symbols
[TABLE]
one must also replace their conjugates
[TABLE]
We still always operate within a class as nowhere do we mix symbols from different classes. Alternatively, one may upgrade only classes with and then use channel symmetry to obtain the upgraded forms of classes .
There is one case where it is possible to use upgrade-merge-2, as stated in the following corollary.
Corollary 27**.**
Theorem 26* also holds if the a-channel upgrade procedure is upgrade-merge-2 applied to two symbols of the same a-channel -value.*
Proof:
While in general the upgrade-merge-2 procedure mixes a symbol and its conjugate, when the two symbols to be merged have the same a-channel -value this is no longer the case (see Appendix C-A), and we can follow along the lines of the proof of Theorem 26. We omit the details. ∎
The reason that [9] introduced both the upgrade-merge-2 and upgrade-merge-3 procedures despite the superiority of the latter stems from numerical issues. To implement upgrade-merge-3 we must divide by the difference of the extremal -values to be merged. If these are very close this can lead to numerical errors. Upgrade-merge-2 is not susceptible to such errors. On the other hand, upgrade-merge-2 cannot be used in the manner stated above; it requires us to mix symbols from two classes and that may have wildly different channels. Thus, this will undesirably upgrade the b-channel.
In practice, however, we may be confronted with a triplet of symbols with very close, but not identical, a-channel -values. To avoid numerical issues, we utilize a fourth nearby symbol. Say that our triplet666To simplify notation, we omit the dependence on the class; it is clear that we do this for each class separately. is with a-channel -values such that , for some “closeness” threshold . Let have a-channel -value such that . Then, we apply upgrade-merge-3 twice: first for obtaining with a-channel -values and then for , ending up with with a-channel -values . In this example we have chosen a fourth symbol with a greater a-channel -value than , but we could have similarly chosen a fourth symbol with a smaller a-channel -value than instead.
VI-B Upgrading Channel
We now show how to upgrade to channel such that and . The idea is to begin with , a channel equivalent to in which and are not explicit in the output. The channel is given by . We upgrade to using some known method, such that channel degrades to . To form upgraded channel , we “split” the outputs of to include and and find a degrading channel that degrades to . We shall see that the upgraded channel is given by
[TABLE]
where and are defined in (65), below. Finally, we form the joint channel using (10). We illustrate this in Figure 4.
Theorem 28**.**
Let be a joint channel where is the -value of the b-channel’s output. Let be a channel equivalent to , and let with degrading channel . Then there exists joint channel such that and .
Proof:
We shall explicitly find and an appropriate degrading channel. The degrading channel will be of the form , i.e., and pass through the degrading channel unchanged. Such degrading channels are proper. Since we have, for any and ,
[TABLE]
Denote
[TABLE]
We assume that , for otherwise output never appears with positive probability and may be ignored, and define
[TABLE]
For each , we will shortly define constants such that and . Similar to (66), we use these constants to define channel by
[TABLE]
Indeed, . We now find the constants and an appropriate degrading channel such that
[TABLE]
which will establish our goal.
Let , and be such that the left-hand side of (68) is positive777Since , there will always be at least one selection of for which the left-hand side of (68) is positive., so that . We shall see that the resulting expressions hold for the zero case as well. Using (66) and (67), we can rewrite (68) as
[TABLE]
Comparing this with (64), we set
[TABLE]
It is easily verified that and . Using the expression for in (69) yields
[TABLE]
This is a valid probability distribution. We remark that (68) is satisfied by (70) and (71) even when . We have found and a proper degrading channel as required. ∎
Corollary 29**.**
In Theorem 28, the marginal a-channels of and coincide.
Proof:
By construction, the degrading channel from to does not change the a-channel output, implying that the a-channel marginal remains the same. ∎
To use Theorem 28, one begins with design parameter that controls the output alphabet size. The channel , with output alphabet of size , is obtained from using a sequence of upgrade operations. To obtain upgraded joint channel , one uses the Theorem to turn them into a sequence of upgrade operations to be performed on channel . If one uses the techniques of [9], the upgrade operations will consist of upgrade-merge-2 and upgrade-merge-3 operations (see Appendix C). In the following examples we apply Theorem 28 specifically to these upgrades.
For brevity, we will use the following notation:
[TABLE]
Example 5** (Upgrading Based on Upgrade-Merge-2).**
The upgrade-merge-2 procedure of [9] selects two conjugate symbols pairs and replaces them with a single conjugate symbol pair. The details of the transformation, in our notation, appear in Appendix C-A.
Let joint channel have b-channel marginal , in which all symbols with the same -value are combined to a single symbol. We select symbols and their respective conjugates , such that and upgrade to given by (75) (Appendix C-A). We denote by the output alphabet of and by the set
[TABLE]
The output alphabet of is ; outputs of represent -values. In particular, the -values of and are and , respectively.
Using Theorem 28, we form channel by
[TABLE]
where by (70),
[TABLE]
We can simplify this when is a symmetrized channel. In this case, , yielding
[TABLE]
Therefore, the upgraded joint channel becomes
[TABLE]
where
[TABLE]
Example 6** (Upgrading Based on Upgrade-Merge-3).**
The upgrade-merge-3 procedure replaces three conjugate symbols pairs with two conjugate symbol pairs. The details of the transformation, in our notation, appear in Appendix C-B.
As above, let joint channel have b-channel marginal . For the upgrade procedure we select symbols and their respective conjugates, such that .888We could have also selected them such that . At least one of the inequalities or must be strict. We upgrade to given by (76) (Appendix C-B). We denote by the output alphabet of and by the set
[TABLE]
The output alphabet of is ; outputs of represent -values. In particular, the -values of and are and , respectively.
Assuming that is symmetrized, we form channel using Theorem 28 as
[TABLE]
where by (70),
[TABLE]
and , . The latter two equalities are due to our assumption that is symmetrized.
Denoting
[TABLE]
the upgraded joint channel is given by
[TABLE]
Remark 7*.*
We observe from these examples an interesting parallel between the a-channel and b-channel upgrading procedures. In the former case, we confine upgrade operations to a single class, in which the b-channel -values are fixed. In light of the above examples, the latter case may be viewed as confining upgrade procedures to “classes” in which and are fixed.
VII Lower Bound Procedure
The previous sections have introduced several ingredients for building an overall procedure for obtaining a lower bound on the probability of error of polar codes under SC decoding. We now combine these ingredients and present the overall procedure. First, we lower-bound the probability of error of two synthetic channels. Then, we show how to use lower bounds on channel pairs to obtain better lower bounds on the union of many error events.
VII-A Lower Bound on the Joint Probability of Error of Two Synthetic Channels
We now present an upgrading procedure for that results in channel with a smaller alphabet size. The procedure leverages the recursive nature of polar codes.
The input to our procedure is BMS channel , the number of polarization steps , the indices and of the a-channel and b-channel, respectively, and parameters and that control the output alphabet sizes of the a- and b-channels, respectively. The binary expansions of and are and , respectively. These expansions specify the order of polarization transforms to be performed, where [math] implies a ‘’-transform and implies a ‘’-transform.
The algorithm consists of a sequence of polarization and upgrading steps. After each polarization step, we bring the channel to -value representation, as described in Section IV-A. A side effect of polarization is increase in alphabet size. The upgrading steps prevents the alphabet size of the channels from growing beyond a predetermined size. After the final upgrading step we obtain joint channel , which is properly upgraded from . We compute , which serves as a lower bound to . We recall that is the probability of error under SC decoding of the joint synthetic channel . This, in turn, lower-bounds (see Corollary 17).
Algorithm 1 provides a high-level description of the procedure. We begin by determining the first index for which and differ (i.e. for and ). The first polarization steps are of a single channel, as the a-channel and b-channel indices are the same. Since these are single channels, we utilize the upgrading procedures of [9] to reduce the output alphabet size. At the th polarization step, the a- and b-channels differ. We perform joint polarization described in LABEL:sec_polarization_for_joint_bitchannels and symmetrize the joint channel using (24). This symmetrization need only be performed once as subsequent polarizations maintain symmetrization (Proposition 16). We then perform the b-channel upgrading procedure (Section VI-B), which reduces the b-channel alphabet size to . Following that, we upgrade the a-channel. As discussed in LABEL:subsec_Upgrading, this consists of two steps. First, we upgrade-couple the channel, to generate classes. Second, for each class separately, we use the a-channel upgrade procedure until each class has at most elements (see Theorem 26 and Corollary 27). We confine the a-channel upgrade procedure to the class by utilizing only upgrade-merge-3 operations. We continue to polarize and upgrade the joint channel in this manner, until . After the final polarization and upgrading operation, we compute the probability of error of the IMJP decoder for the resulting channel.
The lower bound of this procedure compares favorably with the trivial lower bound, . This is because our upgrading procedure only ever changes one marginal, keeping the other intact. Since it leverages upgrading transforms that can be used on single channels, the marginal channels obtained are the same as would be obtained on single channels using the same upgrading steps. Thus, by Lemma 7 this lower bound is at least as good as .
Remark 8*.*
When the BMS is a BEC, we can recover the bounds of [10] and [12] using our upgrading procedure. Only a-channel upgrades are required, as the b-channel, in -value representation, remains a BEC. For each a-channel symbol, the channel in (26) is either a perfect channel or a pure-noise channel (see Lemma 32 in Appendix A). Thus, the upgrade-couple procedure splits the a-channel symbols to those that see a perfect channel regardless of and those that see a pure-noise channel regardless of . Merging a-channel symbols of the same class is equivalent to merging a-channel symbols for which is the same type of channel. We thus merge a-channel symbols of the same a-channel -value that “see” the same type of b-channel. This corresponds to keeping track of the correlation between erasure events of the two channels.
Remark 9*.*
An initial step of Algorithm 1 is to upgrade the channel , even before any polarization operations. This step enables us to apply our algorithm on continuous-output channels, see [9, Section VI].
VII-B Lower Bound for More than Two Synthetic channels
Recall that the probability of error of polar codes under SC decoding may be expressed as . In the previous section, we developed a lower bound on , , which lower bounds . This lower bound may be strengthened by considering several pairs of synthetic channels and using (4). We now show how this can be done.
Lemma 30**.**
The probability of error of a union of events, is lower bounded by
[TABLE]
Proof:
The proof hinges on using the identity in (4). Note that any set of numbers satisfies
[TABLE]
so that
[TABLE]
Therefore,
[TABLE]
Using this in (4) yields the desired bound. ∎
In practice, we combine the lower bound of Lemma 30 with (15). That is, we compute lower bounds on for all pairs of channels in some subset of the non-frozen set, and use Lemma 30 over this subset.
Such bounds are highly dependent on the selection of the subset . One possible strategy is as follows. Let be the set of worst synthetic channels in the non-frozen set for some . For each channel pair in , compute a lower bound on the joint probability of error using Algorithm 1. Then, form all possible subsets of (there are such subsets) and use LABEL:lem_lower_bound_on_union_using_unions_oftwo_events for each subset. Choose the subset with the highest upper bound as . The reason for going over all possible subsets is that bounds based on the inclusion-exclusion principle are not guaranteed to be higher than the highest pairwise probability, see [15].
VIII Implementation
Our implementation of Algorithm 1, in C++, is available for download at [16]. In this section we provide some details on the implementation.
A naive implementation of Algorithm 1 is to perform all steps successively at each iteration. That is, first jointly polarize the joint channel, then bring the channel to -value representation, followed by the b-channel upgrade procedure and the upgrade-couple procedure, and finally perform the a-channel upgrade procedure. One quickly finds out, however, a limitation posed by this approach: the memory required to store the outcomes of these stages becomes prohibitively large when the alphabet-size control parameters and grow.
Observe, however, that the total required memory at the end of each iteration of Algorithm 1 is actually quite small. We need only store the values of for each value of (a total of combinations), a mapping between and its conjugate , and a list of size that stores the possible b-channel -values. Then, we can compute using (59), (60), and Corollary 22. Thus, our data structure for an upgrade-coupled joint channel utilizes a three-dimensional matrix of size to store (specifically, we use the cube data structure provided by [17]). As for the mapping between and its conjugate, if is stored in element (y,i,j) of the matrix, and y is even, then is stored in element (y+1,i,j). We store the absolute values of the b-channel -values in a vector of length .
The second key observation is that each upgrading procedure only ever changes one marginal. That is, the a-channel upgrading procedure leaves the marginal b-channel unchanged, and the b-channel upgrading procedure does not affect the marginal a-channel. Thus, since our upgrading procedure leverage upgrading procedures for single channels, we can pre-compute the upgraded marginal channels. In essence, given a target upgraded marginal channel — computed beforehand using the techniques of [9] — our upgrading procedures “split” the probability of a output symbol among two absorbing symbols. The “splitting” factors are functions of the -values of the three symbols (see appendix C). Indeed, we compute beforehand the polarized and upgraded marginal channels.
The joint polarization step maps each pair of symbols, and to up to four polarized counterparts (see Section IV-B). Knowing beforehand what the upgraded marginal channels should be, we can directly split each polarized symbol into the relevant absorbing symbols. We incorporate the upgrade-couple operation into this by utilizing the factor from (56).
Thus, in our implementation, rather than performing each step of an iteration in its entirety, we perform all steps in one fell swoop. This sidesteps the memory-intensive step of computing the upgrade-coupled jointly polarized channel. The interested reader is urged to look at our source code for further details.
Remark 10*.*
The description here was given in terms of -values, in line with the exposition in this paper. However, for numerical purposes we recommend — and use — likelihood ratios in practical implementation. Likelihood ratios have a greater dynamic range than that of -values, and therefore offer better numerical precision.999As an example, two very different likelihood ratios: and , cannot be differentiated in double precision upon conversion to -values. There is a one-to-one correspondence between -values and likelihood ratios (see appendix B), and all -value based formulas are easily translated to their likelihood ratio counterparts.
IX Numerical Results
Figures 1 and 5 present numerical results of our bound for two cases. In both cases, we designed a polar code for a specific BSC, and then assessed its performance when used over different BSCs. Specifically:
- •
Figure 1: A code of length , rate , designed for a BSC with crossover probability .
- •
Figure 5: A code of length , rate , designed for a BSC with crossover probability .
The codes were designed using the techniques of [9] with quantization levels. The non-frozen set consisted of the channels with smallest probability of error. This non-frozen set was fixed.
For each code, we plot three bounds on the probability of error, when used over specific BSCs: an upper bound on the probability of error, the trivial lower bound on the probability of error, and the new lower bound on the probability of error presented in this paper.
For the upper bound, we computed an upper bound on , and for the trivial lower bound we computed a lower bound on ; upper and lower bounds on the probability of error of single channels (i.e., on ) were obtained using the techniques of [9]. The new lower bound is based on the IMJP decoder, as described in this paper. We computed the IMJP decoding error, with for all possible pairs of the worst channels in the non-frozen set.101010Note that there is a different set of worst channels for each crossover probability. For each crossover probability, we selected the channels in the (fixed) non-frozen set with the highest upper bound on decoding error when used over a BSC with that crossover probability. We then used Lemma 30, computed for the subset of these channels that yielded the highest bound; this provides a significantly improved bound over the bound given by the worst-performing pair. The computation utilized [18] for parallel computation of the IMJP decoding error over different channel pairs.
As one may observe, our bounds improve upon the previously known lower bound (3). In fact, they are quite close to the upper bound on the probability of error. This provides strong numerical evidence that error events of channel pairs dominate the error probability of polar codes under SC decoding.
X Discussion and Outlook
This research was inspired by [12], which showed that — for the BEC — the union bound on the probability of error of polar codes under SC decoding is asymptotically tight. The techniques of [12] hinged on the property that a polarized BEC is itself a BEC. Or, put another way, that the family of binary erasure channels is closed under the polar transform. This property enabled the authors to directly track the joint probability of erasure during the polarization process and bound its rate of decay. Unfortunately, this property is not shared by other channel families.
Design of polar codes for channel coding is based on selecting a set of indices to be frozen. One design rule is to select the worst-performing indices as the frozen set. For example, for a code of length and rate , choose the indices with the highest probability of error (such channels can be identified using the techniques of [9]). This design rule optimizes the union bound on the probability of error of polar codes, (2). As Parizi and Telatar have shown in [12], for the BEC such a design rule is essentially optimal. It is an open question whether a similar claim can be made for other BMS channel families.
As our numerical results show, below a certain crossover probability the upper bound and our lower bound all but coincide, with a significant gap to the trivial lower bound. Thus, we conjecture that the ratio between the union bound and the actual probability of error approaches asymptotically for any BMS channel. This will imply the essential optimality of the the union bound as a design rule. Moreover, we believe that the tools developed in this research are key to proving this conjecture.
One possible approach is to track analytically the evolution of joint error probabilities during the polarization process. The symmetrization transformation and the resultant decoupling decomposition bring joint channels to a form more amenable to analysis. One may look at, for example, the Bhattacharyya parameter of the channel from (28), when are fixed,
[TABLE]
This quantity, together with the Bhattacharyya parameters of the a-channel, may be used to bound . Tracking the evolution of these parameters — or bounds on them — may enable the study of the decay of (if indeed there is such decay). In fact, it can be shown that applying the above suggestion to the BEC coincides with the approach of [12].
Interestingly, our bounds are tight despite the various manipulations they perform on the joint channel. The joint channels that result from our procedure are very different from the actual joint channel, yet have no effect on the marginal distributions. This curious outcome merits further research on the upgrade-couple transform and its effect on the joint channel.
There are several additional avenues of further research. These include:
- •
Our results apply only to BMS channels. It would be interesting to extend them to richer settings, such as channels with non-binary input, or non-symmetric channels.
- •
This research has concentrated on SC decoding. Can it be expanded/applied to other decoding methods for polar codes (e.g., successive cancellation list (SCL) decoding [20])? A logical first step in analyzing SCL decoding is to look at pairs of error events, as done here.
Acknowledgment
The assistance of Ina Talmon is gratefully acknowledged.
Appendix A The IMJP decoder for a BEC
In the special case where is a BEC and and are two of its polar descendants, we have the following.
Proposition 31**.**
Let and be two polar descendants of a BEC in the same tier. Then, the IMJP and the IML (SC) decoders coincide.
To prove this, we first show that for the BEC erasures are determined by the received channel symbols, , and not previous bit decisions. This implies that for fixed , regardless of and in particular , either channel always experiences an erasure, or always experiences a non-erasure. If experiences an erasure, it doesn’t matter what decides in terms of the IMJP decoder – it may as well use an ML decoder; if does not experience an erasure, then the best bet of is to use an ML decoder. This suggests that the IML and IMJP decoders coincide.
Lemma 32**.**
Let be a polar descendant of a BEC, . Then, there exists a set , dependent only on , such that has an erasure if and only if .
Proof:
Here, are the received channel symbols, and the previous bit decisions that are part of ’s output. Let be the binary expansion of , with the MSB. Recall that channel is the result of polarization steps determined by , where is a ‘’-transform and is a ‘’-transform.
Consider first the case where , i.e., . If then has an erasure if and only if at least one of is an erasure, i.e., if and only if , . If then has an erasure if and only if both and are erasures, i.e., if and only if , . Therefore, the claim is true for .
We proceed by induction. Let the claim be true for : for , there exists a set such that has an erasure if and only if . If , then is the result of a ‘’-transform of two BEC channels , so it has an erasure if and only if at least one of them erases. In other words, has an erasure if and only if , . If, however, , then is the result of a ‘’-transform of two BEC channels , so it has an erasure if and only if both of them erase. In other words, has an erasure if and only if , . Thus, the claim is true for as well, completing the proof. ∎
Proof:
By Lemma 3, a decoder that minimizes is an ML decoder. It remains to show that a minimizing is also an ML decoder. Marginalizing the joint channel (10) yields :
[TABLE]
The ML decoder for channel maximizes with respect to ; decoder , on the other hand, maximizes , defined in (12). Using (10) we recast the expression for in the same form as the expression for ,
[TABLE]
By Lemma 32, whether has an erasure depends solely on the received channel symbols, which are wholly contained in , and not on previous bit decisions. In particular, in computing or , we either sum over only erasure symbols or over only non-erasure symbols. Since is an ML decoder for , if is an erasure of then ; if is not an erasure of then . In either case, it is clear that the decision based on (11) is identical to the ML decision. Therefore, is an ML decoder as well, implying that the IMJP decoder is an IML decoder. ∎
Appendix B Introduction to -values
The decision of an ML decoder for a memoryless binary-input channel may be based on any sufficient statistic of the channel output. One well-known sufficient statistic is the log-likelihood ratio (LLR), . When is positive, the decoder declares that [math] was transmitted; when is negative, the decoder declares that was transmitted; constitutes an erasure, at which the decoder makes some random choice. Another sufficient statistic is the -value.
The -value of output , , is given by
[TABLE]
Clearly, . A maximum likelihood decoder makes its decision based on the sign of the -value. Assuming a symmetric channel input, with probability , using Bayes’ law on (73) yields
[TABLE]
The input is binary, hence . Consequently (74) yields
[TABLE]
There is a one-to-one correspondence between and , or, equivalently,
If channel is symmetric, for each output there is a conjugate output ; their LLRs and -values are related:
Since the -value is a sufficient statistic of a BMS channel, we may replace the channel output with its -value. Thus, we may assume that the output of channel is a -value, i.e., . In this case, we say that is in -value representation.
Recall that every BMS channel can be decomposed into BSCs [19, Theorem 2.1]. We can think of the output of a BMS as consisting of the “reliability” of the BSC and its output. The absolute value of the -value corresponds to the BSC’s reliability and its sign to the BSC output ([math] or ).
A comprehensive treatment of -values and LLRs in relation to BMS channels appears in [13, Chapter 4].
Appendix C Upgrades of a BMS Channel
We state here in our notation the two upgrades of a BMS channel from [9].
Let be a discrete BMS whose outputs are -values , and let the probability of symbol be , . Without loss of generality, . Clearly, for all , and . Moreover, . Namely, this is a BMS that decomposes to different BSCs, with crossover probabilities , . BSC channel is selected with probability . We have and .
C-A The Upgrade-merge-2 Procedure
The first upgrade-merge of [9] takes two -values and merges them by transferring the probability of to . We call it upgrade-merge-2. Channel is upgraded to channel ; the output alphabet of is and
[TABLE]
where
[TABLE]
The degrading channel from to is shown in Figure 6a. We show only the portion of interest, i.e., we do not show the symbols that this degrading channel does not change. The parameters of the degrading channel are
[TABLE]
Indeed, and , so this constitutes a valid channel. Note that if then .
C-B The Upgrade-merge-3 Procedure
The second upgrade-merge of [9] removes a -value by splitting its probability between a preceding -value and a succeeding -value . We call it upgrade-merge-3. Unlike upgrade-merge-2, at least one of these inequalities must be strict (i.e., either or ). Channel is upgraded to channel with output alphabet and
[TABLE]
where
[TABLE]
Note that
[TABLE]
The degrading channel from to is shown in Figure 6b, showing only the interesting portion of the channel. The parameters of the channel are , and , . This is a valid channel as .
It can be shown [9, Lemma 12] that . That is, upgrade-merge-3 yields a better (closer) upgraded approximation of than does upgrade-merge-2.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] E. Arıkan, “Channel polarization: a method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Transactions on Information Theory , vol. 55, no. 7, pp. 3051–3073, July 2009.
- 2[2] ——, “Source polarization,” in 2010 IEEE International Symposium on Information Theory , June 2010, pp. 899–903.
- 3[3] S. B. Korada, “Polar codes for channel and source coding,” Ph.D. dissertation, Ecole Polytechnique Fédèrale de Lausanne, 2009.
- 4[4] E. Şaşoğlu, “Polarization and polar codes,” Foundations and Trends® in Communications and Information Theory , vol. 8, no. 4, pp. 259–381, 2011.
- 5[5] J. Honda and H. Yamamoto, “Polar coding without alphabet extension for asymmetric models,” IEEE Transactions on Information Theory , vol. 59, no. 12, pp. 7829–7838, December 2013.
- 6[6] E. Şaşoğlu, “Polarization in the presence of memory,” in 2011 IEEE International Symposium on Information Theory Proceedings , July 2011, pp. 189–193.
- 7[7] E. Şaşoğlu and I. Tal, “Polar coding for processes with memory,” in 2016 IEEE International Symposium on Information Theory (ISIT) . IEEE, 2016, pp. 225–229.
- 8[8] B. Shuval and I. Tal, “Fast polarization for processes with memory,” 2017. [Online]. Available: ar Xiv:1710.02849
