Greedy-Merge Degrading has Optimal Power-Law

Assaf Kartowsky; Ido Tal

arXiv:1701.02119·cs.IT·May 9, 2017

Greedy-Merge Degrading has Optimal Power-Law

Assaf Kartowsky, Ido Tal

PDF

Open Access

TL;DR

This paper proves that the greedy-merge algorithm for degrading channels is asymptotically optimal in reducing mutual information, with bounds close to the best possible, especially as the output alphabet size varies.

Contribution

The paper establishes that the greedy-merge algorithm achieves an optimal power-law bound on mutual information reduction, matching a fundamental lower bound.

Findings

01

Greedy-merge is within a constant factor of the optimal lower bound.

02

The bounds on mutual information reduction are tight in the power-law sense.

03

The results hold for fixed input alphabet size with varying output size.

Abstract

Consider a channel with a given input distribution. Our aim is to degrade it to a channel with at most L output letters. One such degradation method is the so called "greedy-merge" algorithm. We derive an upper bound on the reduction in mutual information between input and output. For fixed input alphabet size and variable L, the upper bound is within a constant factor of an algorithm-independent lower bound. Thus, we establish that greedy-merge is optimal in the power-law sense.

Equations100

I (W, P_{X}) ≜ I (X; Y) = x \in X \sum η (π (x)) - x \in X, y \in Y \sum π (y) η (W (x ∣ y)),

I (W, P_{X}) ≜ I (X; Y) = x \in X \sum η (π (x)) - x \in X, y \in Y \sum π (y) η (W (x ∣ y)),

Q (z ∣ x) = y \in Y \sum W (y ∣ x) Φ (z ∣ y),

Q (z ∣ x) = y \in Y \sum W (y ∣ x) Φ (z ∣ y),

Δ I^{*} ≜ Q, Φ : Q ≼ W, ∣ Q ∣ \leq L min I (W, P_{X}) - I (Q, P_{X}),

Δ I^{*} ≜ Q, Φ : Q ≼ W, ∣ Q ∣ \leq L min I (W, P_{X}) - I (Q, P_{X}),

Δ I \leq μ (∣ X ∣) \cdot ∣ Y ∣^{- \frac{∣ X ∣ + 1}{∣ X ∣ - 1}},

Δ I \leq μ (∣ X ∣) \cdot ∣ Y ∣^{- \frac{∣ X ∣ + 1}{∣ X ∣ - 1}},

μ (∣ X ∣) ≜ \frac{π ∣ X ∣}{( 1 + \frac{1}{2 ( ∣ X ∣ - 1 )} - 1 ) ^{2}} \frac{2∣ X ∣}{Γ ( 1 + \frac{∣ X ∣ - 1}{2} )}^{\frac{2}{∣ X ∣ - 1}},

μ (∣ X ∣) ≜ \frac{π ∣ X ∣}{( 1 + \frac{1}{2 ( ∣ X ∣ - 1 )} - 1 ) ^{2}} \frac{2∣ X ∣}{Γ ( 1 + \frac{∣ X ∣ - 1}{2} )}^{\frac{2}{∣ X ∣ - 1}},

Δ I^{*} = Q, Φ : Q ≼ W, ∣ Q ∣ \leq L min I (W, P_{X}) - I (Q, P_{X}) = O (L^{- \frac{2}{∣ X ∣ - 1}}) .

Δ I^{*} = Q, Φ : Q ≼ W, ∣ Q ∣ \leq L min I (W, P_{X}) - I (Q, P_{X}) = O (L^{- \frac{2}{∣ X ∣ - 1}}) .

Δ I^{*}

Δ I^{*}

\leq μ (∣ X ∣) \int_{L}^{∣ Y ∣} ℓ^{- \frac{∣ X ∣ + 1}{∣ X ∣ - 1}} d ℓ

\leq ν (∣ X ∣) \cdot L^{- \frac{2}{∣ X ∣ - 1}},

π_{ab} = π (y_{ab}), π_{a} = π (y_{a}), π_{b} = π (y_{b}),

π_{ab} = π (y_{ab}), π_{a} = π (y_{a}), π_{b} = π (y_{b}),

γ_{x} = Q (x ∣ y_{ab}) = \frac{π _{a} α _{x} + π _{b} β _{x}}{π _{ab}} = \frac{π _{a} α _{x} + π _{b} β _{x}}{π _{a} + π _{b}} .

γ_{x} = Q (x ∣ y_{ab}) = \frac{π _{a} α _{x} + π _{b} β _{x}}{π _{ab}} = \frac{π _{a} α _{x} + π _{b} β _{x}}{π _{a} + π _{b}} .

Δ I = I (W, P_{X}) - I (Q, P_{X}) = x \in X \sum Δ I_{x},

Δ I = I (W, P_{X}) - I (Q, P_{X}) = x \in X \sum Δ I_{x},

Δ I_{x} \leq (π_{a} + π_{b}) \cdot d_{1} (α_{x}, β_{x}),

Δ I_{x} \leq (π_{a} + π_{b}) \cdot d_{1} (α_{x}, β_{x}),

Δ I_{x}

Δ I_{x}

\leq (a) π_{a} η^{'} (α_{x}) (γ_{x} - α_{x}) + π_{b} η^{'} (β_{x}) (γ_{x} - β_{x})

= (b) \frac{π _{a} π _{b}}{π _{a} + π _{b}} (α_{x} - β_{x}) (η^{'} (β_{x}) - η^{'} (α_{x}))

\leq (c) \frac{1}{4} (π_{a} + π_{b}) (α_{x} - β_{x})^{2} (- η^{''} (λ)),

Δ I_{x} \leq (π_{a} + π_{b}) \cdot d_{2} (α_{x}, β_{x}),

Δ I_{x} \leq (π_{a} + π_{b}) \cdot d_{2} (α_{x}, β_{x}),

d_{2} (α, ζ) ≜ {\frac{( ζ - α ) ^{2}}{m i n ( α , ζ )} \infty α, ζ > 0, \mbox o t h er w i se .

d_{2} (α, ζ) ≜ {\frac{( ζ - α ) ^{2}}{m i n ( α , ζ )} \infty α, ζ > 0, \mbox o t h er w i se .

Δ I_{x} \leq (π_{a} + π_{b}) \cdot d (α_{x}, β_{x}),

Δ I_{x} \leq (π_{a} + π_{b}) \cdot d (α_{x}, β_{x}),

d (α, ζ) ≜ min (d_{1} (α, ζ), d_{2} (α, ζ)) .

d (α, ζ) ≜ min (d_{1} (α, ζ), d_{2} (α, ζ)) .

Δ I \leq (π_{a} + π_{b}) ∣ X ∣ \cdot d (α, β),

Δ I \leq (π_{a} + π_{b}) ∣ X ∣ \cdot d (α, β),

d (α, ζ) ≜ x \in X max d (α_{x}, ζ_{x}) .

d (α, ζ) ≜ x \in X max d (α_{x}, ζ_{x}) .

∣ Y_{small} ∣ \geq \frac{∣ Y ∣}{2} .

∣ Y_{small} ∣ \geq \frac{∣ Y ∣}{2} .

Δ I \leq \frac{4∣ X ∣}{∣ Y ∣} \cdot d (α, β) .

Δ I \leq \frac{4∣ X ∣}{∣ Y ∣} \cdot d (α, β) .

B_{i} (α, r) ≜ {ζ \in R : d_{i} (α, ζ) \leq r}, i \in {1, 2} .

B_{i} (α, r) ≜ {ζ \in R : d_{i} (α, ζ) \leq r}, i \in {1, 2} .

B_{1} (α, r) = {ζ \in R : - r \leq ζ - α \leq r}

B_{1} (α, r) = {ζ \in R : - r \leq ζ - α \leq r}

B_{2} (α, r) = {ζ \in R : - r^{2} /4 + α \cdot r + r /2 \leq ζ - α \leq α \cdot r} .

B_{2} (α, r) = {ζ \in R : - r^{2} /4 + α \cdot r + r /2 \leq ζ - α \leq α \cdot r} .

B (α, r) = {ζ \in R : - \underline{ω} (α, r) \leq ζ - α \leq \overline{ω} (α, r)},

B (α, r) = {ζ \in R : - \underline{ω} (α, r) \leq ζ - α \leq \overline{ω} (α, r)},

B (α, r) ≜ {ζ \in R^{∣ X ∣} : d (α, ζ) \leq r} .

B (α, r) ≜ {ζ \in R^{∣ X ∣} : d (α, ζ) \leq r} .

\displaystyle\begin{split}\mathcal{B}(\boldsymbol{\alpha},r)&=\Big{\{}\boldsymbol{\zeta}\in\mathbb{R}^{|\mathcal{X}|}:\\ &\quad\quad-\underline{\omega}(\alpha_{x},r)\leq\zeta_{x}-\alpha_{x}\leq\overline{\omega}(\alpha_{x},r)\Big{\}}\;.\end{split}

\displaystyle\begin{split}\mathcal{B}(\boldsymbol{\alpha},r)&=\Big{\{}\boldsymbol{\zeta}\in\mathbb{R}^{|\mathcal{X}|}:\\ &\quad\quad-\underline{\omega}(\alpha_{x},r)\leq\zeta_{x}-\alpha_{x}\leq\overline{\omega}(\alpha_{x},r)\Big{\}}\;.\end{split}

B_{K} (α, r) = B (α, r) \cap K^{∣ X ∣} .

B_{K} (α, r) = B (α, r) \cap K^{∣ X ∣} .

C (α, r) = {ζ \in K^{∣ X ∣} : \forall x \in X^{'}, - ω^{'} (α_{x}, r) \leq ζ_{x} - α_{x} \leq ω^{'} (α_{x}, r)},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsError Correcting Code Techniques · Advanced Wireless Communication Techniques · Cellular Automata and Applications

Full text

Greedy-Merge Degrading has Optimal Power-Law

Assaf Kartowsky and Ido Tal

Department of Electrical Engineering

Technion - Haifa 32000, Israel

E-mail: {kartov@campus, idotal@ee}.technion.ac.il

Abstract

Consider a channel with a given input distribution. Our aim is to degrade it to a channel with at most $L$ output letters. One such degradation method is the so called “greedy-merge” algorithm. We derive an upper bound on the reduction in mutual information between input and output. For fixed input alphabet size and variable $L$ , the upper bound is within a constant factor of an algorithm-independent lower bound. Thus, we establish that greedy-merge is optimal in the power-law sense.

I Introduction

In myriad digital processing contexts, quantization is used to map a large alphabet to a smaller one. For example, quantizers are an essential building block in receiver design, used to keep the complexity and resource consumption manageable. The quantizer used has a direct influence on the attainable code rate.

Another recent application is related to polar codes [1]. Polar code construction is equivalent to evaluating the misdecoding probability of each channel in a set of synthetic channels. This evaluation cannot be carried out naively, since the output alphabet size of a synthetic channel is intractably large. One approach to circumvent this difficulty is to degrade the evaluated synthetic channel to a channel with manageable output alphabet size [2][3][4][5][6][7].

Given a design parameter $L$ , we degrade an initial channel to a new one with output alphabet size at most $L$ . We assume that the input distribution is specified, and note that this degradation reduces the mutual information between the channel input and output. In both examples above, this reduction is roughly the loss in code rate due to quantization. We denote the smallest reduction possible by $\Delta I^{\ast}$ .

Let $|\mathcal{X}|$ denote the channel input alphabet size, and treat it as a fixed quantity. We show that for any input distribution and any initial channel, $\Delta I^{\ast}=O(L^{-2/(|\mathcal{X}|-1)})$ . Moreover, this bound is attained efficiently, by the greedy-merge algorithm [2][5]. This bound is tighter than the bounds derived in [3], [4], [5] and [6]. In fact, up to constant multipliers (dependent on $|\mathcal{X}|$ ), this bound is the tightest possible. Namely, [8] proves the existence of an input distribution and a sequence of channels for which $\Delta I^{\ast}=\Omega(L^{-2/(|\mathcal{X}|-1)})$ . Both bounds have $-2/(|\mathcal{X}|-1)$ as the power of $L$ , the same power-law. Note that for noisy channels and a relatively small $L$ our bound can be tightened [9]. See also [10], which is especially relevant in the context of small $L$ .

II Preliminaries

We are given an input distribution and a discrete memoryless channel (DMC) $W:\mathcal{X}\rightarrow\mathcal{Y}$ . Both $|\mathcal{X}|$ and $|\mathcal{Y}|$ are assumed finite. Let $X$ and $Y$ denote the random variables that correspond to the channel input and output, respectively. Denote the corresponding distributions $P_{X}$ and $P_{Y}$ . Let $W(y|x)\triangleq\mathbb{P}\left\{Y=y|X=x\right\}$ . For brevity, let $\pi(x)\triangleq\mathbb{P}\left\{X=x\right\}=P_{X}(x)$ . Assuming further that $\mathcal{X}$ and $\mathcal{Y}$ are disjoint, we abuse notation and denote $\mathbb{P}\left\{X=x|Y=y\right\}$ and $\mathbb{P}\left\{Y=y\right\}$ as $W(x|y)$ and $\pi(y)$ , respectively. Without loss of generality, $\pi(x)>0$ and $\pi(y)>0$ . We do not assume that $W$ is symmetric.

The mutual information between channel input and output is

[TABLE]

where $\eta(p)\triangleq-p\log p$ for $p>0$ , zero for $p=0$ , and the logarithm is taken in the natural basis. We note that the input distribution does not necessarily have to be the one that achieves the channel capacity.

We now define the relation of degradedness between channels. A channel $Q:\mathcal{X}\rightarrow\mathcal{Z}$ is said to be (stochastically) degraded with respect to a channel $W:\mathcal{X}\rightarrow\mathcal{Y}$ , and we write $Q\preccurlyeq W$ , if there exists a channel $\Phi:\mathcal{Y}\rightarrow\mathcal{Z}$ such that

[TABLE]

for all $x\in\mathcal{X}$ and $z\in\mathcal{Z}$ . Note that as a result of the data processing theorem, $Q\preccurlyeq W$ implies $\Delta I\triangleq I(W,P_{X})-I(Q,P_{X})\geq 0$ .

Although mentioned before, let us properly define the optimal degrading loss for a given pair $(W,P_{X})$ as

[TABLE]

where $|Q|$ denotes the output alphabet size of the channel $Q$ . The optimizer $Q$ is the degraded channel that is “closest” to $W$ in the sense of mutual information, yet has at most $L$ output letters.

III Main result

Our main result is an upper bound on $\Delta I^{\ast}$ , in terms of $|\mathcal{X}|$ and $L$ . This upper bound will follow from analyzing a sub-optimal111For the binary-input case, optimal degrading can be realized through dynamic programming [11][12]. For the non-binary case, we do not know of an efficient realization of optimal degrading. degrading algorithm, called “greedy-merge”. In each iteration of greedy-merge, we merge the two output letters $y_{a},y_{b}\in\mathcal{Y}$ that result in the smallest decrease of mutual information between input and output, denoted $\Delta I$ . Namely, the intermediate channel $\Phi$ maps $y_{a}$ and $y_{b}$ to a new symbol, while all other symbols are unchanged by $\Phi$ . This is repeated $|\mathcal{Y}|-L$ times, to yield an output alphabet size of $L$ . By upper bounding the $\Delta I$ of each iteration we obtain an upper bound on $\Delta I^{\ast}$ . A key result is the following theorem, stating that there exists a pair of output letters whose merger yields a “small” $\Delta I$ .

Theorem 1.

Let a DMC $W:\mathcal{X}\rightarrow\mathcal{Y}$ satisfy $|\mathcal{Y}|>2|\mathcal{X}|$ , and let the input distribution $P_{X}$ be fixed. There exists a pair $y_{a},y_{b}\in\mathcal{Y}$ whose merger results in a channel $Q$ satisfying $\Delta I=O\left(|\mathcal{Y}|^{-\frac{|\mathcal{X}|+1}{|\mathcal{X}|-1}}\right)$ . In particular,

[TABLE]

where,

[TABLE]

and $\Gamma(\cdot)$ is the Gamma function.

Recall that Theorem 1 is referring to the merger of a single pair of output letters. The following corollary is our main result, and is basically an iterative utilization of Theorem 1.

Corollary 2.

Let a DMC $W:\mathcal{X}\rightarrow\mathcal{Y}$ satisfy $|\mathcal{Y}|>2|\mathcal{X}|$ and let $L\geq 2|\mathcal{X}|$ . Then, for any fixed input distribution $P_{X}$ ,

[TABLE]

In particular, $\Delta I^{\ast}\leq\nu(|\mathcal{X}|)\cdot L^{-\frac{2}{|\mathcal{X}|-1}}$ , where $\nu(|\mathcal{X}|)\triangleq\frac{|\mathcal{X}|-1}{2}\mu(|\mathcal{X}|)$ , and $\mu(\cdot)$ was defined in Theorem 1. This bound is attained by greedy-merge, and is tight in the power-law sense.

Proof.

If $L\geq|\mathcal{Y}|$ , then obviously $\Delta I^{\ast}=0$ which is not the interesting case. If $2|\mathcal{X}|\leq L<|\mathcal{Y}|$ , then applying Theorem 1 repeatedly $|\mathcal{Y}|-L$ times yields

[TABLE]

by the monotonicity of $\ell^{-(|\mathcal{X}|+1)/(|\mathcal{X}|-1)}$ . The bound is tight in the power-law sense, by [8, Theorem 2]. ∎

Note that for large values of $|\mathcal{X}|$ , the Stirling approximation along with some other first order approximations can be applied to simplify $\nu(|\mathcal{X}|)$ to $\nu(|\mathcal{X}|)\approx 16\pi e|\mathcal{X}|^{3}$ .

IV Proof of Theorem 1

The proof of Theorem 1 will follow from a sphere-packing argument. In the following subsections we define a “distance” function, overcome it not being a metric, and assign different “weights” to different spheres. See [13] for more commentary.

IV-A An alternative “distance” function

Consider the merger of a pair of output letters $y_{a},y_{b}\in\mathcal{Y}$ . The new output alphabet of $Q$ is $\mathcal{Z}=\mathcal{Y}\setminus\left\{y_{a},y_{b}\right\}\cup\left\{y_{ab}\right\}$ . The channel $Q:\mathcal{X}\rightarrow\mathcal{Z}$ then satisfies $Q(y_{ab}|x)=W(y_{a}|x)+W(y_{b}|x)$ , whereas for all $y\in\mathcal{Z}\cap\mathcal{Y}$ we have $Q(y|x)=W(y|x)$ . Using the shorthand

[TABLE]

one gets that $\pi_{ab}=\pi_{a}+\pi_{b}$ . Denote by $\boldsymbol{\alpha}=(\alpha_{x})_{x\in\mathcal{X}}$ , $\boldsymbol{\beta}=(\beta_{x})_{x\in\mathcal{X}}$ and $\boldsymbol{\gamma}=(\gamma_{x})_{x\in\mathcal{X}}$ the vectors corresponding to posterior probabilities associated with $y_{a},y_{b}$ and $y_{ab}$ , respectively. Namely, $\alpha_{x}=W(x|y_{a})$ , $\beta_{x}=W(x|y_{b})$ , and

[TABLE]

Thus, after canceling terms, one gets that

[TABLE]

where $\Delta I_{x}\triangleq\pi_{ab}\eta(\gamma_{x})-\pi_{a}\eta(\alpha_{x})-\pi_{b}\eta(\beta_{x})$ .

In order to bound $\Delta I$ , we give two bounds on $\Delta I_{x}$ . The first bound was derived in [5],

[TABLE]

where for $\alpha\geq 0$ and $\zeta\in\mathbb{R}$ , we define $d_{1}(\alpha,\zeta)\triangleq|\zeta-\alpha|$ .

The subscript “ $1$ ” in $d_{1}$ is suggestive of the $L_{1}$ distance. We will use $\alpha$ to denote a probability associated with an input letter, while $\zeta$ will denote a “free” real variable, possibly negative. Note that the bound in (6) was derived assuming a uniform input distribution, however remains valid for the general case.

We now derive the second bound on $\Delta I_{x}$ . For the case where $\alpha_{x},\beta_{x}>0$ ,

[TABLE]

where in $(a)$ we used the concavity of $\eta(\cdot)$ , in $(b)$ the definition of $\gamma_{x}$ (see (4)), and in $(c)$ the AM-GM inequality and the mean value theorem where $\lambda=\theta\alpha_{x}+(1-\theta)\beta_{x}$ for some $\theta\in[0,1]$ . Using the monotonicity of $-\eta^{\prime\prime}(p)=1/p$ we get $-\eta^{\prime\prime}(\lambda)\leq 1/\min(\alpha_{x},\beta_{x})$ . Thus,

[TABLE]

where

[TABLE]

The subscript “ $2$ ” in $d_{2}$ is suggestive of the squaring in the numerator. Combining (6) and (7) yields

[TABLE]

where

[TABLE]

Returning to (5) using (8) we get

[TABLE]

where

[TABLE]

We note that we use $\max$ in (11) instead of a summation to simplify the upcoming derivations. Moreover, according to (10), it suffices to show the existence of a pair that is “close” in the sense of $d$ , assuming that $\pi_{a},\pi_{b}$ are also small enough.

Since we are interested in lowering the right hand side of (10), we limit our search to a subset of $\mathcal{Y}$ , as was done in [5]. Namely, $\mathcal{Y}_{\mathrm{small}}\triangleq\left\{y\in\mathcal{Y}:\pi(y)\leq 2/|\mathcal{Y}|\right\}$ , which implies

[TABLE]

Hence, $\pi_{a}+\pi_{b}\leq 4/|\mathcal{Y}|$ and

[TABLE]

We still need to prove the existence of a pair $y_{a},y_{b}\in\mathcal{Y}_{\mathrm{small}}$ that is “close” in the sense of $d$ . To that end, as in [5], we would like to use a sphere-packing approach. A typical use of such an argument assumes a proper metric, yet $d$ is not a metric. Specifically, the triangle-inequality does not hold. The absence of a triangle-inequality is a complication that we will overcome, but some care and effort are called for. Broadly speaking, as usually done in sphere-packing, we aim to show the existence of a critical “sphere” radius, $r_{\mathrm{critical}}=r_{\mathrm{critical}}(|\mathcal{X}|,|\mathcal{Y}|)>0$ . Such a critical radius will ensure the existence of $y_{a},y_{b}\in\mathcal{Y}_{\mathrm{small}}$ with corresponding $\boldsymbol{\alpha}$ and $\boldsymbol{\beta}$ for which $d(\boldsymbol{\alpha},\boldsymbol{\beta})\leq r_{\mathrm{critical}}$ .

IV-B Non-intersecting “spheres”

We start by giving explicit equations for the “spheres” corresponding to $d_{1}$ and $d_{2}$ .

Lemma 3.

For $\alpha\geq 0$ and $r>0$ , define the sets $\mathcal{B}_{1},\mathcal{B}_{2}$ as

[TABLE]

Then,

[TABLE]

and

[TABLE]

Proof.

Assume $\zeta\in\mathcal{B}_{1}(\alpha,r)$ . Then $\zeta$ satisfies $|\zeta-\alpha|\leq r$ , which is equivalent to $-r\leq\zeta-\alpha\leq r$ , and we get the desired result for $\mathcal{B}_{1}(\alpha,r)$ . Assume now $\zeta\in\mathcal{B}_{2}(\alpha,r)$ . If $\zeta\geq\alpha$ , then $\min(\alpha,\zeta)=\alpha$ , and thus $(\zeta-\alpha)^{2}/\alpha\leq r$ , which implies $0\leq\zeta-\alpha\leq\sqrt{\alpha\cdot r}$ . If $\zeta\leq\alpha$ , then $\min(\alpha,\zeta)=\zeta$ , and thus, $(\zeta-\alpha)^{2}/\zeta\leq r$ , which implies $-\sqrt{r^{2}/4+\alpha\cdot r}+r/2\leq\zeta-\alpha\leq 0$ . The union of the two yields the desired result for $\mathcal{B}_{2}(\alpha,r)$ . ∎

Thus, we define $\mathcal{B}(\alpha,r)\triangleq\{\zeta\in\mathbb{R}:d(\alpha,\zeta)\leq r\}$ , and note that $\mathcal{B}(\alpha,r)=\mathcal{B}_{1}(\alpha,r)\cup\mathcal{B}_{2}(\alpha,r)$ , since $d$ takes the $\min$ of the two distances. Namely,

[TABLE]

where $\underline{\omega}(\alpha,r)\triangleq\max\left(\sqrt{r^{2}/4+\alpha\cdot r}-r/2,r\right)$ and $\overline{\omega}(\alpha,r)\triangleq\max\left(\sqrt{\alpha\cdot r},r\right)$ . To extend $\mathcal{B}$ to vectors, we define $\mathbb{R}^{|\mathcal{X}|}$ as the set of vectors with real entries that are indexed by $\mathcal{X}$ , $\mathbb{R}^{|\mathcal{X}|}\triangleq\left\{\boldsymbol{\zeta}=(\zeta_{x})_{x\in\mathcal{X}}:\zeta_{x}\in\mathbb{R}\right\}$ . The set $\mathbb{K}^{|\mathcal{X}|}$ is defined as the set of vectors from $\mathbb{R}^{|\mathcal{X}|}$ with entries summing to $1$ , $\mathbb{K}^{|\mathcal{X}|}\triangleq\left\{\boldsymbol{\zeta}\in\mathbb{R}^{|\mathcal{X}|}:\sum_{x\in\mathcal{X}}\zeta_{x}=1\right\}$ . The set $\mathbb{K}_{+}^{|\mathcal{X}|}$ is the set of probability vectors. Namely, the set of vectors from $\mathbb{K}^{|\mathcal{X}|}$ with non-negative entries, $\mathbb{K}_{+}^{|\mathcal{X}|}\triangleq\left\{\boldsymbol{\zeta}\in\mathbb{K}^{|\mathcal{X}|}:\zeta_{x}\geq 0\right\}$ . We can now define $\mathcal{B}(\boldsymbol{\alpha},r)$ . For $\boldsymbol{\alpha}\in\mathbb{K}_{+}^{|\mathcal{X}|}$ let

[TABLE]

Using (11) and (14) we have a simple characterization of $\mathcal{B}(\boldsymbol{\alpha},r)$ as a box: a Cartesian product of segments. That is,

[TABLE]

We stress that the box $\mathcal{B}(\boldsymbol{\alpha},r)$ contains $\boldsymbol{\alpha}$ , but is not necessarily centered at it.

Recall our aim is finding an $r_{\mathrm{critical}}$ . Using our current notation, $r_{\mathrm{critical}}$ must imply the existence of distinct $y_{a},y_{b}\in\mathcal{Y}_{\mathrm{small}}$ such that $\boldsymbol{\beta}\in\mathcal{B}(\boldsymbol{\alpha},r_{\mathrm{critical}})$ . Note that the set $\mathcal{B}(\boldsymbol{\alpha},r)$ is contained in $\mathbb{R}^{|\mathcal{X}|}$ . However, since the boxes are induced by points $\boldsymbol{\alpha}$ in the subspace $\mathbb{K}_{+}^{|\mathcal{X}|}$ of $\mathbb{R}^{|\mathcal{X}|}$ , the sphere-packing would yield a tighter result if performed in $\mathbb{K}^{|\mathcal{X}|}$ rather than in $\mathbb{R}^{|\mathcal{X}|}$ . Then, for $\boldsymbol{\alpha}\in\mathbb{K}_{+}^{|\mathcal{X}|}$ and $r>0$ , let us define

[TABLE]

When considering $\mathcal{B}_{\mathbb{K}}(\boldsymbol{\alpha},r)$ in place of $\mathcal{B}(\boldsymbol{\alpha},r)$ , we have gained in that the affine dimension (see [14, Section 2.1.3]) of $\mathcal{B}_{\mathbb{K}}(\boldsymbol{\alpha},r)$ is $|\mathcal{X}|-1$ while that of $\mathcal{B}(\boldsymbol{\alpha},r)$ is $|\mathcal{X}|$ . However, we have lost in simplicity: the set $\mathcal{B}_{\mathbb{K}}(\boldsymbol{\alpha},r)$ is not a box. Indeed, a moment’s thought reveals that any subset of $\mathbb{K}^{|\mathcal{X}|}$ with more than one element cannot be a box.

We now show how to overcome the above loss. That is, we show a subset of $\mathcal{B}_{\mathbb{K}}(\boldsymbol{\alpha},r)$ which is — up to a simple transform — a box. Denote the index of the largest entry of a vector $\boldsymbol{\alpha}\in\mathbb{K}^{|\mathcal{X}|}$ as $x_{\max}(\boldsymbol{\alpha})$ , namely, $x_{\max}(\boldsymbol{\alpha})\triangleq\operatorname*{arg\,max}_{x\in\mathcal{X}}\alpha_{x}$ . In case of ties, define $x_{\max}(\boldsymbol{\alpha})$ in an arbitrary yet consistent manner. For $x_{\max}=x_{\max}(\boldsymbol{\alpha})$ given, or clear from the context, define $\boldsymbol{\zeta}^{\prime}$ as $\boldsymbol{\zeta}$ , with index $x_{\max}$ deleted. That is, for a given $\boldsymbol{\zeta}\in\mathbb{K}^{|\mathcal{X}|}$ , $\boldsymbol{\zeta}^{\prime}=(\zeta_{x})_{x\in\mathcal{X}^{\prime}}\in\mathbb{R}^{|\mathcal{X}|-1}$ , where $\mathcal{X}^{\prime}\triangleq\mathcal{X}\setminus\{x_{\max}\}$ . Note that for $\boldsymbol{\zeta}\in\mathbb{K}^{|\mathcal{X}|}$ , all the entries sum to one. Thus, given $\boldsymbol{\zeta}^{\prime}$ and $x_{\max}$ , we know $\boldsymbol{\zeta}$ . Next, for $\boldsymbol{\alpha}\in\mathbb{K}_{+}^{|\mathcal{X}|}$ and $r>0$ , define the set

[TABLE]

where $x_{\max}=x_{\max}(\boldsymbol{\alpha})$ and

[TABLE]

Lemma 4.

Let $\boldsymbol{\alpha}\in\mathbb{K}_{+}^{|\mathcal{X}|}$ and $r>0$ be given. Let $x_{\max}=x_{\max}(\boldsymbol{\alpha})$ . Then, $\mathcal{C}(\boldsymbol{\alpha},r)\subset\mathcal{B}_{\mathbb{K}}(\boldsymbol{\alpha},r)$ .

Proof.

It can be easily shown that $0\leq\underline{\omega}(\alpha,r)\leq\overline{\omega}(\alpha,r)$ . Thus, since (18) holds, it suffices to show that

[TABLE]

Indeed, summing the condition in (18) over all $x\in\mathcal{X}^{\prime}$ gives

[TABLE]

Since $\underline{\omega}(\alpha,r)$ is a monotonically non-decreasing function of $\alpha$ , we can simplify the above to

[TABLE]

Since both $\boldsymbol{\zeta}$ and $\boldsymbol{\alpha}$ are in $\mathbb{K}^{|\mathcal{X}|}$ , the middle term in the above is $\alpha_{x_{\max}}-\zeta_{x_{\max}}$ . Thus, (20) follows. ∎

Recall that our plan is to ensure the existence of a “close” pair by using a sphere-packing approach. However, since the triangle inequality does not hold for $d$ , we must use a somewhat different approach. Towards that end, define the positive quadrant associated with $\boldsymbol{\alpha}$ and $r$ as

[TABLE]

where $x_{\max}=x_{\max}(\boldsymbol{\alpha})$ and $\omega^{\prime}(\alpha,r)$ is as defined in (19).

Lemma 5.

Let $y_{a},y_{b}\in\mathcal{Y}$ be such that $x_{\max}(\boldsymbol{\alpha})=x_{\max}(\boldsymbol{\beta})$ . If $\mathcal{Q}^{\prime}(\boldsymbol{\alpha},r)$ and $\mathcal{Q}^{\prime}(\boldsymbol{\beta},r)$ have a non-empty intersection, then $d(\boldsymbol{\alpha},\boldsymbol{\beta})\leq r$ .

Proof.

By (15), (17), and Lemma 4, it suffices to prove that $\boldsymbol{\beta}\in\mathcal{C}(\boldsymbol{\alpha},r)$ . Define $\mathcal{C}^{\prime}(\boldsymbol{\alpha},r)$ as the result of applying a prime operation on each member of $\mathcal{C}(\boldsymbol{\alpha},r)$ , where $x_{\max}=x_{\max}(\boldsymbol{\alpha})$ . Hence, we must equivalently prove that $\boldsymbol{\beta}^{\prime}\in\mathcal{C}^{\prime}(\boldsymbol{\alpha},r)$ . By (18), we must show that for all $x\in\mathcal{X}^{\prime}$ ,

[TABLE]

Since we know that the intersection of $\mathcal{Q}^{\prime}(\boldsymbol{\alpha},r)$ and $\mathcal{Q}^{\prime}(\boldsymbol{\beta},r)$ is non-empty, let $\boldsymbol{\zeta}^{\prime}$ be a member of both sets. Thus, we know that for $x\in\mathcal{X}^{\prime}$ , $0\leq\zeta_{x}-\alpha_{x}\leq\omega^{\prime}(\alpha_{x},r)$ , and $0\leq\zeta_{x}-\beta_{x}\leq\omega^{\prime}(\beta_{x},r)$ . For each $x\in\mathcal{X}^{\prime}$ we must consider two cases: $\alpha_{x}\leq\beta_{x}$ and $\alpha_{x}>\beta_{x}$ .

Consider first the case $\alpha_{x}\leq\beta_{x}$ . Since $\zeta_{x}-\alpha_{x}\leq\omega^{\prime}(\alpha_{x},r)$ and $\beta_{x}-\zeta_{x}\leq 0$ , we conclude that $\beta_{x}-\alpha_{x}\leq\omega^{\prime}(\alpha_{x},r)$ . Conversely, since $\beta_{x}-\alpha_{x}\geq 0$ and, by (19), $\omega^{\prime}(\alpha_{x},r)\geq 0$ , we have that $\beta_{x}-\alpha_{x}\geq-\omega^{\prime}(\alpha_{x},r)$ . Thus we have shown that both inequalities in (21) hold.

To finish the proof, consider the case $\alpha_{x}>\beta_{x}$ . We have already established that $\omega^{\prime}(\alpha_{x},r)\geq 0$ . Thus, since by assumption $\beta_{x}-\alpha_{x}\leq 0$ , we have that $\beta_{x}-\alpha_{x}\leq\omega^{\prime}(\alpha_{x},r)$ . Conversely, since $\zeta_{x}-\beta_{x}\leq\omega^{\prime}(\beta_{x},r)$ and $\alpha_{x}-\zeta_{x}\leq 0$ , we have that $\alpha_{x}-\beta_{x}\leq\omega^{\prime}(\beta_{x},r)$ . We now recall that by (19), the fact that $\alpha_{x}\geq\beta_{x}$ implies that $\omega^{\prime}(\beta_{x},r)\leq\omega^{\prime}(\alpha_{x},r)$ . Thus, $\alpha_{x}-\beta_{x}\leq\omega^{\prime}(\alpha_{x},r)$ . Negating gives $\beta_{x}-\alpha_{x}\geq-\omega^{\prime}(\alpha_{x},r)$ , and we have once again proved the two inequalities in (21). ∎

IV-C Weighted “sphere”-packing

The volume of our “sphere” $\mathcal{Q}^{\prime}(\boldsymbol{\alpha},r)$ unfortunately depends on $\boldsymbol{\alpha}$ . We would like then to alleviate this dependency by defining a density over $\mathbb{R}^{|\mathcal{X}|-1}$ and derive a lower bound on the weight of $\mathcal{Q}^{\prime}(\boldsymbol{\alpha},r)$ . Let $\varphi:\mathbb{R}\to\mathbb{R}$ be defined as $\varphi(\zeta)\triangleq 1/\sqrt{4\zeta}$ . Next, for $\boldsymbol{\zeta}^{\prime}\in\mathbb{R}^{|\mathcal{X}|-1}$ , abuse notation and define $\varphi:\mathbb{R}^{|\mathcal{X}|-1}\to\mathbb{R}$ as $\varphi(\boldsymbol{\zeta}^{\prime})\triangleq\prod_{x\in\mathcal{X}^{\prime}}\varphi(\zeta_{x})$ . The weight of $\mathcal{Q}^{\prime}(\boldsymbol{\alpha},r)$ is then defined as $M\left[\mathcal{Q}^{\prime}(\boldsymbol{\alpha},r)\right]\triangleq\int_{\mathcal{Q}^{\prime}(\boldsymbol{\alpha},r)}\varphi\mathop{}\!\mathrm{d}\boldsymbol{\zeta}^{\prime}$ . The following lemma proposes a lower bound on $M\left[\mathcal{Q}^{\prime}(\boldsymbol{\alpha},r)\right]$ that does not depend on $\boldsymbol{\alpha}$ .

Lemma 6.

The weight $M\left[\mathcal{Q}^{\prime}(\boldsymbol{\alpha},r)\right]$ satisfies

[TABLE]

Proof.

Since $\varphi(\boldsymbol{\zeta}^{\prime})$ is a product,

[TABLE]

where $\psi_{r}(\alpha)\triangleq\sqrt{\alpha+\omega^{\prime}(\alpha,r)}-\sqrt{\alpha}$ . It can be shown that $\psi_{r}(\alpha)$ is decreasing when $\alpha<2r$ simply by using the first derivative. As for $\alpha\geq 2r$ , it can be shown that $\psi_{r}^{\prime}(\alpha)$ is non-zero. Since $\psi_{r}^{\prime}(2r)>0$ we conclude that $\psi_{r}(\alpha)$ is increasing. By continuity we conclude that $\psi_{r}(\alpha)$ is minimal for $\alpha=2r$ and thus we get (22). ∎

We divide the letters in $\mathcal{Y}_{\mathrm{small}}$ to $|\mathcal{X}|$ subsets, according to their $x_{\max}$ value. The largest subset is denoted by $\mathcal{Y}^{\prime}$ , and we henceforth fix $x_{\max}$ accordingly. We limit our search to $\mathcal{Y}^{\prime}$ .

Let $\mathcal{V}^{\prime}$ be the union of all the quadrants corresponding to possible choices of $\boldsymbol{\alpha}$ . Namely,

[TABLE]

In order to bound the weight of $\mathcal{V}^{\prime}$ , we introduce the simpler set $\mathcal{U}^{\prime}$ .

[TABLE]

The constraint $r\leq 1$ in the following lemma will be motivated shortly.

Lemma 7.

Let $r\leq 1$ . Then, $\mathcal{V}^{\prime}\subseteq\mathcal{U}^{\prime}$ .

Proof.

Assume $\boldsymbol{\zeta}^{\prime}\in\mathcal{V}^{\prime}$ . Then, there exists $\boldsymbol{\alpha}\in\mathbb{K}_{+}^{|\mathcal{X}|}$ such that $0\leq\zeta_{x}-\alpha_{x}\leq\omega^{\prime}(\alpha_{x},r)$ for all $x\in\mathcal{X}^{\prime}$ . Hence, $\zeta_{x}\geq 0$ for all $x\in\mathcal{X}^{\prime}$ . Moreover,

[TABLE]

There are two cases to consider. In the case where $\alpha_{x_{\max}}\geq 2r$ we have

[TABLE]

where the second inequality is due to the assumption $\alpha_{x_{\max}}\geq 2r$ . In the case where $\alpha_{x_{\max}}\leq 2r$ , (IV-C) becomes

[TABLE]

where we assumed $r\leq 1$ . Therefore, $\boldsymbol{\zeta}^{\prime}\in\mathcal{U}^{\prime}$ . ∎

The lemma above and the non-negativity of $\varphi$ , enable us to upper bound the weight of $\mathcal{V}^{\prime}$ , denoted by $M\left[\mathcal{V}^{\prime}\right]$ , using $M\left[\mathcal{V}^{\prime}\right]\triangleq\int_{\mathcal{V}^{\prime}}\varphi\mathop{}\!\mathrm{d}\boldsymbol{\zeta}^{\prime}\leq\int_{\mathcal{U}^{\prime}}\varphi\mathop{}\!\mathrm{d}\boldsymbol{\zeta}^{\prime}$ . We define the mapping $\rho_{x}=\sqrt{\zeta_{x}}$ for all $x\in\mathcal{X}^{\prime}$ and perform a change of variables. As a result, $\mathcal{U}^{\prime}$ is mapped to $\mathcal{S}^{\prime}\triangleq\left\{\boldsymbol{\rho}^{\prime}\in\mathbb{R}^{|\mathcal{X}|-1}:\sum_{x\in\mathcal{X}^{\prime}}\rho_{x}^{2}\leq 2,\;\rho_{x}\geq 0\right\}$ , which is a quadrant of a $|\mathcal{X}|-1$ dimensional ball of a $\sqrt{2}$ radius. The density function $\varphi$ transforms into the unit uniform density function since $\mathop{}\!\mathrm{d}\zeta_{x}/\sqrt{4\zeta_{x}}=\mathop{}\!\mathrm{d}\rho_{x}$ . Hence, for $r\leq 1$ ,

[TABLE]

where we have used the well known expression for the volume of a multidimensional ball. Finally, we prove Theorem 1.

Proof of Theorem 1.

Recall that we are assuming $|\mathcal{Y}|>2|\mathcal{X}|$ . According to the definition of $\mathcal{Y}^{\prime}$ , we get by (12) that

[TABLE]

As a result, we have at least two points in $\mathcal{Y}^{\prime}$ , and are therefore in a position to apply a sphere-packing argument. Towards this end, let $r$ be such that the starred equality in the following derivation holds:

[TABLE]

Namely,

[TABLE]

There are two cases to consider. If $r\leq 1$ , then all of (26) holds, by (22), (24) and (25). We take $r_{\mathrm{critical}}=r$ , and deduce the existence of a pair $y_{a},y_{b}\in\mathcal{Y}^{\prime}$ for which $d(\boldsymbol{\alpha},\boldsymbol{\beta})\leq r$ . Indeed, assuming otherwise would contradict (26), since each $\mathcal{Q}^{\prime}$ in the sum is contained in $\mathcal{V}^{\prime}$ , and, by Lemma 5 and our assumption, all summed $\mathcal{Q}^{\prime}$ are disjoint.

We next consider the case $r>1$ . Now, any pair of letters $y_{a},y_{b}\in\mathcal{Y}^{\prime}$ satisfies $d(\boldsymbol{\alpha},\boldsymbol{\beta})\leq r$ . Indeed, by (9) and (11),

[TABLE]

where $\|\cdot\|_{\infty}$ is the maximum norm.

We have proved the existence of $y_{a},y_{b}\in\mathcal{Y}^{\prime}\subset\mathcal{Y}_{\mathrm{small}}$ for which $d(\boldsymbol{\alpha},\boldsymbol{\beta})\leq r$ . By (13) and (27), the proof is finished. ∎

Bibliography14

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] E. Arıkan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Trans. Inf. Theory , vol. 55, no. 7, pp. 3051–3073, July 2009.
2[2] I. Tal and A. Vardy, “How to construct polar codes,” IEEE Trans. Inf. Theory , vol. 59, no. 10, pp. 6562–6582, October 2013.
3[3] R. Pedarsani, S. H. Hassani, I. Tal, and E. Telatar, “On the construction of polar codes,” in 2011 IEEE Int’l Symp. on Inf. Theory (ISIT) , July 2011, pp. 11–15.
4[4] I. Tal, A. Sharov, and A. Vardy, “Constructing polar codes for non-binary alphabets and MA Cs,” in 2012 IEEE Int’l Symp. on Inf. Theory (ISIT) , July 2012, pp. 2132–2136.
5[5] T. C. Gulcu, M. Ye, and A. Barg, “Construction of polar codes for arbitrary discrete memoryless channels,” in 2016 IEEE Int’l Symp. on Inf. Theory (ISIT) , July 2016, pp. 51–55.
6[6] U. Pereg and I. Tal, “Channel upgradation for non-binary input alphabets and MA Cs,” IEEE Trans. Inf. Theory , vol. 63, no. 3, pp. 1410–1424, March 2017.
7[7] I. Tal and A. Vardy, “Channel upgrading for semantically-secure encryption on wiretap channels,” in 2013 IEEE Int’l Symp. on Inf. Theory (ISIT) , July 2013, pp. 1561–1565.
8[8] I. Tal, “On the construction of polar codes for channels with moderate input alphabet sizes,” IEEE Trans. Inf. Theory , vol. 63, no. 3, pp. 1501–1509, March 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Greedy-Merge Degrading has Optimal Power-Law

Abstract

I Introduction

II Preliminaries

III Main result

Theorem 1**.**

Corollary 2**.**

Proof.

IV Proof of Theorem 1

IV-A An alternative “distance” function

IV-B Non-intersecting “spheres”

Lemma 3**.**

Proof.

Lemma 4**.**

Proof.

Lemma 5**.**

Proof.

IV-C Weighted “sphere”-packing

Lemma 6**.**

Proof.

Lemma 7**.**

Proof.

Proof of Theorem 1.

Theorem 1.

Corollary 2.

Lemma 3.

Lemma 4.

Lemma 5.

Lemma 6.

Lemma 7.