A Simple Derivation of the Refined Sphere Packing Bound Under Certain   Symmetry Hypotheses

Baris Nakiboglu

arXiv:1904.12780·cs.IT·May 12, 2020

A Simple Derivation of the Refined Sphere Packing Bound Under Certain Symmetry Hypotheses

Baris Nakiboglu

PDF

TL;DR

This paper presents a straightforward derivation of the refined sphere packing bound using Berry-Esseen theorem and information measures, applicable to various channels including Gaussian and non-stationary channels, with explicit non-asymptotic bounds.

Contribution

It introduces a simple derivation method for the sphere packing bound under symmetry hypotheses, incorporating non-asymptotic bounds with explicit error terms.

Findings

01

Derived sphere packing bounds with explicit prefactors for certain channels

02

Established trade-offs in hypothesis testing error probabilities using Berry-Esseen theorem

03

Provided non-asymptotic bounds with concrete approximation errors

Abstract

A judicious application of the Berry-Esseen theorem via suitable Augustin information measures is demonstrated to be sufficient for deriving the sphere packing bound with a prefactor that is $Ω (n^{- 0.5 (1 - E_{s p}^{'} (R))})$ for all codes on certain families of channels -- including the Gaussian channels and the non-stationary Renyi symmetric channels -- and for the constant composition codes on stationary memoryless channels. The resulting non-asymptotic bounds have definite approximation error terms. As a preliminary result that might be of interest on its own, the trade-off between type I and type II error probabilities in the hypothesis testing problem with (possibly non-stationary) independent samples is determined up to some multiplicative constants, assuming that the probabilities of both types of error are decaying exponentially with the number of samples,…

Equations346

P_{e}^{(n)}

P_{e}^{(n)}

P_{e}^{(n)}

P_{e}^{(n)}

P_{e}^{(n)}

P_{e}^{(n)}

P_{e}^{(n)}

P_{e}^{(n)}

W : X \to P (Y) .

W : X \to P (Y) .

W_{[1, n]} (x_{1}^{n})

W_{[1, n]} (x_{1}^{n})

P_{e}^{m}

P_{e}^{m}

P_{e}

D_{α} (w ∥ q)

D_{α} (w ∥ q)

(1 - α) D_{α} (w ∥ q)

(1 - α) D_{α} (w ∥ q)

\frac{d w _{α}^{q}}{d ν}

\frac{d w _{α}^{q}}{d ν}

(1 - α) D_{α} (w ∥ q)

(1 - α) D_{α} (w ∥ q)

ln \frac{d w _{α}^{q}}{d q} - D_{1} (w_{α}^{q} ∥ q)

ln \frac{d w _{α}^{q}}{d q} - D_{1} (w_{α}^{q} ∥ q)

ln \frac{d w _{α}^{q}}{d w} - D_{1} (w_{α}^{q} ∥ w)

P [\sum_{t = 1}^{n} ξ_{t} < τ] - Φ (\frac{τ}{a _{2} n})

P [\sum_{t = 1}^{n} ξ_{t} < τ] - Φ (\frac{τ}{a _{2} n})

a_{2}

a_{2}

a_{3}

Δ

w (Y_{1}^{n} ∖ E)

w (Y_{1}^{n} ∖ E)

q (E)

q (E)

w (Y_{1}^{n} ∖ E)

ξ_{t}

ξ_{t}

B

B

= {y_{1}^{n} : (1 - α) τ_{0} \leq D_{1} (w_{α}^{q} ∥ w) - ln \frac{d w _{α}^{q}}{d w} \leq (1 - α) τ_{1}} .

w_{α}^{q} (E \cap B)

w_{α}^{q} (E \cap B)

w (Y_{1}^{n} ∖ E)

w_{α}^{q} (B)

w_{α}^{q} (B)

= \frac{1}{2 π} \int_{\frac{τ _{0}}{a _{2} n}}^{\frac{τ _{1}}{a _{2} n}} e^{- \nicefrac z^{2} 2} d z - 2 \frac{0.56}{n} \frac{a _{3}}{a _{2} a _{2}}

\geq \frac{1}{2 π} e^{- \frac{( ∣ τ _{0} ∣ \lor ∣ τ _{1} ∣ ) ^{2}}{2 n a _{2}}} \frac{τ _{1} - τ _{0}}{a _{2} n} - 2 \frac{0.56}{n} \frac{a _{3}}{a _{2} a _{2}} .

w_{α}^{q} (B)

w_{α}^{q} (B)

w (Y_{1}^{n} ∖ E)

w (Y_{1}^{n} ∖ E)

= Δ^{α - 1} β^{\frac{α - 1}{α}} n^{- \frac{1}{2 α}} e^{- D_{1} (w_{α}^{q} ∥ w)} .

E

E

q (E)

q (E)

= e^{- D_{1} (w_{α}^{q} ∥ q)} E_{w_{α}^{q}} [\mathds 1_{{ξ \geq E_{w_{α}^{q}} [ξ] + γ}} e^{- α (ξ - E_{w_{α}^{q}} [ξ])}]

\leq e^{- D_{1} (w_{α}^{q} ∥ q) - α γ} \sum_{κ = 0}^{\infty} w_{α}^{q} (E_{κ}) e^{- α κ},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A Simple Derivation of the Refined Sphere Packing Bound Under Certain Symmetry Hypotheses

Barış Nakiboğlu

[email protected] This work is supported by The Science Academy, Turkey, under The Science Academy’s Young Scientist Awards Program (BAGEP) and by The Scientific and Technological Research Council of Turkey (TÜBİTAK) under Grant 119E053.

This paper was presented in part at the 2019 IEEE International Symposium on Information Theory [1].

Abstract

A judicious application of the Berry-Esseen theorem via suitable Augustin information measures is demonstrated to be sufficient for deriving the sphere packing bound with a prefactor that is $\mathit{\Omega}\left(n^{-0.5(1-E_{sp}^{\prime}(R))}\right)$ for all codes on certain families of channels —including the Gaussian channels and the non-stationary Rényi symmetric channels— and for the constant composition codes on stationary memoryless channels. The resulting non-asymptotic bounds have definite approximation error terms. As a preliminary result that might be of interest on its own, the trade-off between type I and type II error probabilities in the hypothesis testing problem with (possibly non-stationary) independent samples is determined up to some multiplicative constants, assuming that the probabilities of both types of error are decaying exponentially with the number of samples, using the Berry-Esseen theorem.

1 Introduction

The decay of the optimal error probability with the block length for rates below the channel capacity has been studied since the early days of the information theory. For certain channels and for certain values of the rate, sharp bounds were found early on. Elias in [2] for the binary symmetric channel, Shannon111The equivalence of the bounds in [3] to (1.1) is proved in Appendix C. in [3] for the additive Gaussian noise channel, and Dobrushin in [4] for the strongly symmetric channels —see the original publication in Russian to avoid typos present in the translation— proved that

[TABLE]

where222In this section, we suppress the dependence of the sphere packing exponent to the channel in our notation and denote it by ${{\mathit{{E}}}_{sp\!}}\left({{{\mathit{{R}}}}}\right)$ , rather than ${{\mathit{{E}}}_{sp\!}}\left({{{\mathit{{R}}}},\!{{{\mathit{{W}}}}}}\right)$ . ${{{{\it{{a}}}}}_{{{{\mathit{{n}}}}}}}\!=\!{{\mathit{{\Theta}}}\left({{{{{{\it{{b}}}}}_{{{{\mathit{{n}}}}}}}}}\right)}$ iff $0<\liminf\limits_{{{\mathit{{n}}}}\to\infty}{\left\lvert{{\frac{{{{{\it{{a}}}}}_{{{{\mathit{{n}}}}}}}}{{{{{\it{{b}}}}}_{{{{\mathit{{n}}}}}}}}}}\right\lvert}\leq\limsup\limits_{{{\mathit{{n}}}}\to\infty}{\left\lvert{{\frac{{{{{\it{{a}}}}}_{{{{\mathit{{n}}}}}}}}{{{{{\it{{b}}}}}_{{{{\mathit{{n}}}}}}}}}}\right\lvert}<\infty$ , ${{\mathit{{E}}}_{sp\!}}\left({\cdot}\right)$ is the sphere packing exponent of the channel, ${{\mathit{{E}}}_{sp\!}^{\prime}}\left({\cdot}\right)$ is its derivative with respect to the rate, ${{\mathit{{R}}}}_{crit}$ is the rate at which the slope of the sphere packing exponent curve is minus one, i.e., ${{\mathit{{E}}}_{sp\!}^{\prime}}\left({{{\mathit{{R}}}}_{crit}}\right)=-1$ , and ${{\mathit{{C}}}}$ is the capacity of the channel. On the other hand, Elias proved in [2] for the binary erasure channels that

[TABLE]

Neither (1.1), nor (1.2), holds for rates below the critical rate. If, however, we replace the equality sign with the greater than or equal to sign, then both (1.1) and (1.2) hold for all rates below the channel capacity. These lower bounds are customarily called sphere packing bounds (SPBs) because of the techniques used in their original derivations.

Derivations of the SPB in [2, 3, 4] relied on the geometric structure of the output space of the channel and parameters that can be defined only for some models. The resulting bounds were expressed in terms of these parameters, as well. Thus it was not even clear that SPBs in[2, 3, 4] can be interpreted as specific instances of a general bound. The evidence for such an interpretation came not from a breakthrough about the lower bounds on the error probability but from a breakthrough about the upper bounds. Gallager’s seminal work [5] unified and generalized the upper bounds on the error probability —at least in terms of the exponent— in all the previous studies. It is only with Gallager’s formulation in [5] that one can express the bounds in [2, 3, 4] as (1.1) and (1.2).

The first complete proof of the SPB for arbitrary discrete stationary product channels333These channels are customarily called discrete memoryless channels, i.e., DMCs. We call them DSPCs in order to underline the stationarity of these channels and the absence of any constraints on their input sets. In principle, such constraints might exist and stationarity might be absent in a discrete channel that is memoryless. (DSPCs) was presented in [6]. According to [6, Thm. 2]

[TABLE]

where ${{{{\it{{a}}}}}_{{{{\mathit{{n}}}}}}}={{\mathit{{O}}}\left({{{{{{\it{{b}}}}}_{{{{\mathit{{n}}}}}}}}}\right)}$ iff there exists a $K\in{\mathbb{R}}_{{}^{{+}}}$ such that ${\left\lvert{{{{{{\it{{a}}}}}_{{{{\mathit{{n}}}}}}}}}\right\lvert}\leq K{{{{\it{{b}}}}}_{{{{\mathit{{n}}}}}}}$ for all ${{\mathit{{n}}}}$ large enough. In the following two years, the SPB was proved first for stationary product channels with finite input sets in [7] and then for (possibly) non-stationary product channels in [8]. Since then, the SPB has been proven for various channel models in [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24], including some quantum information theoretic ones. It is, however, worth noting that a general proof that holds for both Gaussian channels —considered in [3, 9, 10]— and for arbitrary DSPCs —considered in [6]— was absent until recently, see [21] and [22]. These later works on the SPB, i.e., [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24], were primarily interested in establishing the right exponential decay rate; thus, they were content with prefactors of the form $e^{-{{\mathit{{o}}}\left({{{{{\mathit{{n}}}}}}}\right)}}$ , where ${{{{\it{{a}}}}}_{{{{\mathit{{n}}}}}}}={{\mathit{{o}}}\left({{{{{{\it{{b}}}}}_{{{{\mathit{{n}}}}}}}}}\right)}$ iff for all $\epsilon\!>\!0$ the inequality ${\left\lvert{{{{{{\it{{a}}}}}_{{{{\mathit{{n}}}}}}}}}\right\lvert}\leq\epsilon{{{{\it{{b}}}}}_{{{{\mathit{{n}}}}}}}$ holds for all ${{\mathit{{n}}}}$ large enough. Some authors did obtain prefactors of the form $e^{-{{\mathit{{O}}}\left({{\sqrt{{{\mathit{{n}}}}}}}\right)}}$ or $e^{-{{\mathit{{O}}}\left({{\ln{{\mathit{{n}}}}}}\right)}}$ , but obtaining the best possible —if not tight— prefactor was not an actual concern.

The quest for deriving SPBs with tight prefactors was put on the map again by Altuğ and Wagner in [25] and [26]. According to [25, Thm. 1] for any DSPC with a Gallager symmetric444The condition for Gallager symmetry is described in [11, p. 94]. The binary symmetric channel, the binary erasure channel, and channels considered in [4] are symmetric according to this definition. probability transition matrix ${{{\mathit{{W}}}}}$ with positive entries and rate ${{\mathit{{R}}}}$ in $(0,{{\mathit{{C}}}})$ , there exists an $A\in{\mathbb{R}}_{{}^{{+}}}$ such that for any $\epsilon>0$

[TABLE]

for some ${{\mathit{{n}}}}_{0}$ determined by ${{{\mathit{{W}}}}}$ , ${{\mathit{{R}}}}$ , and $\epsilon$ . The corresponding result was established for the constant composition codes on arbitrary DSPCs in [26, Thm. 1]. These results are generalized to classical-quantum channels in [27, Thms. 8 and 14], with a slight improvement, allowing $\epsilon=0$ for the symmetric channels.

The primary tool for the derivations in [25, 26, 27] is the Berry-Esseen theorem, albeit through certain auxiliary results inspired by a theorem of Bahadur and Rao [28], i.e., [25, (74)], [26, Proposition 5], and [27, Thm. 17]. Our main aim in this paper is to demonstrate that the analysis can be simplified and the results can be strengthened and generalized through a more judicious application of the Berry-Esseen theorem via suitable Augustin information measures.

[2] and [3] not only established (1.1) and (1.2) but also obtained closed-form expressions for the upper and lower bounds implicit in (1.1) and (1.2). Dobrushin went one step further and calculated the exact asymptotic behavior of the SPB and the random coding bound by analyzing asymptotic behavior for the lattice and non-lattice cases separately for random variables used to derive the SPB and the random coding bound, see [4, (1.32), (1.33), (1.34)]. Recently, the saddle point approximation is used to derive the SPB with the same asymptotic prefactor [29, Corollary 2], under weaker symmetry hypothesis,555The binary input Gaussian channel and the binary erasure channel satisfy the symmetry hypothesis of [29], but not that of [4]. albeit by assuming a common support for all output distributions of the channel and a non-lattice structure for the random variables involved.666Neither of these assumptions was needed while deriving this result in [4]. The main drawback of the analysis in [29] is the technical conditions that need to be confirmed for applying the saddle point approximation via [30, Proposition 2.3.1].

Remark 1.1.

The proof of [29, Corollary 2] holds only for channels whose Augustin center does not change with the order, i.e., for ${{{\mathit{{W}}}}}$ ’s for which $\exists{{\it{{q}}}}$ such that ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{{\mathit{{W}}}}}}}}\!=\!{{\it{{q}}}}$ for all ${{\mathit{{\alpha}}}}\!\in\!(0,1)$ . Note that for the channels that violate this additional hypothesis, the ${{\mathit{{O}}}\left({{{{\mathit{{n}}}}^{-\nicefrac{{1}}{{2}}}}}\right)}$ approximation error terms in [29, (30)] are $\rho$ dependent because of the implicit ${{{\mathit{{Q}}}}}$ dependence of the ${{\mathit{{O}}}\left({{{{\mathit{{n}}}}^{-\nicefrac{{1}}{{2}}}}}\right)}$ approximation error terms in [29, (12) and (13)]. In order to recover a result similar to [29, Corollary 2] for a channel whose Augustin center changes with the order, one needs a saddle point approximation that holds for a parametric family of i.i.d. sequences of random variables, such as [31, Proposition 1], rather than [29, Lemma 2], which holds for a single i.i.d sequence of random variables.

Let us finish this section with an overview of the paper. In Section 2, we describe our model and notation. In Section 3, first, we recall the connection between the hypothesis testing problem and the tilting, then we derive our primary technical tool using the Berry-Esseen theorem. In Section 4, we review Augustin’s information measures and the sphere packing exponent. In Section 5, we state and prove refined SPBs for various models using Lemma 3.4 and the observations recalled in Section 4. We conclude our presentation with a brief discussion of the results, recent developments, and future work in Section 6.

2 Model and Notation

For any set ${{\mathscr{{X}}}}$ , we denote the set of all probability mass functions that are non-zero only on finitely many elements of ${{\mathscr{{X}}}}$ by ${{{\mathscr{{P}}}}({{{\mathscr{{X}}}}})}$ . For any measurable space $({{\mathscr{{Y}}}},{{\mathcal{{Y}}}})$ , we denote the set of all probability measures on it by ${{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}})}$ . We denote the expected value of a measurable function ${{\mathit{{f}}}}$ under the probability measure ${{{\it{{\mu}}}}}$ by ${\bf E}_{{{{{\it{{\mu}}}}}}}\!\left[{{{\mathit{{f}}}}}\right]$ . Similarly, we denote the variance of ${{\mathit{{f}}}}$ under ${{{\it{{\mu}}}}}$ , i.e., ${\bf E}_{{{{{\it{{\mu}}}}}}}\!\left[{({{\mathit{{f}}}}-{\bf E}_{{{{{\it{{\mu}}}}}}}\!\left[{{{\mathit{{f}}}}}\right])^{2}}\right]$ , by ${\bf V}_{{{{{\it{{\mu}}}}}}}\!\left[{{{\mathit{{f}}}}}\right]$ .

For sets ${{\mathscr{{X}}}}_{1},\ldots,{{\mathscr{{X}}}}_{{{\mathit{{n}}}}}$ we denote their Cartesian product by ${{\mathscr{{X}}}}_{1}^{{{\mathit{{n}}}}}$ and for $\sigma$ -algebras ${{\mathcal{{Y}}}}_{1},\ldots,{{\mathcal{{Y}}}}_{{{\mathit{{n}}}}}$ we denote their product by ${{\mathcal{{Y}}}}_{1}^{{{\mathit{{n}}}}}$ . We use the symbol $\otimes$ to denote the product of measures.

A channel ${{{\mathit{{W}}}}}$ is a function from the input set ${{\mathscr{{X}}}}$ to the set of all probability measures on the output space $({{\mathscr{{Y}}}},{{\mathcal{{Y}}}})$ :

[TABLE]

A channel ${{{\mathit{{W}}}}}$ is called a discrete channel if both ${{\mathscr{{X}}}}$ and ${{\mathcal{{Y}}}}$ are finite sets. The product of ${{{\mathit{{W}}}}_{{{{\mathit{{t}}}}}}}:{{\mathscr{{X}}}}_{{{\mathit{{t}}}}}\to{{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}_{{{\mathit{{t}}}}}})}$ for ${{\mathit{{t}}}}\in\{1,\ldots,{{\mathit{{n}}}}\}$ is a channel of the form ${{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}:{{\mathscr{{X}}}}_{1}^{{{\mathit{{n}}}}}\to{{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}_{1}^{{{\mathit{{n}}}}}})}$ satisfying

[TABLE]

Any channel obtained by curtailing the input set of a length ${{\mathit{{n}}}}$ product channel is called a length ${{\mathit{{n}}}}$ memoryless channel. A product channel ${{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}$ is stationary iff ${{{\mathit{{W}}}}_{{{{\mathit{{t}}}}}}}={{{\mathit{{W}}}}}$ for all ${{\mathit{{t}}}}$ ’s for some ${{{\mathit{{W}}}}}$ . On a stationary channel, we denote the composition (i.e.,the empirical distribution, the type) of each ${{\mathit{{x}}}}_{1}^{{{\mathit{{n}}}}}$ by ${{\mathit{{\varUpsilon}}}}({{\mathit{{x}}}}_{1}^{{{\mathit{{n}}}}})$ ; thus ${{\mathit{{\varUpsilon}}}}({{\mathit{{x}}}}_{1}^{{{\mathit{{n}}}}})\in{{{\mathscr{{P}}}}({{{\mathscr{{X}}}}})}$ .

The pair $({\varPsi},{\varTheta})$ is an $(M,L)$ channel code on ${{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}$ iff

•

The encoding function ${\varPsi}$ is a function from the message set ${{\mathscr{{M}}}}{\!\!~{}\triangleq\!~{}}\{1,2,\ldots,M\}$ to the input set ${{\mathscr{{X}}}}_{1}^{{{\mathit{{n}}}}}$ .

•

The decoding function ${\varTheta}$ is a ${{\mathcal{{Y}}}}_{1}^{{{\mathit{{n}}}}}$ -measurable function from the output set ${{\mathscr{{Y}}}}_{1}^{{{\mathit{{n}}}}}$ to the set ${\widehat{{{\mathscr{{M}}}}}}{\!\!~{}\triangleq\!~{}}\{{\mathscr{{L}}}:{\mathscr{{L}}}\subset{{\mathscr{{M}}}}\mbox{~{}and~{}}{\left\lvert{{{\mathscr{{L}}}}}\right\lvert}\leq L\}$ .

Given an $(M,L)$ channel code $({\varPsi},{\varTheta})$ on ${{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}$ , the conditional error probability ${\it P_{{{\bf e}}}^{{{{\mathit{{m}}}}}}}$ for ${{\mathit{{m}}}}\in{{\mathscr{{M}}}}$ and the average error probability ${\it P_{{{\bf e}}}}$ are defined as

[TABLE]

An encoding function ${\varPsi}$ —hence the corresponding code— on a stationary product channel, satisfies an empirical distribution constraint ${{\mathscr{{A}}}}\subset{{{\mathscr{{P}}}}({{{\mathscr{{X}}}}})}$ iff the composition of all of its codewords are in ${{\mathscr{{A}}}}$ , i.e., iff ${{\mathit{{\varUpsilon}}}}({\varPsi}({{\mathit{{m}}}}))\in{{\mathscr{{A}}}}$ for all ${{\mathit{{m}}}}\in{{\mathscr{{M}}}}$ . A code is called a constant composition code iff all of its codewords have the same composition, i.e., there exists a ${{\it{{p}}}}$ in ${{{\mathscr{{P}}}}({{{\mathscr{{X}}}}})}$ satisfying ${{\mathit{{\varUpsilon}}}}({\varPsi}({{\mathit{{m}}}}))={{\it{{p}}}}$ for all ${{\mathit{{m}}}}\in{{\mathscr{{M}}}}$ .

3 Hypothesis Testing Problem and Berry-Esseen Theorem

Our primary aim in this section is to characterize —up to a multiplicative constant— the asymptotic behavior of type I error probability with the number of samples for a hypothesis testing problem between product measures, when type II error probability is decaying exponentially. We use the Berry-Esseen theorem via the concepts of Rényi divergence and tilted probability measure to do that. Let us, first, recall the definitions of the Rényi divergence and the tilted probability measure.

Definition 3.1.

For any ${{\mathit{{\alpha}}}}\in{\mathbb{R}}_{{}^{{+}}}$ and ${{\it{{w}}}},{{\it{{q}}}}\in{{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}})}$ , the order ${{\mathit{{\alpha}}}}$ Rényi divergence between ${{\it{{w}}}}$ and ${{\it{{q}}}}$ is

[TABLE]

where ${{{\it{{\nu}}}}}$ is any measure satisfying ${{\it{{w}}}}{\prec}{{{\it{{\nu}}}}}$ and ${{\it{{q}}}}{\prec}{{{\it{{\nu}}}}}$ .

The order one Rényi divergence is the Kullback-Leibler divergence. For other orders, the Rényi divergence can be characterized in terms of the Kullback-Leibler divergence too:

[TABLE]

with the convention that ${{\mathit{{\alpha}}}}{{\mathit{{D}}}}_{{1}}\!\left(\left.\!{{{\it{{v}}}}}\right\|{{{\it{{w}}}}}\right)+(1-{{\mathit{{\alpha}}}}){{\mathit{{D}}}}_{{1}}\!\left(\left.\!{{{\it{{v}}}}}\right\|{{{\it{{q}}}}}\right)=\infty$ if it would be otherwise undefined, see [32, Thm. 30]. The characterization given in (3.1) is related to another key concept of our analysis: the tilted probability measure.

Definition 3.2.

For any ${{\mathit{{\alpha}}}}\in{\mathbb{R}}_{{}^{{+}}}$ and ${{\it{{w}}}},{{\it{{q}}}}\in{{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}})}$ satisfying ${{\mathit{{D}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\left.\!{{{\it{{w}}}}}\right\|{{{\it{{q}}}}}\right)<\infty$ , the order ${{\mathit{{\alpha}}}}$ tilted probability measure ${{{{\it{{w}}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{\it{{q}}}}}}}$ is

[TABLE]

If either ${{\mathit{{\alpha}}}}$ is in $(0,1)$ or ${{\mathit{{D}}}}_{{1}}\!\left(\left.\!{{{{{\it{{w}}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{\it{{q}}}}}}}}\right\|{{{\it{{w}}}}}\right)$ is finite, then the tilted probability measure is the unique probability measure achieving the infimum in (3.1) by [32, Thm. 30], i.e.,

[TABLE]

Furthermore, under the same hypothesis the identities

[TABLE]

hold ${{{{\it{{w}}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{\it{{q}}}}}}}$ -a.s., where ${{{{\it{{w}}}}}_{{ac}}}$ is the component of ${{\it{{w}}}}$ that is absolutely continuous in ${{\it{{q}}}}$ .

Let us proceed with recalling the Berry-Esseen theorem.

Lemma 3.3 ([33, 34, 35]).

Let $\{{{\xi}_{{{{\mathit{{t}}}}}}}\}_{{{\mathit{{t}}}}\in{\mathbb{Z}}_{{}^{{+}}}}$ be independent zero-mean random variables satisfying $\sum\nolimits_{{{\mathit{{t}}}}=1}^{{{\mathit{{n}}}}}{\bf E}_{{\!}}\!\left[{{{{\xi}_{{{{\mathit{{t}}}}}}}^{2}}}\right]<\infty$ . Then there exists an absolute constant $\omega\leq 0.5600$ satisfying

[TABLE]

where ${{{{\it{{a}}}}}_{{{\kappa}}}}=\frac{1}{{{\mathit{{n}}}}}\sum\nolimits_{{{\mathit{{t}}}}=1}^{{{\mathit{{n}}}}}{\bf E}_{{\!}}\!\left[{{{\left\lvert{{{{\xi}_{{{{\mathit{{t}}}}}}}}}\right\lvert}^{{\kappa}}}}\right]$ and ${{{{\mathit{{\Phi}}}}}\left({{{{\it{{s}}}}}}\right)}=\tfrac{1}{\sqrt{2\pi}}\int_{-\infty}^{{{\it{{s}}}}}e^{-\nicefrac{{{{\mathit{{z}}}}^{2}}}{{2}}}{\mathrm{d}{{{\mathit{{z}}}}}}.$

Lemma 3.4, in the following, characterizes the trade-off between type I and type II error probabilities for a hypothesis testing problem with independent samples, assuming that both error probabilities are decaying —at least— exponentially with the number of samples. Lemma 3.4, which is derived using the Berry-Esseen theorem, can be interpreted as a refinement of [6, Thm. 5], which is derived using Chebyshev’s inequality.

Lemma 3.4.

For any ${{\mathit{{\alpha}}}}\in(0,1)$ , ${{\mathit{{n}}}}\in{\mathbb{Z}}_{{}^{{+}}}$ , ${{{{\it{{w}}}}}_{{{{\mathit{{t}}}}}}},{{{{\it{{q}}}}}_{{{{\mathit{{t}}}}}}}\in{{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}_{{{\mathit{{t}}}}}})}$ , let ${{{{\it{{w}}}}}_{{{{\mathit{{t}}}},ac}}}$ be the component of ${{{{\it{{w}}}}}_{{{{\mathit{{t}}}}}}}$ that is absolutely continuous in ${{{{\it{{q}}}}}_{{{{\mathit{{t}}}}}}}$ and let ${{{{\it{{a}}}}}_{{2}}}$ , ${{{{\it{{a}}}}}_{{3}}}$ , and ${{\mathit{{\varDelta}}}}$ be

[TABLE]

where ${{\it{{w}}}}=\otimes_{{{\mathit{{t}}}}=1}^{{{\mathit{{n}}}}}{{{{\it{{w}}}}}_{{{{\mathit{{t}}}}}}}$ and ${{\it{{q}}}}=\otimes_{{{\mathit{{t}}}}=1}^{{{\mathit{{n}}}}}{{{{\it{{q}}}}}_{{{{\mathit{{t}}}}}}}$ . Then for any ${{\mathscr{{E}}}}\in{{\mathcal{{Y}}}}_{1}^{{{\mathit{{n}}}}}$ and $\beta\geq{{\mathit{{n}}}}^{-\nicefrac{{1}}{{2}}}e^{-{{\mathit{{\alpha}}}}\sqrt{{{{{\it{{a}}}}}_{{2}}}{{\mathit{{n}}}}}}$ , satisfying ${{\it{{q}}}}({{\mathscr{{E}}}})\leq\beta e^{-{{\mathit{{D}}}}_{{1}}\!\left(\left.\!{{{{{\it{{w}}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{\it{{q}}}}}}}}\right\|{{{\it{{q}}}}}\right)}$ , we have

[TABLE]

provided that $\beta\leq{{\mathit{{\varDelta}}}}^{-{{\mathit{{\alpha}}}}}{{\mathit{{n}}}}^{-\nicefrac{{1}}{{2}}}e^{{{\mathit{{\alpha}}}}\sqrt{{{{{\it{{a}}}}}_{{2}}}{{\mathit{{n}}}}}}$ . Furthermore, for any ${{\mathit{{\alpha}}}}\in(0,1)$ and $\beta\in{\mathbb{R}}_{{}^{{+}}}$ , there exists an event ${{\mathscr{{E}}}}\in{{\mathcal{{Y}}}}_{1}^{{{\mathit{{n}}}}}$ such that

[TABLE]

Proof of Lemma 3.4.

Let the random variables ${{\xi}_{{{{\mathit{{t}}}}}}}$ and ${{\xi}}$ and the event ${\mathscr{{B}}}$ be

[TABLE]

Thus ${{\xi}}=\ln\tfrac{{\mathrm{d}{{{{{\it{{w}}}}}_{{ac}}}}}}{{\mathrm{d}{{{\it{{q}}}}}}}$ holds ${{\it{{q}}}}$ -a.s. by the definitions of ${{\xi}_{{{{\mathit{{t}}}}}}}$ and ${{\xi}}$ . Hence (3.3), (3.4), and the definition of ${\mathscr{{B}}}$ imply that

[TABLE]

Thus for any ${{\mathscr{{E}}}}\in{{\mathcal{{Y}}}}_{1}^{{{\mathit{{n}}}}}$ , we have

[TABLE]

On the other hand, ${{\xi}_{{{{\mathit{{t}}}}}}}$ ’s are jointly independent under the tilted probability measure ${{{{\it{{w}}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{\it{{q}}}}}}}$ . Thus the Berry-Esseen theorem, given in Lemma 3.3, implies

[TABLE]

If we set $\tau_{0}=-\tfrac{2\ln\beta+\ln{{\mathit{{n}}}}}{2{{\mathit{{\alpha}}}}}-\ln{{\mathit{{\varDelta}}}}$ and $\tau_{1}=-\tfrac{2\ln\beta+\ln{{\mathit{{n}}}}}{2{{\mathit{{\alpha}}}}}$ , then $-\sqrt{{{{{\it{{a}}}}}_{{2}}}{{\mathit{{n}}}}}\leq\tau_{0}\leq\tau_{1}\leq\sqrt{{{{{\it{{a}}}}}_{{2}}}{{\mathit{{n}}}}}$ by the hypothesis and

[TABLE]

Furthermore, ${{{{\it{{w}}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{\it{{q}}}}}}}({{\mathscr{{E}}}}\cap{\mathscr{{B}}})\leq\tfrac{1}{\sqrt{{{\mathit{{n}}}}}}$ as a result of (3.8), $e^{{{\mathit{{\alpha}}}}\tau_{1}}=\tfrac{1}{\beta\sqrt{{{\mathit{{n}}}}}}$ , and the hypothesis ${{\it{{q}}}}({{\mathscr{{E}}}})\leq\beta e^{-{{\mathit{{D}}}}_{{1}}\!\left(\left.\!{{{{{\it{{w}}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{\it{{q}}}}}}}}\right\|{{{\it{{q}}}}}\right)}$ . Thus ${{{{\it{{w}}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{\it{{q}}}}}}}({\mathscr{{B}}}\setminus{{\mathscr{{E}}}})\geq\tfrac{1}{\sqrt{{{\mathit{{n}}}}}}$ . Then using (3.9) and $e^{(1-{{\mathit{{\alpha}}}})\tau_{0}}=\beta^{\frac{{{\mathit{{\alpha}}}}-1}{{{\mathit{{\alpha}}}}}}{{\mathit{{n}}}}^{\frac{{{\mathit{{\alpha}}}}-1}{2{{\mathit{{\alpha}}}}}}{{\mathit{{\varDelta}}}}^{{{\mathit{{\alpha}}}}-1}$ we get

[TABLE]

Remark 3.5.

While deriving bounds similar to (3.5), the constants $\tau_{0}$ and $\tau_{1}$ are usually assumed to satisfy $\tau_{0}=-\tau_{1}$ , see for example [6, Thm. 5] or [27, Thm. 11]. Such a choice, however, does not lead to tight bounds in our case.

To establish the existence of an event satisfying both (3.6) and (3.7), let us consider the event ${{\mathscr{{E}}}}$ given in the following

[TABLE]

where $\gamma$ is a real number to be determined later and ${{{\it{{\nu}}}}}$ is any measure satisfying both ${{\it{{w}}}}{\prec}{{{\it{{\nu}}}}}$ and ${{\it{{q}}}}{\prec}{{{\it{{\nu}}}}}$ .

Remark 3.6.

The random variable ${{\xi}}$ is defined only for ${{\mathit{{y}}}}_{1}^{{{\mathit{{n}}}}}$ ’s with a positive $\tfrac{{\mathrm{d}{{{\it{{q}}}}}}}{{\mathrm{d}{{{{\it{{\nu}}}}}}}}$ . Thus one can define ${{\xi}}$ to be infinite for ${{\mathit{{y}}}}_{1}^{{{\mathit{{n}}}}}$ ’s satisfying both $\tfrac{{\mathrm{d}{{{\it{{q}}}}}}}{{\mathrm{d}{{{{\it{{\nu}}}}}}}}=0$ and $\tfrac{{\mathrm{d}{{{\it{{w}}}}}}}{{\mathrm{d}{{{{\it{{\nu}}}}}}}}>0$ , and define the event ${{\mathscr{{E}}}}$ to be the event that ${{\xi}}$ is greater than or equal to ${\bf E}_{{{{{{\it{{w}}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{\it{{q}}}}}}}}}\!\left[{{{\xi}}}\right]+\gamma$ .

For the event ${{\mathscr{{E}}}}$ defined in (3.10), as a result of (3.3) we have

[TABLE]

where the event ${{\mathscr{{E}}}}_{{\kappa}}$ is defined for each ${\kappa}\in{\mathbb{Z}}$ to be

[TABLE]

On the other hand, we can bound ${{{{\it{{w}}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{\it{{q}}}}}}}({{\mathscr{{E}}}}_{{\kappa}})$ uniformly for all integers ${\kappa}$ using the Berry-Esseen theorem, i.e.,Lemma 3.3, as follows

[TABLE]

For $\gamma=\tfrac{1}{{{\mathit{{\alpha}}}}}\ln\left[\tfrac{1}{\sqrt{{{\mathit{{n}}}}}}\left(\tfrac{1}{\sqrt{2\pi{{{{\it{{a}}}}}_{{2}}}}}+2\tfrac{0.56{{{{\it{{a}}}}}_{{3}}}}{{{{{\it{{a}}}}}_{{2}}}\sqrt{{{{{\it{{a}}}}}_{{2}}}}}\right)\tfrac{\beta^{-1}}{1-e^{-{{\mathit{{\alpha}}}}}}\right]$ , (3.6) follows from (3.11), (3.12), and $\sum\nolimits_{{\kappa}=0}^{\infty}e^{-{{\mathit{{\alpha}}}}{\kappa}}=\tfrac{1}{1-e^{-{{\mathit{{\alpha}}}}}}$ .

${{\it{{w}}}}({{{\mathscr{{Y}}}}_{1}^{{{\mathit{{n}}}}}}\setminus{{\mathscr{{E}}}})$ is bounded following a similar analysis, by invoking (3.4), instead of (3.3):

[TABLE]

Invoking first (3.12) and then $\tfrac{1}{1-e^{{{\mathit{{\alpha}}}}-1}}(1-e^{-{{\mathit{{\alpha}}}}})^{\frac{{{\mathit{{\alpha}}}}-1}{{{\mathit{{\alpha}}}}}}\leq\tfrac{1}{1-{{\mathit{{\alpha}}}}}{{\mathit{{\alpha}}}}^{\frac{{{\mathit{{\alpha}}}}-1}{{{\mathit{{\alpha}}}}}}e^{\frac{1}{2{{\mathit{{\alpha}}}}}}$ we get,

[TABLE]

Then (3.7) follows from $\left(\tfrac{1}{\sqrt{8\pi}}+\tfrac{0.56{{{{\it{{a}}}}}_{{3}}}}{{{{{\it{{a}}}}}_{{2}}}}\right)\leq\tfrac{1\vee\sqrt{8\pi{{{{\it{{a}}}}}_{{2}}}}}{8\pi\sqrt{{{{{\it{{a}}}}}_{{2}}}e}}\ln{{\mathit{{\varDelta}}}}$ . ∎

Lemma 3.4 characterizes the asymptotic behavior of the trade-off between the optimal type I and type II error probabilities for a hypothesis testing problem with independent samples: ${\it P_{{{\bf e}}}}_{II}^{(n)}$ is ${{\mathit{{\Theta}}}\left({{{{\mathit{{n}}}}^{-\frac{1}{2{{\mathit{{\alpha}}}}}}e^{-{{\mathit{{D}}}}_{{1}}\!\left(\left.\!{{{{{\it{{w}}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{\it{{q}}}}}}}}\right\|{{{\it{{w}}}}}\right)}}}\right)}$ whenever ${\it P_{{{\bf e}}}}_{I}^{(n)}$ is ${{\mathit{{\Theta}}}\left({{e^{-{{\mathit{{D}}}}_{{1}}\!\left(\left.\!{{{{{\it{{w}}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{\it{{q}}}}}}}}\right\|{{{\it{{q}}}}}\right)}}}\right)}$ . For the stationary case, —i.e.,when ${{{{\it{{w}}}}}_{{{{\mathit{{t}}}}}}}={{{{\it{{w}}}}}_{{1}}}$ , ${{{{\it{{q}}}}}_{{{{\mathit{{t}}}}}}}={{{{\it{{q}}}}}_{{1}}}$ for all ${{\mathit{{t}}}}$ — Csiszár and Longo [36] described how (3.3) and (3.4) can be used together with an earlier result by Strassen, [37, Thm. 1.1], to characterize the exact asymptotic behavior of ${\it P_{{{\bf e}}}}_{II}^{(n)}$ for the case when ${\it P_{{{\bf e}}}}_{I}^{(n)}=e^{-{{\mathit{{D}}}}_{{1}}\!\left(\left.\!{{{{{\it{{w}}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{\it{{q}}}}}}}}\right\|{{{\it{{q}}}}}\right)}$ , i.e., ${\it P_{{{\bf e}}}}_{II}^{(n)}=\left(K+{{\mathit{{o}}}\left({{1}}\right)}\right){{\mathit{{n}}}}^{-\frac{1}{2{{\mathit{{\alpha}}}}}}e^{-{{\mathit{{D}}}}_{{1}}\!\left(\left.\!{{{{{\it{{w}}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{\it{{q}}}}}}}}\right\|{{{\it{{w}}}}}\right)}$ , with some minor inaccuracies discussed in Remark 3.7. One does not need to rely on [37, Thm. 1.1] of Strassen to characterize this exact asymptotic behavior. The Berry-Esseen theorem, however, is not sufficient for determining the value of the constant $K$ . In order to determine the constant $K$ , one needs to invoke either finer characterizations of the asymptotic behavior of sums of independent random variables —such as the ones in [38, §IV.2,§IV.3], [39, §42,§43]— or apply other techniques —such as the saddle point approximation described in [30, Prop. 2.3.1]. It is worth noting that both of these approaches require hypotheses stronger than that of the Berry-Esseen theorem. The situation is similar for other values of ${{\mathit{{\alpha}}}}$ , but of no interest for our discussion of the sphere packing bound.

Remark 3.7.

We believe the approach of [36] is sound. Its calculations, however, seem to have some mistakes. Repeating the calculations as described in [36], we recover the second line of [36, (33)] as $\ln\tfrac{{{\mathit{{\alpha}}}}^{\!\ast}}{1-{{\mathit{{\alpha}}}}^{\!\ast}}-\frac{\ln S_{1}\sqrt{2\pi}}{{{\mathit{{\alpha}}}}^{\!\ast}}+{{\mathit{{o}}}\left({{1}}\right)}$ . With this modification [36, Thm. 2] is consistent with the intimately related results about the SPB proved earlier [4, (1.32), (1.33)] and since then [29, (38)].

4 Augustin’s Information Measure and The Sphere Packing Exponent

The ultimate aim of this section is to define the sphere packing exponent and review the properties of it that will be useful in our analysis. For that, we first recall the definitions of Augustin’s information measures and review their elementary properties.

4-A Augustin’s Information Measures

Let us start by recalling the definition of the conditional Rényi divergence.

Definition 4.1.

For any ${{\mathit{{\alpha}}}}\in{\mathbb{R}}_{{}^{{+}}}$ , ${{{\mathit{{W}}}}}:{{\mathscr{{X}}}}\to{{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}})}$ , ${{\it{{q}}}}\in{{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}})}$ , and ${{\it{{p}}}}\in{{{\mathscr{{P}}}}({{{\mathscr{{X}}}}})}$ the order ${{\mathit{{\alpha}}}}$ conditional Rényi divergence for the input distribution ${{\it{{p}}}}$ is

[TABLE]

Definition 4.2.

For any ${{\mathit{{\alpha}}}}\in{\mathbb{R}}_{{}^{{+}}}$ , ${{{\mathit{{W}}}}}:{{\mathscr{{X}}}}\to{{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}})}$ , and ${{\it{{p}}}}\in{{{\mathscr{{P}}}}({{{\mathscr{{X}}}}})}$ the order ${{\mathit{{\alpha}}}}$ Augustin information for the input distribution ${{\it{{p}}}}$ is

[TABLE]

The infimum is achieved by a unique probability measure777We refrain from including the channel symbol ${{{\mathit{{W}}}}}$ in the symbol for the Augustin mean because the channel will be clear from the context. ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{p}}}}}}}$ , called the order ${{\mathit{{\alpha}}}}$ Augustin mean for the input distribution ${{\it{{p}}}}$ , by [40, Lemma LABEL:C-lem:information]. Furthermore,

[TABLE]

for all ${{\it{{q}}}}\in{{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}})}$ by [40, Lemma LABEL:C-lem:information], as well.

The Augustin information is continuously differentiable in its order on ${\mathbb{R}}_{{}^{{+}}}$ , and its derivative is given by

[TABLE]

by [40, Lemma LABEL:C-lem:informationO-(LABEL:C-informationO:differentiability)], where ${{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{p}}}}}}}}}}$ is the tilted channel defined as follows.

Definition 4.3.

For any ${{\mathit{{\alpha}}}}\in{\mathbb{R}}_{{}^{{+}}}$ , ${{{\mathit{{W}}}}}:{{\mathscr{{X}}}}\to{{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}})}$ and ${{\it{{q}}}}\in{{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}})}$ , the order ${{\mathit{{\alpha}}}}$ tilted channel ${{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{\it{{q}}}}}}}$ is a function from $\{{{\mathit{{x}}}}:{{\mathit{{D}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\left.\!{{{{\mathit{{W}}}}}({{\mathit{{x}}}})}\right\|{{{\it{{q}}}}}\right)<\infty\}$ to ${{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}})}$ given by

[TABLE]

The tilted channel can be used to express ${{\mathit{{I}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\!{{{\it{{p}}}}};\!{{{{\mathit{{W}}}}}}\!\right)$ in terms of the Kullback-Leibler divergences using888It is worth noting that (4.3) follows from (3.2) and ${{\mathit{{I}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\!{{{\it{{p}}}}};\!{{{{\mathit{{W}}}}}}\!\right)={{\mathit{{D}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\left.\!\left.\!{{{{\mathit{{W}}}}}}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{p}}}}}}}}\right|{{{{\it{{p}}}}}}\right)$ for ${{\mathit{{\alpha}}}}$ values in $(0,1)$ . [40, Lemma LABEL:C-lem:information-(LABEL:C-information:alternative)]:

[TABLE]

Furthermore, the Augustin mean satisfies

[TABLE]

and Augustin mean is the only probability measure satisfying both ${{{{\it{{q}}}}}_{{1,{{\it{{p}}}}}}}{\prec}{{\it{{q}}}}$ and $\sum\nolimits_{{{\mathit{{x}}}}}{{\it{{p}}}}({{\mathit{{x}}}}){{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{\it{{q}}}}}}}({{\mathit{{x}}}})={{\it{{q}}}}$ by [40, Lemma LABEL:C-lem:information], where ${{{{\it{{q}}}}}_{{1,{{\it{{p}}}}}}}=\sum_{{{\mathit{{x}}}}}{{\it{{p}}}}({{\mathit{{x}}}}){{{\mathit{{W}}}}}({{\mathit{{x}}}})$ . Thus for all ${{\mathit{{\alpha}}}}\in{\mathbb{R}}_{{}^{{+}}}$ we have

[TABLE]

Definition 4.4.

For any ${{\mathit{{\alpha}}}}\in{\mathbb{R}}_{{}^{{+}}}$ , ${{{\mathit{{W}}}}}:{{\mathscr{{X}}}}\to{{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}})}$ , and ${{\mathscr{{A}}}}\subset{{{\mathscr{{P}}}}({{{\mathscr{{X}}}}})}$ , the order ${{\mathit{{\alpha}}}}$ Augustin capacity of ${{{\mathit{{W}}}}}$ for the constraint set ${{\mathscr{{A}}}}$ is

[TABLE]

When the constraint set ${{\mathscr{{A}}}}$ is the whole ${{{\mathscr{{P}}}}({{{\mathscr{{X}}}}})}$ , we denote the order ${{\mathit{{\alpha}}}}$ Augustin capacity by ${{\mathit{{C}}}}_{{{{\mathit{{\alpha}}}}},{{{{\mathit{{W}}}}}}}$ , i.e., ${{\mathit{{C}}}}_{{{{\mathit{{\alpha}}}}},{{{{\mathit{{W}}}}}}}{\!\!~{}\triangleq\!~{}}{{\mathit{{C}}}}_{{{{\mathit{{\alpha}}}}},{{{{\mathit{{W}}}}}},{{{{\mathscr{{P}}}}({{{\mathscr{{X}}}}})}}}$ .

Using the definitions of the Augustin information and capacity, we obtain the following expression for the latter

[TABLE]

If ${{\mathscr{{A}}}}$ is convex, then the order of the supremum and the infimum can be changed as a result of [40, Thm. LABEL:C-thm:minimax]:

[TABLE]

If in addition ${{\mathit{{C}}}}_{{{{\mathit{{\alpha}}}}},{{{{\mathit{{W}}}}}},{{{\mathscr{{A}}}}}}$ is finite, then there exists a unique probability measure ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{{\mathit{{W}}}}},{{\mathscr{{A}}}}}}}$ , called the order ${{\mathit{{\alpha}}}}$ Augustin center of ${{{\mathit{{W}}}}}$ for the constraint set ${{\mathscr{{A}}}}$ , satisfying

[TABLE]

by [40, Thm. LABEL:C-thm:minimax].

We denote the set of all probability mass functions satisfying a cost constraint ${{\mathit{{\varrho}}}}$ by ${{\mathscr{{A}}}}({{\mathit{{\varrho}}}})$ , i.e.

[TABLE]

With a slight abuse of notation, we denote the cost-constrained Augustin capacity ${{\mathit{{C}}}}_{{{{\mathit{{\alpha}}}}},{{{{\mathit{{W}}}}}},{{{\mathscr{{A}}}}({{\mathit{{\varrho}}}})}}$ by ${{\mathit{{C}}}}_{{{{\mathit{{\alpha}}}}},{{{{\mathit{{W}}}}}},{{{\mathit{{\varrho}}}}}}$ , as well. A more detailed discussion of Augustin’s information measures can be found in [40].

4-B The Sphere Packing Exponent

Definition 4.5.

For any ${{{\mathit{{W}}}}}:{{\mathscr{{X}}}}\to{{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}})}$ , ${{\mathscr{{A}}}}\subset{{{\mathscr{{P}}}}({{{\mathscr{{X}}}}})}$ , and ${{\mathit{{R}}}}\in{\mathbb{R}}_{{}^{{+}}}$ , the sphere packing exponent (SPE) is

[TABLE]

When the constraint set ${{\mathscr{{A}}}}$ is the whole ${{{\mathscr{{P}}}}({{{\mathscr{{X}}}}})}$ , we denote SPE by ${{\mathit{{E}}}_{sp\!}}\left({{{\mathit{{R}}}},{{{\mathit{{W}}}}}}\right)$ , i.e. ${{\mathit{{E}}}_{sp\!}}\left({{{\mathit{{R}}}},{{{\mathit{{W}}}}}}\right){\!\!~{}\triangleq\!~{}}{{\mathit{{E}}}_{sp\!}}\left({{{\mathit{{R}}}},{{{\mathit{{W}}}}},{{{\mathscr{{P}}}}({{{\mathscr{{X}}}}})}}\right)$ .

With a slight abuse of notation, we denote SPE for ${{\mathscr{{A}}}}({{\mathit{{\varrho}}}})$ by ${{\mathit{{E}}}_{sp\!}}\left({{{\mathit{{R}}}},{{{\mathit{{W}}}}},{{\mathit{{\varrho}}}}}\right)$ and SPE for ${{\mathscr{{A}}}}=\{{{\it{{p}}}}\}$ case by ${{\mathit{{E}}}_{sp\!}}\left({{{\mathit{{R}}}},{{{\mathit{{W}}}}},{{\it{{p}}}}}\right)$ . Note that as a result of definitions of Augustin capacity and the SPE we have

[TABLE]

Furthermore, since ${{\mathit{{I}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\!{{{\it{{p}}}}};\!{{{{\mathit{{W}}}}}}\!\right)$ is continuously differentiable in ${{\mathit{{\alpha}}}}$ by [40, Lemma LABEL:C-lem:informationO-(LABEL:C-informationO:differentiability)], we can apply the derivative test to find the optimal ${{\mathit{{\alpha}}}}$ in (4.11) for ${{\mathscr{{A}}}}=\{{{\it{{p}}}}\}$ case: using (4.1) and (4.3) we get

[TABLE]

On the other hand, either ${{\mathit{{D}}}}_{{1}}\!\left(\left.\!\left.\!{{{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{p}}}}}}}}}}}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{p}}}}}}}}\right|{{{{\it{{p}}}}}}\right)={{\mathit{{I}}}}_{{1}}\!\left(\!{{{\it{{p}}}}};\!{{{{\mathit{{W}}}}}}\!\right)$ for all positive orders ${{\mathit{{\alpha}}}}$ , or ${{\mathit{{D}}}}_{{1}}\!\left(\left.\!\left.\!{{{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{p}}}}}}}}}}}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{p}}}}}}}}\right|{{{{\it{{p}}}}}}\right)$ is increasing and continuous in the order ${{\mathit{{\alpha}}}}$ on ${\mathbb{R}}_{{}^{{+}}}$ by [40, Lemma LABEL:C-lem:informationO-(LABEL:C-informationO:monotonicityofharoutunianinformation)]. Furthermore, ${{\mathit{{D}}}}_{{1}}\!\left(\left.\!\left.\!{{{{\mathit{{W}}}}_{{1}}^{{{{{{\it{{q}}}}}_{{1,{{\it{{p}}}}}}}}}}}\right\|{{{{{\it{{q}}}}}_{{1,{{\it{{p}}}}}}}}\right|{{{{\it{{p}}}}}}\right)$ is equal to ${{\mathit{{I}}}}_{{1}}\!\left(\!{{{\it{{p}}}}};\!{{{{\mathit{{W}}}}}}\!\right)$ by definition and $\lim\nolimits_{{{\mathit{{\alpha}}}}\downarrow 0}{{\mathit{{D}}}}_{{1}}\!\left(\left.\!\left.\!{{{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{p}}}}}}}}}}}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{p}}}}}}}}\right|{{{{\it{{p}}}}}}\right)$ is equal to $\lim\nolimits_{{{\mathit{{\alpha}}}}\downarrow 0}{{\mathit{{I}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\!{{{\it{{p}}}}};\!{{{{\mathit{{W}}}}}}\!\right)$ by (4.5) and [40, Lemma LABEL:C-lem:informationO-(LABEL:C-informationO:limitofharoutunianinformation)]. Thus for any rate ${{\mathit{{R}}}}$ in $(\lim_{{{\mathit{{\alpha}}}}\downarrow 0}{{\mathit{{I}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\!{{{\it{{p}}}}};\!{{{{\mathit{{W}}}}}}\!\right),{{\mathit{{I}}}}_{{1}}\!\left(\!{{{\it{{p}}}}};\!{{{{\mathit{{W}}}}}}\!\right))$ there exists an order ${{\mathit{{\alpha}}}}^{\!\ast}\in(0,1)$ satisfying

[TABLE]

by the intermediate value theorem [41, 4.23]. The order ${{\mathit{{\alpha}}}}^{\!\ast}$ satisfying (4.14) is unique because ${{\mathit{{D}}}}_{{1}}\!\left(\left.\!\left.\!{{{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{p}}}}}}}}}}}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{p}}}}}}}}\right|{{{{\it{{p}}}}}}\right)$ is increasing in ${{\mathit{{\alpha}}}}$ . The monotonicity of ${{\mathit{{D}}}}_{{1}}\!\left(\left.\!\left.\!{{{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{p}}}}}}}}}}}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{p}}}}}}}}\right|{{{{\it{{p}}}}}}\right)$ in ${{\mathit{{\alpha}}}}$ and (4.13) also imply ${{\mathit{{E}}}_{sp\!}}\left({{{\mathit{{R}}}},{{{\mathit{{W}}}}},{{\it{{p}}}}}\right)=\tfrac{1-{{\mathit{{\alpha}}}}^{\!\ast}}{{{\mathit{{\alpha}}}}^{\!\ast}}\left({{\mathit{{I}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast}}}\!\left(\!{{{\it{{p}}}}};\!{{{{\mathit{{W}}}}}}\!\right)-{{\mathit{{R}}}}\right)$ . Thus as a result of (4.3), the unique ${{\mathit{{\alpha}}}}^{\!\ast}$ satisfying (4.14) also satisfies

[TABLE]

Since ${{\mathit{{D}}}}_{{1}}\!\left(\left.\!\left.\!{{{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{p}}}}}}}}}}}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{p}}}}}}}}\right|{{{{\it{{p}}}}}}\right)$ is continuous and increasing in ${{\mathit{{\alpha}}}}$ , its inverse is increasing and continuous, as well. Thus the definition of SPE given in (4.11) and the definition of derivative as a limit imply that for any ${{\mathit{{R}}}}$ in $(\lim_{{{\mathit{{\alpha}}}}\downarrow 0}{{\mathit{{I}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\!{{{\it{{p}}}}};\!{{{{\mathit{{W}}}}}}\!\right),{{\mathit{{I}}}}_{{1}}\!\left(\!{{{\it{{p}}}}};\!{{{{\mathit{{W}}}}}}\!\right))$ the unique order ${{\mathit{{\alpha}}}}^{\!\ast}$ satisfying (4.14) also satisfies

[TABLE]

as was established in [22, Lemma LABEL:D-lem:spherepacking-cc].

5 The Refined Sphere Packing Bound

In this section, we consider the channel coding problem for various channel models and derive lower bounds to the error probability of the following form

[TABLE]

for constants $A$ and ${{\mathit{{n}}}}_{0}$ determined by the rate, the channel, and the constraints on the codes —if there exist any. Following [25, 26, 27], we call these bounds refined sphere packing bounds (refined SPBs) because of their resemblance to the standard SPBs, e.g. [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24], establishing999The approximation error terms in standard SPBs are usually ${{\mathit{{O}}}\left({{\sqrt{{{\mathit{{n}}}}}}}\right)}$ or ${{\mathit{{O}}}\left({{\ln{{\mathit{{n}}}}}}\right)}$ , rather than just ${{\mathit{{o}}}\left({{{{\mathit{{n}}}}}}\right)}$ .

[TABLE]

The refined SPBs that we state and prove in this section are not formally particular cases of a general proposition. Nevertheless, they are all consequences of Lemma 3.4 and the properties of Augustin’s information measures.

We establish a refined SPB for the constant composition codes in Subsection 5-A, for codes on (possibly) non-stationary Rényi symmetric channels in Subsection 5-B, and for codes on additive white Gaussian noise channels with quadratic cost functions in Subsection 5-C.

5-A Constant Composition Codes

Theorem 5.1.

For any ${{{\mathit{{W}}}}}:{{\mathscr{{X}}}}\to{{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}})}$ , $M,L,{{\mathit{{n}}}}\in{\mathbb{Z}}_{{}^{{+}}}$ , ${{\it{{p}}}}\in{{{\mathscr{{P}}}}({{{\mathscr{{X}}}}})}$ satisfying $\lim\nolimits_{{{\mathit{{\alpha}}}}\downarrow 0}{{\mathit{{I}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\!{{{\it{{p}}}}};\!{{{{\mathit{{W}}}}}}\!\right)<\tfrac{1}{{{\mathit{{n}}}}}\ln\tfrac{M}{L}<{{\mathit{{I}}}}_{{1}}\!\left(\!{{{\it{{p}}}}};\!{{{{\mathit{{W}}}}}}\!\right)$ and ${{\mathit{{n}}}}{{\it{{p}}}}({{\mathit{{x}}}})\in{\mathbb{Z}}_{{}^{{\geq 0}}}$ for all ${{\mathit{{x}}}}\in{{\mathscr{{X}}}}$ , the order ${{\mathit{{\alpha}}}}^{\!\ast}{\!\!~{}\triangleq\!~{}}\tfrac{1}{1-{{\mathit{{E}}}_{sp\!}^{\prime}}\left({\frac{1}{{{\mathit{{n}}}}}\ln\frac{M}{L},{{{\mathit{{W}}}}},{{\it{{p}}}}}\right)}$ satisfies

[TABLE]

Furthermore, any $(M,L)$ channel code of length ${{\mathit{{n}}}}$ whose codewords all have the same composition ${{\it{{p}}}}$ satisfies

[TABLE]

provided that $\sqrt{{{{{\it{{a}}}}}_{{2}}}{{\mathit{{n}}}}}-\tfrac{\ln 4{{\mathit{{n}}}}}{2{{\mathit{{\alpha}}}}^{\!\ast}}\geq\ln{{\mathit{{\varDelta}}}}$ where

[TABLE]

Although Theorem 5.1 itself is composition-dependent, it implies —via appropriate worst-case assumptions— composition-independent bounds, such as [26, Thm 1]. Similar composition-dependent [27, Proposition 13] and composition independent [27, Thm 8] bounds have, recently, been derived for classical-quantum channels using an approach similar to that of [26]. The primary advantages of Theorem 5.1 over the previous results are the conceptual simplicity and brevity of its proof and its definite approximation error terms.

Proof of Theorem 5.1.

The existence of a unique order ${{\mathit{{\alpha}}}}^{\!\ast}$ satisfying (5.3) was proved, and its value was determined in Section 4, see (4.14), (4.15), and (4.16).

Let probability measures ${{{{\it{{w}}}}}_{{{{\mathit{{m}}}}}}}$ , ${{\it{{q}}}}$ , and ${{{{\it{{v}}}}}_{{{{\mathit{{m}}}}}}}$ in ${{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}_{1}^{{{\mathit{{n}}}}}})}$ be

[TABLE]

Then ${{{{\it{{v}}}}}_{{{{\mathit{{m}}}}}}}$ is equal to the order ${{\mathit{{\alpha}}}}^{\!\ast}$ tilted probability measure between ${{{{\it{{w}}}}}_{{{{\mathit{{m}}}}}}}$ and ${{\it{{q}}}}$ . Furthermore, the empirical distribution of the all of the codewords —i.e.,all ${\varPsi}({{\mathit{{m}}}})$ ’s— are equal to ${{\it{{p}}}}$ by the hypothesis; thus we have

[TABLE]

On the other hand, $\sum_{m\in{{\mathscr{{M}}}}}{{\it{{q}}}}({{\mathit{{m}}}}\in{\varTheta})\leq L$ by the definition of list decoding. Thus at least half of the messages in ${{\mathscr{{M}}}}$ —at least $\lfloor\tfrac{M+1}{2}\rfloor$ of the messages in ${{\mathscr{{M}}}}$ to be precise— will satisfy ${{\it{{q}}}}({{\mathit{{m}}}}\in{\varTheta})\leq\tfrac{2L}{M}$ as a result of Markov’s inequality. Applying Lemma 3.4 with ${{\mathscr{{E}}}}=\{{{\mathit{{y}}}}_{1}^{{{\mathit{{n}}}}}:{{\mathit{{m}}}}\in{\varTheta}({{\mathit{{y}}}}_{1}^{{{\mathit{{n}}}}})\}$ and $\beta=2$ for the messages satisfying ${{\it{{q}}}}({{\mathit{{m}}}}\in{\varTheta})\leq\tfrac{2L}{M}$ , we get

[TABLE]

as long as $\sqrt{{{{{\it{{a}}}}}_{{2}}}{{\mathit{{n}}}}}-\tfrac{\ln 4{{\mathit{{n}}}}}{2{{\mathit{{\alpha}}}}^{\!\ast}}\geq\ln{{\mathit{{\varDelta}}}}$ . Then (5.4) follows from (4.14), (4.15), (5.3), and the definition error probability as the average of the conditional error probabilities of the messages. ∎

5-B Codes On Rényi Symmetric Channels

Definition 5.2.

A channel ${{{\mathit{{W}}}}}:{{\mathscr{{X}}}}\to{{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}})}$ satisfying ${{{\mathit{{W}}}}}{\prec}{{{\it{{\nu}}}}}$ for some ${{{\it{{\nu}}}}}\in{{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}})}$ is Rényi symmetric iff for each ${{\mathit{{\alpha}}}}\in{\mathbb{R}}_{{}^{{+}}}$ with finite ${{\mathit{{C}}}}_{{{{\mathit{{\alpha}}}}},{{{{\mathit{{W}}}}}}}$ there exists a function ${{\mathit{{G}}}}_{{{\mathit{{\alpha}}}}}^{{{{\mathit{{W}}}}}}(\cdot):{\mathbb{R}}\to[0,1]$ satisfying

[TABLE]

Remark 5.3.

If ${{{\mathit{{W}}}}}$ is Rényi symmetric, then the identity $\lim\nolimits_{{{\it{{s}}}}\downarrow-\infty}{{\mathit{{G}}}}_{{{\mathit{{\alpha}}}}}^{{{{\mathit{{W}}}}}}({{\it{{s}}}})=0$ holds whenever ${{\mathit{{C}}}}_{{{{\mathit{{\alpha}}}}},{{{{\mathit{{W}}}}}}}$ is finite. On the other hand, the identity $\lim\nolimits_{{{\it{{s}}}}\uparrow\infty}{{\mathit{{G}}}}_{{{\mathit{{\alpha}}}}}^{{{{\mathit{{W}}}}}}({{\it{{s}}}})=1$ is violated whenever ${{{\mathit{{W}}}}}({{\mathit{{x}}}}){\nprec}{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{{\mathit{{W}}}}}}}}$ , which can only happen for ${{\mathit{{\alpha}}}}$ ’s in $(0,1)$ . Such a Rényi symmetric ${{{\mathit{{W}}}}}$ is obtained by removing ${{{{\it{{w}}}}}^{{{\imath},{\imath}}}}$ from ${\cal W}_{{\imath}}$ described in [42, Example LABEL:A-eg:singular-countable] and the resulting ${{\mathit{{G}}}}_{{{\mathit{{\alpha}}}}}^{{{{\mathit{{W}}}}}}$ is given by ${{\mathit{{G}}}}_{{{\mathit{{\alpha}}}}}^{{{{\mathit{{W}}}}}}({{\it{{s}}}})=(\nicefrac{{1}}{{2}}){\mathds{1}_{\{{{\it{{s}}}}\geq\ln\!\nicefrac{{1}}{{2}}\}}}$ .

The Rényi symmetry holds for all input symmetric channels described in [43, Definition 3.2] and for all the Gallager symmetric channels described in [11, p. 94], see Appendix A. Recall that the Gallager symmetry holds for all strongly symmetric (Dobrushin symmetric) channels, which is described in [4]. The binary symmetric channel is strongly symmetric. The binary erasure channel is Gallager symmetric but not strongly symmetric. The binary input Gaussian channel is input symmetric according but not Gallager symmetric. The Rayleigh fading channel with per coherence interval power constraint analyzed in [44, (3)] is Rényi symmetric, by [44, (7) and (10)], but not input symmetric, see [44, (5)].

Remark 5.4.

The input symmetry described in [43, Definition 3.2] can be generalized by relying on a compact group with the associated Haar measure, rather than a finite additive group with the uniform distribution. The Rayleigh fading channel with per coherence interval power constraint analyzed in [44] is input symmetric for this more general definition. The covariant channels analyzed by Holevo in [45], can be seen as the counterparts of [43, Definition 3.2] and its generalization in the framework of Quantum Information Theory.

The derivation of the refined SPB for the Rényi symmetric channels is analogous to the derivation of the refined SPB for the constant composition codes. Lemma 5.5, given in the following, is used in lieu of (4.14), (4.15), (4.16) in the latter derivation.

Lemma 5.5.

For any Rényi symmetric channel ${{{\mathit{{W}}}}}:{{\mathscr{{X}}}}\to{{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}})}$ with finite ${{\mathit{{C}}}}_{{1},{{{{\mathit{{W}}}}}}}$ and rate ${{\mathit{{R}}}}$ in $(\lim_{{{\mathit{{\alpha}}}}\downarrow 0}{{\mathit{{C}}}}_{{{{\mathit{{\alpha}}}}},{{{{\mathit{{W}}}}}}},{{\mathit{{C}}}}_{{1},{{{{\mathit{{W}}}}}}})$ there exists an order ${{\mathit{{\alpha}}}}^{\!\ast}\in(0,1)$ such that

[TABLE]

Furthermore, if either ${{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}}}\left(\left.\left\{\tfrac{{\mathrm{d}{{{{\mathit{{W}}}}}({{\mathit{{x}}}})}}}{{\mathrm{d}{{{{\it{{\nu}}}}}}}}=\gamma\tfrac{{\mathrm{d}{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}}}{{\mathrm{d}{{{{\it{{\nu}}}}}}}}\right\}\right|{{\mathit{{x}}}}\right)<1$ for all $\gamma\in{\mathbb{R}}_{{}^{{+}}}$ or ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{{\mathit{{W}}}}}}}}={{\it{{q}}}}$ for all ${{\mathit{{\alpha}}}}\in(0,1]$ , then

[TABLE]

Lemma 5.5 is proved in Appendix B. Proving the essential assertions of Lemma 5.5 for input symmetric channels, however, is considerably easier: for any input symmetric channel ${{{\mathit{{W}}}}}$ and the uniform probability mass function ${{\it{{u}}}}$ on its input set, ${{\mathit{{C}}}}_{{{{\mathit{{\alpha}}}}},{{{{\mathit{{W}}}}}}}={{\mathit{{I}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\!{{{\it{{u}}}}};\!{{{{\mathit{{W}}}}}}\!\right)$ and ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{{\mathit{{W}}}}}}}}={{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{u}}}}}}}$ for all ${{\mathit{{\alpha}}}}\in{\mathbb{R}}_{{}^{{+}}}$ . Consequently, the identities given in (5.6), (5.7), and (5.8) are nothing but the identities given in (4.14), (4.15), and (4.16) for ${{\it{{p}}}}={{\it{{u}}}}$ case because the Kullback-Leibler divergences on the right-hand-sides of (5.6) and (5.7) have the same value for all ${{\mathit{{x}}}}$ by the symmetry. Hence, (5.8) holds for any input symmetric channels satisfying $\lim_{{{\mathit{{\alpha}}}}\downarrow 0}{{\mathit{{C}}}}_{{{{\mathit{{\alpha}}}}},{{{{\mathit{{W}}}}}}}<{{\mathit{{C}}}}_{{1},{{{{\mathit{{W}}}}}}}$ by (4.16), as well.

Remark 5.6.

If $\tfrac{{\mathrm{d}{{{{\mathit{{W}}}}}({{\mathit{{x}}}})}}}{{\mathrm{d}{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{u}}}}}}}}}}=\gamma$ holds ${{{\mathit{{W}}}}}({{\mathit{{x}}}})$ -a.s. for all ${{\mathit{{x}}}}$ for a $(\gamma,{{\mathit{{\alpha}}}})$ pair for an input symmetric ${{{\mathit{{W}}}}}\!$ , then ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{u}}}}}}}=\sum_{{{\mathit{{x}}}}}{{\it{{u}}}}({{\mathit{{x}}}}){{{\mathit{{W}}}}_{{{{\mathit{{\eta}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{u}}}}}}}}}}({{\mathit{{x}}}})$ for all ${{\mathit{{\eta}}}}$ . Thus ${{{{\it{{q}}}}}_{{{{\mathit{{\eta}}}},{{\it{{u}}}}}}}={{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{\it{{u}}}}}}}$ and ${{\mathit{{I}}}}_{{{{\mathit{{\eta}}}}}}\!\left(\!{{{\it{{u}}}}};\!{{{{\mathit{{W}}}}}}\!\right)=\ln\gamma$ for all ${{\mathit{{\eta}}}}\in{\mathbb{R}}_{{}^{{+}}}$ by [40, Lemma LABEL:C-lem:information]. Thus such a $(\gamma,{{\mathit{{\alpha}}}})$ pair does not exists for input symmetric ${{{\mathit{{W}}}}}$ ’s satisfying $\lim_{{{\mathit{{\alpha}}}}\downarrow 0}{{\mathit{{C}}}}_{{{{\mathit{{\alpha}}}}},{{{{\mathit{{W}}}}}}}\neq{{\mathit{{C}}}}_{{1},{{{{\mathit{{W}}}}}}}$ .

Remark 5.7.

Lemma 5.5 is stated under the finite ${{\mathit{{C}}}}_{{1},{{{{\mathit{{W}}}}}}}$ hypothesis, yet it holds under the weaker hypothesis $\lim\nolimits_{{{\mathit{{\alpha}}}}\uparrow}\tfrac{1-{{\mathit{{\alpha}}}}}{{{\mathit{{\alpha}}}}}{{\mathit{{C}}}}_{{{{\mathit{{\alpha}}}}},{{{{\mathit{{W}}}}}}}$ , as well. However, establishing Lemma 5.5 under this weaker hypothesis would require us to introduce the concepts of power mean, Rényi information, and compactness in the topology of setwise convergence, see [42, Lemma LABEL:A-lem:capacityEXT-(LABEL:A-capacityEXT-compact-N)].

Theorem 5.8.

Let ${{{\mathit{{W}}}}_{{{{\mathit{{t}}}}}}}:{{\mathscr{{X}}}}_{{{\mathit{{t}}}}}\to{{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}_{{{\mathit{{t}}}}}})}$ be a Rényi symmetric channel with finite ${{\mathit{{C}}}}_{{1},{{{{\mathit{{W}}}}_{{{{\mathit{{t}}}}}}}}}$ for all ${{\mathit{{t}}}}\in{\mathbb{Z}}_{{}^{{+}}}$ and ${{\mathit{{n}}}},M,L\in{\mathbb{Z}}_{{}^{{+}}}$ satisfy $\lim_{{{\mathit{{\alpha}}}}\downarrow 0}{{\mathit{{C}}}}_{{{{\mathit{{\alpha}}}}},{{{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}}}<\ln\tfrac{M}{L}<{{\mathit{{C}}}}_{{1},{{{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}}}$ . Then there exists an order ${{\mathit{{\alpha}}}}^{\!\ast}\in(0,1)$ satisfying

[TABLE]

where ${{{\mathit{{U}}}}_{{{{\mathit{{\alpha}}}}}}}{\!\!~{}\triangleq\!~{}}\{{{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}\}_{{{\mathit{{\alpha}}}}}^{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}}}}}$ for all ${{\mathit{{\alpha}}}}\in(0,1)$ . Furthermore any $(M,L)$ channel code on ${{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}$ satisfies

[TABLE]

provided that $\sqrt{{{{{\it{{a}}}}}_{{2}}}{{\mathit{{n}}}}}-\tfrac{\ln 4{{\mathit{{n}}}}}{2{{\mathit{{\alpha}}}}^{\!\ast}}\geq\ln{{\mathit{{\varDelta}}}}$ where

[TABLE]

and ${{\xi}_{{{{\mathit{{\alpha}}}}}}^{{{{\mathit{{t}}}}}}}({{\mathit{{x}}}}_{{{\mathit{{t}}}}})=\ln\tfrac{{\mathrm{d}{{{{\mathit{{W}}}}_{{{{\mathit{{t}}}}}}}({{\mathit{{x}}}}_{{{\mathit{{t}}}}})}}}{{\mathrm{d}{{{{\it{{\nu}}}}}_{{{\mathit{{t}}}}}}}}-\ln\tfrac{{\mathrm{d}{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{{\mathit{{W}}}}_{{{{\mathit{{t}}}}}}}}}}}}}{{\mathrm{d}{{{{\it{{\nu}}}}}_{{{\mathit{{t}}}}}}}}$ for all ${{\mathit{{\alpha}}}}\in(0,1)$ .

Furthermore, if $\{{{{\mathit{{W}}}}_{{{{\mathit{{t}}}}}}}\}_{{{\mathit{{\alpha}}}}^{\!\ast}}^{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}_{{{{\mathit{{t}}}}}}}}}}}\left(\left.\left\{\tfrac{{\mathrm{d}{{{{\mathit{{W}}}}_{{{{\mathit{{t}}}}}}}({{\mathit{{x}}}}_{{{\mathit{{t}}}}})}}}{{\mathrm{d}{{{{\it{{\nu}}}}}_{{{\mathit{{t}}}}}}}}=\gamma\tfrac{{\mathrm{d}{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}_{{{{\mathit{{t}}}}}}}}}}}}}{{\mathrm{d}{{{{\it{{\nu}}}}}_{{{\mathit{{t}}}}}}}}\right\}\right|{{\mathit{{x}}}}_{{{\mathit{{t}}}}}\right)<1$ for all $\gamma\in{\mathbb{R}}_{{}^{{+}}}$ for some ${{\mathit{{t}}}}$ or ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}}}}={{\it{{q}}}}$ for all ${{\mathit{{\alpha}}}}\in(0,1]$ , then ${{\mathit{{\alpha}}}}^{\!\ast}=\tfrac{1}{1-{{\mathit{{E}}}_{sp\!}^{\prime}}\left({\ln\frac{M}{L},{{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}}\right)}$ .

Note that if any of the component channels, i.e.,any of the ${{{\mathit{{W}}}}_{{{{\mathit{{t}}}}}}}$ ’s, is an input symmetric channel satisfying $\lim_{{{\mathit{{\alpha}}}}\downarrow 0}{{\mathit{{C}}}}_{{{{\mathit{{\alpha}}}}},{{{{\mathit{{W}}}}_{{{{\mathit{{t}}}}}}}}}<{{\mathit{{C}}}}_{{1},{{{{\mathit{{W}}}}_{{{{\mathit{{t}}}}}}}}}$ , then ${{\mathit{{\alpha}}}}^{\!\ast}=\tfrac{1}{1-{{\mathit{{E}}}_{sp\!}^{\prime}}\left({\ln\frac{M}{L},{{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}}\right)}$ holds as a result of Remark 5.6.

Theorem 5.8 does not assume the channel to be stationary, i.e., it holds even when ${{{\mathit{{W}}}}_{{{{\mathit{{t}}}}}}}$ ’s are not identical. To the best of our knowledge, refined sphere packing bounds have only been reported for stationary channels before —even in the case of symmetric channels considered in [4, (1.28)], [25, Thm. 1], [27, Thm. 14], [29, Corollary 2], [31, Thm. 4], [44, (36), (37b)], [46, Thm. 1].

For the stationary input symmetric channels Theorem 5.8 is tight both in terms of exponent and prefactor for rates above the critical rate, provided that channel is not singular. For the case of the singular stationary input symmetric channels, Altuğ and Wagner [46] have recently reported a sharper result, which generalizes Elias’s result in [2] for the binary erasure channels. In order to obtain such results, however, merely plugging in bounds on binary hypothesis testing is not enough, see Section 6 for a more detailed discussion.

Remark 5.9.

Theorem 5.8 is derived using Lemma 3.4, which is stated for the product measures. Lemma 3.4, however, holds for any ${{\it{{w}}}}$ and ${{\it{{q}}}}$ for which ${{\xi}_{{{{\mathit{{t}}}}}}}$ ’s are independent random variables under ${{{{\it{{w}}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{\it{{q}}}}}}}$ . This condition is satisfied by the output distributions and the Augustin centers on the product channels with feedback, i.e.,by ${{{\mathit{{W}}}}_{{\overrightarrow{{[1,{{\mathit{{n}}}}]}}}}}({{\mathit{{x}}}})$ and ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{{\mathit{{W}}}}_{{\overrightarrow{{[1,{{\mathit{{n}}}}]}}}}}}}}$ , provided that the component channels are Rényi symmetric. Thus Theorem 5.8 holds not just for codes on the product channel ${{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}$ but also for codes on the product channels with feedback ${{{\mathit{{W}}}}_{{\overrightarrow{{[1,{{\mathit{{n}}}}]}}}}}$ . Similar observations have been used to establish the SPB on product channels with feedback in [47, 48, 18, 19, 46]. The formal definition of the product channels with feedback and the proof of the SPB on these channels without the symmetry assumptions can be found in [18, 19, 20].

Proof of Theorem 5.8.

The Rényi symmetry and (4.9) imply

[TABLE]

On the other hand, [40, Lemma LABEL:C-lem:capacityproduct] implies

[TABLE]

Then the product structure ${{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}({{\mathit{{x}}}}_{1}^{{{\mathit{{n}}}}})=\otimes_{{{\mathit{{t}}}}=1}^{{{\mathit{{n}}}}}{{{\mathit{{W}}}}_{{{{\mathit{{t}}}}}}}({{\mathit{{x}}}}_{{{\mathit{{t}}}}})$ and the Rényi symmetry of ${{{\mathit{{W}}}}_{{{{\mathit{{t}}}}}}}$ ’s imply the Rényi symmetry of ${{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}$ . In particular

[TABLE]

where ${\circledast}$ denotes the convolution and ${{\mathit{{g}}}}$ ’s are the density functions of the corresponding ${{\mathit{{G}}}}$ ’s, i.e., ${{\mathit{{g}}}}$ ’s and ${{\mathit{{G}}}}$ ’s are uniquely determined by each other via the following relation

[TABLE]

Since ${{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}$ is Rényi symmetric the existence of an order ${{\mathit{{\alpha}}}}^{\!\ast}\in(0,1)$ satisfying (5.9) follows from (5.6) of Lemma 5.5.

Let probability measures ${{{{\it{{w}}}}}_{{{{\mathit{{m}}}}}}}$ , ${{\it{{q}}}}$ , and ${{{{\it{{v}}}}}_{{{{\mathit{{m}}}}}}}$ in ${{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}_{1}^{{{\mathit{{n}}}}}})}$ be

[TABLE]

Note that ${{{{\it{{w}}}}}_{{{{\mathit{{m}}}}}}}={{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}({\varPsi}({{\mathit{{m}}}}))$ by definition, ${{\it{{q}}}}={{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}}}}$ by (5.13), and ${{{{\it{{v}}}}}_{{{{\mathit{{m}}}}}}}$ is equal to the order ${{\mathit{{\alpha}}}}^{\!\ast}$ tilted probability measure between ${{{{\it{{w}}}}}_{{{{\mathit{{m}}}}}}}$ and ${{\it{{q}}}}$ , which is equal to ${{{\mathit{{U}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast}}}}({\varPsi}({{\mathit{{m}}}}))$ , by construction. Then Lemma 5.5 implies

[TABLE]

On the other hand, $\sum_{m\in{{\mathscr{{M}}}}}{{\it{{q}}}}({{\mathit{{m}}}}\in{\varTheta})\leq L$ by the definition of list decoding. Thus at least half of the messages in ${{\mathscr{{M}}}}$ —at least $\lfloor\tfrac{M+1}{2}\rfloor$ of the messages in ${{\mathscr{{M}}}}$ to be precise— will satisfy ${{\it{{q}}}}({{\mathit{{m}}}}\in{\varTheta})\leq\tfrac{2L}{M}$ as a result of Markov’s inequality. Applying Lemma 3.4 with ${{\mathscr{{E}}}}=\{{{\mathit{{y}}}}_{1}^{{{\mathit{{n}}}}}:{{\mathit{{m}}}}\in{\varTheta}({{\mathit{{y}}}}_{1}^{{{\mathit{{n}}}}})\}$ and $\beta=2$ for the messages satisfying ${{\it{{q}}}}({{\mathit{{m}}}}\in{\varTheta})\leq\tfrac{2L}{M}$ , we get

[TABLE]

provided that $\sqrt{{{{{\it{{a}}}}}_{{2}}}{{\mathit{{n}}}}}-\tfrac{\ln 4{{\mathit{{n}}}}}{2{{\mathit{{\alpha}}}}^{\!\ast}}\geq\ln{{\mathit{{\varDelta}}}}$ . Then (5.10) follows from the definition ${\it P_{{{\bf e}}}}$ as the average of ${\it P_{{{\bf e}}}^{{{{\mathit{{m}}}}}}}$ .

Note that as a result of (5.14), ${{\mathit{{g}}}}_{{{\mathit{{\alpha}}}}^{\!\ast}}^{{{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}}$ is a Dirac delta function iff all ${{\mathit{{g}}}}_{{{\mathit{{\alpha}}}}^{\!\ast}}^{{{{\mathit{{W}}}}_{{{{\mathit{{t}}}}}}}}$ ’s are. This observation together with Lemma 5.5 implies the sufficient condition for ${{\mathit{{\alpha}}}}^{\!\ast}$ to be $\tfrac{1}{1-{{\mathit{{E}}}_{sp\!}^{\prime}}\left({\ln\frac{M}{L},{{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}}\right)}$ . ∎

5-C Codes On Additive White Gaussian Noise Channels

The additive white Gaussian noise channel with noise variance $\sigma^{2}$ is described via the following transition probability

[TABLE]

where ${{{{\mathit{{\varphi}}}}}_{{\sigma^{2}}}}$ is the zero-mean Gaussian probability density function of variance $\sigma^{2}$ , i.e.,

[TABLE]

With a slight abuse of notation, we denote the corresponding probability measure —i.e., zero-mean Gaussian probability measure of variance $\sigma^{2}$ — by ${{{{\mathit{{\varphi}}}}}_{{\sigma^{2}}}}$ , as well.

If the cost function is the quadratic one, then the zero-mean Gaussian distribution is the maximizer for the Augustin information among all input distributions satisfying the cost constraint —for any positive order— see [40, Example LABEL:C-eg:SGauss], i.e.

[TABLE]

Furthermore, the order ${{\mathit{{\alpha}}}}$ Augustin center of this channel is a zero-mean Gaussian probability measure. The closed-form expression for the Augustin capacity and Augustin center are derived in [40, (LABEL:C-eq:eg:SGauss-capacity), (LABEL:C-eq:eg:SGauss-center), (LABEL:C-eq:eg:SGauss-center-variance)]:

[TABLE]

One can confirm the following identity by substitution

[TABLE]

In fact, $\theta_{{{\mathit{{\alpha}}}}}$ is a root of the equality given above because of a fixed point property similar to the one described in (4.4), see [40, (LABEL:C-eq:eg:SGauss-Augustinoperator), (LABEL:C-eq:eg:SGauss-necessarycondition)] and the ensuing discussion.

The sphere packing exponent expression resulting from (5.16), (5.17), (5.18), and (5.19), is derived in [22, Example LABEL:D-eg:SGauss]. It is given by the following parametric form in [22, (LABEL:D-eq:eg:SGauss-parametric-rate), (LABEL:D-eq:eg:SGauss-parametric-spe)]:

[TABLE]

Using (5.18) and (5.20), one can express the unique ${{\mathit{{\alpha}}}}^{\!\ast}$ whose rate is ${{\mathit{{R}}}}$ as a function of ${{\mathit{{R}}}}$ , as well:

[TABLE]

The equivalence of the parametric form given in (5.20) and (5.21) to the expression given by Gallager [11, (7.4.33)] can be confirmed by substitution using (5.22). One can also confirm using (5.18) and (5.19) in (5.20) and (5.21) that

[TABLE]

Furthermore, codes on additive white Gaussian noise channels satisfy both (1.1) and (5.1) as a result of Shannon’s [3, (3)]; this is established in Appendix C, for completeness. Theorem 5.10, which is establishing the refined SPB given in (5.1), is proved using principles and analysis similar to those used in the proofs of Theorems 5.1 and 5.8, which are quite different from the ones employed in [3].

Theorem 5.10.

Let $\sigma$ and ${{\mathit{{\varrho}}}}$ be positive reals, ${{\mathit{{n}}}}$ , $M$ , and $L$ be positive integers satisfying ${{\mathit{{R}}}}\in\left(0,\tfrac{1}{2}\ln\tfrac{\sigma^{2}+{{\mathit{{\varrho}}}}}{\sigma^{2}}\right)$ for ${{\mathit{{R}}}}=\tfrac{1}{{{\mathit{{n}}}}}\ln\tfrac{M}{L}$ , and ${{{\mathit{{W}}}}_{{{{\mathit{{t}}}}}}}$ be an additive white Gaussian noise channel with noise variance $\sigma^{2}$ , say ${{{\mathit{{W}}}}}$ , for all ${{\mathit{{t}}}}$ . Then any $(M,L)$ channel code $({\varPsi},{\varTheta})$ on ${{{\mathit{{W}}}}_{{[1,{{\mathit{{n}}}}]}}}$ satisfying $\sum_{{{\mathit{{t}}}}=1}^{{{\mathit{{n}}}}}({\varPsi}_{{{\mathit{{t}}}}}({{\mathit{{m}}}}))^{2}={{\mathit{{n}}}}{{\mathit{{\varrho}}}}$ for all messages ${{\mathit{{m}}}}$ in the message set ${{\mathscr{{M}}}}$ satisfies

[TABLE]

for ${{\mathit{{\alpha}}}}^{\!\ast}$ given in (5.22) provided that $\sqrt{{{{{\it{{a}}}}}_{{2}}}{{\mathit{{n}}}}}-\tfrac{\ln{{\mathit{{n}}}}}{2{{\mathit{{\alpha}}}}^{\!\ast}}\geq{{\mathit{{\varDelta}}}}$ , where

[TABLE]

Before proving of Theorem 5.10, let us briefly discuss its implications. Theorem 5.10 bounds the performance of codes satisfying an equality cost constraint, but it can also be used to bound the performance of codes satisfying an inequality cost constraint. In particular, Shannon has observed in [3, (83)] that

[TABLE]

where ${\it P_{{{\bf e}}}}({{\mathit{{n}}}},{{\mathit{{\varrho}}}})$ is the infimum of error probabilities of $(M,L)$ channel codes satisfying $\sum_{{{\mathit{{t}}}}=1}^{{{\mathit{{n}}}}}({\varPsi}_{{{\mathit{{t}}}}}({{\mathit{{m}}}}))^{2}\leq{{\mathit{{n}}}}{{\mathit{{\varrho}}}}$ and $\widetilde{{\it P_{{{\bf e}}}}}({{\mathit{{n}}}},{{\mathit{{\varrho}}}})$ is the analogous quantity for the constraint $\sum_{{{\mathit{{t}}}}=1}^{{{\mathit{{n}}}}}({\varPsi}_{{{\mathit{{t}}}}}({{\mathit{{m}}}}))^{2}={{\mathit{{n}}}}{{\mathit{{\varrho}}}}$ . The first inequality of (5.25) holds because any code satisfying the equality constraint also satisfies the inequality constraint. The second inequality of (5.25) is confirmed by considering an extension of codewords by one additional symbol, ${\varPsi}_{{{\mathit{{n}}}}+1}({{\mathit{{m}}}})$ , so as to satisfy $\sum_{{{\mathit{{t}}}}=1}^{{{\mathit{{n}}}}+1}({\varPsi}_{{{\mathit{{t}}}}}({{\mathit{{m}}}}))^{2}=({{\mathit{{n}}}}+1){{\mathit{{\varrho}}}}$ . Recently, Vazquez-Vilar have improved (5.25) in [49, Proposition 1] by observing that the same extension can be constructed for the constraint $\sum_{{{\mathit{{t}}}}=1}^{{{\mathit{{n}}}}+1}({\varPsi}_{{{\mathit{{t}}}}}({{\mathit{{m}}}}))^{2}={{\mathit{{n}}}}{{\mathit{{\varrho}}}}$ , as well. Thus, we have

[TABLE]

One can use Theorem 5.10 together with either (5.25) or (5.26) to determine prefactor for codes satisfying the cost constraint with an inequality. For that let us first note that ${{\mathit{{E}}}_{sp\!}}\left({{{\mathit{{R}}}},{{{\mathit{{W}}}}},{{\mathit{{\varrho}}}}}\right)$ is convex in the rate ${{\mathit{{R}}}}$ as a result of (5.22) and (5.23) because ${{\mathit{{\alpha}}}}^{\!\ast}$ is increasing monotonically with ${{\mathit{{R}}}}$ on $[0,{{\mathit{{C}}}}_{{1},{{{{\mathit{{W}}}}}},{{{\mathit{{\varrho}}}}}}]$ . Then ${{\mathit{{E}}}_{sp\!}}\left({{{\mathit{{R}}}},{{{\mathit{{W}}}}},{{\mathit{{\varrho}}}}}\right)$ lies above its tangent at any point and (5.23) implies

[TABLE]

Applying Theorem 5.10 at $({{\mathit{{n}}}}+1)$ —rather than ${{\mathit{{n}}}}$ — together with (5.25) and invoking (5.27) for ${{\mathit{{R}}}}_{0}=\tfrac{1}{{{\mathit{{n}}}}+1}\ln\tfrac{M}{L}$ and ${{\mathit{{R}}}}_{1}={{\mathit{{R}}}}$ , we get the following bound for $(M,L)$ channel codes $({\varPsi},{\varTheta})$ satisfying $\sum_{{{\mathit{{t}}}}=1}^{{{\mathit{{n}}}}}({\varPsi}_{{{\mathit{{t}}}}}({{\mathit{{m}}}}))^{2}\leq{{\mathit{{n}}}}{{\mathit{{\varrho}}}}$

[TABLE]

where ${{\mathit{{\alpha}}}}^{\!\ast}_{0}={{\mathit{{\alpha}}}}^{\!\ast}({{\mathit{{R}}}}_{0})$ and ${{{{\it{{a}}}}}_{{2}}}$ , ${{{{\it{{a}}}}}_{{3}}}$ , and ${{\mathit{{\varDelta}}}}$ are calculated at ${{\mathit{{\alpha}}}}^{\!\ast}_{0}$ , rather than ${{\mathit{{\alpha}}}}^{\!\ast}$ , provided that $\sqrt{{{{{\it{{a}}}}}_{{2}}}({{\mathit{{n}}}}+1)}-\tfrac{\ln({{\mathit{{n}}}}+1)}{2{{\mathit{{\alpha}}}}^{\!\ast}_{0}}\geq{{\mathit{{\varDelta}}}}$ . Note that ${\left\lvert{{{{\mathit{{R}}}}-{{\mathit{{R}}}}_{0}}}\right\lvert}=\tfrac{{{\mathit{{R}}}}}{{{\mathit{{n}}}}+1}$ and the function ${{\mathit{{\alpha}}}}^{\!\ast}$ given in (5.22) is analytical in the rate ${{\mathit{{R}}}}$ ; hence ${\left\lvert{{{{\mathit{{\alpha}}}}^{\!\ast}_{0}-{{\mathit{{\alpha}}}}^{\!\ast}}}\right\lvert}$ is ${{\mathit{{O}}}\left({{{{\mathit{{n}}}}^{-1}}}\right)}$ . Thus

[TABLE]

for some constant $A\in{\mathbb{R}}_{{}^{{+}}}$ and (5.1), holds not only for codes satisfying $\sum_{{{\mathit{{t}}}}=1}^{{{\mathit{{n}}}}}({\varPsi}_{{{\mathit{{t}}}}}({{\mathit{{m}}}}))^{2}={{\mathit{{n}}}}{{\mathit{{\varrho}}}}$ but also for codes satisfying $\sum_{{{\mathit{{t}}}}=1}^{{{\mathit{{n}}}}}({\varPsi}_{{{\mathit{{t}}}}}({{\mathit{{m}}}}))^{2}\leq{{\mathit{{n}}}}{{\mathit{{\varrho}}}}$ , for the order ${{\mathit{{\alpha}}}}^{\!\ast}$ given in (5.22), for some constant $A\in{\mathbb{R}}_{{}^{{+}}}$ for ${{\mathit{{n}}}}$ large enough.

Proof of Theorem 5.10.

The following expressions for the order one Rényi divergence and the order ${{\mathit{{\alpha}}}}$ tilted channel for the zero-mean Gaussian distribution of variance $\theta$ can be confirmed by substitution

[TABLE]

for all ${{\mathit{{x}}}}\in{\mathbb{R}}$ , $\theta\in{\mathbb{R}}_{{}^{{+}}}$ , and ${{\mathscr{{E}}}}\in{{{\mathcal{{B}}}}}({{\mathbb{R}}})$ . Then as a result of (5.19), (5.20), (5.21), and (5.22), we have

[TABLE]

where ${{\mathit{{R}}}}=\tfrac{1}{{{\mathit{{n}}}}}\ln\tfrac{M}{L}$ .

Let probability measures ${{{{\it{{w}}}}}_{{{{\mathit{{m}}}}}}}$ , ${{\it{{q}}}}$ , and ${{{{\it{{v}}}}}_{{{{\mathit{{m}}}}}}}$ in ${{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}_{1}^{{{\mathit{{n}}}}}})}$ be

[TABLE]

Then ${{{{\it{{v}}}}}_{{{{\mathit{{m}}}}}}}$ is the order ${{\mathit{{\alpha}}}}^{\!\ast}$ tilted probability measure between ${{{{\it{{w}}}}}_{{{{\mathit{{m}}}}}}}$ and ${{\it{{q}}}}$ ; using the hypothesis $\sum_{{{\mathit{{t}}}}=1}^{{{\mathit{{n}}}}}({\varPsi}_{{{\mathit{{t}}}}}({{\mathit{{m}}}}))^{2}={{\mathit{{n}}}}{{\mathit{{\varrho}}}}$ together with (5.30) and (5.31), we get

[TABLE]

In order to obtain (5.24), we will apply Lemma 3.4 to the probability measure pairs $({{{{\it{{w}}}}}_{{{{\mathit{{m}}}}}}},{{\it{{q}}}})$ satisfying ${{\it{{q}}}}({{\mathit{{m}}}}\in{\varTheta})\leq\tfrac{2L}{M}$ for $\beta=2$ together with a rotation on ${\mathbb{R}}^{{{\mathit{{n}}}}}$ that minimizes the approximation error terms arising from the absolute third order moments.

For any ${{\mathit{{x}}}}\in{\mathbb{R}}$ , let the random variable ${{\xi}_{{{{\mathit{{x}}}}}}}$ be

[TABLE]

Then using (5.29) we get

[TABLE]

On the other hand, moments and absolute moments of a zero-mean Gaussian random variable ${{\mathsf{{Z}}}}$ with variance $\sigma^{2}$ satisfy the following identities

[TABLE]

where ${\kappa}!!=\prod_{{\imath}=0}^{\lceil\frac{{\kappa}}{2}\rceil-1}({\kappa}-2{\imath})$ . Furthermore, for any three random variables ${{\mathsf{{Z}}}}_{1}$ , ${{\mathsf{{Z}}}}_{2}$ , and ${{\mathsf{{Z}}}}_{3}$ , we have101010The inequality given in (5.35) follows from the Hölder’s inequality via the observation that the geometric mean is less than the arithmetic mean. A proof is presented in Appendix D for completeness.

[TABLE]

Then using (5.34), one can confirm by substitution that

[TABLE]

The hypothesis $\sum_{{{\mathit{{t}}}}=1}^{{{\mathit{{n}}}}}({\varPsi}_{{{\mathit{{t}}}}}({{\mathit{{m}}}}))^{2}={{\mathit{{n}}}}{{\mathit{{\varrho}}}}$ implies the ${{{{\it{{a}}}}}_{{2}}}$ of Lemma 3.4 to be equal to the ${{{{\it{{a}}}}}_{{2}}}$ of Theorem 5.10. One, however, cannot assert the analogous relation for ${{{{\it{{a}}}}}_{{3}}}$ ’s. Nevertheless, there exists a rotation in ${\mathbb{R}}^{{{\mathit{{n}}}}}$ , say ${\mathcal{{S}}}_{{{\mathit{{m}}}}}$ such that ${\mathcal{{S}}}_{{{\mathit{{m}}}}}{\varPsi}({{\mathit{{m}}}})$ is equal to vector whose all entries are $\sqrt{{{\mathit{{\varrho}}}}}$ , i.e.

[TABLE]

Note that for ${{{{\it{{w}}}}}_{{*}}}{\!\!~{}\triangleq\!~{}}\bigotimes\nolimits_{{{\mathit{{t}}}}=1}^{{{\mathit{{n}}}}}{{{\mathit{{W}}}}}(\sqrt{{{\mathit{{\varrho}}}}})$ , we have

[TABLE]

Thus one can apply Lemma 3.4 to the pair $({{{{\it{{w}}}}}_{{*}}},{{\it{{q}}}})$ in order to bound ${\it P_{{{\bf e}}}^{{{{\mathit{{m}}}}}}}$ . For the pair $({{{{\it{{w}}}}}_{{*}}},{{\it{{q}}}})$ , however, ${{{{\it{{a}}}}}_{{3}}}$ of Lemma 3.4 is equal to the ${{{{\it{{a}}}}}_{{3}}}$ of of Theorem 5.10.

As it was the case for the proofs of Theorems 5.1 and 5.8, $\sum_{m\in{{\mathscr{{M}}}}}{{\it{{q}}}}({{\mathit{{m}}}}\in{\varTheta})\leq L$ by the definition of list decoding. Thus at least half of the messages in ${{\mathscr{{M}}}}$ will satisfy ${{\it{{q}}}}({{\mathit{{m}}}}\in{\varTheta})\leq\tfrac{2L}{M}$ as a result of Markov’s inequality. Applying Lemma 3.4 with ${{\mathscr{{E}}}}=\{{{\mathit{{y}}}}_{1}^{{{\mathit{{n}}}}}:{{\mathit{{m}}}}\in{\varTheta}({{\mathit{{y}}}}_{1}^{{{\mathit{{n}}}}})\}$ and $\beta=2$ for the messages satisfying ${{\it{{q}}}}({{\mathit{{m}}}}\in{\varTheta})\leq\tfrac{2L}{M}$ and using (5.32) and (5.33), we get

[TABLE]

as long as $\sqrt{{{{{\it{{a}}}}}_{{2}}}{{\mathit{{n}}}}}-\tfrac{\ln 4{{\mathit{{n}}}}}{2{{\mathit{{\alpha}}}}^{\!\ast}}\geq\ln{{\mathit{{\varDelta}}}}$ . Then (5.24) follows from the definition of ${\it P_{{{\bf e}}}}$ as the average of the conditional error probabilities. ∎

6 Discussion

Theorems 5.1, 5.8, 5.10 establish refined sphere packing bounds, i.e.,bounds of the form (5.1), for fixed composition codes on stationary memoryless channels, codes on (possibly) non-stationary Rényi symmetric channels, and cost-constrained codes on additive white Gaussian noise channels with the quadratic cost function. Derivations of Theorems 5.1, 5.8, 5.10 rely on the properties of Augustin’s information measures and the application of Berry-Esseen theorem to the hypothesis testing problem summarized in Lemma 3.4. For certain cases including the additive white Gaussian channels [3] and the strongly symmetric channels [4], these bounds are known to be tight in the sense that they can be matched by achievability results asserting the existence of codes satisfying

[TABLE]

for rates between the critical rate and the capacity of the channel. Recently, Altuğ and Wagner [46, Theorem 1] have generalized the results of [2] and [4] and established (5.1) and (6.1) for all non-singular Gallager symmetric channels.

At least since [2], it is also known that for the binary erasure channel the polynomial prefactor of (5.1) can be improved from ${{\mathit{{n}}}}^{\nicefrac{{({{\mathit{{E}}}_{sp\!}^{\prime}}\left({{{\mathit{{R}}}}}\right)-1)}}{{2}}}$ to ${{\mathit{{n}}}}^{\nicefrac{{-1}}{{2}}}$ . Recently, Altuğ and Wagner proved this result for all singular Gallager symmetric channels, [46, Theorem 2]. Both [2] and [46], however, have refrained from relying on bounds on the performance of the binary hypothesis testing problem with independent samples. This is not surprising because Lemma 3.4 characterizes the prefactor for the binary hypothesis testing problem with independent samples, exactly. Thus the refined SPBs of the form (5.1) are the best possible bounds for derivations of the SPB relying on the asymptotic behavior of sums of independent random variables, notwithstanding their suboptimality for singular Gallager symmetric channels.

As pointed out in Section 3, one can improve Lemma 3.4 and determine not only the prefactor but also the asymptotic constant in the tradeoff between the probabilities of type I and type II errors, either by invoking finer characterizations of the asymptotic behavior of sums of independent random variables or by applying a saddle point approximation. Although such results, e.g. [36, 29], require stronger hypothesis and are more nuanced,111111Even the statement of these results are more nuanced because they need to distinguish the lattice and non-lattice cases for the random variables involved in the analysis. they are important in the context of binary hypothesis testing. From the standpoint of the channel coding problem, however, it is rather hard to justify the extra effort such an analysis requires. First of all, the corresponding achievability results will have different constants, even when the prefactors match, as observed in [2, 3, 4]. More importantly, such refined results on binary hypothesis testing will also suffer from the subtlety discussed in the previous paragraph for the case of the singular channels, i.e.,their prefactor will be suboptimal for the singular channels, such as the binary erasure channel.

The principal novelty of this manuscript is the use of the Berry-Essen theorem via suitable Augustin information measures to bound the optimal error probability in the channel coding problem. In this manuscript, our primary focus was the rates below the channel capacity; thus, we have derived refined sphere packing bounds. The same idea can be used to strengthen the strong converse bounds under similar symmetry hypothesis, as it has recently been demonstrated in [50]. The essential technical challenge in this line of work is the derivation of the refined SPBs and the refined strong converses without any symmetry assumptions on the channel or on the codes.

Appendix A Rényi Symmetry is implied by input symmetry and Gallager Symmetry

In the following, we will explain briefly why the Rényi symmetry holds both for all input symmetric channels described in [43, Definition 3.2] and for all Gallager symmetric channels described in [11, p. 94]. Let us start with the input symmetric channels. Let ${{\it{{u}}}}$ be the uniform distribution on the input set of the input symmetric ${{{\mathit{{W}}}}}$ and let ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}$ be

[TABLE]

Then the using the input symmetry one can confirm that

[TABLE]

The definitions of the tilted channel, (A.1), and (A.2) imply ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}=\sum_{{{\mathit{{x}}}}}{{\it{{u}}}}({{\mathit{{x}}}}){{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}}}}({{\mathit{{x}}}})$ and ${{{{\it{{q}}}}}_{{1,{{\it{{u}}}}}}}{\prec}{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}$ . Thus ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}$ is the order ${{\mathit{{\alpha}}}}$ Augustin mean for the uniform input distribution and ${{\mathit{{I}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\!{{{\it{{u}}}}};\!{{{{\mathit{{W}}}}}}\!\right)={{\mathit{{D}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\left.\!\left.\!{{{{\mathit{{W}}}}}}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}}\right|{{{{\it{{u}}}}}}\right)$ by [40, Lemma LABEL:C-lem:information]. Hence

[TABLE]

Since ${{\mathit{{D}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\left.\!\left.\!{{{{\mathit{{W}}}}}}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}}\right|{{{{\it{{p}}}}}}\right)={{\mathit{{I}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\!{{{\it{{u}}}}};\!{{{{\mathit{{W}}}}}}\!\right)$ for all ${{\it{{p}}}}\in{{{\mathscr{{P}}}}({{{\mathscr{{X}}}}})}$ by (A.2) and (A.3), the probability measure ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}$ is not only the order ${{\mathit{{\alpha}}}}$ Augustin mean for the input distribution ${{\it{{u}}}}$ but also the order ${{\mathit{{\alpha}}}}$ Augustin center of ${{{\mathit{{W}}}}}$ by [40, Thm. LABEL:C-thm:minimax]. Then the constraint for the Rényi symmetry follows from the definition of ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}$ and the input symmetry. Thus every input symmetric channel ${{{\mathit{{W}}}}}$ is also a Rényi symmetric channel.

Now let us considers a Gallager symmetric channel ${{{\mathit{{W}}}}}$ . Let ${\mathscr{{S}}}_{1},\ldots,{\mathscr{{S}}}_{m}$ be the partition of the output set ${{\mathscr{{Y}}}}$ assumed in the definition of Gallager symmetry, e.g., [11, p. 94], ${{\it{{u}}}}$ be the uniform distribution on the input set ${{\mathscr{{X}}}}$ of ${{{\mathit{{W}}}}}$ , and ${{{{{\it{{\mu}}}}}}_{{{{\mathit{{\alpha}}}}}}}$ and ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}$ be the measures defined in (A.1). Gallager symmetry implies not only that ${{{{{\it{{\mu}}}}}}_{{{{\mathit{{\alpha}}}}}}}$ and ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}$ are probability mass functions but also that they satisfy the following identities:

[TABLE]

Note that ${{{{{\it{{\mu}}}}}}_{{{{\mathit{{\alpha}}}}}}}({{\mathit{{y}}}})={{{{{\it{{\mu}}}}}}_{{{{\mathit{{\alpha}}}}}}}({{\mathit{{z}}}})$ , and hence ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}({{\mathit{{y}}}})={{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}({{\mathit{{z}}}})$ , whenever ${{\mathit{{y}}}}$ and ${{\mathit{{z}}}}$ are in the same ${\mathscr{{S}}}_{{\imath}}$ as a result of (A.4). Using this fact together with the Gallager symmetry, one can confirm both (A.2) and ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}=\sum_{{{\mathit{{x}}}}}{{\it{{u}}}}({{\mathit{{x}}}}){{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}}}}({{\mathit{{x}}}})$ . On the other hand, ${{{{\it{{q}}}}}_{{1,{{\it{{u}}}}}}}{\prec}{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}$ . Thus ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}$ is the order ${{\mathit{{\alpha}}}}$ Augustin mean for the uniform input distribution and ${{\mathit{{I}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\!{{{\it{{u}}}}};\!{{{{\mathit{{W}}}}}}\!\right)={{\mathit{{D}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\left.\!\left.\!{{{{\mathit{{W}}}}}}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}}\right|{{{{\it{{u}}}}}}\right)$ by [40, Lemma LABEL:C-lem:information]. Hence (A.3) holds for Gallager symmetric ${{{\mathit{{W}}}}}\!$ , as well. Thus ${{\mathit{{D}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\left.\!\left.\!{{{{\mathit{{W}}}}}}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}}\right|{{{{\it{{p}}}}}}\right)={{\mathit{{I}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\!{{{\it{{u}}}}};\!{{{{\mathit{{W}}}}}}\!\right)$ for all ${{\it{{p}}}}\in{{{\mathscr{{P}}}}({{{\mathscr{{X}}}}})}$ and ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}$ is not only the order ${{\mathit{{\alpha}}}}$ Augustin mean for the input distribution ${{\it{{u}}}}$ but also the order ${{\mathit{{\alpha}}}}$ Augustin center of ${{{\mathit{{W}}}}}$ by [40, Thm. LABEL:C-thm:minimax]. Then the constraint for the Rényi symmetry follows from the definition of ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}}}}$ given in (A.1), the Gallager symmetry and (A.4). Thus every Gallager symmetric channel ${{{\mathit{{W}}}}}$ is also a Rényi symmetric channel.

Appendix B Proof of Lemma 5.5

We first prove the existence of an order ${{\mathit{{\alpha}}}}^{\!\ast}$ in $(0,1)$ satisfying (5.6) using the intermediate value theorem. The Rényi symmetry of ${{{\mathit{{W}}}}}$ implies

[TABLE]

for all ${{\mathit{{x}}}}\in{{\mathscr{{X}}}}$ , ${{\mathit{{\eta}}}}\in(0,1]$ , and ${{\mathit{{\alpha}}}}\in{\mathbb{R}}_{{}^{{+}}}$ . Then (4.7), (4.8), and (4.9) imply

[TABLE]

for all ${{\mathit{{x}}}}\in{{\mathscr{{X}}}}$ and ${{\mathit{{\alpha}}}}\in(0,1]$ . Thus the non-negativity of the Rényi divergence and (3.2) imply

[TABLE]

for all ${{\mathit{{x}}}}\in{{\mathscr{{X}}}}$ and ${{\mathit{{\alpha}}}}\in(0,1)$ . On the other hand the Pinsker’s inequality imply for all ${{\mathit{{x}}}}\in{{\mathscr{{X}}}}$ and ${{\mathit{{\alpha}}}}\in(0,1)$

[TABLE]

Thus $\lim\nolimits_{{{\mathit{{\alpha}}}}\uparrow 1}{{\left\lVert{{{{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{{\mathit{{W}}}}}}}}}}}({{\mathit{{x}}}})-{{{\mathit{{W}}}}}({{\mathit{{x}}}})}}\right\lVert}}=0$ for all ${{\mathit{{x}}}}\in{{\mathscr{{X}}}}$ as a result of (B.4). On the other hand, the Augustin center ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{{\mathit{{W}}}}}}}}$ is continuous in ${{\mathit{{\alpha}}}}$ on $(0,1]$ for the total variation topology on ${{{\mathcal{{P}}}}({{{\mathcal{{Y}}}}})}$ by [40, Lemmas LABEL:C-lem:capacityO-(LABEL:C-capacityO-continuity) and LABEL:C-lem:centercontinuity]. Then the lower semicontinuity of the Rényi divergence in its arguments for the topology of setwise convergence, i.e.,[32, Thm. 15], implies

[TABLE]

Then using (B.2) we get

[TABLE]

Furthermore, ${{\mathit{{D}}}}_{{1}}\!\left(\left.\!{{{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{{\mathit{{W}}}}}}}}}}}({{\mathit{{x}}}})}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{{\mathit{{W}}}}}}}}}\right)$ is continuous in ${{\mathit{{\alpha}}}}$ on $(0,1)$ by [19, Lemma LABEL:B-lem:tilting] because ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{{\mathit{{W}}}}}}}}$ is continuous in ${{\mathit{{\alpha}}}}$ on $(0,1]$ for the total variation topology. Then (B.3) implies

[TABLE]

The existence of an ${{\mathit{{\alpha}}}}^{\!\ast}\in(0,1)$ satisfying (5.6) follows from (B.5), (B.6), and the continuity of ${{\mathit{{D}}}}_{{1}}\!\left(\left.\!{{{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{{\mathit{{W}}}}}}}}}}}({{\mathit{{x}}}})}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{{\mathit{{W}}}}}}}}}\right)$ in ${{\mathit{{\alpha}}}}$ on $(0,1)$ by the intermediate value theorem [41, 4.23].

We proceed with showing that any order ${{\mathit{{\alpha}}}}^{\!\ast}$ satisfying (5.6) also satisfies (5.7). The definition of the SPE given in (4.11) and the consequence of the Rényi symmetry given in (B.1), imply that

[TABLE]

where ${{\mathit{{f}}}}({{\mathit{{\alpha}}}},\tau){\!\!~{}\triangleq\!~{}}\tfrac{(1-{{\mathit{{\alpha}}}})}{{{\mathit{{\alpha}}}}}[{{\mathit{{D}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\left.\!{{{{\mathit{{W}}}}}({{\mathit{{x}}}})}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}\right)-\tau]$ . We show in the following that

[TABLE]

Then for any order ${{\mathit{{\alpha}}}}^{\!\ast}$ satisfying (5.6), equations (3.2), (B.2) and the definition of SPE given in (4.11) imply

[TABLE]

Thus the inequalities given in (B.7) and (B.9) hold as equalities and ${{\mathit{{\alpha}}}}^{\!\ast}$ satisfying (5.6) also satisfies (5.7) by (B.8).

Now we prove the identity given in (B.8), which we have assumed in the preceding. The Rényi divergence is non-decreasing in its order by [32, Thm. 3] and ${{\mathit{{D}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\left.\!{{{{\mathit{{W}}}}}({{\mathit{{x}}}})}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}\right)=\tfrac{{{\mathit{{\alpha}}}}}{1-{{\mathit{{\alpha}}}}}{{\mathit{{D}}}}_{{1-{{\mathit{{\alpha}}}}}}\!\left(\left.\!{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}\right\|{{{{\mathit{{W}}}}}({{\mathit{{x}}}})}\right)$ for all ${{\mathit{{\alpha}}}}$ in $(0,1)$ by definition. Then ${{\mathit{{D}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\left.\!{{{{\mathit{{W}}}}}({{\mathit{{x}}}})}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}\right)$ is finite for all ${{\mathit{{\alpha}}}}$ in $(0,1)$ . Thus both ${{\mathit{{D}}}}_{{{{\mathit{{\alpha}}}}}}\!\left(\left.\!{{{{\mathit{{W}}}}}({{\mathit{{x}}}})}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}\right)$ and ${{\mathit{{D}}}}_{{1}}\!\left(\left.\!{{{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}}}({{\mathit{{x}}}})}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}\right)$ are continuously differentiable in ${{\mathit{{\alpha}}}}$ on $(0,1)$ by [40, Lemma LABEL:C-lem:analyticity] and their derivatives on $(0,1)$ are

[TABLE]

where ${{\xi}_{{{{\mathit{{x}}}}}}}=\ln\tfrac{{\mathrm{d}{{{{\mathit{{W}}}}}({{\mathit{{x}}}})}}}{{\mathrm{d}{{{{\it{{\nu}}}}}}}}-\ln\tfrac{{\mathrm{d}{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}}}{{\mathrm{d}{{{{\it{{\nu}}}}}}}}$ . Then

[TABLE]

Consequently, (5.6) implies (B.8) by the derivative test provided that ${{\xi}_{{{{\mathit{{x}}}}}}}=\gamma$ does not hold ${{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}}}({{\mathit{{x}}}})$ -a.s. for any $\gamma\in{\mathbb{R}}_{{}^{{+}}}$ and ${{\mathit{{\alpha}}}}\in(0,1)$ . If ${{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}}}(\{{{\xi}_{{{{\mathit{{x}}}}}}}=\gamma\}|{{\mathit{{x}}}})=1$ for some $\gamma\in{\mathbb{R}}_{{}^{{+}}}$ and ${{\mathit{{\alpha}}}}\in(0,1)$ , on the other hand, then the identities (B.13), (B.14), and (B.15) hold for all ${{\mathit{{\alpha}}}}\in(0,1)$ and one can confirm (B.8) by substitution.

[TABLE]

Now we are left with establishing (5.8) with either of the additional hypotheses. Let us first assume that there does not exists a $\gamma$ satisfying ${{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}}}\left(\left.\left\{\tfrac{{\mathrm{d}{{{{\mathit{{W}}}}}({{\mathit{{x}}}})}}}{{\mathrm{d}{{{{\it{{\nu}}}}}}}}=\gamma\tfrac{{\mathrm{d}{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}}}{{\mathrm{d}{{{{\it{{\nu}}}}}}}}\right\}\right|{{\mathit{{x}}}}\right)=1$ . Then ${\bf V}_{{{{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}}}({{\mathit{{x}}}})}}\!\left[{{{\xi}_{{{{\mathit{{x}}}}}}}}\right]>0$ for all ${{\mathit{{\alpha}}}}$ in $(0,1)$ and thus ${{\mathit{{D}}}}_{{1}}\!\left(\left.\!{{{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}}}({{\mathit{{x}}}})}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}\right)$ is increasing in ${{\mathit{{\alpha}}}}$ on $(0,1)$ . Since ${{\mathit{{D}}}}_{{1}}\!\left(\left.\!{{{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}}}({{\mathit{{x}}}})}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}\right)$ is also continuous in ${{\mathit{{\alpha}}}}$ on $(0,1)$ , it has a continuous and increasing inverse function. Then as a result of (B.12) there exists an $\epsilon>0$ and an increasing continuous function ${{\mathit{{h}}}}:({{\mathit{{R}}}}-\epsilon,{{\mathit{{R}}}}+\epsilon)\to(0,1)$ satisfying

[TABLE]

Then (B.7) implies

[TABLE]

On the other hand, (4.11) and (B.2) imply

[TABLE]

Furthermore, (B.9) implies

[TABLE]

Then (5.8) follows from (B.16), (B.17), (B.18), the identity ${{\mathit{{h}}}}({{\mathit{{R}}}})={{\mathit{{\alpha}}}}^{\!\ast}$ , the definition of the derivative as a limit, and the continuity of ${{\mathit{{h}}}}(\cdot)$ , which is defined as the inverse function of ${{\mathit{{D}}}}_{{1}}\!\left(\left.\!{{{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}}}^{{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}}}({{\mathit{{x}}}})}\right\|{{{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast},{{{\mathit{{W}}}}}}}}}\right)$ as a function of ${{\mathit{{\alpha}}}}$ .

Now we establish (5.8) assuming the existence of a ${{\it{{q}}}}$ satisfying ${{{{\it{{q}}}}}_{{{{\mathit{{\alpha}}}},{{{\mathit{{W}}}}}}}}={{\it{{q}}}}$ for all ${{\mathit{{\alpha}}}}$ in $(0,1)$ . If there does not exists a $\gamma$ satisfying ${{{\mathit{{W}}}}_{{{{\mathit{{\alpha}}}}^{\!\ast}}}^{{{{\it{{q}}}}}}}\left(\left.\left\{\tfrac{{\mathrm{d}{{{{\mathit{{W}}}}}({{\mathit{{x}}}})}}}{{\mathrm{d}{{{{\it{{\nu}}}}}}}}=\gamma\tfrac{{\mathrm{d}{{{\it{{q}}}}}}}{{\mathrm{d}{{{{\it{{\nu}}}}}}}}\right\}\right|{{\mathit{{x}}}}\right)=1$ , then (5.8) holds by the preceding discussion. If there exists such a $\gamma$ , then as a result of (B.2) and (B.13) we have

[TABLE]

Then such a $\gamma$ does not exist by the hypotheses of the lemma because (B.19) and ${{\mathit{{C}}}}_{{1},{{{{\mathit{{W}}}}}}}=\lim_{{{\mathit{{\alpha}}}}\uparrow 1}{{\mathit{{C}}}}_{{{{\mathit{{\alpha}}}}},{{{{\mathit{{W}}}}}}}$ imply ${{\mathit{{C}}}}_{{1},{{{{\mathit{{W}}}}}}}=\infty$ for ${{{\mathit{{W}}}}}(\{{{\xi}_{{{{\mathit{{x}}}}}}}=\ln\gamma\}|{{\mathit{{x}}}})<1$ case and $\lim_{{{\mathit{{\alpha}}}}\downarrow 0}{{\mathit{{C}}}}_{{{{\mathit{{\alpha}}}}},{{{{\mathit{{W}}}}}}}={{\mathit{{C}}}}_{{1},{{{{\mathit{{W}}}}}}}$ for ${{{\mathit{{W}}}}}(\{{{\xi}_{{{{\mathit{{x}}}}}}}=\ln\gamma\}|{{\mathit{{x}}}})=1$ case.

Appendix C Shannon’s Bounds For AWGN Channels and The Sphere Packing Exponent

Shannon, [3, (3)], bounded the error probability of length ${{\mathit{{n}}}}$ block codes described in Theorem 5.10 as

[TABLE]

where $\Omega(\cdot):[0,\pi]\to[0,\tfrac{2\pi^{{{\mathit{{n}}}}/2}}{\Gamma({{\mathit{{n}}}}/2)}]$ is the function mapping the cone angle to the corresponding solid angle in ${\mathbb{R}}^{{{\mathit{{n}}}}}$ , $\theta$ is the cone angle satisfying $\Omega(\theta)=\tfrac{\Omega(\pi)}{M}$ , and $Q(\xi)$ is the probability that a point $X$ in ${\mathbb{R}}^{{{\mathit{{n}}}}}$ at a distance $\sqrt{{{\mathit{{n}}}}{{\mathit{{\varrho}}}}}$ from the origin $O$ being moved outside a circular cone of half-angle $\xi$ with the vertex at the origin $O$ and the axis at $OX$ by a Gaussian noise of variance $\sigma^{2}$ .

Shannon, [3, (4) and (5)], derived the exact asymptotic expressions for both the upper bound and the lower bound given in (C.1) in terms of functions $f(\cdot)$ and $g(\cdot)$ that do not depend on the block length ${{\mathit{{n}}}}$ :

[TABLE]

where $\theta_{c}$ and $\theta_{cr}$ —the cone angles corresponding to the channel capacity and the critical rate— are given by

[TABLE]

and ${{\mathit{{E}}}_{{\mathit{{L}}}}}(\cdot)$ —the fixed cone angle exponent— is defined via the function ${{\mathit{{G}}}}(\cdot)$ as follows

[TABLE]

Remark C.1.

Shannon’s notation in [3] is slightly different from ours; Shannon works with the signal to noise “amplitude” ratio $A{\!\!~{}\triangleq\!~{}}\tfrac{\sqrt{{{\mathit{{\varrho}}}}}}{\sigma}$ , rather than cost constraint ${{\mathit{{\varrho}}}}$ and the noise power $\sigma^{2}$ . Furthermore, Shannon specifies the critical cone angle $\theta_{cr}$ as the solution of the equation given in (C.8). Nevertheless, one obtains the closed-form expression given in (C.5), by plugging in the definition of ${{\mathit{{G}}}}(\cdot)$ —given in (C.7)— in (C.8) and solving the resulting quadratic equation for $\sin^{2}\theta_{cr}$ .

[TABLE]

Shannon presented the exact asymptotic expression for the rate ${{\mathit{{R}}}}$ in terms of the cone angle $\theta$ in [3, (11)], as well:

[TABLE]

We obtain the fixed-rate asymptotic expression corresponding to the fixed cone angle asymptotic expressions (C.2), (C.3), and (C.6), by first deriving the asymptotic expression for the cone angle $\theta$ for a fixed-rate ${{\mathit{{R}}}}$ using (C.9). If ${\left\lvert{{\delta_{{{\mathit{{n}}}}}}}\right\lvert}\ll 1$ and

[TABLE]

then using the small angle approximation for the trigonometric functions we get

[TABLE]

Invoking $\ln(1+\epsilon)=\epsilon+{{\mathit{{O}}}\left({{\epsilon^{2}}}\right)}$ , we get

[TABLE]

Consequently, if

[TABLE]

then the rate corresponding to the cone angle $\theta_{{{\mathit{{n}}}}}$ at the block length ${{\mathit{{n}}}}$ is ${{\mathit{{R}}}}+{{\mathit{{O}}}\left({{\tfrac{\ln^{2}{{\mathit{{n}}}}}{{{\mathit{{n}}}}^{2}}}}\right)}$ . In other words, we get a fixed-rate by changing the cone angle by an additive factor proportional to $\tfrac{\ln{{\mathit{{n}}}}}{{{\mathit{{n}}}}}$ . In order to obtain the exact asymptotic expressions for the upper and lower bounds to the error probability given in (C.1) via (C.2) and (C.3) at a fixed-rate ${{\mathit{{R}}}}$ , we will apply Taylor’s expansion. To that end, we first calculate the derivatives of ${{\mathit{{G}}}}$ and ${{\mathit{{E}}}_{{\mathit{{L}}}}}$ . As a result of (C.7), we have

[TABLE]

Then using (C.6), we get

[TABLE]

Thus (C.10) and the Taylor’s expansion imply

[TABLE]

Invoking (C.12), we get the following asymptotic expression

[TABLE]

On the other hand, invoking (C.7) in (C.6), we get

[TABLE]

Then using (C.11), we get the closed-form expression for the sphere packing exponent given in [22, (LABEL:D-eq:eg:SGauss-spe)]:

[TABLE]

On the other hand, (C.7) implies

[TABLE]

Invoking first (C.11), then (5.22), and finally (5.23), we get

[TABLE]

Then (C.13) and (C.14) imply

[TABLE]

where ${{\mathit{{E}}}_{sp\!}^{\prime}}\left({{{\mathit{{R}}}}}\right)=\left.\tfrac{\partial{}}{\partial{{{\it{{s}}}}}}{{\mathit{{E}}}_{sp\!}}\left({{{\it{{s}}}},{{{\mathit{{W}}}}},{{\mathit{{\varrho}}}}}\right)\right|_{{{\it{{s}}}}={{\mathit{{R}}}}}$ . Then [3, (3)], i.e.,(C.1), imply both (1.1) and (5.1), because (C.1) is nothing but (1.1) and (5.1) for certain multiplicative constants.

Appendix D Proof of (5.35)

Note that for any three random variables ${{\mathsf{{Z}}}}_{1}$ , ${{\mathsf{{Z}}}}_{2}$ , and ${{\mathsf{{Z}}}}_{3}$ ,

[TABLE]

On the other hand, as a result of Hölder’s inequality we have

[TABLE]

Furthermore, since the geometric mean is upper bounded by the arithmetic mean, we also have

[TABLE]

Thus

[TABLE]

Acknowledgment

The author would like to thank Hao-Chung Cheng both for numerous inspiring discussions on the sphere packing bound and its refinements, which helped the author to simplify and improve the statement of Lemma 3.4, and for his feedback on the manuscript. The author would also like to thank the reviewer for pointing out [45] and for his feedback on the manuscript.

Bibliography50

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] B. Nakiboğlu. A Simple Derivation of the Refined SPB for the Constant Composition Codes. In 2019 IEEE International Symposium on Information Theory (ISIT) , pages 2659–2663, Paris, France, July 2019.
2[2] P. Elias. Coding for two noisy channels. In Proceedings of Third London Symposium of Information Theory , pages 61–74, London, 1955. Butterworth Scientific.
3[3] C. E. Shannon. Probability of error for optimal codes in a Gaussian channel. The Bell System Technical Journal , 38(3):611–656, May 1959.
4[4] R. Dobrushin. Asymptotic estimates of the probability of error for transmission of messages over a discrete memoryless communication channel with a symmetric transition probability matrix. Theory of Probability & Its Applications , 7(3):270–300, 1962.
5[5] R. G. Gallager. A simple derivation of the coding theorem and some applications. IEEE Transactions on Information Theory , 11(1):3–18, Jan. 1965.
6[6] C. E. Shannon, R. G. Gallager, and E. R. Berlekamp. Lower bounds to error probability for coding on discrete memoryless channels. I. Information and Control , 10(1):65–103, 1967.
7[7] E. A. Haroutunian. Bounds for the exponent of the probability of error for a semicontinuous memoryless channel. Problems of Information Transmission , 4(4):29–39, 1968.
8[8] U. Augustin. Error estimates for low rate codes. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete , 14(1):61–88, 1969.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A Simple Derivation of the Refined Sphere Packing Bound Under Certain Symmetry Hypotheses

Abstract

1 Introduction

Remark 1.1**.**

2 Model and Notation

3 Hypothesis Testing Problem and Berry-Esseen Theorem

Definition 3.1**.**

Definition 3.2**.**

Lemma 3.3** ([33, 34, 35]).**

Lemma 3.4**.**

Proof of Lemma 3.4.

Remark 3.5**.**

Remark 3.6**.**

Remark 3.7**.**

4 Augustin’s Information Measure and The Sphere Packing Exponent

4-A Augustin’s Information Measures

Definition 4.1**.**

Definition 4.2**.**

Definition 4.3**.**

Definition 4.4**.**

4-B The Sphere Packing Exponent

Definition 4.5**.**

5 The Refined Sphere Packing Bound

5-A Constant Composition Codes

Theorem 5.1**.**

Proof of Theorem 5.1.

5-B Codes On Rényi Symmetric Channels

Definition 5.2**.**

Remark 5.3**.**

Remark 5.4**.**

Lemma 5.5**.**

Remark 5.6**.**

Remark 5.7**.**

Theorem 5.8**.**

Remark 5.9**.**

Proof of Theorem 5.8.

5-C Codes On Additive White Gaussian Noise Channels

Theorem 5.10**.**

Proof of Theorem 5.10.

6 Discussion

Appendix A Rényi Symmetry is implied by input symmetry and Gallager Symmetry

Appendix B Proof of Lemma 5.5

Appendix C Shannon’s Bounds For AWGN Channels and The Sphere Packing Exponent

Remark C.1**.**

Appendix D Proof of (5.35)

Acknowledgment

Remark 1.1.

Definition 3.1.

Definition 3.2.

Lemma 3.3 ([33, 34, 35]).

Lemma 3.4.

Remark 3.5.

Remark 3.6.

Remark 3.7.

Definition 4.1.

Definition 4.2.

Definition 4.3.

Definition 4.4.

Definition 4.5.

Theorem 5.1.

Definition 5.2.

Remark 5.3.

Remark 5.4.

Lemma 5.5.

Remark 5.6.

Remark 5.7.

Theorem 5.8.

Remark 5.9.

Theorem 5.10.

Remark C.1.