This paper presents a straightforward derivation of the refined sphere packing bound using Berry-Esseen theorem and information measures, applicable to various channels including Gaussian and non-stationary channels, with explicit non-asymptotic bounds.
Contribution
It introduces a simple derivation method for the sphere packing bound under symmetry hypotheses, incorporating non-asymptotic bounds with explicit error terms.
Findings
01
Derived sphere packing bounds with explicit prefactors for certain channels
02
Established trade-offs in hypothesis testing error probabilities using Berry-Esseen theorem
03
Provided non-asymptotic bounds with concrete approximation errors
Abstract
A judicious application of the Berry-Esseen theorem via suitable Augustin information measures is demonstrated to be sufficient for deriving the sphere packing bound with a prefactor that is Ω(n−0.5(1−Esp′(R))) for all codes on certain families of channels -- including the Gaussian channels and the non-stationary Renyi symmetric channels -- and for the constant composition codes on stationary memoryless channels. The resulting non-asymptotic bounds have definite approximation error terms. As a preliminary result that might be of interest on its own, the trade-off between type I and type II error probabilities in the hypothesis testing problem with (possibly non-stationary) independent samples is determined up to some multiplicative constants, assuming that the probabilities of both types of error are decaying exponentially with the number of samples,…
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Full text
A Simple Derivation of the Refined Sphere Packing Bound Under Certain Symmetry Hypotheses
Barış Nakiboğlu
[email protected]
This work is supported
by The Science Academy, Turkey, under The
Science Academy’s Young Scientist Awards Program (BAGEP)
and by The Scientific and Technological Research Council
of Turkey (TÜBİTAK) under Grant 119E053.
This paper was presented in part at
the 2019 IEEE International Symposium on Information Theory
[1].
Abstract
A judicious application of the Berry-Esseen theorem via suitable
Augustin information measures is demonstrated to be sufficient for deriving
the sphere packing bound with a prefactor that is
Ω(n−0.5(1−Esp′(R)))
for all codes on certain families of channels
—including the Gaussian channels and the non-stationary Rényi symmetric channels—
and for the constant composition codes on stationary memoryless channels.
The resulting non-asymptotic bounds have definite approximation error terms.
As a preliminary result that might be of interest on its own, the trade-off
between type I and type II error probabilities in the hypothesis testing
problem with (possibly non-stationary) independent samples is determined
up to some multiplicative constants, assuming that the probabilities of
both types of error are decaying exponentially with the number of samples,
using the Berry-Esseen theorem.
1 Introduction
The decay of the optimal error probability
with the block length for rates below the channel capacity
has been studied since the early days of the information theory.
For certain channels and for certain values of the rate,
sharp bounds were found early on.
Elias in [2] for the binary symmetric channel,
Shannon111The equivalence of the bounds in [3] to (1.1) is proved in
Appendix C.
in [3] for the additive Gaussian noise channel, and
Dobrushin in [4]
for the strongly symmetric channels
—see the original publication in Russian to avoid typos present in the translation—
proved that
[TABLE]
where222In this section, we suppress the dependence of
the sphere packing exponent to the channel in our notation and denote it by
Esp(R), rather than Esp(R,W).
an=Θ(bn) iff
0<n→∞liminfbnan≤n→∞limsupbnan<∞,
Esp(⋅) is the sphere packing exponent of the channel,
Esp′(⋅) is its derivative with respect to the rate,
Rcrit is the rate at which the slope
of the sphere packing exponent curve is minus one, i.e.,Esp′(Rcrit)=−1,
and C is the capacity of the
channel.
On the other hand, Elias proved in [2]
for the binary erasure channels that
[TABLE]
Neither (1.1), nor (1.2), holds for rates
below the critical rate.
If, however, we replace the equality sign with the greater than or equal
to sign, then both (1.1) and (1.2) hold
for all rates below the channel capacity.
These lower bounds are customarily called sphere packing bounds (SPBs)
because of the techniques used in their original derivations.
Derivations of the SPB in [2, 3, 4] relied
on the geometric structure of the output space of the channel
and parameters that can be defined only for some models.
The resulting bounds were expressed in terms of these parameters, as well.
Thus it was not even clear that SPBs in[2, 3, 4]
can be interpreted as specific instances of a general bound.
The evidence for such an interpretation came not from a breakthrough about
the lower bounds on the error probability but
from a breakthrough about the upper bounds.
Gallager’s seminal work [5] unified and generalized
the upper bounds on the error probability
—at least in terms of the exponent—
in all the previous studies.
It is only with Gallager’s formulation in [5]
that one can express the bounds in [2, 3, 4]
as (1.1) and (1.2).
The first complete proof of the SPB
for arbitrary discrete stationary product
channels333These channels are customarily called discrete
memoryless channels, i.e., DMCs. We call them DSPCs in order to underline
the stationarity of these channels and the absence of any constraints on
their input sets.
In principle, such constraints might exist and stationarity might be absent
in a discrete channel that is memoryless.
(DSPCs) was presented in [6].
According to [6, Thm. 2]
[TABLE]
where an=O(bn) iff
there exists a K∈R+ such that ∣an∣≤Kbn
for all n large enough.
In the following two years, the SPB was proved
first for stationary product channels with finite input sets in [7]
and then for (possibly) non-stationary product channels in [8].
Since then, the SPB has been proven for various channel models in
[9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
including some quantum information theoretic ones.
It is, however, worth noting that a general proof that holds
for both Gaussian channels
—considered in [3, 9, 10]—
and for arbitrary DSPCs
—considered in [6]—
was absent until recently, see [21] and [22].
These later works on the SPB,
i.e., [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
were primarily interested in establishing the right exponential decay rate;
thus, they were content with prefactors of the form
e−o(n),
where
an=o(bn) iff
for all ϵ>0 the inequality ∣an∣≤ϵbn
holds for all n large enough.
Some authors did obtain prefactors of the form
e−O(n) or
e−O(lnn), but obtaining the best possible
—if not tight—
prefactor was not an actual concern.
The quest for deriving SPBs with tight prefactors
was put on the map again by Altuğ and Wagner in [25]
and [26].
According to [25, Thm. 1] for any DSPC
with a Gallager symmetric444The condition
for Gallager symmetry is described
in [11, p. 94].
The binary symmetric channel,
the binary erasure channel,
and channels considered in [4]
are symmetric according to this definition.
probability transition matrix W with positive entries
and rate R in (0,C), there exists an A∈R+
such that for any ϵ>0
[TABLE]
for some n0 determined by W, R, and ϵ.
The corresponding result was established for the constant composition codes
on arbitrary DSPCs in [26, Thm. 1].
These results are generalized to classical-quantum
channels in [27, Thms. 8 and 14],
with a slight improvement, allowing ϵ=0
for the symmetric channels.
The primary tool for the derivations in [25, 26, 27]
is the Berry-Esseen theorem, albeit through certain auxiliary results
inspired by a theorem of Bahadur and Rao [28],
i.e.,
[25, (74)], [26, Proposition 5],
and [27, Thm. 17].
Our main aim in this paper is to demonstrate that
the analysis can be simplified and the results can
be strengthened and generalized through
a more judicious application of
the Berry-Esseen theorem via
suitable Augustin information measures.
[2] and [3]
not only established (1.1) and (1.2)
but also obtained closed-form expressions for the
upper and lower bounds implicit in (1.1) and (1.2).
Dobrushin went one step further and calculated the
exact asymptotic behavior of the SPB and the random coding bound
by analyzing asymptotic behavior for
the lattice and non-lattice cases separately
for random variables used to derive the SPB
and the random coding bound, see [4, (1.32), (1.33), (1.34)].
Recently, the saddle point approximation is used to
derive the SPB with the same asymptotic
prefactor [29, Corollary 2],
under weaker symmetry
hypothesis,555The binary input Gaussian channel
and the binary erasure channel satisfy the symmetry hypothesis of
[29], but not that of [4].
albeit by assuming a common support for all output distributions
of the channel and a non-lattice structure for the random variables
involved.666Neither of these assumptions was needed while deriving
this result in [4].
The main drawback of the analysis in [29]
is the technical conditions that need to be confirmed for applying
the saddle point approximation via [30, Proposition 2.3.1].
Remark 1.1**.**
The proof of [29, Corollary 2] holds only
for channels whose Augustin center does not change with the order,
i.e., for W’s for which ∃q such that
qα,W=q for all α∈(0,1).
Note that for the channels that violate this
additional hypothesis, the O(n−\nicefrac12) approximation
error terms in [29, (30)] are ρ dependent
because of the implicit Q dependence of the
O(n−\nicefrac12) approximation
error terms in [29, (12) and (13)].
In order to recover a result similar to [29, Corollary 2]
for a channel whose Augustin center changes with the order,
one needs a saddle point approximation
that holds for a parametric family of i.i.d. sequences of random variables,
such as [31, Proposition 1],
rather than [29, Lemma 2], which holds for
a single i.i.d sequence of random variables.
Let us finish this section with an overview of the paper.
In Section 2, we describe our model and notation.
In Section 3, first, we recall the connection between
the hypothesis testing problem and the tilting,
then we derive our primary technical tool using the Berry-Esseen theorem.
In Section 4, we review Augustin’s information measures
and the sphere packing exponent.
In Section 5, we state and prove refined SPBs for various models
using Lemma 3.4 and
the observations recalled in Section 4.
We conclude our presentation with a brief discussion
of the results, recent developments, and future work in Section 6.
2 Model and Notation
For any set X, we denote the set of all probability mass functions that
are non-zero only on finitely many elements of X by P(X).
For any measurable space (Y,Y), we denote the set of all probability
measures on it by P(Y).
We denote the expected value of a measurable function f
under the probability measure μ by Eμ[f].
Similarly, we denote the variance of f under μ,
i.e.,Eμ[(f−Eμ[f])2],
by Vμ[f].
For sets X1,…,Xn we denote
their Cartesian product by X1n
and for σ-algebras Y1,…,Yn
we denote their product by Y1n.
We use the symbol ⊗ to denote the product of measures.
A channelW is a function from the input setX to the set of all probability
measures on the output space(Y,Y):
[TABLE]
A channel W is called a discrete channel if both X and Y are finite sets.
The product of Wt:Xt→P(Yt) for t∈{1,…,n}
is a channel of the form W[1,n]:X1n→P(Y1n)
satisfying
[TABLE]
Any channel obtained by curtailing the input set of a length n
product channel is called a length nmemoryless channel.
A product channel W[1,n] is stationary iff Wt=W
for all t’s for some W.
On a stationary channel, we denote the composition
(i.e.,the empirical distribution, the type)
of each x1n by Υ(x1n);
thus Υ(x1n)∈P(X).
The pair (Ψ,Θ) is an (M,L)channel code on W[1,n] iff
•
The encoding functionΨ is a function from the message set M≜{1,2,…,M} to the input
set X1n.
•
The decoding functionΘ is a Y1n-measurable function from
the output set Y1n to the set M≜{L:L⊂M\mboxand∣L∣≤L}.
Given an (M,L) channel code (Ψ,Θ) on W[1,n],
the conditional error probabilityPem for m∈M
and the average error probabilityPe
are defined as
[TABLE]
An encoding function Ψ
—hence the corresponding code—
on a stationary product channel, satisfies
an empirical distribution constraint A⊂P(X)
iff the composition of all of its codewords are in A,
i.e., iff Υ(Ψ(m))∈A for all m∈M.
A code is called
a constant composition code iff all of its codewords have
the same composition, i.e., there exists a p in P(X)
satisfying Υ(Ψ(m))=p for all m∈M.
3 Hypothesis Testing Problem and Berry-Esseen Theorem
Our primary aim in this section is to characterize —up to a multiplicative constant—
the asymptotic behavior of type I error probability with the number of samples
for a hypothesis testing problem between product measures, when type II error
probability is decaying exponentially.
We use the Berry-Esseen theorem via the concepts of Rényi divergence and
tilted probability measure to do that.
Let us, first, recall the definitions of the Rényi divergence and the tilted
probability measure.
Definition 3.1**.**
For any α∈R+ and w,q∈P(Y),
the order α Rényi divergence between w and q is
[TABLE]
where ν is any measure satisfying w≺ν and q≺ν.
The order one Rényi divergence is the Kullback-Leibler divergence.
For other orders, the Rényi divergence can be characterized in terms of the
Kullback-Leibler divergence too:
[TABLE]
with the convention that αD1(v∥w)+(1−α)D1(v∥q)=∞ if
it would be otherwise undefined, see [32, Thm. 30].
The characterization given in (3.1) is related to another key concept
of our analysis: the tilted probability measure.
Definition 3.2**.**
For any α∈R+ and w,q∈P(Y) satisfying
Dα(w∥q)<∞,
the order α tilted probability measurewαq is
[TABLE]
If either α is in (0,1) or D1(wαq∥w) is finite,
then the tilted probability measure is the unique probability measure achieving
the infimum in (3.1)
by [32, Thm. 30], i.e.,
[TABLE]
Furthermore, under the same hypothesis the identities
[TABLE]
hold wαq-a.s.,
where wac is the component of w that is absolutely continuous in q.
Let us proceed with recalling the Berry-Esseen theorem.
Let {ξt}t∈Z+ be independent zero-mean
random variables satisfying
∑t=1nE[ξt2]<∞.
Then there exists an absolute constant ω≤0.5600 satisfying
[TABLE]
where aκ=n1∑t=1nE[∣ξt∣κ]
and Φ(s)=2π1∫−∞se−\nicefracz22dz.
Lemma 3.4, in the following, characterizes the trade-off between
type I and type II error probabilities for a hypothesis testing problem with
independent samples, assuming that both error probabilities are decaying
—at least— exponentially with the number of samples.
Lemma 3.4, which is derived using the Berry-Esseen theorem,
can be interpreted as a refinement of [6, Thm. 5],
which is derived using Chebyshev’s inequality.
Lemma 3.4**.**
For any α∈(0,1), n∈Z+,
wt,qt∈P(Yt),
let wt,ac be the component of wt that is absolutely continuous in qt
and let a2, a3, and Δ be
[TABLE]
where w=⊗t=1nwt and q=⊗t=1nqt.
Then for any E∈Y1n
and β≥n−\nicefrac12e−αa2n,
satisfying q(E)≤βe−D1(wαq∥q),
we have
[TABLE]
provided that β≤Δ−αn−\nicefrac12eαa2n.
Furthermore, for any α∈(0,1) and β∈R+,
there exists an event E∈Y1n such that
Let the random variables ξt and ξ and the event B be
[TABLE]
Thus ξ=lndqdwac holds
q-a.s. by the definitions of ξt and ξ.
Hence (3.3), (3.4), and the definition of B
imply that
[TABLE]
Thus for any E∈Y1n, we have
[TABLE]
On the other hand, ξt’s are jointly independent
under the tilted probability measure wαq.
Thus the Berry-Esseen theorem, given in Lemma 3.3, implies
[TABLE]
If we set
τ0=−2α2lnβ+lnn−lnΔ
and τ1=−2α2lnβ+lnn,
then −a2n≤τ0≤τ1≤a2n
by the hypothesis
and
[TABLE]
Furthermore,
wαq(E∩B)≤n1
as a result of (3.8),
eατ1=βn1, and
the hypothesis q(E)≤βe−D1(wαq∥q).
Thus wαq(B∖E)≥n1.
Then using (3.9) and
e(1−α)τ0=βαα−1n2αα−1Δα−1
we get
[TABLE]
Remark 3.5**.**
While deriving bounds similar to (3.5),
the constants τ0 and τ1 are usually
assumed to satisfy τ0=−τ1, see for example
[6, Thm. 5] or [27, Thm. 11].
Such a choice, however, does not lead to tight bounds in our case.
To establish the existence of an event satisfying both
(3.6) and (3.7),
let us consider the event E given in the following
[TABLE]
where γ is a real number to be determined later and
ν is any measure satisfying both
w≺ν and q≺ν.
Remark 3.6**.**
The random variable ξ is defined only for y1n’s
with a positive dνdq.
Thus one can define ξ to be infinite for y1n’s
satisfying both dνdq=0 and dνdw>0,
and define the event E to be the event that ξ is greater than or equal
to Ewαq[ξ]+γ.
For the event E defined in (3.10),
as a result of (3.3) we have
[TABLE]
where the event Eκ is defined for each κ∈Z
to be
[TABLE]
On the other hand, we can bound wαq(Eκ)
uniformly for all integers κ
using the Berry-Esseen theorem, i.e.,Lemma 3.3,
as follows
[TABLE]
For γ=α1ln[n1(2πa21+2a2a20.56a3)1−e−αβ−1],
(3.6)
follows from (3.11), (3.12),
and
∑κ=0∞e−ακ=1−e−α1.
w(Y1n∖E) is bounded following a similar analysis,
by invoking (3.4), instead of (3.3):
[TABLE]
Invoking first (3.12)
and then 1−eα−11(1−e−α)αα−1≤1−α1ααα−1e2α1
we get,
[TABLE]
Then (3.7) follows from
(8π1+a20.56a3)≤8πa2e1∨8πa2lnΔ.
∎
Lemma 3.4 characterizes the asymptotic behavior
of the trade-off between the optimal type I and type II error probabilities
for a hypothesis testing problem with independent samples:
PeII(n) is
Θ(n−2α1e−D1(wαq∥w))
whenever
PeI(n) is
Θ(e−D1(wαq∥q)).
For the stationary case,
—i.e.,when wt=w1, qt=q1
for all t—
Csiszár and Longo [36]
described how (3.3) and (3.4)
can be used together with an earlier result by Strassen,
[37, Thm. 1.1],
to characterize the exact
asymptotic behavior of PeII(n)
for the case when
PeI(n)=e−D1(wαq∥q),
i.e.,
PeII(n)=(K+o(1))n−2α1e−D1(wαq∥w),
with some minor inaccuracies discussed in Remark 3.7.
One does not need to rely on [37, Thm. 1.1] of Strassen
to characterize this exact asymptotic behavior.
The Berry-Esseen theorem, however, is not sufficient
for determining the value of the constant K.
In order to determine the constant K, one needs to invoke
either finer characterizations of the asymptotic behavior
of sums of independent random
variables —such as the ones in [38, §IV.2,§IV.3],
[39, §42,§43]—
or apply other techniques
—such as the saddle point approximation described in [30, Prop. 2.3.1].
It is worth noting that both of these approaches
require hypotheses stronger than that of the Berry-Esseen theorem.
The situation is similar for other values of α,
but of no interest for our discussion of the sphere
packing bound.
Remark 3.7**.**
We believe the approach of [36]
is sound. Its calculations, however, seem to have some mistakes.
Repeating the calculations as described in [36], we recover
the second line of [36, (33)] as
ln1−α∗α∗−α∗lnS12π+o(1).
With this modification [36, Thm. 2] is consistent with
the intimately related results about the SPB proved earlier
[4, (1.32), (1.33)] and since then
[29, (38)].
4 Augustin’s Information Measure and The Sphere Packing Exponent
The ultimate aim of this section is to define the sphere packing exponent
and review the properties of it that will be useful in
our analysis.
For that, we first recall the definitions of Augustin’s information measures
and review their elementary properties.
4-A Augustin’s Information Measures
Let us start by recalling the definition of the conditional Rényi divergence.
Definition 4.1**.**
For any α∈R+, W:X→P(Y), q∈P(Y),
and p∈P(X)the order α conditional Rényi divergence for
the input distribution p is
[TABLE]
Definition 4.2**.**
For any α∈R+, W:X→P(Y), and
p∈P(X)the order α Augustin information for the input distribution p
is
[TABLE]
The infimum is achieved by a unique probability
measure777We refrain from including the channel symbol W
in the symbol for the Augustin mean
because the channel will be clear from the context. qα,p,
called
the order α Augustin mean for the input distribution p,
by [40, Lemma LABEL:C-lem:information].
Furthermore,
[TABLE]
for all q∈P(Y) by [40, Lemma LABEL:C-lem:information], as well.
The Augustin information is continuously differentiable in
its order on R+,
and its derivative is given by
[TABLE]
by [40, Lemma LABEL:C-lem:informationO-(LABEL:C-informationO:differentiability)],
where Wαqα,p is the tilted channel defined as follows.
Definition 4.3**.**
For any α∈R+, W:X→P(Y) and q∈P(Y),
the order α tilted channel Wαq is a function
from {x:Dα(W(x)∥q)<∞} to P(Y)
given by
[TABLE]
The tilted channel can be used to express Iα(p;W)
in terms of the Kullback-Leibler divergences
using888It is worth noting that (4.3) follows from
(3.2) and Iα(p;W)=Dα(W∥qα,p∣p)
for α values in (0,1).
[40, Lemma LABEL:C-lem:information-(LABEL:C-information:alternative)]:
[TABLE]
Furthermore, the Augustin mean satisfies
[TABLE]
and Augustin mean is the only probability measure
satisfying both q1,p≺q
and ∑xp(x)Wαq(x)=q
by [40, Lemma LABEL:C-lem:information],
where q1,p=∑xp(x)W(x).
Thus for all α∈R+ we have
[TABLE]
Definition 4.4**.**
For any α∈R+, W:X→P(Y), and A⊂P(X),
the order α Augustin capacity of W for the constraint set A is
[TABLE]
When the constraint set A is the whole P(X), we denote the order α
Augustin capacity by Cα,W, i.e.,
Cα,W≜Cα,W,P(X).
Using the definitions of the Augustin information and capacity,
we obtain the following expression for the latter
[TABLE]
If A is convex, then the order of the supremum and the infimum can be changed
as a result of [40, Thm. LABEL:C-thm:minimax]:
[TABLE]
If in addition Cα,W,A is finite, then
there exists a unique probability measure qα,W,A,
called the order α Augustin center of W for the constraint set A,
satisfying
We denote the set of all probability mass functions satisfying a cost constraint ϱ by A(ϱ), i.e.
[TABLE]
With a slight abuse of notation, we denote
the cost-constrained Augustin capacity
Cα,W,A(ϱ)
by
Cα,W,ϱ,
as well.
A more detailed discussion of Augustin’s information measures
can be found in [40].
4-B The Sphere Packing Exponent
Definition 4.5**.**
For any W:X→P(Y), A⊂P(X),
and R∈R+, the sphere packing exponent (SPE) is
[TABLE]
When the constraint set A is the whole P(X),
we denote SPE by Esp(R,W), i.e.
Esp(R,W)≜Esp(R,W,P(X)).
With a slight abuse of notation, we denote
SPE for A(ϱ) by Esp(R,W,ϱ)
and SPE for A={p}
case by Esp(R,W,p).
Note that as a result of definitions of Augustin capacity and
the SPE we have
[TABLE]
Furthermore,
since Iα(p;W) is continuously differentiable in α
by [40, Lemma LABEL:C-lem:informationO-(LABEL:C-informationO:differentiability)],
we can apply the derivative test to find the optimal α in
(4.11) for A={p} case:
using
(4.1)
and
(4.3)
we get
[TABLE]
On the other hand, either
D1(Wαqα,pqα,pp)=I1(p;W)
for all positive orders α,
or D1(Wαqα,pqα,pp)
is increasing and continuous in the order α on R+
by
[40, Lemma LABEL:C-lem:informationO-(LABEL:C-informationO:monotonicityofharoutunianinformation)].
Furthermore,
D1(W1q1,pq1,pp)
is equal to I1(p;W)
by definition and
limα↓0D1(Wαqα,pqα,pp)
is equal to limα↓0Iα(p;W)
by (4.5) and
[40, Lemma LABEL:C-lem:informationO-(LABEL:C-informationO:limitofharoutunianinformation)].
Thus
for any rate R in
(limα↓0Iα(p;W),I1(p;W))
there exists an order α∗∈(0,1) satisfying
[TABLE]
by the intermediate value theorem [41, 4.23].
The order α∗ satisfying (4.14)
is unique because D1(Wαqα,pqα,pp)
is increasing in α.
The monotonicity of
D1(Wαqα,pqα,pp) in α
and (4.13) also imply
Esp(R,W,p)=α∗1−α∗(Iα∗(p;W)−R).
Thus as a result of (4.3),
the unique α∗ satisfying (4.14) also satisfies
[TABLE]
Since D1(Wαqα,pqα,pp)
is continuous and increasing in α,
its inverse is increasing and continuous, as well.
Thus the definition of
SPE given in (4.11)
and the definition of derivative as a limit
imply that for any R in
(limα↓0Iα(p;W),I1(p;W))
the unique order α∗
satisfying (4.14)
also satisfies
[TABLE]
as was established in [22, Lemma LABEL:D-lem:spherepacking-cc].
5 The Refined Sphere Packing Bound
In this section, we consider the channel coding problem
for various channel models
and derive lower bounds to the error probability of
the following form
[TABLE]
for constants A and n0 determined by the rate, the channel,
and the constraints on the codes —if there exist any.
Following [25, 26, 27],
we call these bounds refined sphere packing bounds (refined SPBs)
because of their resemblance to the standard SPBs,
e.g. [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
establishing999The approximation error terms in standard SPBs
are usually O(n)
or O(lnn), rather than just o(n).
[TABLE]
The refined SPBs that we state and prove in this section are not formally
particular cases of a general proposition. Nevertheless, they are all consequences
of Lemma 3.4 and the properties of Augustin’s information measures.
We establish a refined SPB for
the constant composition codes in Subsection 5-A,
for codes on (possibly) non-stationary Rényi symmetric channels
in Subsection 5-B,
and
for codes on additive white Gaussian noise channels with
quadratic cost functions in Subsection 5-C.
5-A Constant Composition Codes
Theorem 5.1**.**
For any W:X→P(Y),
M,L,n∈Z+,
p∈P(X) satisfying
limα↓0Iα(p;W)<n1lnLM<I1(p;W)
and
np(x)∈Z≥0 for all x∈X,
the order α∗≜1−Esp′(n1lnLM,W,p)1
satisfies
[TABLE]
Furthermore, any (M,L) channel code of length n
whose codewords all have the same composition p satisfies
[TABLE]
provided that
a2n−2α∗ln4n≥lnΔ
where
[TABLE]
Although Theorem 5.1 itself is
composition-dependent,
it implies —via appropriate worst-case assumptions—
composition-independent bounds, such as
[26, Thm 1].
Similar composition-dependent [27, Proposition 13]
and composition independent [27, Thm 8] bounds
have, recently, been derived for classical-quantum channels using
an approach similar to that of [26].
The primary advantages of Theorem 5.1
over the previous results are the conceptual simplicity and brevity
of its proof and its definite approximation error terms.
The existence of a unique order α∗ satisfying
(5.3) was
proved, and its value was determined in Section 4,
see (4.14),
(4.15),
and
(4.16).
Let probability measures wm, q, and vm
in P(Y1n) be
[TABLE]
Then vm is equal to the order α∗
tilted probability measure between wm and q.
Furthermore, the empirical distribution of
the all of the codewords
—i.e.,all Ψ(m)’s— are equal to p
by the hypothesis; thus we have
[TABLE]
On the other hand, ∑m∈Mq(m∈Θ)≤L
by the definition of list decoding.
Thus
at least half of the messages in M
—at least ⌊2M+1⌋ of the messages in M to be precise—
will satisfy q(m∈Θ)≤M2L
as a result of Markov’s inequality.
Applying Lemma 3.4
with E={y1n:m∈Θ(y1n)}
and β=2 for the messages satisfying
q(m∈Θ)≤M2L, we get
[TABLE]
as long as
a2n−2α∗ln4n≥lnΔ.
Then (5.4)
follows from
(4.14),
(4.15),
(5.3),
and the definition error probability as the
average of the conditional error probabilities
of the messages.
∎
5-B Codes On Rényi Symmetric Channels
Definition 5.2**.**
A channel W:X→P(Y) satisfying
W≺ν for some ν∈P(Y)
is Rényi symmetric iff
for each α∈R+ with finite Cα,W
there exists a function GαW(⋅):R→[0,1]
satisfying
[TABLE]
Remark 5.3**.**
If W is Rényi symmetric, then
the identity
lims↓−∞GαW(s)=0
holds
whenever Cα,W is finite.
On the other hand, the identity
lims↑∞GαW(s)=1
is violated
whenever W(x)⊀qα,W,
which can only happen for α’s in (0,1).
Such a Rényi symmetric W is obtained
by removing w, from W described
in [42, Example LABEL:A-eg:singular-countable]
and the resulting GαW is given by
GαW(s)=(\nicefrac12)\mathds1{s≥ln\nicefrac12}.
The Rényi symmetry holds
for all input symmetric channels described in [43, Definition 3.2]
and
for all the Gallager symmetric channels described in [11, p. 94],
see Appendix A.
Recall that the Gallager symmetry holds for all
strongly symmetric (Dobrushin symmetric) channels, which is described in [4].
The binary symmetric channel is strongly symmetric.
The binary erasure channel is Gallager symmetric but not strongly symmetric.
The binary input Gaussian channel is input symmetric according
but not Gallager symmetric.
The Rayleigh fading channel with per coherence interval
power constraint analyzed in [44, (3)] is
Rényi symmetric, by [44, (7) and (10)],
but not input symmetric, see [44, (5)].
Remark 5.4**.**
The input symmetry described in [43, Definition 3.2] can be generalized
by relying on a compact group with the associated Haar measure,
rather than a finite additive group with the uniform distribution.
The Rayleigh fading channel with per coherence interval
power constraint analyzed in [44]
is input symmetric for this more general definition.
The covariant channels analyzed by Holevo in [45],
can be seen as the counterparts of [43, Definition 3.2]
and its generalization in the framework of Quantum Information Theory.
The derivation of the refined SPB for the Rényi symmetric channels
is analogous to the derivation of the refined SPB for the constant composition codes.
Lemma 5.5, given in the following,
is used in lieu of
(4.14),
(4.15),
(4.16)
in the latter derivation.
Lemma 5.5**.**
For any Rényi symmetric channel W:X→P(Y)
with finite C1,W
and rate R in (limα↓0Cα,W,C1,W)
there exists an order α∗∈(0,1) such that
[TABLE]
Furthermore, if either
Wα∗qα∗,W({dνdW(x)=γdνdqα∗,W}x)<1
for all γ∈R+
or qα,W=q for all α∈(0,1], then
[TABLE]
Lemma 5.5 is proved in Appendix B.
Proving the essential assertions of Lemma 5.5
for input symmetric channels, however, is considerably easier:
for any input symmetric channel W
and the uniform probability mass function u
on its input set,
Cα,W=Iα(u;W)
and qα,W=qα,u
for all α∈R+.
Consequently,
the identities given in
(5.6),
(5.7),
and
(5.8)
are nothing but
the identities given in
(4.14),
(4.15),
and
(4.16)
for p=u case
because the Kullback-Leibler divergences on the right-hand-sides of
(5.6)
and
(5.7)
have the same value for all x
by the symmetry.
Hence, (5.8) holds
for any input symmetric channels
satisfying
limα↓0Cα,W<C1,W
by (4.16), as well.
Remark 5.6**.**
If dqα,udW(x)=γ holds W(x)-a.s. for all x
for a (γ,α) pair for an input symmetric W,
then qα,u=∑xu(x)Wηqα,u(x)
for all η.
Thus qη,u=qα,u
and
Iη(u;W)=lnγ
for all η∈R+
by [40, Lemma LABEL:C-lem:information].
Thus such a (γ,α) pair does not exists
for input symmetric W’s satisfying
limα↓0Cα,W=C1,W.
Remark 5.7**.**
Lemma 5.5 is stated under the finite C1,W hypothesis,
yet it holds under the weaker hypothesis limα↑α1−αCα,W, as well.
However, establishing Lemma 5.5 under this weaker hypothesis would require
us to introduce the concepts of power mean, Rényi information, and compactness in the topology of setwise convergence,
see [42, Lemma LABEL:A-lem:capacityEXT-(LABEL:A-capacityEXT-compact-N)].
Theorem 5.8**.**
Let Wt:Xt→P(Yt) be
a Rényi symmetric channel with finite C1,Wt
for all t∈Z+
and n,M,L∈Z+ satisfy
limα↓0Cα,W[1,n]<lnLM<C1,W[1,n].
Then there exists an order α∗∈(0,1) satisfying
[TABLE]
where Uα≜{W[1,n]}αqα,W[1,n]
for all α∈(0,1).
Furthermore any (M,L) channel code
on W[1,n] satisfies
[TABLE]
provided that a2n−2α∗ln4n≥lnΔ where
[TABLE]
and ξαt(xt)=lndνtdWt(xt)−lndνtdqα,Wt
for all α∈(0,1).
Furthermore, if
{Wt}α∗qα∗,Wt({dνtdWt(xt)=γdνtdqα∗,Wt}xt)<1
for all γ∈R+ for some t
or qα,W[1,n]=q for all α∈(0,1],
then α∗=1−Esp′(lnLM,W[1,n])1.
Note that if any of the component channels, i.e.,any of the Wt’s, is an input symmetric
channel satisfying
limα↓0Cα,Wt<C1,Wt,
then
α∗=1−Esp′(lnLM,W[1,n])1
holds as a result of Remark 5.6.
Theorem 5.8 does not assume the channel
to be stationary,
i.e., it holds even when Wt’s are not identical.
To the best of our knowledge, refined sphere packing bounds
have only been reported for stationary channels before
—even in the case of symmetric channels considered in
[4, (1.28)],
[25, Thm. 1], [27, Thm. 14],
[29, Corollary 2],
[31, Thm. 4],
[44, (36), (37b)],
[46, Thm. 1].
For the stationary input symmetric channels Theorem 5.8
is tight both in terms of exponent and prefactor for
rates above the critical rate, provided that
channel is not singular.
For the case of the singular stationary input symmetric channels,
Altuğ and Wagner [46] have recently reported a sharper result,
which generalizes Elias’s result in [2]
for the binary erasure channels.
In order to obtain such results, however,
merely plugging in bounds on binary hypothesis testing
is not enough, see Section 6 for a more detailed discussion.
Remark 5.9**.**
Theorem 5.8 is derived using Lemma 3.4,
which is stated for the product measures.
Lemma 3.4, however, holds for any w and q
for which ξt’s
are independent random variables under wαq.
This condition is satisfied by
the output distributions and
the Augustin centers on the product channels with feedback,
i.e.,by W[1,n](x) and qα,W[1,n],
provided that
the component channels are Rényi symmetric.
Thus Theorem 5.8 holds not just for codes on
the product channel W[1,n]
but also for codes on the product channels with feedback
W[1,n].
Similar observations have been used to establish
the SPB on product channels with feedback in
[47, 48, 18, 19, 46].
The formal definition of the product channels with feedback
and the proof of the SPB on these channels without the symmetry assumptions
can be found in [18, 19, 20].
On the other hand, [40, Lemma LABEL:C-lem:capacityproduct] implies
[TABLE]
Then the product structure
W[1,n](x1n)=⊗t=1nWt(xt)
and the Rényi symmetry of Wt’s
imply the Rényi symmetry
of W[1,n]. In particular
[TABLE]
where ⊛ denotes the convolution and
g’s are the density functions of the corresponding G’s,
i.e.,
g’s and G’s are uniquely determined by each other via
the following relation
[TABLE]
Since W[1,n] is Rényi symmetric the existence of
an order α∗∈(0,1) satisfying (5.9)
follows from (5.6)
of Lemma 5.5.
Let probability measures wm, q, and vm
in P(Y1n) be
[TABLE]
Note that wm=W[1,n](Ψ(m)) by definition,
q=qα∗,W[1,n]
by (5.13),
and
vm is equal to the order α∗
tilted probability measure between wm and q,
which is equal to Uα∗(Ψ(m)),
by construction.
Then Lemma 5.5
implies
[TABLE]
On the other hand, ∑m∈Mq(m∈Θ)≤L
by the definition of list decoding.
Thus
at least half of the messages in M
—at least ⌊2M+1⌋ of the messages in M to be precise—
will satisfy q(m∈Θ)≤M2L
as a result of Markov’s inequality.
Applying Lemma 3.4
with E={y1n:m∈Θ(y1n)}
and β=2 for the messages satisfying
q(m∈Θ)≤M2L, we get
[TABLE]
provided that a2n−2α∗ln4n≥lnΔ.
Then (5.10) follows from
the definition Pe as the
average of Pem.
Note that as a result of (5.14),
gα∗W[1,n] is a Dirac delta function
iff all gα∗Wt’s are.
This observation together with Lemma 5.5
implies the sufficient condition for α∗ to be
1−Esp′(lnLM,W[1,n])1.
∎
5-C Codes On Additive White Gaussian Noise Channels
The additive white Gaussian noise channel with noise variance σ2
is described via the following transition probability
[TABLE]
where φσ2 is the zero-mean Gaussian probability density function of variance σ2, i.e.,
[TABLE]
With a slight abuse of notation,
we denote the corresponding probability measure
—i.e., zero-mean Gaussian probability measure of variance σ2—
by φσ2, as well.
If the cost function is the quadratic one,
then the zero-mean Gaussian distribution is the maximizer for
the Augustin information among all input distributions
satisfying the cost constraint
—for any positive order—
see [40, Example LABEL:C-eg:SGauss], i.e.
[TABLE]
Furthermore, the order α Augustin center
of this channel is a zero-mean Gaussian probability measure.
The closed-form expression for
the Augustin capacity and Augustin center
are derived in
[40, (LABEL:C-eq:eg:SGauss-capacity), (LABEL:C-eq:eg:SGauss-center), (LABEL:C-eq:eg:SGauss-center-variance)]:
[TABLE]
One can confirm the following identity by substitution
[TABLE]
In fact, θα is a root of the equality given above
because of a fixed point property similar to the one described in
(4.4),
see [40, (LABEL:C-eq:eg:SGauss-Augustinoperator), (LABEL:C-eq:eg:SGauss-necessarycondition)]
and the ensuing discussion.
The sphere packing exponent expression resulting from
(5.16),
(5.17),
(5.18),
and (5.19),
is derived in [22, Example LABEL:D-eg:SGauss].
It is given by the following parametric form
in [22, (LABEL:D-eq:eg:SGauss-parametric-rate), (LABEL:D-eq:eg:SGauss-parametric-spe)]:
[TABLE]
Using (5.18) and (5.20),
one can express the unique α∗ whose rate is R as a function of R, as well:
[TABLE]
The equivalence of the parametric form given in
(5.20) and (5.21)
to the expression given by Gallager [11, (7.4.33)]
can be confirmed by substitution using (5.22).
One can also confirm using
(5.18) and (5.19)
in
(5.20) and (5.21) that
[TABLE]
Furthermore, codes on additive white Gaussian noise channels satisfy
both (1.1) and (5.1)
as a result of Shannon’s [3, (3)];
this is established in Appendix C,
for completeness.
Theorem 5.10, which is establishing the refined SPB given in
(5.1), is proved using principles and analysis similar
to those used in the proofs of Theorems 5.1 and 5.8,
which are quite different from the ones employed in [3].
Theorem 5.10**.**
Let σ and ϱ be positive reals,
n, M, and L be positive integers satisfying
R∈(0,21lnσ2σ2+ϱ)
for R=n1lnLM,
and Wt be an additive white Gaussian noise channel with
noise variance σ2, say W, for all t.
Then any (M,L) channel code (Ψ,Θ)
on W[1,n] satisfying
∑t=1n(Ψt(m))2=nϱ
for all messages m in the message set M
satisfies
[TABLE]
for α∗ given in (5.22)
provided that
a2n−2α∗lnn≥Δ,
where
[TABLE]
Before proving of Theorem 5.10, let us briefly
discuss its implications.
Theorem 5.10 bounds the performance of codes satisfying
an equality cost constraint, but it can also be used to bound the performance
of codes satisfying an inequality cost constraint.
In particular, Shannon has observed in [3, (83)] that
[TABLE]
where Pe(n,ϱ) is the infimum of
error probabilities of (M,L) channel codes
satisfying ∑t=1n(Ψt(m))2≤nϱ
and Pe(n,ϱ) is the analogous quantity for
the constraint ∑t=1n(Ψt(m))2=nϱ.
The first inequality of (5.25) holds
because any code satisfying
the equality constraint also satisfies the inequality constraint.
The second inequality of (5.25)
is confirmed by considering
an extension of codewords by one additional symbol, Ψn+1(m),
so as to satisfy
∑t=1n+1(Ψt(m))2=(n+1)ϱ.
Recently, Vazquez-Vilar have improved (5.25)
in [49, Proposition 1]
by observing that the same extension can be constructed for the constraint
∑t=1n+1(Ψt(m))2=nϱ, as well.
Thus, we have
[TABLE]
One can use Theorem 5.10 together with either (5.25) or (5.26)
to determine prefactor for codes satisfying the cost constraint with an inequality.
For that let us first note that Esp(R,W,ϱ) is convex in the rate R
as a result of (5.22) and (5.23)
because α∗ is increasing monotonically with R on [0,C1,W,ϱ].
Then Esp(R,W,ϱ) lies above its tangent at any point and
(5.23) implies
[TABLE]
Applying Theorem 5.10 at (n+1) —rather than n—
together with (5.25) and
invoking (5.27)
for R0=n+11lnLM and R1=R,
we get the following bound
for (M,L) channel codes (Ψ,Θ) satisfying
∑t=1n(Ψt(m))2≤nϱ
[TABLE]
where α0∗=α∗(R0) and a2, a3, and Δ are
calculated at α0∗, rather than α∗,
provided that
a2(n+1)−2α0∗ln(n+1)≥Δ.
Note that
∣R−R0∣=n+1R
and the function
α∗ given in (5.22)
is analytical in the rate R;
hence ∣α0∗−α∗∣ is O(n−1).
Thus
[TABLE]
for some constant A∈R+ and
(5.1),
holds not only for codes satisfying
∑t=1n(Ψt(m))2=nϱ
but also for codes satisfying
∑t=1n(Ψt(m))2≤nϱ,
for the order α∗ given in (5.22),
for some constant A∈R+ for n large enough.
The following expressions for the order one Rényi divergence and
the order α tilted channel for the zero-mean Gaussian distribution
of variance θ can be confirmed by substitution
[TABLE]
for all
x∈R,
θ∈R+,
and E∈B(R).
Then as a result of
(5.19),
(5.20),
(5.21),
and
(5.22),
we have
[TABLE]
where R=n1lnLM.
Let probability measures wm, q, and vm
in P(Y1n) be
[TABLE]
Then vm is the order α∗
tilted probability measure between wm and q;
using the hypothesis
∑t=1n(Ψt(m))2=nϱ
together with (5.30) and (5.31),
we get
[TABLE]
In order to obtain (5.24), we will apply Lemma 3.4
to the probability measure pairs (wm,q) satisfying
q(m∈Θ)≤M2L for β=2
together with a rotation on Rn
that minimizes the approximation error terms arising from the absolute
third order moments.
On the other hand, moments and absolute moments of a zero-mean Gaussian random variable Z
with variance σ2 satisfy the following identities
[TABLE]
where κ!!=∏=0⌈2κ⌉−1(κ−2).
Furthermore, for any three random variables
Z1, Z2, and Z3,
we have101010The inequality given in (5.35)
follows from the Hölder’s inequality via the observation that
the geometric mean is less than the arithmetic mean.
A proof is presented in Appendix D
for completeness.
[TABLE]
Then using (5.34), one can confirm by substitution that
[TABLE]
The hypothesis ∑t=1n(Ψt(m))2=nϱ
implies the a2 of Lemma 3.4 to be equal to
the a2 of Theorem 5.10.
One, however, cannot assert the analogous relation for a3’s.
Nevertheless, there exists a rotation in Rn, say Sm
such that SmΨ(m) is equal to vector whose all entries are ϱ,
i.e.
[TABLE]
Note that for w∗≜⨂t=1nW(ϱ),
we have
[TABLE]
Thus one can apply Lemma 3.4 to the pair (w∗,q) in
order to bound Pem. For the pair (w∗,q),
however, a3 of Lemma 3.4
is equal to the a3 of of Theorem 5.10.
As it was the case for the proofs of Theorems 5.1 and
5.8, ∑m∈Mq(m∈Θ)≤L
by the definition of list decoding.
Thus at least half of the messages in M
will satisfy q(m∈Θ)≤M2L
as a result of Markov’s inequality.
Applying Lemma 3.4
with E={y1n:m∈Θ(y1n)}
and β=2 for the messages satisfying
q(m∈Θ)≤M2L
and using (5.32) and (5.33), we get
[TABLE]
as long as
a2n−2α∗ln4n≥lnΔ.
Then (5.24)
follows from the definition of Pe as the
average of the conditional error probabilities.
∎
6 Discussion
Theorems 5.1, 5.8, 5.10
establish refined sphere packing bounds, i.e.,bounds of the form (5.1),
for fixed composition codes on stationary memoryless channels,
codes on (possibly) non-stationary Rényi symmetric channels,
and cost-constrained codes on additive white Gaussian noise channels with the quadratic cost function.
Derivations of Theorems 5.1, 5.8, 5.10
rely on the properties of Augustin’s information measures
and the application of Berry-Esseen theorem to the hypothesis testing problem
summarized in Lemma 3.4.
For certain cases including
the additive white Gaussian channels [3]
and
the strongly symmetric channels [4],
these bounds are known to be tight in the sense that
they can be matched by achievability results asserting
the existence of codes satisfying
[TABLE]
for rates between the critical rate and the capacity of the channel.
Recently, Altuğ and Wagner [46, Theorem 1]
have generalized the results of
[2] and [4] and
established (5.1) and (6.1) for all
non-singular Gallager symmetric channels.
At least since [2], it is also known
that for the binary erasure channel
the polynomial prefactor of (5.1) can be improved
from n\nicefrac(Esp′(R)−1)2 to
n\nicefrac−12.
Recently, Altuğ and Wagner proved
this result for all singular Gallager symmetric channels,
[46, Theorem 2].
Both [2] and [46],
however, have refrained from relying on bounds on
the performance of the binary hypothesis testing problem
with independent samples.
This is not surprising because
Lemma 3.4 characterizes the
prefactor for the binary hypothesis testing problem
with independent samples, exactly.
Thus the refined SPBs of the form (5.1)
are the best possible bounds
for derivations of the SPB relying on the asymptotic
behavior of sums of independent random variables,
notwithstanding their suboptimality for
singular Gallager symmetric channels.
As pointed out in Section 3,
one can improve Lemma 3.4 and determine not only
the prefactor but also the asymptotic constant in
the tradeoff between the probabilities of type I and type II errors,
either by invoking finer characterizations of the asymptotic behavior
of sums of independent random
variables or by applying a saddle point approximation.
Although such results, e.g. [36, 29],
require stronger hypothesis and are more
nuanced,111111Even the statement of these results are more nuanced
because they need to distinguish the lattice and non-lattice cases
for the random variables involved in the analysis.
they are important in the context of binary hypothesis testing.
From the standpoint of the channel coding problem, however,
it is rather hard to justify the extra effort such an analysis requires.
First of all, the corresponding achievability results will have different
constants, even when the prefactors match,
as observed in [2, 3, 4].
More importantly, such refined results on binary hypothesis testing
will also suffer from the subtlety discussed in the previous
paragraph for the case of the singular channels,
i.e.,their prefactor will be suboptimal for the singular
channels, such as the binary erasure channel.
The principal novelty of this manuscript is the use of
the Berry-Essen theorem via suitable Augustin information measures
to bound the optimal error probability in the channel coding problem.
In this manuscript, our primary focus was the rates below
the channel capacity;
thus, we have derived refined sphere packing bounds.
The same idea can be used to strengthen the strong converse bounds
under similar symmetry hypothesis, as it has recently been demonstrated
in [50].
The essential technical challenge in this line of work is the
derivation of the refined SPBs and the refined strong converses
without any symmetry assumptions on the channel or on the codes.
Appendix A Rényi Symmetry is implied by input symmetry and Gallager Symmetry
In the following, we will explain briefly why the Rényi symmetry holds both
for all input symmetric channels described in [43, Definition 3.2]
and for all Gallager symmetric channels described in [11, p. 94].
Let us start with the input symmetric channels.
Let u be the uniform distribution
on the input set of the input symmetric W
and let qα be
[TABLE]
Then the using the input symmetry one can confirm that
[TABLE]
The definitions of the tilted channel, (A.1),
and (A.2) imply
qα=∑xu(x)Wαqα(x)
and q1,u≺qα.
Thus qα is the order α Augustin mean for the uniform
input distribution
and
Iα(u;W)=Dα(W∥qα∣u)
by [40, Lemma LABEL:C-lem:information].
Hence
[TABLE]
Since Dα(W∥qα∣p)=Iα(u;W)
for all p∈P(X) by (A.2)
and (A.3),
the probability measure qα is not only the order α
Augustin mean for the input distribution u
but also the order α Augustin center of W by
[40, Thm. LABEL:C-thm:minimax].
Then the constraint for the Rényi symmetry follows from
the definition of qα and the input symmetry.
Thus every input symmetric channel W is also a
Rényi symmetric channel.
Now let us considers a Gallager symmetric channel W.
Let S1,…,Sm be the partition of the output set Y
assumed in the definition of Gallager symmetry, e.g., [11, p. 94],
u be the uniform distribution on the input set X of W,
and
μα and qα be the measures defined in (A.1).
Gallager symmetry implies not only that μα and qα
are probability mass functions but also that they satisfy the following identities:
[TABLE]
Note that μα(y)=μα(z), and hence qα(y)=qα(z),
whenever y and z are in the same S
as a result of (A.4).
Using this fact together with the Gallager symmetry,
one can confirm both (A.2)
and qα=∑xu(x)Wαqα(x).
On the other hand, q1,u≺qα.
Thus qα is the order α Augustin mean for the uniform
input distribution
and
Iα(u;W)=Dα(W∥qα∣u)
by [40, Lemma LABEL:C-lem:information].
Hence (A.3) holds for Gallager symmetric W,
as well.
Thus Dα(W∥qα∣p)=Iα(u;W)
for all p∈P(X)
and qα is not only the order α
Augustin mean for the input distribution u
but also the order α Augustin center of W by
[40, Thm. LABEL:C-thm:minimax].
Then the constraint for the Rényi symmetry follows from
the definition of qα given in (A.1),
the Gallager symmetry and (A.4).
Thus every Gallager symmetric channel W is
also a Rényi symmetric channel.
We first prove the existence of an order α∗ in (0,1) satisfying
(5.6) using the intermediate value theorem.
The Rényi symmetry of W implies
[TABLE]
for all x∈X, η∈(0,1], and
α∈R+.
Then (4.7), (4.8),
and (4.9) imply
[TABLE]
for all x∈X
and α∈(0,1].
Thus the non-negativity of the Rényi divergence
and (3.2) imply
[TABLE]
for all x∈X and α∈(0,1).
On the other hand the Pinsker’s inequality
imply for all x∈X and α∈(0,1)
[TABLE]
Thus limα↑1Wαqα,W(x)−W(x)=0
for all x∈X as a result of (B.4).
On the other hand, the Augustin center qα,W
is continuous in α on (0,1] for the total
variation topology on P(Y) by
[40, Lemmas LABEL:C-lem:capacityO-(LABEL:C-capacityO-continuity)
and LABEL:C-lem:centercontinuity].
Then the lower semicontinuity of the Rényi divergence
in its arguments for the topology of setwise convergence,
i.e.,[32, Thm. 15], implies
Furthermore, D1(Wαqα,W(x)qα,W)
is continuous in α on (0,1)
by [19, Lemma LABEL:B-lem:tilting]
because qα,W
is continuous in α on (0,1] for the total
variation topology.
Then (B.3) implies
[TABLE]
The existence of an α∗∈(0,1) satisfying
(5.6) follows from
(B.5), (B.6),
and the continuity of D1(Wαqα,W(x)qα,W)
in α on (0,1)
by the intermediate value theorem [41, 4.23].
We proceed with showing that any order α∗ satisfying
(5.6)
also satisfies (5.7).
The definition of the SPE given
in (4.11)
and
the consequence of the Rényi symmetry given in (B.1),
imply that
[TABLE]
where f(α,τ)≜α(1−α)[Dα(W(x)∥qα∗,W)−τ].
We show in the following that
[TABLE]
Then for any order α∗ satisfying (5.6),
equations (3.2), (B.2)
and the definition of SPE given in (4.11) imply
[TABLE]
Thus the inequalities given in
(B.7) and (B.9)
hold as equalities and α∗ satisfying (5.6)
also satisfies (5.7) by
(B.8).
Now we prove the identity given in (B.8),
which we have assumed in the preceding.
The Rényi divergence is non-decreasing in its order
by [32, Thm. 3]
and Dα(W(x)∥qα∗,W)=1−ααD1−α(qα∗,W∥W(x))
for all α in (0,1) by definition.
Then Dα(W(x)∥qα∗,W) is finite for all
α in (0,1).
Thus both
Dα(W(x)∥qα∗,W)
and
D1(Wαqα∗,W(x)qα∗,W)
are continuously differentiable in α on (0,1)
by [40, Lemma LABEL:C-lem:analyticity]
and their derivatives on (0,1) are
[TABLE]
where ξx=lndνdW(x)−lndνdqα∗,W. Then
[TABLE]
Consequently,
(5.6) implies (B.8)
by the derivative test
provided that
ξx=γ does not hold Wαqα∗,W(x)-a.s.
for any γ∈R+ and α∈(0,1).
If Wαqα∗,W({ξx=γ}∣x)=1
for some γ∈R+ and α∈(0,1), on the other hand, then
the identities (B.13), (B.14),
and (B.15) hold for all α∈(0,1)
and one can confirm (B.8) by substitution.
[TABLE]
Now we are left with establishing (5.8) with
either of the additional hypotheses.
Let us first assume that there does not exists a γ satisfying
Wα∗qα∗,W({dνdW(x)=γdνdqα∗,W}x)=1.
Then VWαqα∗,W(x)[ξx]>0
for all α in (0,1)
and thus D1(Wαqα∗,W(x)qα∗,W)
is increasing in α on (0,1).
Since D1(Wαqα∗,W(x)qα∗,W)
is also continuous in α on (0,1),
it has a continuous and increasing inverse function.
Then as a result of (B.12)
there exists an ϵ>0 and an increasing continuous function
h:(R−ϵ,R+ϵ)→(0,1)
satisfying
Then (5.8) follows from
(B.16), (B.17), (B.18),
the identity h(R)=α∗,
the definition of the derivative as a limit,
and the continuity of h(⋅), which is defined as the inverse function of
D1(Wαqα∗,W(x)qα∗,W)
as a function of α.
Now we establish (5.8) assuming
the existence of a q satisfying
qα,W=q for all α in (0,1).
If there does not exists a γ satisfying
Wα∗q({dνdW(x)=γdνdq}x)=1,
then (5.8) holds by the preceding discussion.
If there exists such a γ , then as a result of
(B.2) and (B.13) we have
[TABLE]
Then such a γ does not exist by
the hypotheses of the lemma because
(B.19)
and C1,W=limα↑1Cα,W
imply C1,W=∞ for
W({ξx=lnγ}∣x)<1 case and
limα↓0Cα,W=C1,W
for W({ξx=lnγ}∣x)=1
case.
Appendix C Shannon’s Bounds For AWGN Channels and The Sphere Packing Exponent
Shannon, [3, (3)], bounded the error probability of length n block codes
described in Theorem 5.10 as
[TABLE]
where
Ω(⋅):[0,π]→[0,Γ(n/2)2πn/2]
is the function mapping the cone angle
to the corresponding solid angle in Rn,
θ is the cone angle satisfying
Ω(θ)=MΩ(π),
and
Q(ξ) is the probability that a point X in
Rn
at a distance nϱ
from the origin O being moved outside a circular cone of half-angle
ξ with the vertex at the origin O and the axis at OX
by a Gaussian noise of variance σ2.
Shannon, [3, (4) and (5)], derived
the exact asymptotic expressions for both
the upper bound and the lower bound given in (C.1)
in terms of functions f(⋅) and g(⋅) that do not depend on
the block length n:
[TABLE]
where θc and θcr —the cone angles corresponding to
the channel capacity and the critical rate—
are given by
[TABLE]
and EL(⋅) —the fixed cone angle exponent— is defined via the function G(⋅) as follows
[TABLE]
Remark C.1**.**
Shannon’s notation in [3] is slightly different from ours; Shannon works with
the signal to noise “amplitude” ratio A≜σϱ,
rather than cost constraint ϱ and the noise power σ2.
Furthermore, Shannon specifies the critical cone angle θcr
as the solution of the equation given in (C.8).
Nevertheless, one obtains the closed-form expression given in (C.5),
by plugging in the definition of G(⋅) —given in (C.7)—
in (C.8) and solving the resulting quadratic
equation for sin2θcr.
[TABLE]
Shannon presented the exact asymptotic expression for the rate R in terms of
the cone angle θ in [3, (11)], as well:
[TABLE]
We obtain the fixed-rate asymptotic expression
corresponding to the fixed cone angle asymptotic expressions
(C.2), (C.3),
and (C.6),
by first deriving the asymptotic expression for the cone angle
θ for a fixed-rate R using (C.9).
If ∣δn∣≪1 and
[TABLE]
then using the small angle approximation for the trigonometric functions
we get
[TABLE]
Invoking ln(1+ϵ)=ϵ+O(ϵ2), we get
[TABLE]
Consequently, if
[TABLE]
then the rate corresponding to the cone angle θn
at the block length n is R+O(n2ln2n).
In other words, we get a fixed-rate by changing the cone angle by an additive factor proportional
to nlnn.
In order to obtain the exact asymptotic expressions for the
upper and lower bounds to the error probability given in (C.1)
via (C.2) and (C.3) at
a fixed-rate R, we will apply Taylor’s expansion.
To that end, we first calculate the derivatives of G and
EL.
As a result of (C.7), we have
where Esp′(R)=∂s∂Esp(s,W,ϱ)s=R.
Then [3, (3)], i.e.,(C.1),
imply both (1.1) and (5.1), because
(C.1)
is nothing but (1.1) and (5.1)
for certain multiplicative constants.
Note that for any three random variables
Z1, Z2, and Z3,
[TABLE]
On the other hand, as a result of Hölder’s inequality we have
[TABLE]
Furthermore, since the geometric mean is upper bounded by
the arithmetic mean, we also have
[TABLE]
Thus
[TABLE]
Acknowledgment
The author would like to thank Hao-Chung Cheng both
for numerous inspiring discussions on the sphere packing bound and its refinements,
which helped the author to simplify and improve the statement of Lemma 3.4,
and for his feedback on the manuscript.
The author would also like to thank the reviewer for pointing out [45]
and for his feedback on the manuscript.
Bibliography50
The reference list from the paper itself. Each links out to its DOI / PubMed record.
1[1] B. Nakiboğlu. A Simple Derivation of the Refined SPB for the Constant Composition Codes. In 2019 IEEE International Symposium on Information Theory (ISIT) , pages 2659–2663, Paris, France, July 2019.
2[2] P. Elias. Coding for two noisy channels. In Proceedings of Third London Symposium of Information Theory , pages 61–74, London, 1955. Butterworth Scientific.
3[3] C. E. Shannon. Probability of error for optimal codes in a Gaussian channel. The Bell System Technical Journal , 38(3):611–656, May 1959.
4[4] R. Dobrushin. Asymptotic estimates of the probability of error for transmission of messages over a discrete memoryless communication channel with a symmetric transition probability matrix. Theory of Probability & Its Applications , 7(3):270–300, 1962.
5[5] R. G. Gallager. A simple derivation of the coding theorem and some applications. IEEE Transactions on Information Theory , 11(1):3–18, Jan. 1965.
6[6] C. E. Shannon, R. G. Gallager, and E. R. Berlekamp. Lower bounds to error probability for coding on discrete memoryless channels. I. Information and Control , 10(1):65–103, 1967.
7[7] E. A. Haroutunian. Bounds for the exponent of the probability of error for a semicontinuous memoryless channel. Problems of Information Transmission , 4(4):29–39, 1968.
8[8] U. Augustin. Error estimates for low rate codes. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete , 14(1):61–88, 1969.