This paper investigates the continuity properties of key channel parameters and operations in the space of equivalent discrete memoryless channels, providing foundational insights for information theory and channel coding.
Contribution
It establishes the continuity of mutual information, capacity, Bhattacharyya parameter, and various channel operations under different topologies on DMCs, advancing theoretical understanding.
Findings
01
Mutual information and capacity are continuous under various topologies.
02
Channel operations like sums, products, and transformations are continuous.
03
Key error probabilities are continuous functions of channel parameters.
Abstract
We study the continuity of many channel parameters and operations under various topologies on the space of equivalent discrete memoryless channels (DMC). We show that mutual information, channel capacity, Bhattacharyya parameter, probability of error of a fixed code, and optimal probability of error for a given code rate and blocklength, are continuous under various DMC topologies. We also show that channel operations such as sums, products, interpolations, and Ar{\i}kan-style transformations are continuous.
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Full text
Continuity of Channel Parameters and Operations under Various DMC Topologies
We study the continuity of many channel parameters and operations under various topologies on the space of equivalent discrete memoryless channels (DMC). We show that mutual information, channel capacity, Bhattacharyya parameter, probability of error of a fixed code, and optimal probability of error for a given code rate and blocklength, are continuous under various DMC topologies. We also show that channel operations such as sums, products, interpolations, and Arıkan-style transformations are continuous.
I Introduction
Let X and Y be two finite sets and let W be a fixed channel with input alphabet X and output alphabet Y. It is well known that the input-output mutual information is continuous on the simplex of input probability distributions. Many other parameters that depend on the input probability distribution were shown to be continuous on the simplex in [1].
Polyanskiy studied in [2] the continuity of the Neyman-Pearson function for a binary hypothesis test that arises in the analysis of channel codes. He showed that for arbitrary input and output alphabets, this function is continuous in the input distribution in the total variation topology. He also showed that under some regularity assumptions, this function is continuous in the weak-∗ topology.
If X and Y are finite sets, the space of channels with input alphabet X and output alphabet Y can naturally be endowed with the topology of the Euclidean metric, or any other equivalent metric. It is well known that the channel capacity is continuous in this topology. If X and Y are arbitrary, one can construct a topology on the space of channels using the weak-∗ topology on the output alphabet. It was shown in [3] that the capacity is lower semi-continuous in this topology.
The continuity results that are mentioned in the previous paragraph do not take into account “equivalence” between channels. Two channels are said to be equivalent if they are degraded from each other. This means that each channel can be simulated from the other by local operations at the receiver. Two channels that are degraded from each other are completely equivalent from an operational point of view: both channels have exactly the same probability of error under optimal decoding for any fixed code. Moreover, any sub-optimal decoder for one channel can be transformed to a sub-optimal decoder for the other channel with the same probability of error and essentially the same computational complexity. This is why it makes sense, from an information-theoretic point of view, to identify equivalent channels and consider them as one point in the space of “equivalent channels”.
In [4], equivalent binary-input channels were identified with their L-density (i.e., the density of log-likelihood ratios). The space of equivalent binary-input channels was endowed with the topology of convergence in distribution of L-densities. Since the symmetric capacity111The symmetric capacity is the input-output mutual information with uniformly distributed input. and the Bhattacharyya parameter can be written as an integral of a continuous function with respect to the L-density [4], it immediately follows that these parameters are continuous in the L-density topology.
In [5], many topologies were constructed for the space of equivalent channels sharing a fixed input alphabet. In this paper, we study the continuity of many channel parameters and operations under these topologies.
In Section II, we introduce the preliminaries for this paper. In Section III, we recall the main results of [5] that we need here. In Section IV, we introduce the channel parameters and operations that we investigate in this paper. In Section V, we study the continuity of these parameters and operations in the quotient topology of the space of equivalent channels with fixed input and output alphabets. The continuity in the strong topology of the space of equivalent channels sharing the same input alphabet is studied in Section VI. Finally, the continuity in the noisiness/weak-∗ and the total variation topologies is studied in Section VII.
II Preliminaries
We assume that the reader is familiar with the basic concepts of general topology. The main concepts and theorems that we need can be found in the preliminaries section of [5].
II-A Set-theoretic notations
For every integer n≥1, we denote the set {1,…,n} as [n].
The set of mappings from a set A to a set B is denoted as BA.
Let A be a subset of B. The indicator mapping\mathds1A,B:B→{0,1} of A in B is defined as:
[TABLE]
If the superset B is clear from the context, we simply write \mathds1A to denote the indicator mapping of A in B.
The power set of B is the set of subsets of B. Since every subset of B can be identified with its indicator mapping, we denote the power set of B as {0,1}B=2B.
Let (Ai)i∈I be a collection of arbitrary sets indexed by I. The disjoint union of (Ai)i∈I is defined as i∈I∐Ai=i∈I⋃(Ai×{i}). For every i∈I, the ith-canonical injection is the mapping ϕi:Ai→j∈I∐Aj defined as ϕi(xi)=(xi,i). If no confusions can arise, we can identify Ai with Ai×{i} through the canonical injection. Therefore, we can see Ai as a subset of j∈I∐Aj for every i∈I.
Let R be an equivalence relation on a set T. For every x∈T, the set x^={y∈T:xRy} is the R-equivalence class of x. The collection of R-equivalence classes, which we denote as T/R, forms a partition of T, and it is called the quotient space of T by R. The mapping ProjR:T→T/R defined as ProjR(x)=x^ for every x∈T is the projection mapping onto T/R.
II-B Topological notations
A topological space (T,U) is said to be contractible to x0∈T if there exists a continuous mapping H:T×[0,1]→T such that H(x,0)=x and H(x,1)=x0 for every x∈T, where [0,1] is endowed with the Euclidean topology. (T,U) is strongly contractible to x0∈T if we also have H(x0,t)=x0 for every t∈[0,1].
Intuitively, T is contractible if it can be “continuously shrinked” to a single point x0. If this “continuous shrinking” can be done without moving x0, T is strongly contractible.
Note that contractibility is a very strong notion of connectedness: every contractible space is path-connected and simply connected. Moreover, all its homotopy, homology and cohomology groups of order ≥1 are zero.
Let {(Ti,Ui)}i∈I be a collection of topological spaces indexed by I. The product topology on i∈I∏Ti is denoted by i∈I⨂Ui. The disjoint union topology on i∈I∐Ti is denoted by i∈I⨁Ui.
The following lemma is useful to show the continuity of many functions.
Lemma 1**.**
Let (S,V) and (T,U) be two compact topological spaces and let f:S×T→R be a continuous function on S×T. For every s∈S and every ϵ>0, there exists a neighborhood Vs of s such that for every s′∈Vs, we have
Let (T,U) be a topological space and let R be an equivalence relation on T. The quotient topology on T/R is the finest topology that makes the projection mapping ProjR continuous. It is given by
[TABLE]
Lemma 2**.**
Let f:T→S be a continuous mapping from (T,U) to (S,V). If f(x)=f(x′) for every x,x′∈T satisfying xRx′, then we can define a transcendent mappingf:T/R→S such that f(x^)=f(x′) for any x′∈x^. f is well defined on T/R . Moreover, f is a continuous mapping from (T/R,U/R) to (S,V).
Let (T,U) and (S,V) be two topological spaces and let R be an equivalence relation on T. Consider the equivalence relation R′ on T×S defined as (x1,y1)R′(x2,y2) if and only if x1Rx2 and y1=y2. A natural question to ask is whether the canonical bijection between \big{(}(T/R)\times S,(\mathcal{U}/R)\otimes\mathcal{V}\big{)} and \big{(}(T\times S)/R^{\prime},(\mathcal{U}\otimes\mathcal{V})/R^{\prime}\big{)} is a homeomorphism. It turns out that this is not the case in general. The following theorem, which is widely used in algebraic topology, provides a sufficient condition:
Theorem 1**.**
[6]**
If (S,V) is locally compact and Hausdorff, then the canonical bijection between \big{(}(T/R)\times S,(\mathcal{U}/R)\otimes\mathcal{V}\big{)} and \big{(}(T\times S)/R^{\prime},(\mathcal{U}\otimes\mathcal{V})/R^{\prime}\big{)} is a homeomorphism.
Corollary 1**.**
Let (T,U) and (S,V) be two topological spaces, and let RT and RS be two equivalence relations on T and S respectively. Define the equivalence relation R on T×S as (x1,y1)R(x2,y2) if and only if x1RTx2 and y1RSy2. If (S,V) and (T/RT,U/RT) are locally compact and Hausdorff, then the canonical bijection between \big{(}(T/R_{T})\times(S/R_{S}),(\mathcal{U}/R_{T})\otimes(\mathcal{V}/R_{S})\big{)} and \big{(}(T\times S)/R,(\mathcal{U}\otimes\mathcal{V})/R\big{)} is a homeomorphism.
Proof.
We just need to apply Theorem 1 twice. Define the equivalence relation RT′ on T×S as follows: (x1,y1)RT′(x2,y2) if and only if x1RTx2 and y1=y2. Since (S,V) is locally compact and Hausdorff, Theorem 1 implies that the canonical bijection from \big{(}(T/R_{T})\times S,(\mathcal{U}/R_{T})\otimes\mathcal{V}\big{)} to \big{(}(T\times S)/R_{T}^{\prime},(\mathcal{U}\otimes\mathcal{V})/R_{T}^{\prime}\big{)} is a homeomorphism. Let us identify these two spaces through the canonical bijection.
Now define the equivalence relation RS′ on (T/RT)×S as follows: (x^1,y1)RS′(x^2,y2) if and only if x^1=x^2 and y1RSy2. Since (T/RT,U/RT) is locally compact and Hausdorff, Theorem 1 implies that the canonical bijection from \big{(}(T/R_{T})\times(S/R_{S}),(\mathcal{U}/R_{T})\otimes(\mathcal{V}/R_{S})\big{)} to \big{(}((T/R_{T})\times S)/R_{S}^{\prime},((\mathcal{U}/R_{T})\otimes\mathcal{V})/R_{S}^{\prime}\big{)} is a homeomorphism.
Since we identified \big{(}(T/R_{T})\times S,(\mathcal{U}/R_{T})\otimes\mathcal{V}\big{)} and \big{(}(T\times S)/R_{T}^{\prime},(\mathcal{U}\otimes\mathcal{V})/R_{T}^{\prime}\big{)} through the canonical bijection (which is a homeomorphism), RS′ can be seen as an equivalence relation on \big{(}(T\times S)/R_{T}^{\prime},(\mathcal{U}\otimes\mathcal{V})/R_{T}^{\prime}\big{)}. It is easy to see that the canonical bijection from \big{(}\big{(}(T\times S)/R_{T}^{\prime}\big{)}/R_{S}^{\prime},\big{(}(\mathcal{U}\otimes\mathcal{V})/R_{T}^{\prime}\big{)}/R_{S}^{\prime}\big{)} to \big{(}(T\times S)/R,(\mathcal{U}\otimes\mathcal{V})/R\big{)} is a homeomorphism. We conclude that the canonical bijection from \big{(}(T/R_{T})\times(S/R_{S}),(\mathcal{U}/R_{T})\otimes(\mathcal{V}/R_{S})\big{)} to \big{(}(T\times S)/R,(\mathcal{U}\otimes\mathcal{V})/R\big{)} is a homeomorphism.
∎
II-D Measure-theoretic notations
If (M,Σ) is a measurable space, we denote the set of probability measures on (M,Σ) as P(M,Σ). If the σ-algebra Σ is known from the context, we simply write P(M) to denote the set of probability measures.
If P∈P(M,Σ) and {x} is a measurable singleton, we simply write P(x) to denote P({x}).
For every P1,P2∈P(M,Σ), the total variation distance between P1 and P2 is defined as:
[TABLE]
The push-forward probability measure
Let P be a probability measure on (M,Σ), and let f:M→M′ be a measurable mapping from (M,Σ) to another measurable space (M′,Σ′). The push-forward probability measure of P by f is the probability measure f#P on (M′,Σ′) defined as (f#P)(A′)=P(f−1(A′)) for every A′∈Σ′.
A measurable mapping g:M′→R is integrable with respect to f#P if and only if g∘f is integrable with respect to P. Moreover,
[TABLE]
The mapping f# from P(M,Σ) to P(M′,Σ′) is continuous if these spaces are endowed with the total variation topology:
[TABLE]
Probability measures on finite sets
We always endow finite sets with their finest σ-algebra, i.e., the power set. In this case, every probability measure is completely determined by its value on singletons, i.e., if P is a probability measure on a finite set X, then for every A⊂X, we have
[TABLE]
If X is a finite set, we denote the set of probability distributions on X as ΔX. Note that ΔX is an (∣X∣−1)-dimensional simplex in RX. We always endow ΔX with the total variation distance and its induced topology. For every p1,p2∈ΔX, we have:
[TABLE]
Products of probability measures
We denote the product of two measurable spaces (M1,Σ1) and (M2,Σ2) as (M1×M2,Σ1⊗Σ2). If P1∈P(M1,Σ1) and P2∈P(M2,Σ2), we denote the product of P1 and P2 as P1×P2.
If P(M1,Σ1), P(M2,Σ2) and P(M1×M2,Σ1⊗Σ2) are endowed with the total variation topology, the mapping (P1,P2)→P1×P2 is a continuous mapping (see Appendix B).
Borel sets and the support of a probability measure
Let (T,U) be a Hausdorff topological space. The Borel σ-algebra of (T,U) is the σ-algebra generated by U. We denote the Borel σ-algebra of (T,U) as B(T,U). If the topology U is known from the context, we simply write B(T) to denote the Borel σ-algebra. The sets in B(T) are called the Borel sets of T.
The support of a measure P∈P(T,B(T)) is the set of all points x∈T for which every neighborhood has a strictly positive measure:
[TABLE]
If P is a probability measure on a Polish space, then P\big{(}T\setminus\operatorname*{supp}(P)\big{)}=0.
II-E Random mappings
Let M and M′ be two arbitrary sets and let Σ′ be a σ-algebra on M′. A random mapping from M to (M′,Σ′) is a mapping R from M to P(M′,Σ′). For every x∈M, R(x) can be interpreted as the probability distribution of the random output given that the input is x.
Let Σ be a σ-algebra on M. We say that R is a measurable random mapping from (M,Σ) to (M′,Σ′) if the mapping RB:M→R defined as RB(x)=(R(x))(B) is measurable for every B∈Σ′.
Note that this definition of measurability is consistent with the measurability of ordinary mappings: let f be a mapping from M to M′ and let Df:M→P(M′,Σ′) be the random mapping defined as Df(x)=δf(x) for every x∈M, where δf(x)∈P(M′,Σ′) is a Dirac measure centered at f(x). We have:
[TABLE]
where (a) and (b) follow from the fact that ((Df)B)(x) is either 1 or 0 depending on whether f(x)∈B or not.
Let P be a probability measure on (M,Σ) and let R be a measurable random mapping from (M,Σ) to (M′,Σ′). The push-forward probability measure of P by R is the probability measure R#P on (M′,Σ′) defined as:
[TABLE]
Note that this definition is consistent with the push-forward of ordinary mappings: if f and Df are as above, then for every B∈Σ′, we have
[TABLE]
Proposition 1**.**
Let R be a measurable random mapping from (M,Σ) to (M′,Σ′). If g:M′→R+∪{+∞} is a Σ′-measurable mapping, then the mapping x→∫M′g(y)⋅d(R(x))(y) is a measurable mapping from (M,Σ) to R+∪{+∞}. Moreover, for every P∈P(M,Σ), we have
If g:M′→R is bounded and Σ′-measurable, then the mapping
[TABLE]
is bounded and Σ-measurable. Moreover, for every P∈P(M,Σ), we have
[TABLE]
Proof.
Write g=g+−g− (where g+=max{g,0} and g−=max{−g,0}), and use the fact that every bounded measurable function is integrable over any probability distribution.
∎
Lemma 3**.**
For every measurable random mapping R from (M,Σ) to (M′,Σ′), the push-forward mapping R# is continuous from P(M,Σ) to P(M′,Σ′) under the total variation topology.
Let U be a Polish222This assumption can be dropped. We assumed that U is Polish just to avoid working with Moore-Smith nets. topology on M, and let U′ be an arbitrary topology on M′. Let R be a measurable random mapping from (M,B(M)) to (M′,B(M′)). Moreover, assume that R is a continuous mapping from (M,U) to P(M′,B(M′)) when the latter space is endowed with the weak-∗ topology. Under these assumptions, the push-forward mapping R# is continuous from P(M,B(M)) to P(M′,B(M′)) under the weak-∗ topology.
Let X be a finite set. A meta-probability measure on X is a probability measure on the Borel sets of ΔX. It is called a meta-probability measure because it is a probability measure on the space of probability distributions on X.
We denote the set of meta-probability measures on X as MP(X). Clearly, MP(X)=P(ΔX).
A meta-probability measure MP on X is said to be balanced if it satisfies
[TABLE]
where πX is the uniform probability distribution on X.
We denote the set of all balanced meta-probability measures on X as MPb(X). The set of all balanced and finitely supported meta-probability measures on X is denoted as MPbf(X).
The following lemma is useful to show the continuity of functions defined on MP(X).
Lemma 5**.**
Let (S,V) be a compact topological space and let f:S×ΔX→R be a continuous function on S×ΔX. The mapping F:S×MP(X)→R defined as
[TABLE]
is continuous, where MP(X) is endowed with the weak-∗ topology.
Let f be a mapping from a finite set X to another finite set X′. f induces a push-forward mapping f# taking probability distributions in ΔX to probability distributions in ΔX′. f# is continuous because ΔX and ΔX′ are endowed with the total variation distance. f# in turn induces another push-forward mapping taking meta-probability measures in MP(X) to meta-probability measures in MP(X′). We denote this mapping as f## and we call it the meta-push-forward mapping induced by f. Since f# is a continuous mapping from ΔX to ΔX′, f## is a continuous mapping from MP(X) to MP(X′) under both the weak-∗ and the total variation topologies.
Let X1 and X2 be two finite sets. Let Mul:ΔX1×ΔX2→ΔX1×X2 be defined as Mul(p1,p2)=p1×p2. For every MP1∈MP(X1) and MP2∈MP(X2), we define the tensor product of MP1 and MP2 as MP1⊗MP2=Mul#(MP1×MP2)∈MP(X1×X2).
Note that since ΔX1, ΔX2 and ΔX1×X2 are endowed with the total variation topology, Mul(p1,p2)=p1×p2 is a continuous mapping from ΔX1×ΔX2 to ΔX1×X2. Therefore, Mul# is a continuous mapping from P(ΔX1×ΔX2) to P(ΔX1×X2)=MP(X1×X2) under both the weak-∗ and the total variation topologies. On the other hand, Appendices B and F imply that the mapping (MP1,MP2)→MP1×MP2 from MP(X1)×MP(X2) to P(ΔX1×ΔX2) is continuous under both the weak-∗ and the total variation topologies. We conclude that the tensor product is continuous under both these topologies.
III The space of equivalent channels
In this section, we summarize the main results of [5].
III-A Space of channels from X to Y
A discrete memoryless channel W is a 3-tuple W=(X,Y,pW) where X is a finite set that is called the input alphabet of W, Y is a finite set that is called the output alphabet of W, and pW:X×Y→[0,1] is a function satisfying ∀x∈X,y∈Y∑pW(x,y)=1.
For every (x,y)∈X×Y, we denote pW(x,y) as W(y∣x), which we interpret as the conditional probability of receiving y at the output, given that x is the input.
Let DMCX,Y be the set of all channels having X as input alphabet and Y as output alphabet.
For every W,W′∈DMCX,Y, define the distance between W and W′ as follows:
[TABLE]
We always endow DMCX,Y with the metric distance dX,Y. This metric makes DMCX,Y a compact path-connected metric space. The metric topology on DMCX,Y that is induced by dX,Y is denoted as TX,Y.
III-B Equivalence between channels
Let W∈DMCX,Y and W′∈DMCX,Z be two channels having the same input alphabet. We say that W′ is degraded from W if there exists a channel V∈DMCY,Z such that
[TABLE]
W and W′ are said to be equivalent if each one is degraded from the other.
Let ΔX and ΔY be the space of probability distributions on X and Y respectively. Define PWo∈ΔY as PWo(y)=∣X∣1x∈X∑W(y∣x) for every y∈Y. The image of W is the set of output-symbols y∈Y having strictly positive probabilities:
[TABLE]
For every y∈Im(W), define Wy−1∈ΔX as follows:
[TABLE]
For every (x,y)∈X×Im(W), we have W(y∣x)=∣X∣PWo(y)Wy−1(x). On the other hand, if x∈X and y∈Y∖Im(W), we have W(y∣x)=0. This shows that PWo and the collection {Wy−1}y∈Im(W) uniquely determine W.
The Blackwell measure333In an earlier version of this work, I called MPW the posterior meta-probability distribution of W. Maxim Raginsky thankfully brought to my attention the fact that MPW is called Blackwell measure. (denoted MPW) of W is a meta-probability measure on X defined as:
[TABLE]
where δWy−1 is a Dirac measure centered at Wy−1.
It is known that a meta-probability measure MP on X is the Blackwell measure of some DMC with input alphabet X if and only if it is balanced and finitely supported [7].
It is also known that two channels W∈DMCX,Y and W′∈DMCX,Z are equivalent if and only if MPW=MPW′ [7].
III-C Space of equivalent channels from X to Y
Let X and Y be two finite sets. Define the equivalence relation RX,Y(o) on DMCX,Y as follows:
[TABLE]
The space of equivalent channels with input alphabet X and output alphabet Y is the quotient of DMCX,Y by the equivalence relation:
[TABLE]
Quotient topology
We define the topology TX,Y(o) on DMCX,Y(o) as the quotient topology TX,Y/RX,Y(o). We always associate DMCX,Y(o) with the quotient topology TX,Y(o).
We have shown in [5] that DMCX,Y(o) is a compact, path-connected and metrizable space.
If Y1 and Y2 are two finite sets of the same size, there exists a canonical homeomorphism between DMCX,Y1(o) and DMCX,Y2(o) [5]. This allows us to identify DMCX,Y(o) with DMCX,[n](o), where n=∣Y∣ and [n]={1,…,n}.
Moreover, for every 1≤n≤m, there exists a canonical subspace of DMCX,[m](o) that is homeomorphic to DMCX,[n](o) [5]. Therefore, we can consider DMCX,[n](o) as a compact subspace of DMCX,[m](o).
Noisiness metric
For every m≥1, let Δ[m]×X be the space of probability distributions on [m]×X. Let Y be a finite set and let W∈DMCX,Y. For every p∈Δ[m]×X, define Pc(p,W) as follows:
[TABLE]
The quantity Pc(p,W) depends only on the RX,Y(o)-equivalence class of W (see [5]). Therefore, if W^∈DMCX,Y(o), we can define Pc(p,W^):=Pc(p,W′) for any W′∈W^.
Define the noisiness distancedX,Y(o):DMCX,Y(o)×DMCX,Y(o)→R+ as follows:
[TABLE]
We have shown in [5] that (DMCX,Y(o),TX,Y(o)) is topologically equivalent to (DMCX,Y(o),dX,Y(o)).
III-D Space of equivalent channels with input alphabet X
The space of channels with input alphabet X is defined as
[TABLE]
We define the equivalence relation RX,∗(o) on DMCX,∗ as follows:
[TABLE]
The space of equivalent channels with input alphabet X is the quotient of DMCX,∗ by the equivalence relation:
[TABLE]
For every n≥1 and every W∈DMCX,[n], we identify the RX,[n](o)-equivalence class of W with the RX,∗(o)-equivalence class of it. This allows us to consider DMCX,[n](o) as a subspace of DMCX,∗(o). Moreover,
[TABLE]
Since any two equivalent channels have the same Blackwell measure, we can define the Blackwell measure of W^∈DMCX,∗(o) as MPW^=MPW′ for any W′∈W^. The rank of W^∈DMCX,∗(o) is the size of the support of its Blackwell measure:
[TABLE]
We have:
[TABLE]
A topology T on DMCX,∗(o) is said to be natural if and only if it induces the quotient topology TX,[n](o) on DMCX,[n](o) for every n≥1.
Every natural topology is σ-compact, separable and path-connected [5]. On the other hand, if ∣X∣≥2, a Hausdorff natural topology is not Baire and it is not locally compact anywhere [5]. This implies that no natural topology can be completely metrized if ∣X∣≥2.
Strong topology on DMCX,∗(o)
We associate DMCX,∗ with the disjoint union topology Ts,X,∗:=n≥1⨁TX,[n]. The space (DMCX,∗,Ts,X,∗) is disconnected, metrizable and σ-compact [5].
The strong topologyTs,X,∗(o) on DMCX,∗(o) is the quotient of Ts,X,∗ by RX,∗(o):
[TABLE]
We call open and closed sets in (DMCX,∗(o),Ts,X,∗(o)) as strongly open and strongly closed sets respectively. If A is a subset of DMCX,∗(o), then A is strongly open if and only if A∩DMCX,[n](o) is open in DMCX,[n](o) for every n≥1. Similarly, A is strongly closed if and only if A∩DMCX,[n](o) is closed in DMCX,[n](o) for every n≥1.
We have shown in [5] that Ts,X,∗(o) is the finest natural topology. The strong topology is sequential, compactly generated, and T4 [5]. On the other hand, if ∣X∣≥2, the strong topology is not first-countable anywhere [5], hence it is not metrizable.
Noisiness metric
Define the noisiness metric on DMCX,∗(o) as follows:
[TABLE]
dX,∗(o)(W^,W^′) is well defined because dX,[n](o)(W^,W^′) does not depend on n≥1 as long as W^,W^′∈DMCX,[n](o). We can also express dX,∗(o) as follows:
[TABLE]
The metric topology on DMCX,∗(o) that is induced by dX,∗(o) is called the noisiness topology on DMCX,∗(o), and it is denoted as TX,∗(o). We have shown in [5] that TX,∗(o) is a natural topology that is strictly coarser than Ts,X,∗(o).
Topologies from Blackwell measures
The mapping W^→MPW^ is a bijection from DMCX,∗(o) to MPbf(X). We call this mapping the canonical bijection from DMCX,∗(o) to MPbf(X).
Since ΔX is a metric space, there are many standard ways to construct topologies on MP(X). If we choose any of these standard topologies on MP(X) and then relativize it to the subspace MPbf(X), we can construct topologies on DMCX,∗(o) through the canonical bijection.
In [5], we studied the weak-∗ and the total variation topologies. We showed that the weak-∗ topology is exactly the same as the noisiness topology.
The total-variation metric distancedTV,X,∗(o) on DMCX,∗(o) is defined as
[TABLE]
The total-variation topologyTTV,X,∗(o) is the metric topology that is induced by dTV,X,∗(o) on DMCX,∗(o). We proved in [5] that if ∣X∣≥2, we have:
•
TTV,X,∗(o) is not natural nor Baire, hence it is not completely metrizable.
•
TTV,X,∗(o) is not locally compact anywhere.
IV Channel parameters and operations
IV-A Useful parameters
Let ΔX be the space of probability distributions on X. For every p∈ΔX and every W∈DMCX,Y, define I(p,W) as the mutual information I(X;Y), where X is distributed as p and Y is the output of W when X is the input. The mutual information is computed using the natural logarithm. The capacity of W is defined as C(W)=p∈ΔXsupI(p,W).
For every p∈ΔX, the error probability of the MAP decoder of W under prior p is defined as:
[TABLE]
Clearly, 0≤Pe(p,W)≤1.
For every W∈DMCX,Y, define the Bhattacharyya parameter of W as:
[TABLE]
It is easy to see that 0≤Z(W)≤1.
It was shown in [8] and [9] that 41Z(W)2≤Pe(πX,W)≤(∣X∣−1)Z(W), where πX is the uniform distribution on X.
An (n,M)-codeC on the alphabet X is a subset of Xn such that ∣C∣=M. The integer n is the blocklength of C, and M is the size of the code. The rate of C is n1logM, and it is measured in nats. The error probability of the ML decoder for the code C when it is used for a channel W∈DMCX,Y is given by:
[TABLE]
The optimal error probability of (n,M)-codes for a channel W is given by:
[TABLE]
The following proposition shows that all the above parameters are continuous:
Proposition 2**.**
We have:
•
I:ΔX×DMCX,Y→R+* is continuous, concave in p, and convex in W.*
•
C:DMCX,Y→R+* is continuous and convex.*
•
Pe:ΔX×DMCX,Y→[0,1]* is continuous, concave in p and concave in W.*
•
Z:DMCX,Y→[0,1]* is continuous.*
•
For every code C on X, Pe,C:DMCX,Y→[0,1] is continuous.
•
For every n>0 and every 1≤M≤∣X∣n, the mapping Pe,n,M:DMCX,Y→[0,1] is continuous.
Proof.
These facts are well known, especially the continuity of I, its concavity in p, and its convexity in W [10]. Since C is the supremum of a family of mappings that are convex in W, it is also convex in W. For a proof of the continuity of C, see Appendix G. The continuity of Z, Pe and Pe,C follows immediately from their definitions. Moreover, since Pe,n,M is the minimum of a finite number of continuous mappings, it is continuous. The concavity of Pe in p and in W can also be easily seen from the definition.
∎
IV-B Channel operations
If W∈DMCX,Y and V∈DMCY,Z, we define the composition V∘W∈DMCX,Z of W and V as follows:
[TABLE]
For every function f:X→Y, define the deterministic channelDf∈DMCX,Y as follows:
[TABLE]
It is easy to see that if f:X→Y and g:Y→Z, then Dg∘Df=Dg∘f.
For every two channels W1∈DMCX1,Y1 and W2∈DMCX2,Y2, define the channel sumW1⊕W2∈DMCX1∐X2,Y1∐Y2 of W1 and W2 as:
[TABLE]
W1⊕W2 arises when the transmitter has two channels W1 and W2 at his disposal and he can use exactly one of them at each channel use. It is an easy exercise to check that eC(W1⊕W2)=eC(W1)+eC(W2) (remember that we compute the mutual information using the natural logarithm).
We define the channel productW1⊗W2∈DMCX1×X2,Y1×Y2 of W1 and W2 as:
[TABLE]
W1⊗W2 arises when the transmitter has two channels W1 and W2 at his disposal and he uses both of them at each channel use. It is an easy exercise to check that C(W1⊗W2)=C(W1)+C(W2), or equivalently eC(W1⊗W2)=eC(W1)⋅eC(W2). Channel sums and products were first introduced by Shannon in [11].
For every W1∈DMCX,Y1, W2∈DMCX,Y2 and every 0≤α≤1, we define the α-interpolation [αW1,(1−α)W2]∈DMCX,Y1∐Y2 between W1 and W2 as:
[TABLE]
Channel interpolation arises when a channel behaves as W1 with probability α and as W2 with probability 1−α. The transmitter has no control on which behavior the channel chooses, but on the other hand, the receiver knows which one was chosen. Channel interpolations were used in [12] to construct interpolations between polar codes and Reed-Muller codes.
Now fix a binary operation ∗ on X. For every W∈DMCX,Y, define W−∈DMCX,Y2 and W+∈DMCX,Y2×X as:
[TABLE]
and
[TABLE]
These operations generalize Arıkan’s polarization transformations [13].
Proposition 3**.**
We have:
•
The mapping (W,V)→V∘W from DMCX,Y×DMCY,Z to DMCX,Z is continuous.
•
The mapping (W1,W2)→W1⊕W2 from DMCX1,Y1×DMCX2,Y2 to DMCX1∐X2,Y1∐Y2 is continuous.
•
The mapping (W1,W2)→W1⊗W2 from DMCX1,Y1×DMCX2,Y2 to DMCX1×X2,Y1×Y2 is continuous.
•
The mapping (W1,W2,α)→[αW1,(1−α)W2] from DMCX,Y1×DMCX,Y2×[0,1] to DMCX,Y1∐Y2 is continuous.
•
For any binary operation ∗ on X, the mapping W→W− from DMCX,Y to DMCX,Y2 is continuous.
•
For any binary operation ∗ on X, the mapping W→W+ from DMCX,Y to DMCX,Y2×X is continuous.
Proof.
The continuity immediately follows from the definitions.
∎
V Continuity on DMCX,Y(o)
It is well known that the parameters defined in section IV-A depend only on the RX,Y(o)-equivalence class of W. Therefore, we can define those parameters for any W^∈DMCX,Y(o) through the transcendent mapping (defined in Lemma 2). The following proposition shows that those parameters are continuous on DMCX,Y(o):
Proposition 4**.**
We have:
•
I:ΔX×DMCX,Y(o)→R+* is continuous and concave in p.*
•
C:DMCX,Y(o)→R+* is continuous.*
•
Pe:ΔX×DMCX,Y(o)→[0,1]* is continuous and concave in p.*
•
Z:DMCX,Y(o)→[0,1]* is continuous.*
•
For every code C on X, Pe,C:DMCX,Y(o)→[0,1] is continuous.
•
For every n>0 and every 1≤M≤∣X∣n, the mapping Pe,n,M:DMCX,Y(o)→[0,1] is continuous.
Proof.
Since the corresponding parameters are continuous on DMCX,Y (Proposition 2), Lemma 2 implies that they are continuous on DMCX,Y(o). The only cases that need a special treatment are those of I and Z. We will only prove the continuity of I since the proof of continuity of Z is similar.
Define the relation R on ΔX×DMCX,Y as
[TABLE]
It is easy to see that I(p,W) depends only on the R-equivalence class of (p,W). Since I is continuous on ΔX×DMCX,Y, Lemma 2 implies that the transcendent mapping of I is continuous on (ΔX×DMCX,Y)/R. On the other hand, since ΔX is locally compact, Theorem 1 implies that (ΔX×DMCX,Y)/R can be identified with ΔX×(DMCX,Y/RX,Y(o))=ΔX×DMCX,Y(o) and the two spaces have the same topology. Therefore, I is continuous on ΔX×DMCX,Y(o).
∎
With the exception of channel composition, all the channel operations that were defined in Section IV-B can also be “quotiented”. We just need to realize that the equivalence class of the resulting channel depends only on the equivalence classes of the channels that were used in the operation. Let us illustrate this in the case of channel sums:
Let W1,W1′∈DMCX1,Y1 and W2,W2′∈DMCX2,Y2 and assume that W1 is degraded from W1′ and W2 is degraded from W2′. There exists V1∈DMCY1,Y1 and V2∈DMCY2,Y2 such that W1=V1∘W1′ and W2=V2∘W2′. It is easy to see that W1⊕W2=(V1⊕V2)∘(W1′⊕W2′), which shows that W1⊕W2 is degraded from W1′⊕W2′. This was proved by Shannon in [14].
Therefore, if W1 is equivalent to W1′ and W2 is equivalent to W2′, then W1⊕W2 is equivalent to W1′⊕W2′. This allows us to define the channel sum for every W^1∈DMCX1,Y1(o) and every W2∈DMCX2,Y2(o) as W^1⊕W2=W1′⊕W2′∈DMCX1∐X2,Y1∐Y2(o) for any W1′∈W^1 and any W2′∈W2, where W1′⊕W2′ is the RX1∐X2,Y1∐Y2(o)-equivalence class of W1′⊕W2′.
With the exception of channel composition, we can “quotient” all the channel operations of Section IV-B in a similar fashion. Moreover, we can show that they are continuous:
Proposition 5**.**
We have:
•
The mapping (W^1,W2)→W^1⊕W2 from DMCX1,Y1(o)×DMCX2,Y2(o) to DMCX1∐X2,Y1∐Y2(o) is continuous.
•
The mapping (W^1,W2)→W^1⊗W2 from DMCX1,Y1(o)×DMCX2,Y2(o) to DMCX1×X2,Y1×Y2(o) is continuous.
•
The mapping (W^1,W2,α)→[αW^1,(1−α)W2] from DMCX,Y1(o)×DMCX,Y2(o)×[0,1] to DMCX,Y1∐Y2(o) is continuous.
•
For any binary operation ∗ on X, the mapping W^→W^− from DMCX,Y(o) to DMCX,Y2(o) is continuous.
•
For any binary operation ∗ on X, the mapping W^→W^+ from DMCX,Y(o) to DMCX,Y2×X(o) is continuous.
Proof.
We only prove the continuity of the channel sum because the proof of continuity of the other operations is similar.
Let Proj:DMCX1∐X2,Y1∐Y2→DMCX1∐X2,Y1∐Y2(o) be the projection onto the RX1∐X2,Y1∐Y2(o)-equivalence classes. Define the mapping f:DMCX1,Y1×DMCX2,Y2→DMCX1∐X2,Y1∐Y2(o) as f(W1,W2)=Proj(W1⊕W2). Clearly, f is continuous.
Now define the equivalence relation R on DMCX1,Y1×DMCX2,Y2 as:
[TABLE]
The discussion before the proposition shows that f(W1,W2)=Proj(W1⊕W2) depends only on the R-equivalence class of (W1,W2). Lemma 2 now shows that the transcendent map of f defined on (DMCX1,Y1×DMCX2,Y2)/R is continuous.
Notice that (DMCX1,Y1×DMCX2,Y2)/R can be identified with DMCX1,Y1(o)×DMCX2,Y2(o). Therefore, we can define f on DMCX1,Y1(o)×DMCX2,Y2(o) through this identification. Moreover, since DMCX1,Y1 and DMCX2,Y2(o) are locally compact and Hausdorff, Corollary 1 implies that the canonical bijection between (DMCX1,Y1×DMCX2,Y2)/R and DMCX1,Y1(o)×DMCX2,Y2(o) is a homeomorphism.
Now since the mapping f on DMCX1,Y1(o)×DMCX2,Y2(o) is just the channel sum, we conclude that the mapping (W^1,W2)→W^1⊕W2 from DMCX1,Y1(o)×DMCX2,Y2(o) to DMCX1∐X2,Y1∐Y2(o) is continuous.
∎
VI Continuity in the strong topology
The following lemma provides a way to check whether a mapping defined on (DMCX,∗(o),Ts,X,∗(o)) is continuous:
Lemma 6**.**
Let (S,V) be an arbitrary topological space. A mapping f:DMCX,∗(o)→S is continuous on (DMCX,∗(o),Ts,X,∗(o)) if and only if it is continuous on (DMCX,[n](o),TX,[n](o)) for every n≥1.
Proof.
[TABLE]
∎
Since the channel parameters I, C, Pe, Z, Pe,C and Pe,n,M are defined on DMCX,[n](o) for every n≥1 (see Section V), they are also defined on DMCX,∗(o)=n≥1⋃DMCX,[n](o). The following proposition shows that those parameters are continuous in the strong topology:
Proposition 6**.**
Let UX be the standard topology on ΔX. We have:
•
I:ΔX×DMCX,∗(o)→R+* is continuous on (ΔX×DMCX,∗(o),UX⊗Ts,X,∗(o)) and concave in p.*
•
C:DMCX,∗(o)→R+* is continuous on (DMCX,∗(o),Ts,X,∗(o)).*
•
Pe:ΔX×DMCX,∗(o)→[0,1]* is continuous on (ΔX×DMCX,∗(o),UX⊗Ts,X,∗(o)) and concave in p.*
•
Z:DMCX,∗(o)→[0,1]* is continuous on (DMCX,∗(o),Ts,X,∗(o)).*
•
For every code C on X, Pe,C:DMCX,∗(o)→[0,1] is continuous on (DMCX,∗(o),Ts,X,∗(o)).
•
For every n>0 and every 1≤M≤∣X∣n, the mapping Pe,n,M:DMCX,∗(o)→[0,1] is continuous on (DMCX,∗(o),Ts,X,∗(o)).
Proof.
The continuity of C,Z,Pe,C and Pe,n,M immediately follows from Proposition 4 and Lemma 6. Since the proofs of continuity of I and Z are similar, we only prove the continuity for I.
Due to the distributivity of the product with respect to disjoint unions, we have
[TABLE]
and
[TABLE]
Therefore, (ΔX×DMCX,∗,UX⊗Ts,X,∗) is the disjoint union of the spaces (ΔX×DMCX,[n])n≥1. Moreover, I is continuous on ΔX×DMCX,[n] for every n≥1. We conclude that I is continuous on (ΔX×DMCX,∗,UX⊗Ts,X,∗).
Define the relation R on ΔX×DMCX,∗ as follows: (p1,W1)R(p2,W2) if and only if p1=p2 and W1RX,∗(o)W2. Since I(p,W) depends only on the R-equivalence class of (p,W), Lemma 2 shows that the transcendent map of I is a continuous mapping from \big{(}(\Delta_{\mathcal{X}}\times\operatorname*{DMC}_{\mathcal{X},\ast})/R,(\mathcal{U}_{\mathcal{X}}\otimes\mathcal{T}_{s,\mathcal{X},\ast})/R\big{)} to R+. On the other hand, since ΔX is locally compact and Hausdorff, Theorem 1 implies that \big{(}(\Delta_{\mathcal{X}}\times\operatorname*{DMC}_{\mathcal{X},\ast})/R,(\mathcal{U}_{\mathcal{X}}\otimes\mathcal{T}_{s,\mathcal{X},\ast})/R\big{)} can be identified with \big{(}\Delta_{\mathcal{X}}\times(\operatorname*{DMC}_{\mathcal{X},\ast}/R_{\mathcal{X},\ast}^{(o)}),\mathcal{U}_{\mathcal{X}}\otimes(\mathcal{T}_{s,\mathcal{X},\ast}/R_{\mathcal{X},\ast}^{(o)})\big{)}=(\Delta_{\mathcal{X}}\times\operatorname*{DMC}_{\mathcal{X},\ast}^{(o)},\mathcal{U}_{\mathcal{X}}\otimes\mathcal{T}_{s,\mathcal{X},\ast}^{(o)}). Therefore, I is continuous on (ΔX×DMCX,∗(o),UX⊗Ts,X,∗(o)).
∎
It is also possible to extend the definition of all the channel operations that were defined in section V to DMCX,∗(o). Moreover, it is possible to show that many channel operations are continuous in the strong topology:
Proposition 7**.**
Assume that all equivalent channel spaces are endowed with the strong topology. We have:
•
The mapping (W^1,W2)→W^1⊕W2 from DMCX1,∗(o)×DMCX2,Y2(o) to DMCX1∐X2,∗(o) is continuous.
•
The mapping (W^1,W2)→W^1⊗W2 from DMCX1,∗(o)×DMCX2,Y2(o) to DMCX1×X2,∗(o) is continuous.
•
The mapping (W^1,W2,α)→[αW^1,(1−α)W2] from DMCX,∗×DMCX,Y2(o)×[0,1] to DMCX,∗(o) is continuous.
•
For any binary operation ∗ on X, the mapping W^→W^− from DMCX,∗(o) to DMCX,∗(o) is continuous.
•
For any binary operation ∗ on X, the mapping W^→W^+ from DMCX,∗(o) to DMCX,∗(o) is continuous.
Proof.
We only prove the continuity of the channel interpolation because the proof of the continuity of other operations is similar.
Let U be the standard topology on [0,1]. Due to the distributivity of the product with respect to disjoint unions, we have:
[TABLE]
and
[TABLE]
Therefore, the space DMCX,∗×DMCX,Y2×[0,1] is the topological disjoint union of the spaces (DMCX,[n]×DMCX,Y2×[0,1])n≥1.
For every n≥1, let Projn be the projection onto the RX,[n]∐Y2(o)-equivalence classes and let in be the canonical injection from DMCX,[n]∐Y2(o) to DMCX,∗(o).
Define the mapping f:DMCX,∗×DMCX,Y2×[0,1]→DMCX,∗(o) as
[TABLE]
where n is the unique integer satisfying W1∈DMCX,[n]. W^1 and W2 are the RX,[n](o) and RX,Y2(o)-equivalence classes of W1 and W2 respectively.
Due to Proposition 3 and due to the continuity of Projn and in, the mapping f is continuous on DMCX,[n]×DMCX,Y2×[0,1] for every n≥1. Therefore, f is continuous on (DMCX,∗×DMCX,Y2×[0,1],Ts,X,∗⊗TX,Y2⊗U).
Let R′ be the equivalence relation defined on DMCX,∗×DMCX,Y2 as follows: (W1,W2)R′(W1′,W2′) if and only if W1RX,∗(o)W1′ and W2RX,Y2(o)W2′. Also, define the equivalence relation R on DMCX,∗×DMCX,Y2×[0,1] as follows: (W1,W2,α)R(W1′,W2′,α′) if and only if (W1,W2)R′(W1′,W2′) and α=α′.
Since f(W1,W2,α) depends only on the R-equivalence class of (W1,W2,α), Lemma 2 implies that the transcendent mapping of f is continuous on (DMCX,∗×DMCX,Y2×[0,1])/R.
Since [0,1] is Hausdorff and locally compact, Theorem 1 implies that the canonical bijection from (DMCX,∗×DMCX,Y2×[0,1])/R to \big{(}(\operatorname*{DMC}_{\mathcal{X},\ast}\times\operatorname*{DMC}_{\mathcal{X},\mathcal{Y}_{2}})/R^{\prime}\big{)}\times[0,1]) is a homeomorphism. On the other hand, since (DMCX,∗,Ts,X,∗) and DMCX,Y2(o)=DMCX,Y2/RX,Y2(o) are Hausdorff and locally compact, Corollary 1 implies that the canonical bijection from DMCX,∗(o)×DMCX,Y2(o) to (DMCX,∗×DMCX,Y2)/R′ is a homeomorphism. We conclude that the channel interpolation is continuous on (DMCX,∗(o)×DMCX,Y2(o)×[0,1],Ts,X,∗(o)⊗TX,Y(o)⊗U).
∎
Corollary 3**.**
(DMCX,∗(o),Ts,X,∗(o))* is strongly contractible to every point in DMCX,∗(o).*
Proof.
Fix W^0∈DMCX,∗(o). Define the mapping H:DMCX,∗(o)×[0,1]→DMCX,∗(o) as H(W^,α)=[αW^0,(1−α)W^]. H is continuous by Proposition 7. We also have H(W^,0)=W^ and H(W^,1)=W^0 for every W^∈DMCX,∗(o). Moreover, H(W^0,α)=W^0 for every 0≤α≤1. Therefore, (DMCX,∗(o),Ts,X,∗(o)) is strongly contractible to every point in DMCX,∗(o).
∎
The reader might be wondering why channel operations such as the channel sum were not shown to be continuous on the whole space DMCX1,∗(o)×DMCX2,∗(o) instead of the smaller space DMCX1,∗(o)×DMCX2,Y2(o). The reason is because we cannot apply Corollary 1 to DMCX1,∗×DMCX2,∗ and DMCX1,∗(o)×DMCX2,∗(o) since neither DMCX1,∗(o) nor DMCX2,∗(o) is locally compact (under the strong topology).
One potential method to show the continuity of the channel sum on (DMCX1,∗(o)×DMCX2,∗(o),Ts,X1,∗(o)⊗Ts,X2,∗(o)) is as follows: let R be the equivalence relation on DMCX1,∗×DMCX2,∗ defined as (W1,W2)R(W1′,W2′) if and only if W1RX1,∗(o)W1′ and W2RX2,∗(o)W2′. We can identify (DMCX1,∗×DMCX2,∗)/R with DMCX1,∗(o)×DMCX2,∗(o) through the canonical bijection. Using Lemma 2, it is easy to see that the mapping (W^1,W2)→W^1⊕W2 is continuous from \big{(}\operatorname*{DMC}_{\mathcal{X}_{1},\ast}^{(o)}\times\operatorname*{DMC}_{\mathcal{X}_{2},\ast}^{(o)},(\mathcal{T}_{s,\mathcal{X}_{1},\ast}\otimes\mathcal{T}_{s,\mathcal{X}_{2},\ast})/R\big{)} to (DMCX1∐X2,∗(o),Ts,X1∐X2,∗(o)).
It was shown in [15] that the topology (Ts,X1,∗⊗Ts,X2,∗)/R is homeomorphic to κ(Ts,X1,∗(o)⊗Ts,X2,∗(o)) through the canonical bijection, where κ(Ts,X1,∗(o)⊗Ts,X2,∗(o)) is the coarsest topology that is both compactly generated and finer than Ts,X1,∗(o)⊗Ts,X2,∗(o). Therefore, the mapping (W^1,W2)→W^1⊕W2 is continuous on \big{(}\operatorname*{DMC}_{\mathcal{X}_{1},\ast}^{(o)}\times\operatorname*{DMC}_{\mathcal{X}_{2},\ast}^{(o)},\kappa(\mathcal{T}_{s,\mathcal{X}_{1},\ast}^{(o)}\otimes\mathcal{T}_{s,\mathcal{X}_{2},\ast}^{(o)})\big{)}. This means that if Ts,X1,∗(o)⊗Ts,X2,∗(o) is compactly generated, we will have Ts,X1,∗(o)⊗Ts,X2,∗(o)=κ(Ts,X1,∗(o)⊗Ts,X2,∗(o)) and so the channel sum will be continuous on (DMCX1,∗(o)×DMCX2,∗(o),Ts,X1,∗(o)⊗Ts,X2,∗(o)). Note that although Ts,X1,∗(o) and Ts,X2,∗(o) are compactly generated, their product Ts,X1,∗(o)⊗Ts,X2,∗(o) might not be compactly generated.
VII Continuity in the noisiness/weak-∗ and the total variation topologies
We need to express the channel parameters and operations in terms of the Blackwell measures.
VII-A Channel parameters
The following proposition shows that many channel parameters can be expressed as an integral of a continuous function with respect to the Blackwell measure:
Proposition 8**.**
For every W^∈DMCX,∗(o), we have:
[TABLE]
[TABLE]
[TABLE]
[TABLE]
where H(p) is the entropy of p, and MPW^n is the product measure on ΔXn obtained by multiplying MPW^ with itself n times. Note that we adopt the standard convention that 0log00=0.
Proof.
By choosing any representative channel W∈W^ and replacing W(y∣x) by ∣X∣PWo(y)Wy−1(x) in the definitions of the channel parameters, all the above formulas immediately follow. Let us show how this works for Pe:
[TABLE]
where (a) is true because W(y∣x)=0 for y∈/Im(W).
∎
Proposition 9**.**
Let UX be the standard topology on ΔX. We have:
•
I:ΔX×DMCX,∗(o)→R+* is continuous on (ΔX×DMCX,∗(o),UX⊗TX,∗(o)) and concave in p.*
•
C:DMCX,∗(o)→R+* is continuous on (DMCX,∗(o),TX,∗(o)).*
•
Pe:ΔX×DMCX,∗(o)→[0,1]* is continuous on (ΔX×DMCX,∗(o),UX⊗TX,∗(o)) and concave in p.*
•
Z:DMCX,∗(o)→[0,1]* is continuous on (DMCX,∗(o),TX,∗(o)).*
•
For every code C on X, Pe,C:DMCX,∗(o)→[0,1] is continuous on (DMCX,∗(o),TX,∗(o)).
•
For every n>0 and every 1≤M≤∣X∣n, the mapping Pe,n,M:DMCX,∗(o)→[0,1] is continuous on (DMCX,∗(o),TX,∗(o)).
Proof.
We associate the space MP(X) with the weak-∗ topology. Define the mapping
[TABLE]
as follows:
[TABLE]
Lemma 5 implies that I is continuous. On the other hand, Proposition 8 shows that I(p,W^)=I(p,MPW^). Therefore, I is continuous on (ΔX×DMCX,∗(o),UX⊗TX,∗(o)). We can prove the continuity of Pe and Z similarly.
Now define the mapping C:MP(X)→R as
[TABLE]
Fix MP∈MP(X) and let ϵ>0. Since MP(X) is compact (under the weak-∗ topology), Lemma 1 implies the existence of a weakly-∗ open neighborhood UMP of MP such that ∣I(p,MP)−I(p,MP′)∣<ϵ for every MP′∈UMP and every p∈ΔX. Therefore, for every MP′∈UMP and every p∈ΔX, we have
[TABLE]
hence,
[TABLE]
Similarly, we can show that C(MP′)≤C(MP)+ϵ. This shows that ∣C(MP′)−C(MP)∣≤ϵ for every MP′∈UMP. Therefore, C is continuous. But C(W^)=C(MPW^), so C is continuous on (DMCX,∗(o),TX,∗(o)).
Now for every 0≤i≤n, define the mapping fi:ΔXi×MP(X)→R backward-recursively as follows:
•
fn(p1n,MP)=x1n∈Cmax{i=1∏npi(xi)}.
•
For every 0≤i<n, define
[TABLE]
Clearly fn is continuous. Now let 0≤i<n and assume that fi+1 is continuous. If we let S=ΔXi×MP(X), Lemma 5 implies that the mapping Fi:ΔXi×MP(X)×MP(X) defined as
[TABLE]
is continuous. But fi(p1i,MP)=Fi(p1i,MP,MP), so fi is also continuous. Therefore, f0 is continuous. By noticing that Pe,C(W^)=1−∣C∣∣X∣nf0(MPW^), we conclude that Pe,C is continuous on (DMCX,∗(o),TX,∗(o)). Moreover, since Pe,n,M is the minimum of a finite family of continuous mappings, it is continuous.
∎
It is worth mentioning that Proposition 6 can be shown from Proposition 9 because the noisiness topology is coarser than the strong topology.
Corollary 4**.**
All the mappings in Proposition 9 are also continuous if we replace the noisiness topology TX,∗(o) with the total variation topology TTV,X,∗(o).
Proof.
This is true because TTV,X,∗(o) is finer than TX,∗(o).
∎
VII-B Channel operations
In the following, we show that we can express the channel operations in terms of Blackwell measures. We have all the tools to achieve this for the channel sum, channel product and channel interpolation. In order to express the channel polarization transformations in terms of the Blackwell measures, we need to introduce new definitions.
Let X be a finite set and let ∗ be a binary operation on a finite set X. We say that ∗ is uniformity preserving if the mapping (a,b)→(a∗b,b) is a bijection from X2 to itself [16]. For every a,b∈X, we denote the unique element c∈X satisfying c∗b=a as c=a/∗b. Note that /∗ is a binary operation and it is uniformity preserving. /∗ is called the right-inverse of ∗. It was shown in [9] that a binary operation is polarizing if and only if it is uniformity preserving and its inverse is strongly ergodic.
Binary operations that are not uniformity preserving are not interesting for polarization theory because they do not preserve the symmetric capacity [9]. Therefore, we will only focus on polarization transformations that are based on uniformity preserving operations.
Let ∗ be a fixed uniformity preserving operation on X. Define the mapping C−,∗:ΔX×ΔX→ΔX as
[TABLE]
The probability distribution C−,∗(p1,p2) can be interpreted as follows: let X1 and X2 be two independent random variables in X that are distributed as p1 and p2 respectively, and let (U1,U2) be the random pair in X2 defined as (U1,U2)=(X1/∗X2,X2), or equivalently (X1,X2)=(U1∗U2,U2). C−,∗(p1,p2) is the probability distribution of U1.
Clearly, C−,∗ is continuous. Therefore, the push-forward mapping C#−,∗ is continuous from P(ΔX×ΔX) to P(ΔX)=MP(X) under both the weak-∗ and the total variation topologies (see Section II-F). For every MP1,MP2∈MP(X), we define the (−,∗)-convolution of MP1 and MP2 as:
[TABLE]
Since the product of meta-probability measures is continuous under both the weak-∗ and the total variation topologies (Appendices B and F), the (−,∗)-convolution is also continuous under these topologies.
For every p1,p2∈ΔX and every u1∈supp(C−,∗(p1,p2)), define C+,u1,∗(p1,p2)∈ΔX as
[TABLE]
The probability distribution C+,u1,∗(p1,p2) can be interpreted as follows: if X1,X2,U1 and U2 are as above, C+,u1,∗(p1,p2) is the conditional probability distribution of U2 given U1=u1.
Define the mapping C+,∗:ΔX×ΔX→P(ΔX)=MP(X) as follows:
[TABLE]
where δC+,u1,∗(p1,p2) is a Dirac measure centered at C+,u1,∗(p1,p2).
If X1,X2,U1 and U2 are as above, C+,∗(p1,p2) is the meta-probability measure that describes the possible conditional probability distributions of U2 that are seen by someone having knowledge of U1. Clearly, C+,∗ is a random mapping from ΔX×ΔX to ΔX. In Appendix H, we show that C+,∗ is a measurable random mapping. We also show in Appendix H that C+,∗ is a continuous mapping from ΔX×ΔX to MP(X) when the latter space is endowed with the weak-∗ topology. Lemmas 3 and 4 now imply that the push-forward mapping C#+,∗ is continuous under both the weak-∗ and the total variation topologies.
For every MP1,MP2∈MP(X), we define the (+,∗)-convolution of MP1 and MP2 as:
[TABLE]
Since the product of meta-probability measures is continuous under both the weak-∗ and the total variation topologies (Appendices B and F), the (+,∗)-convolution is also continuous under these topologies.
Proposition 10**.**
We have:
•
For every W^1∈DMCX1,∗(o) and W2∈DMCX2,∗(o), we have:
[TABLE]
where MPW^1′ (respectively MPW^2′) is the meta-push-forward of MPW^1 (respectively MPW^2) by the canonical injection from X1 (respectively X2) to X1∐X2.
•
For every W^1∈DMCX1,∗(o) and W2∈DMCX2,∗(o), we have:
[TABLE]
•
For every α∈[0,1] and every W^1,W^2∈DMCX,∗(o), we have
[TABLE]
•
For every uniformity preserving binary operation ∗ on X, and every W^∈DMCX,∗(o), we have
[TABLE]
•
For every uniformity preserving binary operation ∗ on X, and every W^∈DMCX,∗(o), we have
Note that the polarization transformation formulas in Proposition 10 generalize the formulas given by Raginsky in [17] for binary-input channels.
Proposition 11**.**
Assume that all equivalent channel spaces are endowed with the noisiness/weak-∗ or the total variation topology. We have:
•
The mapping (W^1,W2)→W^1⊕W2 from DMCX1,∗(o)×DMCX2,∗(o) to DMCX1∐X2,∗(o) is continuous.
•
The mapping (W^1,W2)→W^1⊗W2 from DMCX1,∗(o)×DMCX2,∗(o) to DMCX1×X2,∗(o) is continuous.
•
The mapping (W^1,W2,α)→[αW^1,(1−α)W2] from DMCX,∗×DMCX,∗(o)×[0,1] to DMCX,∗(o) is continuous.
•
For every uniformity preserving binary operation ∗ on X, the mapping W^→W^− from DMCX,∗(o) to DMCX,∗(o) is continuous.
•
For every uniformity preserving binary operation ∗ on X, the mapping W^→W^+ from DMCX,∗(o) to DMCX,∗(o) is continuous.
Proof.
The proposition directly follows from Proposition 10 and the fact that all the meta-probability measure operations that are involved in the formulas are continuous under both the weak-∗ and the total variation topologies.
∎
Corollary 5**.**
Both (DMCX,∗(o),TX,∗(o)) and (DMCX,∗(o),TTV,X,∗(o)) are strongly contractible to every point in DMCX,∗(o).
Sections V and VI show that the quotient topology is relatively easy to work with. If one is interested in the space of equivalent channels sharing the same input and output alphabets, then using the quotient formulation of the topology seems to be the easiest way to prove theorems.
The continuity of the channel sum and the channel product on the whole product space (DMCX1,∗(o)×DMCX2,∗(o),Ts,X1,∗(o)⊗Ts,X2,∗(o)) remains an open problem. As we mentioned in Section VI, it is sufficient to prove that the product topology Ts,X1,∗(o)⊗Ts,X2,∗(o) is compactly generated.
Acknowledgment
I would like to thank Emre Telatar and Mohammad Bazzi for helpful discussions. I am also grateful to Maxim Raginsky for his comments.
Fix ϵ>0 and let (s,t)∈S×T. Since f is continuous, there exists a neighborhood Os,t of (s,t) in S×T such that for every (s′,t′)∈Os,t, we have ∣f(s′,t′)−f(s,t)∣<2ϵ. Moreover, since products of open sets form a base for the product topology, there exists an open neighborhood Vs,t of s in (S,V) and an open neighborhood Us,t of t in T such that Vs,t×Us,t⊂Os,t.
Since (S,V) and (T,U) are compact, the product space is also compact. On the other hand, we have (s,t)∈S×T⋃Vs,t×Us,t=S×T so {Vs,t×Us,t}(s,t)∈S×T is an open cover of S×T. Therefore, there exist s1,…,sn∈S and t1,…,tn∈T such that i=1⋃nVsi,ti×Usi,ti=S×T.
Now fix s∈S and define Vs=1≤i≤n,s∈Vsi,ti⋂Vsi,ti. Since Vs is the intersection of finitely many open sets containing s, Vs is an open neighborhood of s in (S,V). Let s′∈Vs and t∈T. Since i=1⋃nVsi,ti×Usi,ti=S×T, there exists 1≤i≤n such that (s,t)∈Vsi,ti×Usi,ti⊂Osi,ti. Since s∈Vsi,ti, we have Vs⊂Vsi,ti and so s′∈Vsi,ti. Therefore, (s′,t)∈Vsi,ti×Usi,ti⊂Osi,ti, hence
[TABLE]
But this is true for every t∈T. Therefore,
[TABLE]
Appendix B Continuity of the product of measures
For every subset A of M1×M2 and every x1∈M1, define A2x1={x2∈M2:(x1,x2)∈A}. Similarly, for every x2∈M2, define A1x2={x1∈M1:(x1,x2)∈A}. Let P1,P1′∈P(M1,Σ1) and P2,P2′∈P(M2,Σ2). We have:
[TABLE]
This shows that the product of measures is continuous under the total variation topology.
For every n≥0, define the mapping gn:M′→R+ as follows:
[TABLE]
Clearly, for every y∈M′ we have:
•
gn(y)≤g(y) for all n≥0.
•
gn(y)≤gn+1(y) for all n≥0.
•
n→∞limgn(y)=g(y).
Moreover, for every fixed n≥0, we have:
•
gn is Σ′-measurable.
•
gn takes values in {2ni:0≤i≤n2n}.
For every 0≤i≤n2n, let Bi,n={y∈M′:gn(y)=2ni}. Since gn is Σ′-measurable, we have Bi,n∈Σ′ for every 0≤i≤n2n. Now for every n≥0, define the mapping Gn:M→R∪{+∞} as follows:
[TABLE]
Since the random mapping R is measurable and since Bi,n∈Σ′, the mapping RBi,n is Σ-measurable for every 0≤i≤n2n. Therefore, Gn is Σ-measurable for every n≥0. Moreover, for every x∈Σ, we have:
[TABLE]
where (a) follows from the monotone convergence theorem. We conclude that G is Σ-measurable because it is the point-wise limit of Σ-measurable functions. On the other hand, we have
[TABLE]
Therefore,
[TABLE]
where (a) and (b) follow from the monotone convergence theorem.
Appendix D Continuity of the push-forward by a random mapping
Let R be a measurable random mapping from (M,Σ) to (M′,Σ′). Let P1,P2∈P(M,Σ). Define the signed measure μ=P1−P2 and let {μ+,μ−} be the Jordan measure decomposition of μ. It is easy to see that ∥P1−P2∥TV=μ+(M)=μ−(M). For every B∈Σ′, we have:
[TABLE]
where (a) follows from the fact that ∣RB(x)∣=∣(R(x))(B)∣≤1 for every x∈M. We can similarly show that
[TABLE]
Therefore,
[TABLE]
This shows that the push-forward mapping R# from P(M,Σ) to P(M′,Σ′) is continuous under the total variation topology. This concludes the proof of Lemma 3.
Now assume that U is a Polish topology on M and U′ is an arbitrary topology on M′. Let R be measurable random mapping from (M,B(M)) to (M′,B(M′)). Moreover, assume that R is a continuous mapping from (M,U) to P(M′,B(M′)) when the latter space is endowed with the weak-∗ topology. Let (Pn)n≥0 be a sequence of probability measures in P(M,B(M)) that weakly-∗ converges to P∈P(M,B(M)).
Let g:M′→R be a bounded and continuous mapping. Define the mapping G:M→R as follows:
[TABLE]
For every sequence (xn)n≥0 converging to x in M, the sequence (R(xn))n≥0 weakly-∗ converges to R(x) in P(M′,B(M′)) because of the continuity of R. This implies that the sequence (G(xn))n≥0 converges to G(x). Since U is a Polish topology (hence metrizable and sequential [18]), this shows that G is a bounded and continuous mapping from (M,U) to R. Therefore, we have:
[TABLE]
where (a) and (c) follow from Corollary 2, and (b) follows from the fact that (Pn)n≥0 weakly-∗ converges to P. This shows that (R#Pn)n≥0 weakly-∗ converges to R#P. Now since U is Polish, the weak-∗ topology on P(M,B(M)) is metrizable [19], hence it is sequential [18]. This shows that the push-forward mapping R# from P(M,B(M)) to P(M′,B(M′)) is continuous under the weak-∗ topology.
For every s∈S, define the mapping fs:ΔX→R as fs(p)=f(s,p). Clearly fs is continuous for every s∈S. Therefore, the mapping Fs:MP(X)→R defined as
[TABLE]
is continuous in the weak-∗ topology of MP(X).
Fix ϵ>0 and let (s,MP)∈S×MP(X). Since Fs is continuous, there exists a weakly-∗ open neighborhood Us,MP of MP such that ∣Fs(MP′)−Fs(MP)∣<2ϵ for every MP′∈Us,MP. On the other hand, Lemma 1 implies the existence of an open neighborhood Vs of s in (S,V) such that for every s′∈Vs we have
[TABLE]
Clearly Vs×Us,MP is an open neighborhood of (s,MP) in S×MP(X). For every (s′,MP′)∈Vs×Us,MP, we have
[TABLE]
where (a) follows from the fact that MP′ is a meta-probability measure and ∣f(s′,p)−f(s′,p)∣≤2ϵ for every p∈ΔX. We conclude that F is continuous.
Appendix F Weak-∗ continuity of the product of meta-probability measures
Let (MP1,n)n≥0 and (MP2,n)n≥0 be two sequences that weakly-∗ converge to MP1 and MP2 in MP(X1) and MP(X2) respectively. Let f:ΔX1×ΔX2→R be a continuous and bounded mapping. Define the mapping F:ΔX1×MP(X2) as follows:
[TABLE]
Fix ϵ>0. Since f(p1,p2) is continuous, Lemma 5 implies that F is continuous. Therefore, the mapping p1→F(p1,MP2) is continuous on ΔX1, which implies that it is also bounded because ΔX1 is compact. Therefore,
[TABLE]
because (MP1,n)n≥0 weakly-∗ converges to MP1. This means that there exists n1≥0 such that for every n≥n1, we have
[TABLE]
On the other hand, since F is continuous and since MP(X2) is compact under the weak-∗ topology [19], Lemma 1 implies the existence of a weakly-∗ open neighborhood UMP2 of MP2 such that
∣F(p1,MP2′)−F(p1,MP2)∣≤2ϵ for every MP2′∈UMP2 and every p1∈ΔX1. Moreover, since MP2,n weakly-∗ converges to MP2, there exists n2≥0 such that MP2,n∈UMP2 for every n≥n2.
Therefore, for every n≥max{n1,n2}, we have
[TABLE]
where (a) follows from the fact MP2,n∈UMP2 for every n≥n2. Therefore,
[TABLE]
where (a) and (b) follow from Fubini’s theorem. We conclude that (MP1,n×MP2,n)n≥0 weakly-∗ converges to (MP1×MP2)n≥0. Therefore the product of meta-probability measures is weakly-∗ continuous.
Appendix G Continuity of the capacity
Since the mapping I is continuous, and since the space ΔX×DMCX,Y is compact, the mapping I is uniformly continuous, i.e., for every ϵ>0, there exists δ(ϵ)>0 such that for every (p1,W1),(p2,W2)∈ΔX×DMCX,Y, if ∥p1−p2∥1:=x∈X∑∣p1(x)−p2(x)∣<δ(ϵ) and dX,Y(W1,W2)<δ(ϵ), then
[TABLE]
Let W1,W2∈DMCX,Y be such that dX,Y(W1,W2)<δ(ϵ). For every p∈ΔX, we have ∥p−p∥1=0<δ(ϵ) so we must have ∣I(p,W1)−I(p,W2)∣<ϵ. Therefore,
[TABLE]
Therefore,
[TABLE]
Similarly, we can show that C(W2)≤C(W1)+ϵ. This implies that ∣C(W1)−C(W2)∣≤ϵ, hence C is continuous.
Appendix H Measurability and continuity of C+,∗
Let us first show that the random mapping C+,∗ is measurable. We need to show that the mapping CB+,∗:ΔX×ΔX→R is measurable for every B∈B(ΔX), where
[TABLE]
For every u1∈X, define the set
[TABLE]
Clearly, Au1 is open in ΔX×ΔX (and so it is measurable). The mapping C+,u1,∗ is defined on Au1 and it is clearly continuous. Therefore, for every B∈B(ΔX), (C+,u1,∗)−1(B) is measurable. We have:
[TABLE]
where (a) follows from the fact that (p1,p2)∈(C+,u1,∗)−1(B) if and only if (p1,p2)∈Au1 and C+,u1,∗(p1,p2)∈B. This shows that CB+,∗ is measurable for every B∈B(ΔX). Therefore, C+,∗ is a measurable random mapping.
Let (p1,n,p2,n)n≥0 be a converging sequence to (p1,p2) in ΔX×ΔX. Since C−,∗ is continuous, we have n→∞lim(C−,∗(p1,n,p2,n))(u1)=(C−,∗(p1,p2))(u1) for every u1∈X. Therefore, for every u1∈supp(C−,∗(p1,p2)), there exists nu1≥0 such that for every n≥nu1, we have C−,∗(p1,n,p2,n)>0. Let n0=max{nu1:u1∈supp(C−,∗(p1,p2))}. For every n≥n0, we have supp(C−,∗(p1,p2))⊂supp(C−,∗(p1,n,p2,n)). Therefore, for every continuous and bounded mapping g:ΔX→R, we have
[TABLE]
where (b) follows from the continuity of g and C−,∗, and the continuity of C+,u1,∗ on Au1 for every u1∈X. (a) follows from the fact that:
[TABLE]
We conclude that the mapping C+,∗ is a continuous mapping from ΔX×ΔX to MP(X) when the latter space is endowed with the weak-∗ topology.
Let W^1∈DMCX1,∗(o) and W2∈DMCX2,∗(o). Fix W1∈W^1 and W2∈W2 and let Y1 and Y2 be the output alphabets of W1 and W2 respectively. We may assume without loss of generality that Im(W1)=Y1 and Im(W2)=Y2.
Let y∈Y1. We have
[TABLE]
For every x∈X1, we have
[TABLE]
On the other hand, for every x∈X2, we have
[TABLE]
Therefore (W1⊕W2)y−1=ϕ1#(W1)y−1, where ϕ1 is the canonical injection from X1 to X1∐X2.
Similarly, for every y∈Y2, we have PW1⊕W2o(y)=∣X1∣+∣X2∣∣X2∣PW1o(y)>0 and (W1⊕W2)y−1=ϕ2#(W2)y−1, where ϕ2 is the canonical injection from X2 to X1∐X2. For every B∈B(ΔX1∐X2), we have:
Now let α∈[0,1] and W^1,W^2∈DMCX,∗(o). Fix W1∈W^1 and W2∈W^2 and let Y1 and Y2 be the output alphabets of W1 and W2 respectively. We may assume without loss of generality that Im(W1)=Y1 and Im(W2)=Y2. Let W=[αW1,(1−α)W2]. If α=0, then W is equivalent to W2 and MPW=MPW2=αMPW1+(1−α)MPW2. If α=1, then W is equivalent to W1 and MPW=MPW1=αMPW1+(1−α)MPW2.
Assume now that 0<α<1. For every y∈Y1, we have:
[TABLE]
For every x∈X, we have:
[TABLE]
Similarly, for every y∈Y2, we have PWo(y)=(1−α)PW2o(y)>0 and Wy−1=(W2)y−1. Therefore,
Now let W^∈DMCX,∗(o) and let ∗ be a uniformity preserving binary operation on X. Fix W∈W^ and let Y be the output alphabet of W. We may assume without loss of generality that Im(W)=Y.
Let U1,U2 be two independent random variables uniformly distributed in X. Let X1=U1∗U2 and X2=U2. Send X1 and X2 through two independent copies of W and let Y1 and Y2 be the output respectively.
This shows the fifth and last formula of Proposition 10.
Bibliography19
The reference list from the paper itself. Each links out to its DOI / PubMed record.
1[1] Y. Polyanskiy, H. V. Poor, and S. Verdu, “Channel coding rate in the finite blocklength regime,” IEEE Transactions on Information Theory , vol. 56, no. 5, pp. 2307–2359, May 2010.
2[2] Y. Polyanskiy, “Saddle point in the minimax converse for channel coding,” IEEE Transactions on Information Theory , vol. 59, no. 5, pp. 2576–2595, May 2013.
3[3] H. Schwarte, “On weak convergence of probability measures, channel capacity and code error probabilities,” IEEE Transactions on Information Theory , vol. 42, no. 5, pp. 1549–1551, Sep 1996.
4[4] T. Richardson and R. Urbanke, Modern Coding Theory . New York, NY, USA: Cambridge University Press, 2008.
5[5] R. Nasser, “Topological structures on DMC spaces,” ar Xiv:1701.04467 , Jan 2017.
6[6] R. Engelking, General topology , ser. Monografie matematyczne. PWN, 1977.
7[7] E. Torgersen, Comparison of Statistical Experiments , ser. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 1991.
8[8] E. Şaşoğlu, E. Telatar, and E. Arıkan, “Polarization for arbitrary discrete memoryless channels,” in Information Theory Workshop, 2009. ITW 2009. IEEE , 2009, pp. 144 –148.