A Generalization of the DMC

Sergey Tridenski; Anelia Somekh-Baruch

PMC · DOI:10.3390/e28020228·February 16, 2026

A Generalization of the DMC

Sergey Tridenski, Anelia Somekh-Baruch

PDF

Open Access

TL;DR

This paper generalizes the discrete memoryless channel by using a uniform distribution over output sequences and derives key performance metrics.

Contribution

The paper introduces a new channel model and derives error and decoding exponents for a random ensemble of such channels.

Findings

01

An achievable error exponent is derived for the generalized channel model.

02

The optimal correct-decoding exponent is determined along with its converse.

03

The channel ensemble capacity is obtained as a corollary.

Abstract

We consider a generalization of the discrete memoryless channel, in which the channel probability distribution is replaced by a uniform distribution over clouds of channel output sequences. For a random ensemble of such channels, we derive an achievable error exponent, as well as its converse together with the optimal correct-decoding exponent, all as functions of information rate. As a corollary of these results, we obtain the channel ensemble capacity.

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Chemicals1

DMC

Diseases1

injury to

Figures2

Click any figure to enlarge with its caption.

Funding1

—Israel Science Foundation (ISF)

Keywords

error exponentcorrect-decodingdiscrete memoryless channel capacity

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWireless Communication Security Techniques · Error Correcting Code Techniques · Cooperative Communication and Network Coding

Full text

1. Introduction

We consider the basic information-theoretic scenario of point-to-point communication. The standard go-to model for such a scenario is the discrete memoryless channel (DMC). With this model, the communication performance is characterized by the channel capacity, surrounded by the error and the correct-decoding exponents, as functions of the information rate. In order to be characterized by these quantities, the communication is usually performed with a codebook of blocks of n channel input symbols, conveying $[eqn]$ equiprobable messages, where R is the rate in bits.

In this paper, we slightly deviate from the standard DMC model. In our set-up, the DMC itself reappears as a limiting case. Consider first the following communication scheme. Let K be some positive real parameter in addition to the rate R, and suppose that there has been an exponentially large number $[eqn]$ of block transmissions through a DMC. Each transmitted block is a codeword of length n, chosen each time with uniform probability from the same codebook of size $[eqn]$ . This corresponds to a significant amount of transmitted information of $[eqn]$ bits. By the end of these transmissions, each of the $[eqn]$ codewords has been used approximately $[eqn]$ times, resulting in $[eqn]$ not necessarily distinct channel output vectors, forming an unordered “cloud”. The parameter K therefore represents an exponential size of the cloud of channel output vectors generated by a single codeword. Suppose that, in the end of the $[eqn]$ transmissions, the unordered outcome clouds of all the codewords are revealed to the decoder. For small K, when most of the output vectors in the clouds are distinct, this “revelation” would be approximately equivalent to a noiseless transmission of the same $[eqn]$ bits of information. For higher K, however, the description of the clouds will require an exponentially smaller number of noiseless bits compared to $[eqn]$ .

Note that, given the $[eqn]$ received channel output blocks with time indices $[eqn]$ , as well as the knowledge of the clouds, the optimal decoder for any given output block with an index j (optimal in the sense that it minimizes the probability of error for the block with the index j) chooses the codeword with the maximal number of replicas of this block in its cloud. This decoder is optimal regardless of the message probabilities or the transition probabilities of the DMC that created the clouds. Moreover, the same decoder, which relies on the clouds and is oblivious of the transition probabilities in the channel that created the clouds, remains optimal whether or not the channel is memoryless or time-invariant within each block, as long as it is memoryless and time-invariant by blocks.

As an alternative communication scheme, consider $[eqn]$ contemporaneous block transmissions to $[eqn]$ receivers through physically distinct channels, modeled as stochastically independent and identical DMCs. Each transmitted block is a codeword of length n, chosen independently of others with uniform probability from the same common codebook of size $[eqn]$ . Suppose that, with noiseless feedback, all the $[eqn]$ received channel output vectors become associated with the respective sent codewords on the side of the transmitters and then, with cooperation between the transmitters, this information becomes available (published) to the receivers in the form of $[eqn]$ unordered outcome clouds of the average size of $[eqn]$ channel output vectors associated with the codewords, as in the previous scheme. This can be seen as a joint estimation of physically distinct but stochastically identical channels. As soon as its received vector is published, a receiver can start decoding. Our current work shows that the smaller the clouds, the lower the average probability of error and the higher the capacity of the resulting channel.

Given the clouds, the receiver sees effectively a different channel—one that chooses its output vector with uniform probability from the cloud of the sent codeword. This channel can be described by a model, different from DMC. In this model, we assume that the messages are equiprobable and each cloud contains exactly $[eqn]$ vectors. The clouds are generated randomly i.i.d. with a channel-generating distribution, independently for each codeword in a codebook. This is similar to constant composition clouds used for superposition coding [1] through a noiseless channel. The capacity and the relevant probability exponents of this scheme can be given in the average sense, for the ensemble of random channels. As the exponential size of the clouds K tends to infinity, the random channel ensemble converges to a single channel with the transition probabilities of the channel-generating distribution, which is a DMC in our case [2,3,4].

In this paper, we complete our work [5]. We make a rigorous proof of the random-coding error exponent [5] [Theorem 1] and add an error exponent converse bound. We verify that the correct-decoding exponent converse [5] [Theorem 2] is achievable.

The paper is organized as follows. In Section 2, we start introducing our notation and define the channel model. In Section 3, we derive an achievable error exponent for the random channel ensemble. In Section 4 and Section 5, we provide converse results. We derive an upper bound on the optimal error exponent (in Section 4) and the optimal correct-decoding exponent (in Section 4 and Section 7) of the random channel ensemble. In Section 6, we obtain the channel ensemble capacity as a corollary of the previous sections.

2. Channel Model

Let $[eqn]$ and $[eqn]$ be letters from finite channel input and output alphabets, respectively. Let $[eqn]$ denote transition probabilities of a channel-generating distribution. The channel is generated for a given codebook of blocks of a length n of letters from $[eqn]$ . Let $[eqn]$ be such a codebook, consisting of $[eqn]$ codewords $[eqn]$ , $[eqn]$ , where R is a positive real number representing a communication rate.

Given this codebook and another positive real number K, a channel instance is generated with the distribution W, as follows. For each one of the $[eqn]$ messages m, an exponentially large number $[eqn]$ of sequences $[eqn]$ is generated randomly given the corresponding codeword $[eqn]$ , where each sequence $[eqn]$ is generated independently of others. Each letter $[eqn]$ , $[eqn]$ , of each such sequence $[eqn]$ is generated i.i.d. according to W given the corresponding letter $[eqn]$ of $[eqn]$ . In this way, the set of clouds of $[eqn]$ ’s of size $[eqn]$ for each m forms one channel instance.

We assume that the messages $[eqn]$ , represented by the codewords $[eqn]$ , are equiprobable. Given that a particular message is sent through the channel, the stochastic channel action now amounts to choosing exactly one of all the not necessarily unique $[eqn]$ vectors $[eqn]$ , corresponding to the sent message, with the uniform probability $[eqn]$ . We assume that the decoder, receiving the channel output vector $[eqn]$ , knows not only the codebook, but also the channel instance, i.e., all the $[eqn]$ clouds comprising the corresponding $[eqn]$ ’s.

A cloud can have more than one replica of the received vector $[eqn]$ . The maximum-likelihood (ML) decoder makes an error with non-vanishing probability $[eqn]$ , if there exists an incorrect message with the same or a higher number of replicas of $[eqn]$ in its cloud, comparing to the sent message itself. Otherwise, there is no error.

Let $[eqn]$ with indices $[eqn]$ and $[eqn]$ be all the cloud vectors, not necessarily distinct. Then, the encoder is a function $[eqn]$ , mapping the messages to the clouds, which is $[eqn]$ , $[eqn]$ . The ML decoder without loss of optimality is a deterministic function $[eqn]$ , given by

[eqn]

where the minimum is taken over the indices m in the $[eqn]$ set.

3. Achievable Error Exponent

Suppose the codebook is generated i.i.d. according to a distribution over $[eqn]$ with probabilities $[eqn]$ . Let $[eqn]$ denote the average error probability of the maximum-likelihood decoder, averaged over all possible messages, codebooks, and channel instances:

[eqn]

where $[eqn]$ represents the random received vector, while I is the random sent message, $[eqn]$ are randomly generated codewords, $[eqn]$ are the random cloud vectors, and the expectation $[eqn]$ is taken according to the independent and identical joint distribution $[eqn]$ . Let $[eqn]$ denote probabilities in an auxiliary distribution over $[eqn]$ and let us define the following:

[eqn]

[eqn]

[eqn]

where $[eqn]$ is the Kullback–Leibler divergence averaged over T, the expectation $[eqn]$ is with respect to the joint distribution $[eqn]$ , and $[eqn]$ . All the logarithms here and below are to base e. In what follows, we usually suppress the superscripts in $[eqn]$ and $[eqn]$ . Then, we can show the following:

Theorem 1 (Random-coding error exponent).

[eqn]

where $[eqn]$ is defined in (3).

We prove this theorem by separately deriving matching lower and upper bounds. For the lower bound, for $[eqn]$ , let us further define

[eqn]

[eqn]

[eqn]

[eqn]

where $[eqn]$ and $[eqn]$ are defined by (2). Our lower bound is given by Lemma 1, together with Lemmas 3 and 4 below.

Lemma 1 (Lower bound).

[eqn]

where $[eqn]$ is defined in (5)–(8).

In the proof of Lemma 1, we use the following auxiliary lemma:

Lemma 2 (Super-exponential bounds). *Let $[eqn]$ *, * $[eqn]$ *, 2,…, *be i.i.d. Bernoulli( $[eqn]$ ) random variables. Then, for any $[eqn]$ *,

[eqn]

[eqn]

where $[eqn]$ is a function of $[eqn]$ that satisfies $[eqn]$ as $[eqn]$ .

Proof. The result follows straightforwardly from Markov’s inequality for the random variable $[eqn]$ and $[eqn]$ , resp., as well as the inequality $[eqn]$ . □

Proof of Lemma 1. We will use $[eqn]$ to establish (9).Let $[eqn]$ be the sent codeword and $[eqn]$ be the received vector. The cloud of $[eqn]$ can contain more than one vector $[eqn]$ . The maximum-likelihood decoder makes an error with non-vanishing probability $[eqn]$ ; if there exists an incorrect codeword (not necessarily distinct from $[eqn]$ , but representing a different message and having therefore an independently generated cloud) with the same or a higher number of vectors $[eqn]$ in its cloud, compared to the sent codeword itself. Otherwise, there is no error.Consider an event where $[eqn]$ and $[eqn]$ have a joint empirical distribution (type with denominator n) $[eqn]$ , i.e., $[eqn]$ , where T is a distribution on $[eqn]$ and V is a conditional distribution on $[eqn]$ given a letter from $[eqn]$ . The exponent of the probability of this event (probability of type class in [6]) is given by

[eqn]

where the term $[eqn]$ vanishes uniformly w.r.t. $[eqn]$ , as $[eqn]$ .Consider now the competing codewords. The exponent of the probability of an event, in which $[eqn]$ appears somewhere in the clouds corresponding to the incorrect codewords, is given by

[eqn]

where $[eqn]$ is uniform w.r.t. T. To observe this, consider a possibly different (from V) conditional type $[eqn]$ of some $[eqn]$ w.r.t. $[eqn]$ . The exponent of the probability of an event, in which a certain incorrect codeword belongs to the conditional type $[eqn]$ given $[eqn]$ , is given by

[eqn]

where $[eqn]$ is uniform w.r.t. $[eqn]$ . The exponent of the probability of an event, that a certain $[eqn]$ in the cloud of $[eqn]$ equals $[eqn]$ , is given by $[eqn]$ . The exponent of the probability of an event, that in the cloud of $[eqn]$ of the type $[eqn]$ the vector $[eqn]$ appears at least once, is given by

[eqn]

where the term $[eqn]$ , vanishing as $[eqn]$ , depends on K. In particular, as a lower bound on the exponent, (14) follows trivially without $[eqn]$ from the union bound on the probability. Meanwhile, to confirm (14) as an upper bound on the exponent, denoting $[eqn]$ and $[eqn]$ , we can write similarly to [3] [Equation (14)]:

[eqn]

where $[eqn]$ is a function of K. Adding together (13) and (14), we obtain the exponent of the probability of an event, that a certain incorrect codeword is of the conditional type $[eqn]$ w.r.t. $[eqn]$ , and $[eqn]$ appears at least once in its cloud:

[eqn]

where $[eqn]$ is uniform w.r.t. $[eqn]$ . Finally, the exponent of the probability of an event, where there exists in the codebook at least one incorrect codeword of the conditional type $[eqn]$ w.r.t. $[eqn]$ and $[eqn]$ appears at least once in its cloud, is given by

[eqn]

where $[eqn]$ uniformly w.r.t. $[eqn]$ as $[eqn]$ and may depend on K and R, which yields (12).Suppose that $[eqn]$ . In this case, the exponent of the conditional probability of error, given that the received vector and the sent codeword belong to the joint type, $[eqn]$ can be lower bounded by (12), and the exponent of the (unconditional) probability of error due to all such cases is lower-bounded by

[eqn]

Consider now the opposite case when $[eqn]$ . For this case, recall that the exponent of the probability of an event, in which there exists at least one incorrect codeword of the conditional type $[eqn]$ w.r.t. $[eqn]$ , is given by $[eqn]$ . Suppose now that the conditional type $[eqn]$ is such that $[eqn]$ . For this case, we use Lemma 2, with $[eqn]$ . Using (11) for the correct cloud and (10) for the competing clouds, the probability of the event that the cloud of an incorrect codeword of the type $[eqn]$ has at least as many occurrences of the vector $[eqn]$ , compared to the correct codeword of the type V, can be upper-bounded uniformly by

[eqn]

That is, it tends to zero super-exponentially fast with n. The remaining types $[eqn]$ with $[eqn]$ allow us to write a lower bound on the exponent of the (unconditional) probability of error due to all the cases $[eqn]$ , as

[eqn]

Together, (17) and (19) cover all cases and the minimum between the two gives the lower bound on the error exponent.Observe that the objective function of (17) can also be used in (19), because in (19), the set over which the minimization is performed satisfies $[eqn]$ . Furthermore, for the lower bound, we can simply extend the minimization set in (17) and (19) from types to arbitrary distributions $[eqn]$ and $[eqn]$ . Therefore, omitting $[eqn]$ , in the limit of a large n, we can replace the minimum of the bounds (17) and (19) with (5). □

To complete the lower bound given by Lemma 1, we establish the next two lemmas.

Lemma 3 (Epsilon equals zero). The expression defined in (5)–(8) satisfies

[eqn]

Proof. Observe first that both (6) and (7) are convex (∪) functions of $[eqn]$ . This can be verified directly by the definition of convexity, using the property that $[eqn]$ is convex (∪) and $[eqn]$ is linear in the pair $[eqn]$ . Furthermore, by continuity of $[eqn]$ and $[eqn]$ , it follows that (6) and (7) are lower semi-continuous functions of $[eqn]$ . Observe next from (6) and (7) that at least one of them is necessarily finite at $[eqn]$ , i.e., $[eqn]$ . Suppose that $[eqn]$ . Then, $[eqn]$ is finite for $[eqn]$ and by the lower semi-continuity of the convex function $[eqn]$ . Then, we also obtain (20). Consider the opposite case $[eqn]$ . Then, (6) at $[eqn]$ is a minimization of a continuous function of $[eqn]$ over a closed non-empty set. Let $[eqn]$ be the distribution $[eqn]$ , achieving the minimum in (6) at $[eqn]$ . Then, necessarily $[eqn]$ (otherwise with $[eqn]$ there has to be $[eqn]$ ). Then, $[eqn]$ is finite for $[eqn]$ and by the lower semi-continuity of the convex function $[eqn]$ . Then, we once again obtain (20). □

Lemma 4 (Identity).

[eqn]

where the LHS and the RHS are defined by (5)–(8) and (3), respectively.

Proof. For $[eqn]$ , we can conveniently rewrite the minimum (5) between (6) and (7) in the following unified manner:

[eqn]

where in the objective function we used also the property that $[eqn]$ . Now, it is convenient to verify, that in (22) the conditional distribution $[eqn]$ without loss of optimality can be replaced with V. To this end suppose that some joint distributions $[eqn]$ and $[eqn]$ satisfy the condition under the minimum of (22).If $[eqn]$ , then, since also $[eqn]$ , we cannot increase the objective function of (22) by using $[eqn]$ in place of $[eqn]$ .If $[eqn]$ , then we cannot increase the objective function of (22) by using $[eqn]$ in place of $[eqn]$ .It follows that (3) is a lower bound on minimum (22). Finally, since (3) is also an upper bound on (22), we conclude that there is equality between (3) and (22). □

Combining (21), (20), and (9), we have that the RHS of (4) is a lower bound. It remains to show that it is also an upper bound.

Lemma 5 (Upper bound).

[eqn]

where $[eqn]$ is defined in (3).

In the proof of Lemma 5 we use the following auxiliary lemma:

Lemma 6 (Two competing clouds). Let $[eqn]$ and $[eqn]$ be two statistically independent binomial random variables with the parameters $[eqn]$ and $[eqn]$ , where $[eqn]$ is a constant. Then,

[eqn]

where $[eqn]$ depends on q and as $[eqn]$ satisfies $[eqn]$ .

The proof is given in the Appendix A. In the above Lemma, N and $[eqn]$ can describe the random numbers of replicas of $[eqn]$ in an incorrect cloud and the correct cloud, respectively.

Proof of Lemma 5. For the upper bound it is enough to consider the exponent of the probability of the event that the transmitted and the received blocks $[eqn]$ and $[eqn]$ have a joint type $[eqn]$ , while in the codebook there exists at least one incorrect codeword of the same conditional type V w.r.t. $[eqn]$ , and $[eqn]$ appears at least once in its cloud. As in the proof of Lemma 1, this exponent is given by

[eqn]

The additional exponent of the conditional probability of error given this event is $[eqn]$ , as follows immediately by Lemma 6, used with $[eqn]$ and $[eqn]$ with $[eqn]$ , or $[eqn]$ . In the limit of a large n, we can omit $[eqn]$ and by continuity minimize (25) over all distributions $[eqn]$ , to obtain the RHS of (23). □

This completes the proof of Theorem 1. An alternative representation of the error exponent of Theorem 1 is given by

Lemma 7 (Dual form).

[eqn]

where $[eqn]$ is defined in (3) and

[eqn]

Proof. Observe first that the minimum (3) can be lower-bounded as

[eqn]

[eqn]

Observe further, that the lower bound (29) is the lower convex envelope of (28) as a function of $[eqn]$ . Indeed, the minimum (28) is a non-increasing function of R, and therefore it cannot have lower supporting lines with slopes greater than 0. It also cannot have lower supporting lines with negative slopes below $[eqn]$ , as it decreases with the slope exactly $[eqn]$ in the region of negative or small positive values of R. Note that the objective function of the minimum (29) is continuous in $[eqn]$ in the closed region of $[eqn]$ . Let $[eqn]$ be the minimizing distribution of the minimum in (29) for a given $[eqn]$ . For this distribution, there exists a real $[eqn]$ such that the expression in the square brackets of (29) is zero. Therefore, there is equality between (29) and (28) at $[eqn]$ . And this is achieved for each $[eqn]$ , which corresponds to lower supporting lines of slopes $[eqn]$ between 0 and $[eqn]$ .Finally, we observe that there is in fact equality between (28) and (29) for all R, since (28) is a convex (∪) function of R and, therefore, it coincides with its lower convex envelope. Indeed, using the property $[eqn]$ , the objective function of the minimization (28) can be rewritten as a maximum of three terms:

[eqn]

Then, this objective function is convex (∪) in the triple $[eqn]$ , verified as a maximum of convex (∪) functions of $[eqn]$ . In particular, the convexity of $[eqn]$ in $[eqn]$ follows by the log-sum inequality [6]. By the definition of convexity, it is then verified that the minimum (28) itself is a convex (∪) function of R.So far, we have shown that (28) and (29) are equal. Consider now the minimum of (29) with any $[eqn]$ :

[eqn]

[eqn]

By the same reasoning as before, there is equality also between (30) and (31). Putting together (31) and (29) and denoting $[eqn]$ , we can rewrite (28) as

[eqn]

where the minimizing solution is

[eqn]

□

4. A Converse Theorem for the Error and Correct-Decoding Exponents

Let $[eqn]$ denote the average error probability of the maximum-likelihood decoder for a given codebook $[eqn]$ of block length n, averaged over all messages and channel instances:

[eqn]

where $[eqn]$ represents the random received vector, I is the random sent message, $[eqn]$ represent random codewords equal to the codewords $[eqn]$ of the given codebook $[eqn]$ , while $[eqn]$ are the random cloud vectors and the expectation $[eqn]$ is taken according to the independent and identical conditional distribution W, generating the clouds $[eqn]$ given $[eqn]$ . Let $[eqn]$ denote the mutual information of a pair of random variables with the joint distribution $[eqn]$ , and let us define:

[eqn]

[eqn]

where P and U are such that $[eqn]$ . Then, we can show the following.

Theorem 2 (Converse bounds).

[eqn]

[eqn]

where (36) holds for all $[eqn]$ and (37) holds a.e.: except possibly for such $[eqn]$ where there is a transition (a jump) from $[eqn]$ to a finite value of (35) as a monotonically non-increasing function of R.

Let $[eqn]$ denote the conditional average error probability of the maximum-likelihood decoder for a codebook $[eqn]$ , given that the joint type of the sent and the received blocks is $[eqn]$ . Theorem 2 is a corollary of the following upper bound on the corresponding conditional probability of correct decoding:

Lemma 8. For any constant composition codebook $[eqn]$ and any $[eqn]$ ,

[eqn]

where the term $[eqn]$ , vanishing uniformly w.r.t. $[eqn]$ as $[eqn]$ , depends on ϵ but does not depend on the choice of $[eqn]$ .

Proof. Suppose we are given a constant composition codebook $[eqn]$ , where all $[eqn]$ codewords are of the same type with empirical probabilities $[eqn]$ . Looking at the codebook $[eqn]$ as a matrix of letters from $[eqn]$ , of size $[eqn]$ , we construct a whole ensemble of block codes by permuting the columns of the matrix. Observe that the total number of code permutations in the ensemble is given by

[eqn]

where $[eqn]$ is the entropy of the empirical distribution P, and $[eqn]$ denotes the number of same-symbol permutations in the type P, i.e., the symbol permutations that do not change a codeword that is a member of the type.Suppose that, for each code in the ensemble, a separate independent channel instance is generated. And suppose that, for every transmission, one code in the ensemble (known to the decoder with its own channel instance) is chosen randomly with uniform probability over permutations. Consider an event where the sent codeword, chosen with uniform probability over the code permutations and the messages, together with the received vector have a joint type $[eqn]$ , such that $[eqn]$ . Since the channel-generating distribution is memoryless, this will result in the same conditional average probability of correct decoding given $[eqn]$ , when averaged over all messages and channel instances, as $[eqn]$ itself. In what follows, we will derive an upper bound on this probability.Let $[eqn]$ be the received vector of the type T. Consider the conditional type class $[eqn]$ of codewords with the empirical distribution V given the vector $[eqn]$ . Observe that the total number of all codewords in the ensemble belonging to this conditional type class (counted as distinct if corresponding to different code permutations or messages) is given by

[eqn]

where $[eqn]$ is the average entropy of the conditional distribution V given T, i.e., $[eqn]$ .Let us fix two small numbers $[eqn]$ and consider separately two cases. Suppose, first, that $[eqn]$ . In this case, the probability of an event in which the cloud of any $[eqn]$ in the ensemble contains less than $[eqn]$ or more than $[eqn]$ vectors $[eqn]$ by Lemma 2 uniformly tends to zero super-exponentially fast with n. Denote the complementary highly probable event as $[eqn]$ . Let k be an index of a code in the ensemble. Let $[eqn]$ denote the number of codewords from the conditional type class $[eqn]$ in the code of index k. Then, given the conditions that the received vector is $[eqn]$ , whereby the sent codeword belongs to $[eqn]$ , and $[eqn]$ , we observe that the conditional probability of the code k is upper-bounded by $[eqn]$ . Furthermore, given that indeed the code k is used for communication, the conditional probability of correct decoding is upper-bounded by $[eqn]$ . Summing over all codes, we can write

[eqn]

[eqn]

[eqn]

Consider now the second case when $[eqn]$ . In this case, the probability of an event in which the cloud of any $[eqn]$ in the ensemble contains more than $[eqn]$ occurrences of the vector $[eqn]$ by (10) of Lemma 2 uniformly tends to zero super-exponentially fast. Let us denote this rare event as $[eqn]$ . In fact, among the codewords $[eqn]$ , those with clouds containing $[eqn]$ become rare. However, the probability of an event where, in the ensemble, there are less than $[eqn]$ codewords from $[eqn]$ having at least one vector $[eqn]$ in their cloud uniformly tends to zero super-exponentially fast. This, in turn, can be verified similarly to (11) of Lemma 2, using (39). Let us denote this rare event as $[eqn]$ . Let us denote the complementary (to the union of the events $[eqn]$ and $[eqn]$ ) and highly-probable event as $[eqn]$ .Let $[eqn]$ denote the number of such codewords in the code k that both belong to the conditional type class $[eqn]$ and have at least one $[eqn]$ in their respective cloud. Then, given the intersection of events that the received vector is $[eqn]$ and that the sent codeword belongs to $[eqn]$ and $[eqn]$ , we obtain that the conditional probability of the code k is upper-bounded by $[eqn]$ . Given that the code k is used for communication, the conditional probability of correct decoding is upper-bounded by $[eqn]$ . Repeating the steps leading to (40), we obtain (41) once again. □

Proof of Theorem 2. First, we verify the bound on the correct-decoding exponent (36). It is enough to consider constant composition codes, because they can asymptotically achieve the same exponent of the correct-decoding probability as the best block codes, as is shown in the beginning of [7] [Lemma 5] using a suboptimal encoder–decoder pair.Thus, let $[eqn]$ be a constant composition codebook of a type P. Consider an event where the sent codeword together with the received vector have a joint type $[eqn]$ . The exponent of the probability of such event is given by $[eqn]$ .We then add the lower bound on the exponent of the conditional probability of correct decoding given $[eqn]$ of Lemma 8 in the following form:

[eqn]

minimizing the resulting expression over all distributions $[eqn]$ , discarding $[eqn]$ , and taking $[eqn]$ , we obtain (34).Next, we establish the bound on the error exponent (37). Here, it also suffices to consider constant composition codebooks $[eqn]$ , because there is only a polynomial number of different types in a general codebook of block length n.Turning (38) into a lower bound on $[eqn]$ , we can obtain the following upper bound on the error exponent of $[eqn]$ :

[eqn]

[eqn]

[eqn]

Here, (43) follows directly from Lemma 8 and the fact that the exponent of $[eqn]$ is $[eqn]$ . In (44), we extend the inner minimization from conditional types to arbitrary distributions U with the help of an additional $[eqn]$ in the minimization condition. In (45), we extend the outer maximization to arbitrary distributions P, and as a result, the maximum cannot decrease.In the limit of a large n, the vanishing term $[eqn]$ in (45) disappears and we are left with $[eqn]$ . In order to replace $[eqn]$ with zero, observe that both the objective function and the expression in the minimization condition of (45) are convex (∪) functions of U. It follows that the inner minimum of (45) is a convex (∪) function of $[eqn]$ . Therefore, (45) itself, as a maximum of convex functions of $[eqn]$ , is convex (∪) in $[eqn]$ . We conclude that by continuity of a convex function, the maximum in (45) tends to (35) as $[eqn]$ , with a possible exception when (45) jumps to $[eqn]$ exactly at $[eqn]$ , which corresponds to the jump to $[eqn]$ of (35) as a convex (∪) function of R exactly at R. □

5. Alternative Representation of the Converse Bounds

In this section, we develop alternative expressions for the converse bounds of Theorem 2. Using the properties that $[eqn]$ and $[eqn]$ , the expression (34) for the lower bound of Theorem 2 can be written also as $[eqn]$ , where

[eqn]

and $[eqn]$ and $[eqn]$ are defined in (2). An alternative expression for (46) is given by

Lemma 9 (Alternative representation—correct-decoding exponent).

[eqn]

where $[eqn]$ is defined by (46) and $[eqn]$ is defined as in (27).

Proof. We can rewrite (46) as a minimum of two terms:

[eqn]

Solution of each one of the terms is similar to the method of Lemma 7 and gives (47). □

The expression (35) for the upper bound of Theorem 2 can be written alternatively as

Lemma 10 (Alternative representation—upper bound on the error exponent).

[eqn]

where $[eqn]$ and $[eqn]$ are defined in (35) and (27), respectively.

The proof is given in the Appendix A. Examples of this bound together with the achievable error exponent as a lower bound are given in Figure 1. Note the discontinuities (jumps to $[eqn]$ ) in the upper bounds. Observing the alternative to (48) expression (A2), which appears in the proof of Lemma 10, it can be verified similarly to Lemma 7 that the discontinuity (jump to $[eqn]$ ) in (48) occurs at

[eqn]

For $[eqn]$ , this gives $[eqn]$ , so that there is no jump for $[eqn]$ .

6. The Capacity of the Channel Ensemble

Let us define the capacity of the channel ensemble generated with W, denoted as $[eqn]$ , as the supremum of rates R, for which there exists a sequence of codebooks $[eqn]$ of size $[eqn]$ with $[eqn]$ as $[eqn]$ , where $[eqn]$ is defined as in (33). Comparing (1) with (33), we conclude that

[eqn]

It follows that if the achievable error exponent (4) is positive, then there exists a sequence of codebooks $[eqn]$ of size $[eqn]$ such that $[eqn]$ drops to zero exponentially fast as $[eqn]$ . If, on the other hand, the lower bound (36) on the minimal correct-decoding exponent is positive, then for any sequence of codebooks $[eqn]$ of size $[eqn]$ , the probability of correct decoding $[eqn]$ tends to zero exponentially fast as $[eqn]$ , so that $[eqn]$ tends to 1. Then, $[eqn]$ must correspond to a point on the R-axis, at which both the maximal achievable error exponent and the lower bound on the minimal correct-decoding exponent of the channel ensemble are equal to zero. By the results of the previous sections, it turns out that there is only one such point. Examples are shown in Figure 2. We find the point $[eqn]$ in the following theorem.

Theorem 3 (Ensemble capacity).

[eqn]

where $[eqn]$ is the Shannon capacity of the DMC W, and $[eqn]$ with $[eqn]$ .

Proof. The maximal achievable error exponent, provided by Theorem 1, is

[eqn]

where $[eqn]$ is given by (3). The lower bound on the minimal correct-decoding exponent, given by Theorem 2, can be written as

[eqn]

where $[eqn]$ is given by (46). Since $[eqn]$ if $[eqn]$ for all $[eqn]$ , both expressions (3) and (46), as functions of R, meet zero at the same point, which is $[eqn]$ . This gives

[eqn]

where the last equality follows because $[eqn]$ with $[eqn]$ , $[eqn]$ . □

7. The Optimal Correct-Decoding Exponent

In fact, the lower bound (36) is achievable. As in Section 3, suppose the codebook is generated i.i.d. according to a distribution over $[eqn]$ with probabilities $[eqn]$ , and let $[eqn]$ denote the average error probability of the maximum-likelihood decoder, averaged over all possible messages, codebooks, and channel instances.

Lemma 11 (Achievable correct-decoding exponent).

[eqn]

where $[eqn]$ is defined in (46).

Proof. Consider the following suboptimal decoder. The decoder works with a single anticipated joint type $[eqn]$ of the sent codeword $[eqn]$ and the received vector $[eqn]$ . If the type of $[eqn]$ is not T, the decoder declares an error. Otherwise, in case the type of the received block is indeed T, the decoder looks for the indices of the codewords with the conditional type V w.r.t. $[eqn]$ , with at least one replica of $[eqn]$ in their clouds, and chooses one of these indices as its estimate $[eqn]$ of the transmitted message. The choice is made randomly with uniform probability, regardless of the actual number of replicas of $[eqn]$ in each cloud. If there are no codewords of the conditional type V w.r.t. $[eqn]$ with at least one $[eqn]$ in their cloud, then the decoder declares an error again.Let $[eqn]$ denote the random number of incorrect codewords of the conditional type V w.r.t. $[eqn]$ , with at least one replica of $[eqn]$ in their clouds, in the codebook. Then, the conditional probability of the correct decoding, given that the joint type of the received and the transmitted blocks is indeed $[eqn]$ , is given by

[eqn]

with Jensen’s inequality where the expectation is w.r.t. the randomness of both the incorrect codewords and their clouds. Note that the exponent of $[eqn]$ can be expressed as R minus (15) with $[eqn]$ . The RHS of (50) then results in the following upper bound on the exponent of the conditional probability of correct decoding:

[eqn]

Adding to this the exponent of the joint type $[eqn]$ , we obtain (46). □

Now, since $[eqn]$ , by (36) of Theorem 2 and Lemma 11, we have the following.

Theorem 4 (Optimal correct-decoding exponent).

[eqn]

where $[eqn]$ is defined in (34) and $[eqn]$ is defined as in Section 4. This exponent is achievable by random coding.

Bibliography9

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Somekh-Baruch A. On Achievable Rates and Error Exponents for Channels with Mismatched Decoding IEEE Trans. Inf. Theory 20156172774010.1109/TIT.2014.2385699 · doi ↗
2Gallager R.G. Information Theory and Reliable Communication John Wiley & Sons Hoboken, NJ, USA 1968
3Gallager R.G. The Random Coding Bound is Tight for the Average Code IEEE Trans. Inf. Theory 19731924424610.1109/TIT.1973.1054971 · doi ↗
4Arimoto S. On the Converse to the Coding Theorem for Discrete Memoryless Channels IEEE Trans. Inf. Theory 19731935735910.1109/TIT.1973.1055007 · doi ↗
5Tridenski S. Somekh-Baruch A. A Generalization of the DMC Proceedings of the IEEE Information Theory Workshop (ITW), 2020 IEEE Information Theory Workshop (ITW)Riva del Garda, Italy 11–15 April 2021
6Cover T.M. Thomas J.A. Elements of Information Theory John Wiley & Sons Hoboken, NJ, USA 1991
7Dueck G. Körner J. Reliability Function of a Discrete Memoryless Channel at Rates above Capacity IEEE Trans. Inf. Theory 197925828510.1109/TIT.1979.1056003 · doi ↗
8Feller W. An Introduction to Probability Theory and Its Applications 3rd ed.John Wiley & Sons Hoboken, NJ, USA 1968 Volume 1