Lossless Source Coding in the Point-to-Point, Multiple Access, and   Random Access Scenarios

Shuqing Chen; Michelle Effros; Victoria Kostina

arXiv:1902.03366·cs.IT·October 13, 2020

Lossless Source Coding in the Point-to-Point, Multiple Access, and Random Access Scenarios

Shuqing Chen, Michelle Effros, Victoria Kostina

PDF

TL;DR

This paper develops a unified analysis of lossless source coding in various scenarios, providing third-order performance characterizations and introducing a novel random access coding scheme with rateless coding and feedback.

Contribution

It introduces a third-order analysis for multiple scenarios and proposes a new random access source coding method with provable optimality.

Findings

01

Third-order characterization of Slepian-Wolf rate region.

02

Independent encoders can match joint encoder performance.

03

A rateless coding scheme with feedback achieves optimal performance.

Abstract

This work studies point-to-point, multiple access, and random access lossless source coding in the finite-blocklength regime. In each scenario, a random coding technique is developed and used to analyze third-order coding performance. Asymptotic results include a third-order characterization of the Slepian-Wolf rate region with an improved converse that relies on a connection to composite hypothesis testing. For dependent sources, the result implies that the independent encoders used by Slepian-Wolf codes can achieve the same third-order-optimal performance as a single joint encoder. The concept of random access source coding is introduced to generalize multiple access (Slepian-Wolf) source coding to the case where encoders decide independently whether or not to participate and the set of participating encoders is unknown {\em a priori} to both the encoders and the decoder. The proposed…

Figures15

Click any figure to enlarge with its caption.

Equations817

R^{*} (n, ϵ) \approx H (X) + \frac{V ( X )}{n} Q^{- 1} (ϵ) - \frac{lo g n}{2 n},

R^{*} (n, ϵ) \approx H (X) + \frac{V ( X )}{n} Q^{- 1} (ϵ) - \frac{lo g n}{2 n},

Φ (z)

Φ (z)

Q (z)

ϕ (z) ≜ Φ^{'} (z) = \frac{1}{2 π} e^{- \frac{z ^{2}}{2}} .

ϕ (z) ≜ Φ^{'} (z) = \frac{1}{2 π} e^{- \frac{z ^{2}}{2}} .

Φ (V; z) ≜ Φ (V; z_{1}, \dots, z_{d})

Φ (V; z) ≜ Φ (V; z_{1}, \dots, z_{d})

 (x_{A})

 (x_{A})

 (x_{A} ∣ x_{B})

H (X_{A})

H (X_{A})

H (X_{A} ∣ X_{B})

V (X_{A})

V (X_{A} ∣ X_{B})

T (X_{A})

T (X_{A} ∣ X_{B})

V_{c} (X_{A} ∣ X_{B})

V_{c} (X_{A} ∣ X_{B})

T_{c} (X_{A} ∣ X_{B})

M^{*} (n, ϵ)

M^{*} (n, ϵ)

R^{*} (n, ϵ)

n \to \infty lim R^{*} (n, ϵ) = H (X), \forall ϵ \in (0, 1) .

n \to \infty lim R^{*} (n, ϵ) = H (X), \forall ϵ \in (0, 1) .

R^{*} (n, ϵ)

R^{*} (n, ϵ)

n > \frac{1}{4} (1 + \frac{T ( X )}{2 V ( X ) ^{3/2}})^{2} \frac{1}{( ϕ ( Q ^{- 1} ( ϵ )) Q ^{- 1} ( ϵ ) ) ^{2}},

n > \frac{1}{4} (1 + \frac{T ( X )}{2 V ( X ) ^{3/2}})^{2} \frac{1}{( ϕ ( Q ^{- 1} ( ϵ )) Q ^{- 1} ( ϵ ) ) ^{2}},

R^{*} (n, ϵ)

R^{*} (n, ϵ)

1 - ϵ \leq \frac{M ^{*} ( n , ϵ )}{∣ X ∣ ^{n}} \leq 1 - ϵ + \frac{1}{∣ X ∣ ^{n}} .

1 - ϵ \leq \frac{M ^{*} ( n , ϵ )}{∣ X ∣ ^{n}} \leq 1 - ϵ + \frac{1}{∣ X ∣ ^{n}} .

H (X) - \frac{1}{n} lo g \frac{1}{1 - ϵ} \leq R^{*} (n, ϵ)

H (X) - \frac{1}{n} lo g \frac{1}{1 - ϵ} \leq R^{*} (n, ϵ)

\leq

ϵ \leq P [ (X) > lo g M - γ] + exp (- γ), \forall γ > 0.

ϵ \leq P [ (X) > lo g M - γ] + exp (- γ), \forall γ > 0.

R^{*} (n, ϵ) \leq H (X) + \frac{V ( X )}{n} Q^{- 1} (ϵ) + \frac{lo g n}{2 n} + O (\frac{1}{n}) .

R^{*} (n, ϵ) \leq H (X) + \frac{V ( X )}{n} Q^{- 1} (ϵ) + \frac{lo g n}{2 n} + O (\frac{1}{n}) .

ϵ \leq E [exp {- ∣ lo g M -  (X)] ∣_{+}}] .

ϵ \leq E [exp {- ∣ lo g M -  (X)] ∣_{+}}] .

ϵ \leq P [ (X) > lo g γ] + \frac{1}{M} U [ (X) \leq lo g γ],

ϵ \leq P [ (X) > lo g γ] + \frac{1}{M} U [ (X) \leq lo g γ],

H_{0} : P_{X}, selected if  (X) \leq lo g M

H_{0} : P_{X}, selected if  (X) \leq lo g M

H_{1} : U_{X}, selected if  (X) > lo g M .

R^{*} (n, ϵ) \leq H (X) + \frac{V ( X )}{n} Q^{- 1} (ϵ) + O (\frac{1}{n}) .

R^{*} (n, ϵ) \leq H (X) + \frac{V ( X )}{n} Q^{- 1} (ϵ) + O (\frac{1}{n}) .

ϵ \leq E [min {1, \frac{1}{M} E [exp ( (\overset{ˉ}{X})) 1 { (\overset{ˉ}{X}) \leq  (X)} ∣ X]}],

ϵ \leq E [min {1, \frac{1}{M} E [exp ( (\overset{ˉ}{X})) 1 { (\overset{ˉ}{X}) \leq  (X)} ∣ X]}],

g (c) = ar g x \in X : F (x) = c max P_{X} (x) = ar g x \in X : F (x) = c min  (x) .

g (c) = ar g x \in X : F (x) = c max P_{X} (x) = ar g x \in X : F (x) = c min  (x) .

E ≜ {\exists \overset{x}{ˉ} \in X \ {X} s.t.  (\overset{x}{ˉ}) \leq  (X), F (\overset{x}{ˉ}) = F (X)},

E ≜ {\exists \overset{x}{ˉ} \in X \ {X} s.t.  (\overset{x}{ˉ}) \leq  (X), F (\overset{x}{ˉ}) = F (X)},

E [P ({g (F (X)) \neq = X} ∣ F (\cdot))]

E [P ({g (F (X)) \neq = X} ∣ F (\cdot))]

\leq

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Lossless Source Coding in the Point-to-Point,

Multiple Access, and Random Access Scenarios

Shuqing Chen, Michelle Effros, and Victoria Kostina, 1 Manuscript received September 4, 2019; revised May 27, 2020; accepted June 8, 2020This work is supported in part by the National Science Foundation under Grants CCF-1817241 and CCF-1956386. The work of S. Chen is supported in part by the Oringer Fellowship Fund in Information Science and Technology. This paper was presented in part at the 2019 IEEE International Symposium on Information Theory [1]. Matlab code for the computation of nonasymptotic bounds in this paper is available at github [2].Shuqing Chen was with the Department of Electrical Engineering, California Institute of Technology, Pasadena, CA 91125 USA. She is now with Virtu Financial Inc., New York, NY 10006 USA. (e-mail: [email protected]).Michelle Effros and Victoria Kostina are with the Department of Electrical Engineering, California Institute of Technology, Pasadena, CA 91125, USA. (e-mail: [email protected], [email protected]).Communicated by I. Kontoyiannis, Associate Editor At Large. Digital Object Identifier 10.1109/TIT.2020.3005155

Abstract

This work studies point-to-point, multiple access, and random access lossless source coding in the finite-blocklength regime. In each scenario, a random coding technique is developed and used to analyze third-order coding performance. Asymptotic results include a third-order characterization of the Slepian-Wolf rate region with an improved converse that relies on a connection to composite hypothesis testing. For dependent sources, the result implies that the independent encoders used by Slepian-Wolf codes can achieve the same third-order-optimal performance as a single joint encoder. The concept of random access source coding is introduced to generalize multiple access (Slepian-Wolf) source coding to the case where encoders decide independently whether or not to participate and the set of participating encoders is unknown a priori to both the encoders and the decoder. The proposed random access source coding strategy employs rateless coding with scheduled feedback. A random coding argument proves the existence of a single deterministic code of this structure that simultaneously achieves the third-order-optimal Slepian-Wolf performance for each possible active encoder set.

Index Terms:

Lossless source coding, Slepian-Wolf, random access, finite blocklength, random coding, non-asymptotic information theory, Gaussian approximation, hypothesis testing, meta-converse.

I Introduction

We study the fundamental limits of fixed-length, finite-blocklength lossless source coding in three scenarios:

Point-to-point: A single source is compressed by a single encoder and decompressed by a single decoder. 2. 2.

Multiple access: Each source in a fixed set of sources is compressed by an independent encoder; all sources are decompressed by a joint decoder. 3. 3.

Random access: Each active source from some set of possible sources is compressed by an independent encoder; all active sources are decompressed by a joint decoder.

The information-theoretic limit in any lossless source coding scenario is the set of code sizes or rates at which a desired level of reconstruction error is achievable. Shannon’s theory [3] analyzes this fundamental limit by allowing an arbitrarily long encoding blocklength in order to obtain a vanishing error probability. Finite-blocklength limits [4, 5, 6, 7], which are of particular interest in delay-sensitive and computationally-constrained coding environments, allow a non-vanishing error probability and study refined asymptotics of the rates achievable with encoding blocklength $n$ . Due to their non-vanishing error probability, the resulting codes are sometimes called “almost-lossless” source codes. We here use the term “source coding” to refer to this almost-lossless coding paradigm.

In point-to-point source coding, non-asymptotic bounds and asymptotic expansions of the minimum achievable rate appear in [8, 4, 9, 10, 6]. In [6], Kontoyiannis and Verdú analyze the optimal code to give a third-order characterization of the minimum achievable rate $R^{*}(n,\epsilon)$ at blocklength $n$ and error probability $\epsilon$ . For a finite-alphabet, stationary, memoryless source with single-letter distribution $P_{X}$ , entropy $H(X)$ , and varentropy $V(X)>0$ ,

[TABLE]

where $Q^{-1}(\cdot)$ is the inverse complementary Gaussian distribution function, and any higher-order term is bounded by $O\big{(}\frac{1}{n}\big{)}$ .

For a multiple access source code (MASC), also known as a Slepian-Wolf (SW) source code [11], the fundamental limit is the set of achievable rate tuples known as the rate region. The first-order rate region for stationary, memoryless and general sources appears in [11] and [12, 9], respectively. Second-order asymptotic expansions of the MASC rate region for stationary, memoryless sources appear in [13, 14]. Tan and Kosut’s characterization [13] is similar in form to the first two terms of (1), with varentropy $V(X)$ replaced by the entropy dispersion matrix and third-order term bounded by $O\big{(}\frac{\log n}{n}\big{)}$ .

For point-to-point source coding, our contributions include non-asymptotic characterizations of the performance of randomly designed codes using threshold and maximum-likelihood decoders. The former analysis demonstrates that combining random coding with the best possible threshold decoder cannot achieve $-\frac{\log n}{2n}$ in the third-order term in (1), and thus it is strictly sub-optimal. The latter shows that combining random coding with maximum likelihood decoding achieves the first three terms in (1). We derive both bounds by deriving and analyzing a source coding analog to the random coding union (RCU) bound from channel coding [5, Th. 16]. Our asymptotic expansion is achieved by a random code rather than the optimal code from [6]. Thus, there is no loss (up to the third-order term) due to random code design, which in turn shows that many codes have near-optimal performance; further, since our RCU bound holds when restricted to linear compressors, there are many good linear codes. The RCU bound is also important because it generalizes to the MASC and other scenarios where the optimal code is not known.

Our MASC RCU bound yields a new MASC achievability bound (Theorem 18). Establishing a link to composite hypothesis testing (HT) yields a new MASC HT converse (Theorem 19), which extends the meta-converse for channel coding in [5] to source coding with multiple encoders. This converse recovers and improves the previous converse due to Han [9, Lemma 7.2.2] and is equivalent to the LP-based converse of Jose and Kulkarni [15], which is the current best MASC converse. Our analysis of composite HT, including both non-asymptotic and asymptotic characterizations, develops tools with potential application in other multiple-terminal communication scenarios and beyond. The MASC RCU bound and HT converse together yield the third-order MASC rate region for stationary, memoryless sources (Theorem 20), revealing a $-\frac{\log n}{2n}$ third-order term that is independent of the number of encoders. This tightens the $O\big{(}\frac{\log n}{n}\big{)}$ third-order bound from [13], which grows linearly with the source alphabet size and exponentially with the number of encoders. For dependent sources, the MASC’s third-order-optimal sum rate equals the third-order-optimal rate achievable through joint encoding.

While a MASC assumes a fixed, known collection of encoders, the set of transmitters communicating with a given access point in applications like sensor networks, the internet of things, and random access communication may be unknown or time-varying. The information theory literature treats the resulting channel coding challenges in papers such as [16, 17, 18]. We introduce the notion of a random access source code (RASC) and tackle the resulting source coding challenges. The RASC extends the MASC to scenarios where some encoders are inactive, and the decoder seeks to reliably reconstruct the sources associated with the active encoders assuming that the set of active encoders is unknown a priori.

We propose and analyze a robust RASC with rateless encoders that transmit codewords symbol by symbol until the receiver tells them to stop. Unlike typical rateless codes, which allow arbitrary decoding times [19, 20, 21, 22], our code employs a small set of decoding times. Single-bit feedback from the decoder to all encoders at each potential decoding time tells the encoders whether or not to continue transmitting.

We demonstrate (Theorem 24) that there exists a single deterministic RASC that simultaneously achieves, for every possible set of active encoders, the third-order-optimal MASC performance for the active source set. Since traditional random coding arguments do not guarantee the existence of a single deterministic code that meets multiple independent constraints, prior code designs for multiple-constraint scenarios (e.g., [21]) employ a family of codes indexed using common randomness shared by all communicators. We develop an alternative approach, deriving a refined random coding argument (Lemma 25) that demonstrates the existence of a single deterministic code that meets all our constraints simultaneously; this technique may eliminate the need for common randomness in other communication scenarios. For stationary, memoryless, permutation-invariant sources, employing identical encoders at all transmitters reduces RASC design complexity.

Except where noted, all presented source coding results apply to both finite and countably infinite source alphabets.

The organization of this paper is as follows. Section II defines notation. Section III treats (point-to-point) source coding. Section IV studies composite HT, developing general tools for multiple-encoder communication scenarios. Section V treats the MASC. Section VI introduces and studies the RASC. Each of Sections III, V, and VI follows a similar flow:

For the (point-to-point) source code: Section III-A defines the problem. Section III-B provides historical background. Section III-C presents our new random coding achievability bounds and their asymptotic expansions. 2. 2.

For the MASC: Section V-A gives definitions. Section V-B provides historical background. Section V-C presents new non-asymptotic bounds. Section V-D presents the third-order MASC characterization, comparing MASC and point-to-point source coding performance. Section V-E bounds the impact of limited feedback (and cooperation) on the third-order-optimal MASC region. 3. 3.

For the RASC: Section VI-A defines the problem and describes our proposed code. Section VI-B highlights related work. Section VI-C derives converse and achievability characterizations for our proposed code’s finite-blocklength performance. Section VI-D treats the simplified code for permutation-invariant sources.

Section VII contains concluding remarks. Proofs of auxiliary results appear in the appendices.

II Notation

For any positive integer $i$ , let $[i]\triangleq\{1,\ldots,i\}$ . We use uppercase letters (e.g., $X$ ) for random variables, lowercase letters (e.g., $x$ ) for scalar values, calligraphic uppercase letters (e.g., $\mathcal{E}$ ) for subsets of a sample space (events) or index sets, and script uppercase letters (e.g., $\mathscr{Q}$ ) for subsets of a Euclidean space. We use both bold face and superscripts for vectors (e.g., $\mathbf{x}=x^{n}$ , $\mathbf{1}=(1,\ldots,1)$ , and $\mathbf{0}=(0,\ldots,0)$ ). Given a sequence $(x_{1},x_{2},\ldots)$ with element $x_{i}$ in set $\mathcal{X}_{i}$ for each $i$ and given an ordered index set $\mathcal{T}\subseteq\mathbb{N}$ , we define vector $\mathbf{x}_{\mathcal{T}}\triangleq(x_{i},\;i\in\mathcal{T})$ and set $\mathcal{X}_{\mathcal{T}}\triangleq\prod_{i\in\mathcal{T}}\mathcal{X}_{i}$ . Given a set $\mathcal{X}$ , $\mathcal{X}^{n}$ is the $n$ -fold Cartesian product of $\mathcal{X}$ . We denote matrices by sans serif uppercase letters (e.g., $\mathsf{V}$ ) and the $(i,j)$ -th element of matrix $\mathsf{V}$ by $[\mathsf{V}]_{i,j}$ . Inequalities between two vectors of the same dimension indicate elementwise inequalities. Given vector $\mathbf{u}\in\mathbb{R}^{d}$ and set $\mathscr{Q}\subset\mathbb{R}^{d}$ , $\mathbf{u}+\mathscr{Q}$ denotes the Minkowski sum of $\{\mathbf{u}\}$ and $\mathscr{Q}$ , giving $\mathbf{u}+\mathscr{Q}\triangleq\left\{\mathbf{u}+\mathbf{q}:\mathbf{q}\in\mathscr{Q}\right\}$ . For two functions $u(n)$ and $f(n)$ , $u(n)=O(f(n))$ if there exist $c,\,n_{0}\in\mathbb{R}_{+}$ such that $0\leq u(n)\leq cf(n)$ for all $n>n_{0}$ . For a $d$ -dimensional function $\mathbf{u}:\mathbb{N}\rightarrow\mathbb{R}^{d}$ , $\mathbf{u}(n)=O(f(n))\mathbf{1}$ if $u_{i}(n)=O(f(n))$ for all $i\in[d]$ . For any finite set $\mathcal{A}$ , $\mathcal{P}(\mathcal{A})$ represents the power set of $\mathcal{A}$ excluding the empty set, giving $\mathcal{P}(\mathcal{A})\triangleq\{\mathcal{T}:\mathcal{T}\subseteq\mathcal{A}\}\setminus\emptyset$ . We use $|\cdot|_{+}\triangleq\max\{0,\cdot\}$ . All uses of ‘ $\log$ ’ and ‘ $\exp$ ’, if not specified, employ an arbitrary common base, which determines the information unit.

Denote the standard and complementary Gaussian cumulative distribution functions (cdf) by $\Phi(z)$ and $Q(z)$ , giving

[TABLE]

Function $Q^{-1}(\cdot)$ denotes the inverse of $Q(\cdot)$ . The standard Gaussian probability density function is

[TABLE]

The $d$ -dimensional generalization of the Gaussian cdf is

[TABLE]

Given an ordered index set $\mathcal{T}\subset\mathbb{N}$ , let $P_{\mathbf{X}_{\mathcal{T}}}$ be a distribution defined on countable alphabet $\mathcal{X}_{\mathcal{T}}$ . For any $\mathcal{A},\mathcal{B}\subseteq\mathcal{T}$ with $\mathcal{A}\cap\mathcal{B}=\emptyset$ and any $(\mathbf{x}_{\mathcal{A}},\mathbf{x}_{\mathcal{B}})\in\mathcal{X}_{\mathcal{A}}\times\mathcal{X}_{\mathcal{B}}$ , the information and conditional information are defined as

[TABLE]

The corresponding entropy, conditional entropy, varentropy, conditional varentropy, third centered moment of information, and third centered moment of conditional information are defined by, respectively,

[TABLE]

We also define random variables

[TABLE]

III Point-to-Point Source Coding

III-A Definitions

In point-to-point source coding, the encoder maps a discrete random variable $X$ defined on finite or countably infinite alphabet $\mathcal{X}$ into a message from codebook $[M]$ . The decoder reconstructs $X$ from the compressed description. Formal definitions of codes and their information-theoretic limits follow. For prior definitions, see, for example, [9, Chapter 1].

Definition 1 (Point-to-point source code).

An $(M,\epsilon)$ code for a random variable $X$ with discrete alphabet $\mathcal{X}$ comprises an encoding function $\mathsf{f}\colon\mathcal{X}\rightarrow[M]$ and a decoding function $\mathsf{g}\colon[M]\rightarrow\mathcal{X}$ with error probability $\mathbb{P}\left[\mathsf{g}(\mathsf{f}(X))\neq X\right]\leq\epsilon$ .

Definition 2 (Block point-to-point source code).

An $(n,M,\epsilon)$ code is an $(M,\epsilon)$ code defined for a random vector $X^{n}$ with discrete vector alphabet $\mathcal{X}^{n}$ .

Definition 3 (Minimum achievable rate).

The minimum code size $M^{*}(n,\epsilon)$ and rate $R^{*}(n,\epsilon)$ achievable at blocklength $n$ and error probability $\epsilon$ are defined as

[TABLE]

A discrete information source is a sequence of discrete random variables, $X_{1},X_{2},\ldots$ , specified by the transition probability kernels $P_{X_{i}|X^{i-1}}$ , $i=1,2,\ldots$ While Definition 2 applies to many classes of sources, including sources with memory and non-stationary sources, our asymptotic analysis focuses on stationary, memoryless sources, where $P_{X_{i}|X^{i-1}}=P_{X}$ for all $i=1,2,\ldots$ (i.e., $X_{1},X_{2},\ldots$ are i.i.d.).

III-B Background

Shannon’s source coding theorem [3] describes the fundamental limit on the asymptotic performance for lossless source coding on a stationary, memoryless source, giving

[TABLE]

In the finite-blocklength regime, Kontoyiannis and Verdú [6] characterize $R^{*}(n,\epsilon)$ using upper and lower bounds that match in their first three terms and show an $O\left(\frac{1}{n}\right)$ fourth-order gap.

Theorem 1 (Kontoyiannis and Verdú [6]).

Consider a stationary, memoryless source with finite alphabet $\mathcal{X}$ , single-letter distribution $P_{X}$ , and varentropy $V(X)>0$ . Then111These bounds, which are stated in a base-2 logarithmic scale in [6], hold for any base. The base of the logarithm determines the information unit. (achievability) for all $0<\epsilon\leq\frac{1}{2}$ and all222According to [6], the achievability bound holds for any $n\geq 1$ . Notice, however, that it only becomes meaningful when $n>\left(\frac{T(X)}{V(X)^{3/2}\epsilon}\right)^{2}$ . $n>\left(\frac{T(X)}{V(X)^{3/2}\epsilon}\right)^{2}$ ,

[TABLE]

(converse) for all $0<\epsilon\leq\frac{1}{2}$ and all

[TABLE]

*Remark 1**.*

Although [6, Theorem 1] restricts attention to $0<\epsilon\leq\frac{1}{2}$ and $\mathcal{X}$ finite, the proof in [6] applies for all $0<\epsilon<1$ and any countable source alphabet, achieving the same first three terms in (1) and (19) and fourth-order term $\pm O\left(\frac{1}{n}\right)$ (which varies with $\epsilon$ ) provided that the third centered moment $T(X)$ of information random variable $X$ is finite.

*Remark 2**.*

When $V(X)=0$ , the source is uniformly distributed over a finite alphabet (i.e., non-redundant), and $H(X)=\log|\mathcal{X}|$ . The optimal code maps any $1-\epsilon$ fraction of possible source outcomes to unique codewords, giving

[TABLE]

As a result, when $P_{X}$ is uniform,

[TABLE]

which matches (1) up to the second order (since $V(X)=0$ ) but omits the $-\frac{\log n}{2n}$ third-order term.

*Remark 3**.*

While it is not captured by our notation, $R^{*}(n,\epsilon)$ is a function of $P_{X}$ . Since the $-\frac{\log n}{2n}$ third-order term appears in (1) and (19) but not in (20), the bound on $R^{*}(n,\epsilon)$ , when viewed as a function of $P_{X}$ , is discontinuous at the point where $P_{X}$ equals the uniform distribution on $\mathcal{X}$ . In contrast, $R^{*}(n,\epsilon)$ , which is known and calculable, is continuous. The problem arises because Berry-Esseen type bounds are loose for small $V(X)$ . Thus for any finite $n$ , the achievability bound in (1) blows up as $V(X)\rightarrow 0$ . See Figure 1. Theorem 1 states that for any $V(X)>0$ there exists some $n_{0}=n_{0}(P_{X},\epsilon)$ such that for all $n>n_{0}$ , $R^{*}(n,\epsilon)$ behaves like $-\frac{\log n}{2n}$ in the third-order term; the smaller the value of $V(X)$ , the larger $n_{0}$ must be.

Achievability results based on Shannon’s random coding argument [3] are important because they do not require knowledge of the optimal code, which is available only in a few special communication scenarios (e.g., [6, 7]). The following random coding achievability bound333Tighter bounds based on the optimal code appear in [9, Lemma 1.3.1] and [23, Remark 5]. is obtained by assigning source realizations to codewords independently and uniformly at random. The threshold decoder decodes to $x\in\mathcal{X}$ if and only if $x$ is a unique source realization that (i) is compatible with the observed codeword under the given (random) code design, and (ii) has information $\imath(x)$ below $\log M-\gamma$ .

Theorem 2 (e.g. [24], [25, Th. 9.4]).

There exists an $(M,\epsilon)$ code for discrete random variable $X$ such that

[TABLE]

Particularizing (21) to a stationary, memoryless source with single-letter distribution $P_{X}$ satisfying $V(X)>0$ and $T(X)<\infty$ , choosing $\log M$ and $\gamma$ optimally, and applying the Berry-Esseen inequality (see Theorem 6 below) gives

[TABLE]

Since the optimal application of Theorem 2 yields (22), which exceeds the bounds in Theorem 1 by $+\frac{\log n}{n}$ in the third-order term, we are left to wonder whether random code design, threshold decoding, or both yield third-order performance penalties. In [6, Th. 8], Kontoyiannis and Verdú precisely characterize the performance of a code designed with i.i.d. uniform random codeword generation and an optimal (maximum likelihood) decoder. Unfortunately, that result is difficult to use in the asymptotic analysis. In Section III-C Theorem 4, below, we derive a new random coding bound using a maximum likelihood decoder; this result demonstrates that random coding suffices to achieve the third-order optimal performance for a stationary, memoryless source.

III-C New Achievability Bounds Based on Random Coding

We next use random code design to derive two new non-asymptotic achievability bounds for point-to-point source coding. We call these results the dependence testing (DT) bound and the random coding union (RCU) bound since they are the source coding analogues of the DT [5, Th. 17] and RCU [5, Th. 16] bounds in channel coding. The DT bound tightens Theorem 2, which is also based on threshold decoding.

Theorem 3 (DT bound).

Given a discrete random variable $X$ , there exists an $(M,\epsilon)$ code with a threshold decoder for which

[TABLE]

Proof.

Appendix A. ∎

The proof of Theorem 3 bounds the random coding performance of a threshold decoder with threshold $\log\gamma$ as

[TABLE]

where $\mathbb{U}\left[\cdot\right]$ denotes a mass with respect to the counting measure $U_{X}$ on $\mathcal{X}$ , which assigns unit weight to each $x\in\mathcal{X}$ . As in a channel coding argument from [5], we apply the Neyman-Pearson lemma and find that the right-hand side of (24) equals $\frac{M+1}{M}$ times the minimum measure of the error event in a Bayesian binary hypothesis test between $P_{X}$ with a priori probability $\frac{M}{M+1}$ and $U_{X}$ with a priori probability $\frac{1}{M+1}$ . (The Neyman-Pearson lemma generalizes to $\sigma$ -finite measures like $U_{X}$ [23, Remark 5].) This error measure is minimized by the test that compares the log likelihood ratio $\log\frac{U_{X}(X)}{P_{X}(X)}$ to the log ratio of a priori probabilities $\log\frac{M/(M+1)}{1/(M+1)}$ , giving

[TABLE]

Taking $\gamma=M$ minimizes the right-hand side of (24), which implies that Theorem 3 is the tightest possible bound for random coding with threshold decoding.

Particularizing Theorem 3 to a stationary, memoryless source with a single-letter distribution $P_{X}$ satisfying $V(X)>0$ and $T(X)<\infty$ and invoking the Berry-Esseen inequality (see Theorem 6 below), we obtain the asymptotic expansion

[TABLE]

Unfortunately, (25) is sub-optimal in its third-order term. Thus, random code design with threshold-based decoding fails to achieve the optimal third-order performance.

Next, we present the RCU bound, which employs random code design and maximum likelihood decoding.

Theorem 4 (RCU bound).

Given a discrete random variable $X$ , there exists an $(M,\epsilon)$ code with a maximum likelihood decoder for which

[TABLE]

where $P_{X\bar{X}}(a,b)=P_{X}(a)P_{X}(b)$ for all $a,b\in\mathcal{X}$ .

Proof.

Our random code design randomly and independently draws encoder output $\mathsf{F}(x)$ for each $x\in\mathcal{X}$ from the uniform distribution on $[M]$ . We use the maximum likelihood decoder

[TABLE]

If multiple source symbols have the maximal probability mass, the decoder design chooses among them uniformly at random.

Under this random code construction, the expected error probability is bounded by the probability $\mathbb{P}\left[\mathcal{E}\right]$ of event

[TABLE]

where probability measure $\mathbb{P}[\cdot]$ captures both the random source output $X$ and the random encoding map $\mathsf{F}$ . The resulting error bound is

[TABLE]

where (III-C) applies the law of iterated expectation, (III-C) bounds the probability by the minimum of the union bound and 1, (31) holds because the encoder outputs are drawn i.i.d. uniformly at random and independently of $X$ , and (32) rewrites (31) in terms of the distribution $P_{X\bar{X}}=P_{X}P_{X}$ .

The existence of the desired $(M,\epsilon)$ code follows since (32) equals the right-hand side of (26). ∎

*Remark 4**.*

By the argument employed in the proof of [25, Th. 9.5], we obtain the same RCU bound if we randomize only over linear encoding maps. Thus, there is no loss in performance when restricting to linear compressors.

We next show that the RCU bound recovers the first three terms of the achievability result in Theorem 1. Thus, the sub-optimal third-order terms in (22) and (25) result from the sub-optimal decoder rather than the random encoder design. This is important since optimal codes are not available for scenarios like the MASC studied in Section V, below.

Theorem 5 focuses on a stationary, memoryless source with single-letter distribution $P_{X}$ satisfying

[TABLE]

Define constants

[TABLE]

where $C_{0}$ is the absolute constant in the Berry-Esseen inequality for i.i.d. random variables. (See Theorem 6, below.)

Theorem 5 (Third-order-optimal achievability via random coding).

Consider a stationary, memoryless source satisfying the conditions in (33) and (34). For all $0<\epsilon<1$ ,

[TABLE]

where $\xi(n)=O\big{(}\frac{1}{n}\big{)}$ is bounded more precisely as follows.

For all $0<\epsilon\leq\frac{1}{2}$ and $n>\left(\frac{B+C}{\epsilon}\right)^{2}$ ,

[TABLE] 2. 2)

For all $\frac{1}{2}<\epsilon<1$ and $n>\left(\frac{B+C}{\epsilon-\frac{1}{2}}\right)^{2}$ ,

[TABLE]

Before we show our proof of the asymptotic expansion in Theorem 5, we state two auxiliary results used in our analysis. The first is the classical Berry-Esseen inequality (e.g., [26, Chapter XVI.5]), stated here with the best known absolute constant $C_{0}$ from [27].

Theorem 6 (Berry-Esseen inequality).

Let $Z_{1},\ldots,Z_{n}$ be independent random variables such that $V\triangleq\frac{1}{n}\sum_{i=1}^{n}\emph{Var}[Z_{i}]>0$ and $T\triangleq\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}[|Z_{i}-\mathbb{E}[Z_{i}]|^{3}]<\infty$ . Then for any real $t$ and $n\geq 1$ ,

[TABLE]

where $0.4097\leq C_{0}\leq 0.5583$ ( $0.4097\leq C_{0}<0.4690$ for identically distributed $Z_{i}$ ) [27].

We refer to $C_{0}\cdot T/V^{3/2}$ as the Berry-Esseen constant.

The second result is from Polyanskiy et al. [5, Lemma 47].

Lemma 7 ([5, Lemma 47]).

In the setting of Theorem 6, it holds for any $A$ and $n\geq 1$ that

[TABLE]

Proof of Theorem 5.

We analyze the RCU bound of Theorem 4 for random variable $X^{n}$ . For notational brevity, define

[TABLE]

By Theorem 4, there exists an $(n,M,\epsilon^{\prime})$ code such that

[TABLE]

where $P_{X^{n}\bar{X}^{n}}=P_{X}^{n}P_{X}^{n}$ , and each of $I_{n}$ and $\bar{I}_{n}$ is a sum of i.i.d. random variables. Applying Lemma 7 with $Z_{i}=-\imath(\bar{X}_{i})$ and $A=-I_{n}$ gives

[TABLE]

Plugging (44) in (43), we find

[TABLE]

where (46) plugs (44) into (43), (46) separates the cases $I_{n}>\log\left(M\sqrt{n}/C\right)$ and $I_{n}\leq\log\left(M\sqrt{n}/C\right)$ , and (47) applies Lemma 7 to the second term in (46).

Denote for brevity

[TABLE]

We now choose

[TABLE]

and apply the Berry-Esseen inequality (Theorem 6) to (47), giving $\epsilon^{\prime}\leq\epsilon$ and proving achievability bound

[TABLE]

To obtain (37) from (49), we note that as long as $\delta_{n}<\epsilon$ ,

[TABLE]

where (51) applies the definition of the Gaussian cumulative distribution function $\Phi(\cdot)$ and its complement $Q(\cdot)$ from (2) and (3), (52) holds by a first-order Taylor bound for some $\xi_{n}\in\left[\Phi(Q^{-1}(\epsilon)),\Phi(Q^{-1}(\epsilon))+\delta_{n}\right]$ , and (53) holds by the inverse function theorem.

For $\epsilon\leq\frac{1}{2}$ and $\delta_{n}<\epsilon$ , $\xi_{n}\geq\frac{1}{2}$ and $\phi(\Phi^{-1}(\xi_{n}))$ is decreasing in $\xi_{n}$ . We can further bound the right-hand side of (53) and conclude that

[TABLE]

For $\epsilon>\frac{1}{2}$ and $\delta_{n}<\epsilon-\frac{1}{2}$ , we have $\xi_{n}\leq\frac{1}{2}$ and $\phi(\Phi^{-1}(\xi_{n}))$ is increasing in $\xi_{n}$ . We conclude that

[TABLE]

Plugging (54) and (55) into (49) gives (38) and (39). ∎

IV Composite Hypothesis Testing

The meta-converse for channel coding [5, Th. 26]444The quantum information theory literature contains an earlier approach to channel coding converses using binary hypothesis testing [28], [29, Ch. 4.6]. and its generalizations to lossy source coding [23] and joint source-channel coding [30, 31] apply binary hypothesis testing to derive converses in point-to-point communication problems. To extend this approach to multi-terminal coding (see, e.g., Section V Theorem 19, below), we develop a corresponding method using composite hypothesis testing. We first develop non-asymptotic tools and then analyze the asymptotics.

A composite hypothesis test $P_{Z|X}:\mathcal{X}\rightarrow\{0,1\}$ tests a simple hypothesis against a composite hypothesis:

[TABLE]

where $X$ is the observation, $P$ is the distribution under the simple hypothesis, and $\{Q_{j}\}_{j=1}^{k}$ is the collection of possible distributions under the composite hypothesis. The following definition generalizes the optimal $\beta$ -function from binary to composite hypothesis testing. (See, for example, [32, Def. 1].)

Definition 4.

The set of achievable false-positive errors for power- $\alpha$ tests between distribution $P$ and collection of distributions $\{Q_{j}\}_{j=1}^{k}$ is the subset of $[0,1]^{k}$ defined as

[TABLE]

where $\mathbb{P}\left[\cdot\right]$ denotes a probability with respect to $P$ , and for each $j\in[k]$ , $\mathbb{Q}_{j}\left[\cdot\right]$ denotes a probability with respect to $Q_{j}$ .

Like binary hypothesis tests (see [23, Remark 5]), composite hypothesis tests can be generalized to allow $P$ and $\{Q_{j}\}_{j=1}^{k}$ to be $\sigma$ -finite measures; in such cases, $\beta_{\alpha}(P,\{Q_{j}\}_{j=1}^{k})$ may not be a subset of $[0,1]^{k}$ . We apply this generalization in Section V-C2 to derive our new MASC converse.

In [32], Huang and Moulin study the asymptotics of the set $\beta_{\alpha}(P,\{Q_{j}\}_{j=1}^{k})$ , giving a third-order-optimal characterization [32, Th. 1]. As noted in [33, Appendix D], there is a gap in their converse proof (see also Remark 6, below). We here present a comprehensive analysis of composite hypothesis testing, starting with non-asymptotic characterizations and then particularizing them to give a new proof of [32, Th. 1].

IV-A Non-Asymptotic Bounds

The analysis of $\beta_{\alpha}\left(P,\{Q_{j}\}_{j=1}^{k}\right)$ in [32] uses the test that achieves the minimal (boundary) points of that set. For each minimal point $\boldsymbol{\beta}$ , there exists a vector $\mathbf{a}=(a_{1},\ldots,a_{k})\geq\mathbf{0}$ , $\mathbf{a}\neq\mathbf{0}$ , such that the generalized Neyman-Pearson test

[TABLE]

achieves $\boldsymbol{\beta}$ ; here $\lambda\in[0,1]$ is chosen so that $\mathbb{P}\left[Z=1\right]=~{}\alpha$ . While the above test is optimal, the achievability and converse bounds that follow simplify the asymptotic analysis.

Lemma 8 (Achievability).

For any $\gamma_{j}\geq 0$ , $j\in[k]$ , there exists a composite hypothesis test $P_{Z|X}$ for which

[TABLE]

Proof.

Fix any $\gamma_{j}\geq 0$ , $j\in[k]$ . Consider the (sub-optimal) likelihood-ratio threshold test555In [32], Huang and Moulin also use this sub-optimal likelihood-ratio threshold in their asymptotic achievability analysis.:

[TABLE]

Under this test, (58) follows immediately, and (8) holds by

[TABLE]

∎

The following converse bound extends [5, Eq. (102)] from binary hypothesis testing to composite hypothesis testing.

Lemma 9 (Converse).

For any $\alpha$ , if $\boldsymbol{\beta}=(\beta_{1},\ldots,\beta_{k})\in\beta_{\alpha}\left(P,\{Q_{j}\}_{j=1}^{k}\right)$ , then

[TABLE]

where $\gamma_{j}\geq 0$ , $j\in[k]$ are arbitrary constants.

Proof.

Appendix B. ∎

Lemma 10 extends the argument of [34, Lemma 1] from binary to composite hypothesis testing.

Lemma 10 (Variational lemma).

For any $\alpha$ , if $\boldsymbol{\beta}=(\beta_{1},\ldots,\beta_{k})\in\beta_{\alpha}\left(P,\{Q_{j}\}_{j=1}^{k}\right)$ , then

[TABLE]

where $\gamma_{j}\geq 0$ , $j\in[k]$ , are arbitrary constants and equality is achieved by a generalized Neyman-Pearson test.

Proof.

Appendix C. ∎

Given any $\boldsymbol{\beta}=(\beta_{1},\ldots,\beta_{k})$ , define

[TABLE]

Then Lemma 10 gives

[TABLE]

*Remark 5**.*

We can derive Lemma 9 from the variational characterization in Lemma 10 by noting that

[TABLE]

Lemmas 9 and 10 are useful beyond the asymptotic analysis of composite hypothesis testing. They also make it possible to recover previous converse bounds from our new MASC meta-converse, as presented in Section V-C2 below.

IV-B Asymptotics for I.I.D. Distributions

We here characterize the asymptotics of $\beta_{\alpha}\left(P,\{Q_{j}\}_{j=1}^{k}\right)$ when each of $P$ and $\{Q_{j}\}_{j=1}^{k}$ is a product of $n$ identical single-shot distributions, i.e., $P(X^{n})=\prod_{i=1}^{n}P(X_{i})$ and $Q_{j}(X^{n})=\prod_{i=1}^{n}Q_{j}(X_{i})$ , $j\in[k]$ .

We begin with notation. For each $j\in[k]$ , define

[TABLE]

Define vector $\mathbf{D}$ and matrix $\mathsf{V}$ as

[TABLE]

Let $\mathbf{Z}\in\mathbb{R}^{d}$ be a Gaussian random vector with mean zero and covariance matrix $\mathsf{V}$ . Define the multidimensional counterpart of the function $Q^{-1}(\cdot)$ as

[TABLE]

The set $\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon)$ appears in characterizations such as [13, 32]. When $\mathsf{V}$ is non-singular, the boundary of $\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon)$ approaches $z_{i}=\sqrt{[\mathsf{V}]_{i,i}}Q^{-1}(\epsilon)$ in each dimension $i\in[d]$ , as illustrated in Figure 2LABEL:sub@fig-sve-1. For $\epsilon\leq 1/2$ , $\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon)$ lies in the positive orthant of $\mathbb{R}^{d}$ ; for $\epsilon>1/2$ , $\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon)$ extends outside of the positive orthant. If $\epsilon^{\prime}<\epsilon$ , then $\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon^{\prime})\subset\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon)$ . See Figure 2LABEL:sub@fig-sve-2 for plots of the boundaries of $\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon)$ in $\mathbb{R}^{2}$ . If $\mathsf{V}$ is singular with rank $r<d$ , then $\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon)$ lies in an $r$ -dimensional subspace of $\mathbb{R}^{d}$ .

Theorem 11 derives a third-order-optimal characterization of $\beta_{\alpha}\left(P,\{Q_{j}\}_{j=1}^{k}\right)$ under assumptions

[TABLE]

Define the inner and outer bounding sets

[TABLE]

where vector $\mathbf{D}$ and matrix $\mathsf{V}$ are defined in (75) and (76).

Theorem 11 (Third-order-optimal asymptotics).

Assume that $P$ and $\left\{Q_{j}\right\}_{j=1}^{k}$ are product distributions composed of $n$ identical single-shot distributions that satisfy (78) and (79). For any $\alpha\in(0,1)$ , the set $\beta_{\alpha}\left(P,\{Q_{j}\}_{j=1}^{k}\right)$ satisfies

[TABLE]

where $\epsilon=1-\alpha$ .

*Remark 6**.*

In [32, Th. 1], Huang and Moulin claim the third-order-optimal result in Theorem 11 when $\mathsf{V}$ is non-singular. Unfortunately, there is a gap in their converse proof. Applying [32, Lemma 2] to get [32, Eq. (13)] requires that vector $\mathbf{b}$ is independent of $n$ . However, they consider any $\mathbf{b}\in\mathscr{Q}_{\rm inv}\left(\mathsf{V},\epsilon\right)$ , which may grow with $n$ because set $\mathscr{Q}_{\rm inv}\left(\mathsf{V},\epsilon\right)$ is unbounded. Thus, [32, Eq. (13)] does not always hold.

We resolve this issue with a new proof of Theorem 11 that leverages Lemmas 8 and 9. We first show two auxiliary results.

The multidimensional Berry-Esseen theorem bounds the probability of a sum of i.i.d. random vectors. Bentkus’ theorem [35, Th. 1.1] for the case with mean zero and identity covariance achieves the best known dependence on dimension. Tan and Kosut extend [35, Th. 1.1] to non-singular covariance matrices [13, Cor. 8]. We here extend [13, Cor. 8] to covariance matrices with non-zero rank.

Lemma 12.

Let $\mathbf{U}_{1},\ldots,\mathbf{U}_{n}$ be i.i.d. random vectors in $\mathbb{R}^{d}$ with mean zero and covariance matrix $\mathsf{V}$ . Let $\mathbf{Z}\sim\mathcal{N}(\mathbf{0},\mathsf{V})$ be a Gaussian vector in $\mathbb{R}^{d}$ . Define $r\triangleq\emph{rank}(\mathsf{V})$ . Let $\mathsf{T}$ be a $d\times r$ matrix whose columns are the $r$ normalized eigenvectors of $\mathsf{V}$ with non-zero eigenvalues. Define i.i.d. random vectors $\mathbf{W}_{1},\ldots,\mathbf{W}_{n}\in\mathbb{R}^{r}$ such that $\mathbf{U}_{i}=\mathsf{T}\mathbf{W}_{i}$ for $i\in[n]$ . Let $\mathsf{V}_{r}\triangleq\emph{Cov}[\mathbf{W}_{1}]$ and $\beta_{r}\triangleq\mathbb{E}[\|\mathbf{W}_{1}\|_{2}^{3}]$ . If $r\geq 1$ , then for all $n$ ,

[TABLE]

where $\lambda_{\min}(\mathsf{V}_{r})>0$ is the smallest eigenvalue of matrix $\mathsf{V}_{r}$ .

Proof.

Appendix D. ∎

If $r=d$ , then $\mathsf{V}_{r}=\mathsf{V}$ and Lemma 12 recovers [13, Cor. 8].

The following lemma is useful for our asymptotic analysis.

Lemma 13.

Fix an arbitrary $d\times d$ positive-semidefinite matrix $\mathsf{V}$ and $0<\epsilon<1$ . Then, the following results hold.

There exist constants $D_{1}$ and $\delta_{1}>0$ such that for all $0\leq\delta<\delta_{1}$ ,

[TABLE] 2. 2.

There exist constants $D_{2}$ and $\delta_{2}>0$ such that for all $0\leq\delta<\delta_{2}$ ,

[TABLE]

Proof.

Appendix E. ∎

Proof of Theorem 11.

Define random variables

[TABLE]

and random vector

[TABLE]

For brevity, denote

[TABLE]

To prove the achievability part of Theorem 11, we particularize Lemma 8 to product distributions $P^{\otimes n}$ and $\left\{Q_{j}^{\otimes n}\right\}_{j=1}^{k}$ to obtain that for any $\boldsymbol{\gamma}\geq\mathbf{0}$ , there exists a test $P_{Z|X^{n}}$ for which

[TABLE]

Take any $\boldsymbol{\gamma}$ such that

[TABLE]

where $B$ is the constant on the right side of (81) for $\mathbf{I}_{n}$ , which is finite under assumptions (78) and (79). Applying Lemma 12 to (87) gives

[TABLE]

where $\mathbf{Z}\sim\mathcal{N}(\mathbf{0},\mathsf{V})$ and matrix $\mathsf{V}$ is defined in (76). Applying Lemma 7 to (88) gives

[TABLE]

where

[TABLE]

is a finite positive constant by the assumptions in (78) and (79). Plugging (89) into (93) and noting (92) gives

[TABLE]

where (96) follows from Lemma 13-82.

For the converse, recall from Lemma 9 that if $\epsilon=1-\alpha$ , then any $\boldsymbol{\beta}\in\beta_{\alpha}\left(P,\{Q_{j}\}_{j=1}^{k}\right)$ must satisfy

[TABLE]

for all $\gamma_{j}\geq 0$ , $j\in[k]$ . Take

[TABLE]

Then, (97) becomes

[TABLE]

where (100) applies Lemma 12 and $B$ is the constant in the right side of (81). By the definition of $\mathscr{Q}_{\rm inv}\left(\mathsf{V},\epsilon\right)$ in (77), (100) implies that

[TABLE]

Applying Lemma 13-83, we conclude from (101) that

[TABLE]

∎

V Multiple Access Source Coding

To simplify notation, we focus on MASCs with two encoders. Our definitions and results generalize to more than two encoders, as briefly noted in Remark 12 below.

V-A Definitions

In a MASC [11], also known as a Slepian-Wolf source code, independent encoders compress a pair of random variables $(X_{1},X_{2})$ with discrete alphabets $\mathcal{X}_{1}$ and $\mathcal{X}_{2}$ . Encoder $i$ , $i\in[2]$ , observes only $X_{i}$ , which it maps to a codeword in $[M_{i}]$ ; a single decoder jointly decodes the pair of codewords to reconstruct $(X_{1},X_{2})$ . We first define codes for abstract random objects and then particularize to random objects that live in an alphabet endowed with a Cartesian product structure.

Definition 5 (MASC).

An $(M_{1},M_{2},\epsilon)$ MASC for random variables $(X_{1},X_{2})$ with discrete alphabets $\mathcal{X}_{1}$ and $\mathcal{X}_{2}$ comprises two encoding functions $\mathsf{f}_{1}\colon\mathcal{X}_{1}\rightarrow[M_{1}]$ and $\mathsf{f}_{2}\colon\mathcal{X}_{2}\rightarrow[M_{2}]$ and a decoding function, $\mathsf{g}\colon[M_{1}]\times[M_{2}]\rightarrow\mathcal{X}_{1}\times\mathcal{X}_{2}$ with error probability $\mathbb{P}\left[\mathsf{g}(\mathsf{f}_{1}(X_{1}),\mathsf{f}_{2}(X_{2}))\neq(X_{1},X_{2})\right]\leq\epsilon$ .

In block coding, encoders individually observe $X_{1}^{n}$ and $X_{2}^{n}$ drawn from distribution $P_{X_{1}^{n}X_{2}^{n}}$ on $\mathcal{X}_{1}^{n}\times\mathcal{X}_{2}^{n}$ . Our block MASC definition is similar to those in [13] and [14].

Definition 6 (Block MASC).

An $(n,M_{1},M_{2},\epsilon)$ MASC is an $(M_{1},M_{2},\epsilon)$ MASC for random vectors $(X_{1}^{n},X_{2}^{n})$ on $\mathcal{X}_{1}^{n}\times\mathcal{X}_{2}^{n}$ . The code rate $\mathbf{R}=(R_{1},R_{2})$ is given by

[TABLE]

Definition 7 ( $(n,\epsilon)$ -rate region).

Rate $\mathbf{R}=(R_{1},R_{2})$ is $(n,\epsilon)$ -achievable if there exists an $(n,M_{1},M_{2},\epsilon)$ MASC with $R_{1}\leq\frac{1}{n}\log M_{1}$ and $R_{2}\leq\frac{1}{n}\log M_{2}$ . The $(n,\epsilon)$ -rate region $\mathscr{R}^{*}(n,\epsilon)$ is the closure of the set of $(n,\epsilon)$ -achievable rate pairs.

While definitions 6 and 7 apply to arbitrary discrete random variables $(X_{1i},X_{2i})$ , $i=1,2,\ldots$ , with transition probability kernels $P_{(X_{1}X_{2})_{i}|(X_{1}X_{2})^{i-1}}$ , our asymptotic analysis focuses on stationary, memoryless sources, where $P_{(X_{1}X_{2})_{i}|(X_{1}X_{2})^{i-1}}=P_{X_{1}X_{2}}$ for all $i=1,2,\ldots$

For any rate $\mathbf{R}=(R_{1},R_{2})$ and distribution $P_{X_{1}X_{2}}$ , define

[TABLE]

V-B Background

In [11], Slepian and Wolf prove that if $(X_{1}^{n},X_{2}^{n})$ are stationary and memoryless, then for every $\epsilon\in(0,1)$ ,

[TABLE]

(i.e., the strong converse holds). We call this region the asymptotic MASC rate region.

In [12], Miyake and Kanaya give achievability and converse bounds for finite-blocklength coding on finite-alphabet sources. In [9], Han gives corresponding results for sources with countable alphabets. While these results are stated in [9] for general sources whose alphabets adopt $n$ -fold Cartesian product structures, we here describe them in an abstract form.

Theorem 14 (Achievability, Han [9, Lemma 7.2.1]).

Given discrete random variables $(X_{1},X_{2})$ , there exists an $(M_{1},M_{2},\epsilon)$ MASC satisfying

[TABLE]

where $\gamma>0$ is an arbitrary constant.

Theorem 15 (Converse, Han [9, Lemma 7.2.2]).

Any $(M_{1},M_{2},\epsilon)$ MASC on discrete random variables $(X_{1},X_{2})$ satisfies

[TABLE]

where $\gamma>0$ is an arbitrary constant.

In [15], Jose and Kulkarni derive a new linear programming (LP) finite-blocklength converse, tightening the bound in Theorem 15 with an extra non-negative term (see [15, Cor. 13]).

Theorem 16 (LP-based converse, [15, Th. 12]).

Any $(M_{1},M_{2},\epsilon)$ MASC on discrete random variables $(X_{1},X_{2})$ satisfies

[TABLE]

where the supremum is over $\phi_{1},\phi_{2},\phi_{3}:\mathcal{X}_{1}\times\mathcal{X}_{2}\rightarrow[0,1]$ such that $0\leq\phi_{1}(x_{1},x_{2}),\,\phi_{2}(x_{1},x_{2}),\,\phi_{3}(x_{1},x_{2})\leq P_{X_{1}X_{2}}(x_{1},x_{2})$ for all $(x_{1},x_{2})\in\mathcal{X}_{1}\times\mathcal{X}_{2}$ .

The best prior asymptotic expansion of the MASC rate region is the second-order characterization developed independently in [13, 14]. In [13], Tan and Kosut introduce an entropy dispersion matrix, which serves a role similar to the scalar dispersion in the point-to-point case [5, 6, 23].

Definition 8 (Tan and Kosut [13, Def. 7]).

The entropy dispersion matrix $\mathsf{V}$ for random variables $(X_{1},X_{2})$ is the covariance matrix $\mathsf{V}\triangleq\emph{Cov}\left[\overline{\boldsymbol{\imath}}(X_{1},X_{2})\right]$ of random vector

[TABLE]

Note that $\mathsf{V}$ is a $3\times 3$ positive-semidefinite matrix with $V(X_{1}|X_{2})$ , $V(X_{2}|X_{1})$ , and $V(X_{1},X_{2})$ on the diagonal.

Tan and Kosut [13] give a second-order characterization of the MASC rate region for finite-alphabet stationary, memoryless sources in terms of the asymptotic rate region and the entropy dispersion matrix. Their result, reproduced below, exhibits an $O\big{(}\frac{\log n}{n}\big{)}$ gap in the third-order term.

Define

[TABLE]

where $\overline{\mathbf{R}}$ and $\overline{\mathbf{H}}$ are defined in (104), $\mathsf{V}$ is the entropy dispersion matrix for $(X_{1},X_{2})$ (Definition 8), $\nu\triangleq|\mathcal{X}_{1}||\mathcal{X}_{2}|+\kappa+\frac{3}{2}$ , and $\kappa$ is the absolute finite positive constant from [13, Def. 6].

Theorem 17 (Tan and Kosut [13, Th. 1]).

Consider finite-alphabet, stationary, memoryless sources $(X_{1},X_{2})$ with $P_{X_{1}X_{2}}(x_{1},x_{2})>0$ for every $(x_{1},x_{2})\in\mathcal{X}_{1}\times\mathcal{X}_{2}$ . For any $0<\epsilon<1$ and all $n$ sufficiently large,

[TABLE]

*Remark 7**.*

The inner boundary defined in (LABEL:eq-sw-tk-1) is achievable by a universal coding scheme [13, Sec. VI]. The outer bounding region in (LABEL:eq-sw-tk-2) is based on [9, Lemma 7.2.2].

In [14], Nomura and Han use [9, Lemma 7.2.1] and [9, Lemma 7.2.2] to derive a second-order MASC coding theorem for stationary, memoryless, dependent sources. Their result is equivalent to Theorem 17 up to the second-order term and applies also for countable alphabets. Neither [13] nor [14] finds the precise third-order term. In Sections V-C and V-D, below, we give new non-asymptotic MASC bounds and then apply them to precisely characterize the third-order asymptotics.

V-C New Non-Asymptotic Bounds

V-C1 Achievability

We present a MASC RCU bound, extending Theorem 4 to the multiple-encoder case.

Theorem 18 (MASC RCU bound).

Given discrete random variables $(X_{1},X_{2})$ , there exists an $(M_{1},M_{2},\epsilon)$ MASC with

[TABLE]

where

[TABLE]

Proof.

For every $x_{i}\in\mathcal{X}_{i}$ , $i\in[2]$ , draw $\mathsf{F}_{i}(x_{i})$ i.i.d. uniformly at random from $[M_{i}]$ . The maximum likelihood decoder is defined for each $(c_{1},c_{2})\in[M_{1}]\times[M_{2}]$ by

[TABLE]

where ties are broken equiprobably at random in the code design. This decoder is optimal for the given encoder.

We bound the random code’s expected error probability by the probability of the union of events

[TABLE]

By a derivation similar to that in the proof of Theorem 4,

[TABLE]

and (124) is equal to the right side of (113) as desired. ∎

Figure 4 in Section V-D1 plots the point-to-point (Theorem 4) and MASC (Theorem 18) RCU bounds.

V-C2 Converse

The MASC composite hypothesis testing converse employs the set $\beta_{\alpha}\left(P,\{Q_{j}\}_{j=1}^{k}\right)$ (see Definition 4) and its generalization to $\sigma$ -finite measures.

Theorem 19 (Hypothesis testing (HT) converse).

Let $P_{X_{1}X_{2}}$ be the source distribution defined on $\mathcal{X}_{1}\times\mathcal{X}_{2}$ . Let $Q^{(1)}_{X_{1}X_{2}}$ , $Q^{(2)}_{X_{1}X_{2}}$ , and $Q^{(3)}_{X_{1}X_{2}}$ be any $\sigma$ -finite measures defined on $\mathcal{X}_{1}\times\mathcal{X}_{2}$ . Any $(M_{1},M_{2},\epsilon)$ MASC satisfies

[TABLE]

where

[TABLE]

Proof.

Consider an $(M_{1},M_{2},\epsilon)$ MASC with stochastic encoders $P_{F_{1}|X_{1}}$ and $P_{F_{2}|X_{2}}$ and stochastic decoder $P_{\hat{X}_{1}\hat{X}_{2}|F_{1}F_{2}}$ , where $F_{1}$ and $F_{2}$ are the encoder outputs, and $(\hat{X}_{1},\hat{X}_{2})$ is the decoder output. Fix distributions $\{Q^{(j)}_{X_{1}X_{2}}\}_{j=1}^{3}$ on $\mathcal{X}_{1}\times\mathcal{X}_{2}$ . Then $Z=1\big{\{}(\hat{X}_{1},\hat{X}_{2})=({X}_{1},{X}_{2})\big{\}}$ defines a (sub-optimal) composite HT for testing $P_{X_{1}X_{2}}$ against $\{Q^{(j)}_{X_{1}X_{2}}\}_{j=1}^{3}$ , for which $\mathbb{P}\left[Z=1\right]\geq 1-\epsilon$ and

[TABLE]

where (130) follows since $\max\limits_{\hat{x}_{1}\in\mathcal{X}_{1}}Q^{(1)}_{X_{1}X_{2}}(\hat{x}_{1},x_{2})$ is independent of $x_{1}$ , and (131) follows by bounding the probability in the sum over $x_{1}\in\mathcal{X}_{1}$ by 1. Similarly,

[TABLE]

Thus (19) holds by the definition of $\beta_{1-\epsilon}\left(P,\{Q^{(j)}\}_{j=1}^{k}\right)$ . ∎

To recover Han’s converse (Theorem 15) from Theorem 19, let $P_{X_{1}}$ and $P_{X_{2}}$ be the marginals of $P_{X_{1}X_{2}}$ and let $U_{X_{1}}$ , $U_{X_{2}}$ , and $U_{X_{1}X_{2}}$ be the counting measures over $\mathcal{X}_{1}$ , $\mathcal{X}_{2}$ , and $\mathcal{X}_{1}\times\mathcal{X}_{2}$ . By Theorem 19, any $(M_{1},M_{2},\epsilon)$ MASC satisfies

[TABLE]

Applying Lemma 9 to (136) with $k=3$ gives

[TABLE]

Setting $\gamma_{1}=\frac{\exp\left(-\gamma\right)}{M_{1}}$ , $\gamma_{2}=\frac{\exp\left(-\gamma\right)}{M_{2}}$ , and $\gamma_{3}=\frac{\exp\left(-\gamma\right)}{M_{1}M_{2}}$ for an arbitrary $\gamma>0$ recovers Theorem 15.

To show that Theorem 19 is equivalent to the LP-based converse (Theorem 16), we apply (IV-A) to Theorem 19, showing that any $(M_{1},M_{2},\epsilon)$ MASC satisfies

[TABLE]

The outer supremum is over $\sigma$ -finite measures $Q^{(1)}_{X_{1}X_{2}}$ , $Q^{(2)}_{X_{1}X_{2}}$ , and $Q^{(3)}_{X_{1}X_{2}}$ . In Appendix F, we show that the bounds in (138) and (108) are equivalent, establishing the equivalence between the MASC HT (Theorem 19) and LP (Theorem 16) converses.

*Remark 8**.*

When one of the sources is deterministic, the MASC HT converse reduces to the point-to-point HT converse [23, Eq. (64)]. For example, if $X_{2}$ is deterministic, then (136) reduces to

[TABLE]

which further reduces to

[TABLE]

where $\beta_{\alpha}(P,Q)$ is the optimal $\beta$ -function for binary hypothesis testing between distributions $P$ and $Q$ .

V-D Asymptotics: Third-Order MASC Rate Region

The following third-order asymptotic characterization of the MASC rate region for stationary, memoryless sources closes the $O\big{(}\frac{\log n}{n}\big{)}$ gap between (LABEL:eq-sw-tk-1) and (LABEL:eq-sw-tk-2).

Consider stationary, memoryless sources with single-letter joint distribution $P_{X_{1}X_{2}}$ for which

[TABLE]

When (139) holds666In fact, the weaker condition $V(X_{1}|X_{2})>0$ , $V(X_{2}|X_{1})>0$ , $V(X_{1},X_{2})>0$ suffices., $\text{rank}(\mathsf{V})\geq 1$ . Technical assumptions (139), (140), and (141) are required to ensure applicability of the multidimensional Berry-Esseen theorem and Lemma 7 in our asymptotic analysis. Assumption (140) is satisfied automatically if the alphabets $\mathcal{X}_{1}$ and $\mathcal{X}_{2}$ are finite.

Define the set

[TABLE]

where vector $\overline{\mathbf{H}}$ is defined in (104), $\mathsf{V}$ is the entropy dispersion matrix for $(X_{1},X_{2})$ , and $\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon)$ is defined in (77). Note that $\mathscr{R}^{*}(n,\epsilon)\subset\mathbb{R}^{2}$ (see Definition 7) but $\overline{\mathscr{R}}^{*}(n,\epsilon)\subset\mathbb{R}^{3}$ . Define the inner and outer bounding sets

[TABLE]

Theorem 20 (Third-order MASC rate region).

Consider a pair of stationary, memoryless sources with single-letter joint distribution $P_{X_{1}X_{2}}$ satisfying (139)–(141). For any $0<\epsilon<1$ , the $(n,\epsilon)$ -rate region $\mathscr{R}^{*}(n,\epsilon)$ satisfies

[TABLE]

Since the upper and lower bounds in Theorem 20 agree up to their third-order terms, we call $\overline{\mathscr{R}}^{*}(n,\epsilon)$ the third-order MASC rate region. Figure 3 plots the boundaries of $\overline{\mathscr{R}}^{*}(n,\epsilon)$ at different values of $n$ for an example pair of sources.

*Remark 9**.*

As noted in Remark 2, for point-to-point source coding, zero varentropy means that the source is uniform; the $-\frac{\log n}{2n}$ third-order term is absent in that case. While condition (139) limits Theorem 20 to sources with positive varentropies, Appendix G considers the case where one or more varentropies are zero. Roughly, each zero varentropy yields a zero dispersion, and the absence of a $-\frac{\log n}{2n}$ third-order term, similar to the point-to-point case. Furthermore, if $V(X_{1}|X_{2})>0$ but $\mathbb{E}\left[V_{c}(X_{1}|X_{2})\right]=0$ , the corresponding achievable third order term increases from $-\frac{\log n}{2n}$ to [math].777This is seen by modifying the reasoning in (172)–(187) in the proof of Theorem 20 below. This means that the optimal third order term lies in $[-\frac{\log n}{2n},0]$ in that case.

Proof of Theorem 20: achievability.

We apply Theorem 18 to stationary, memoryless sources with $P_{X_{1}^{n}X_{2}^{n}}=P_{X_{1}X_{2}}^{n}$ and then apply Lemmas 7 and 12 to analyze the bound. Let

[TABLE]

where $(X_{1i},X_{2i},\bar{X}_{1i},\bar{X}_{2i},\bar{X}_{1i}^{\prime},\bar{X}_{2i}^{\prime})$ , $i=1,\ldots,n$ , are drawn i.i.d. according to the joint distribution defined in (18). With this notation, the random variables $A_{1}$ , $A_{2}$ , $A_{12}$ defined in (114), (115), (116) particularize as

[TABLE]

By Theorem 18, there exists an $(n,M_{1},M_{2},\epsilon^{\prime})$ MASC such that

[TABLE]

To bound each of the terms in (157), we first bound the random variables $A_{1}$ , $A_{2}$ , and $A_{12}$ by random variables $\bar{A}_{1}$ , $\bar{A}_{2}$ , and $\bar{A}_{12}$ that are easier to work with.

Denote constants

[TABLE]

that are finite by assumptions (139) and (140). Define

[TABLE]

for $x_{1}^{n}\in\mathcal{X}_{1}^{n}$ . Define $V_{2}(x_{2}^{n})$ and $T_{2}(x_{2}^{n})$ for $x_{2}^{n}\in\mathcal{X}_{2}^{n}$ analogously.

Applying Lemma 7 to $A_{12}$ yields

[TABLE]

To bound $A_{1}$ , we consider the cases $V_{2}(x_{2}^{n})>0$ and $V_{2}(x_{2}^{n})=0$ separately. If $V_{2}(x_{2}^{n})>0$ , then

[TABLE]

is finite by assumption (140), and Lemma 7 yields

[TABLE]

If $V_{2}(x_{2}^{n})=0$ , then $I_{1}=\bar{I}_{1}=H(X_{1}^{n}|X_{2}^{n}=x_{2}^{n})$ irrespective of the realization of $X_{1}^{n}$ , and

[TABLE]

Putting (165) and (166) together yields

[TABLE]

Similarly,

[TABLE]

where $K_{2}(x_{1}^{n})$ is defined analogously to (164).

Next, we apply Lemma 7 again to further bound each of the first three terms in (157):

[TABLE]

We proceed to bound the last term in (157). For fixed constants $s_{1}<\mathbb{E}\left[V_{c}(X_{2}|X_{1})\right]$ and $s_{2}<\mathbb{E}\left[V_{c}(X_{1}|X_{2})\right]$ , define the events $\mathcal{S}_{1}$ and $\mathcal{S}_{2}$ that $X_{1}^{n}$ and $X_{2}^{n}$ are typical, respectively:

[TABLE]

Note that

[TABLE]

where

[TABLE]

are both finite by the assumptions in (139) and (140).

Applying the union bound to $\mathbb{P}\left[\mathcal{S}^{c}_{k}\right]$ , $k\in\{1,2\}$ , and Chebyshev’s inequality

[TABLE]

to both terms, we observe that for each $k\in\{1,2\}$ ,

[TABLE]

where

[TABLE]

are finite by assumption (141).

We are now prepared to apply Lemma 12 to the last term in (157). Pick any pair of rates $(R_{1},R_{2})$ satisfying

[TABLE]

where the set $\overline{\mathscr{R}}^{*}(n,\epsilon)$ is defined in (142), $\mathbf{C}\triangleq\left(\log(3\bar{K}_{1}),\log(3\bar{K}_{2}),\log(3K_{12})\right)^{T}$ , and $B$ is the Bentkus constant in the right-side of (81) for zero-mean i.i.d. random vectors

[TABLE]

Note that $B<\infty$ by assumption (140). We have

[TABLE]

where (185) applies (174) and $\mathbb{P}[\mathcal{A}\cap\mathcal{B}]\geq\mathbb{P}[\mathcal{A}]-\mathbb{P}[\mathcal{B}^{c}]$ , and (187) applies (182), Lemma 12 and (179).

Substituting (169), (170), (171), and (187) into (157) yields $\epsilon^{\prime}\leq\epsilon$ , and the proof is complete since the set of $(R_{1},R_{2})$ satisfying (182) contains $\mathscr{R}_{\rm in}^{*}(n,\epsilon)$ by Lemma 13-82. ∎

Proof of Theorem 20: converse.

We invoke Theorem 19 with $P_{X_{1}X_{2}}=P_{X_{1}X_{2}}^{n}$ , $Q^{(1)}_{X_{1}X_{2}}=U_{X_{1}}^{n}P_{X_{2}}^{n}$ , $Q^{(2)}_{X_{1}X_{2}}=P_{X_{1}}^{n}U_{X_{2}}^{n}$ , and $Q^{(3)}_{X_{1}X_{2}}=U_{X_{1}X_{2}}^{n}$ , where $P_{X_{1}}$ and $P_{X_{2}}$ are the marginals of $P_{X_{1}X_{2}}$ , and $U_{X_{1}}$ , $U_{X_{2}}$ , and $U_{X_{1}X_{2}}$ are the counting measures over $\mathcal{X}_{1}$ , $\mathcal{X}_{2}$ , and $\mathcal{X}_{1}\times\mathcal{X}_{2}$ . Applying Theorem 11 to $\beta_{1-\epsilon}\left(P_{X_{1}X_{2}},\left\{U_{X_{1}}P_{X_{2}},P_{X_{1}}U_{X_{2}},U_{X_{1}X_{2}}\right\}\right)$ under the assumptions in (139) and (140), we conclude that in order to attain error probability $\epsilon$ , $M_{1}$ and $M_{2}$ must satisfy

[TABLE]

which is equivalent to $(R_{1},R_{2})\in\mathscr{R}_{\rm out}^{*}(n,\epsilon)$ (144), as desired. ∎

*Remark 10**.*

The converse of Theorem 20 can also be proved using Han’s converse (Theorem 15) with $\gamma=\frac{\log n}{2}$ and Lemmas 12 and 13 in a way similar to that in the achievability proof above, except that we would use Lemma 13 to bound $\mathscr{Q}_{\rm inv}\left(\mathsf{V},\epsilon+O\left(\frac{1}{\sqrt{n}}\right)\big{)}\subseteq\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon)-O\left(\frac{1}{\sqrt{n}}\right)\right)\mathbf{1}$ instead of $\mathscr{Q}_{\rm inv}\left(\mathsf{V},\epsilon-O\left(\frac{1}{\sqrt{n}}\right)\right)$ . Our HT converse (Theorem 19) is stronger than Han’s converse, but the gap is in the fourth- or higher-order terms, as illustrated through computation in Figure 4. Han’s achievability bound (Theorem 14) with the third-order optimal choice of $\gamma=\frac{\log n}{2}$ leads to the third order term of $+\frac{\log n}{2n}$ instead of $-\frac{\log n}{2n}$ . Thus Han’s achievability is weaker than our RCU bound (Theorem 18) in the third order term.

*Remark 11**.*

Tan and Kosut’s converse (Theorem 17) is also based on Han’s converse. Instead of deriving an outer bound on $\mathscr{Q}_{\rm inv}\left(\mathsf{V},\epsilon+O\left(\frac{1}{\sqrt{n}}\right)\right)$ as given in Lemma 13, they apply the multivariate Taylor approximation to expand the probability, giving a bound that is loose in the third-order term.

*Remark 12**.*

Theorem 20 generalizes to any finite number of encoders. Let $\mathcal{T}\subset\mathbb{N}$ be a nonempty ordered set with a unique index for each encoder. For any vector $\mathbf{R}_{\mathcal{T}}\in\mathbb{R}^{|\mathcal{T}|}$ , define the $\left(2^{|\mathcal{T}|}-1\right)$ -dimensional vector of its partial sums as

[TABLE]

For any distribution $P_{\mathbf{X}_{\mathcal{T}}}$ defined on $\mathcal{X}_{\mathcal{T}}$ and any $\mathbf{x}_{\mathcal{T}}\in\mathcal{X}_{\mathcal{T}}$ , define $\left(2^{|\mathcal{T}|}-1\right)$ -dimensional vectors

[TABLE]

and $\left(2^{|\mathcal{T}|}-1\right)\times\left(2^{|\mathcal{T}|}-1\right)$ entropy dispersion matrix

[TABLE]

for random vector $\mathbf{X}_{\mathcal{T}}$ . Define set

[TABLE]

Thus, $\mathscr{R}_{\mathcal{T}}^{*}(n,\epsilon)\subset\mathbb{R}^{|\mathcal{T}|}$ while $\overline{\mathscr{R}}_{\mathcal{T}}^{*}(n,\epsilon)\subset\mathbb{R}^{2^{|\mathcal{T}|}-1}$ . Finally,

[TABLE]

If every element of $\overline{\boldsymbol{\imath}}_{\mathcal{T}}(\mathbf{X}_{\mathcal{T}})$ has a positive variance and a finite third centered moment, then for any $0<\epsilon<1$ ,

[TABLE]

V-D1 Comparison with Point-to-Point Source Coding

Figure 4 compares joint (point-to-point) compression of $(X_{1}^{n},X_{2}^{n})$ to the MASC sum-rate at the symmetrical rate point ( $R_{1}=R_{2}$ ). The gap between the MASC and point-to-point HT converses captures a penalty due to separate encoding. For small $n$ , the third-order Gaussian approximation (without the $O\left(\frac{1}{n}\right)$ term) is more accurate at $\epsilon=10^{-1}$ than at $\epsilon=10^{-3}$ since the $O\left(\frac{1}{n}\right)$ term blows up as $\epsilon$ approaches [math].

It is well-known that optimal MASCs incur no first-order penalty in achievable sum rate when compared to joint coding [11, 12, 9]. We next investigate the higher-order penalty of the MASC’s independent encoders.

Tan and Kosut introduce a quantity known as the local dispersion [13, Def. 4] to characterize the second-order speed of convergence to any asymptotic MASC rate point from any direction. For any non-corner point on the diagonal boundary of the asymptotic MASC rate region, the sum rate’s second-order coefficient is optimal when approached either vertically or horizontally. Approaching corner points incurs a positive second-order penalty relative to point-to-point coding.

Two corollaries of Theorem 20, below, bound the MASC penalty by considering the achievable sum rate $R_{1}+R_{2}$ for different choices of $R_{1}$ and $R_{2}$ . We treat the cases where $X_{1}$ and $X_{2}$ are dependent and $X_{1}$ and $X_{2}$ are independent separately, assuming throughout that (139) and (140) hold.

When $X_{1}$ and $X_{2}$ are dependent, $H(X_{1})+H(X_{2})>H(X_{1},X_{2})>H(X_{1}|X_{2})+H(X_{2}|X_{1})$ , and the asymptotic sum-rate boundary contains non-corner and corner points. Corollary 21, below, shows that a MASC incurs no first-, second-, or third-order performance penalty relative to joint coding at non-corner points (i.e., when $R_{1}<H(X_{1})$ and $R_{2}<H(X_{2})$ ); in contrast, a MASC suffers a second-order performance penalty at corner points (i.e., when $R_{1}=H(X_{1})$ or $R_{2}=H(X_{2})$ ). See Figure 5LABEL:sub@fig-sw-sumrate-1 for an illustration.

Corollary 21.

Suppose that $X_{1}$ and $X_{2}$ are dependent.

Fix constants $\delta_{1},\delta_{2},G>0$ and $\epsilon\in(0,1)$ . Then there exists some constant $n(\delta_{1},\delta_{2},G)$ such that if

[TABLE]

then $\mathbf{R}=(R_{1},R_{2})\in\overline{\mathscr{R}}^{*}(n,\epsilon)$ for all $n>n(\delta_{1},\delta_{2},G)$ . 2. 2.

Fix $\epsilon\in(0,1)$ . If

[TABLE]

*for some $G>0$ , then $\mathbf{R}=(H(X_{1}),R_{2})\in\overline{\mathscr{R}}^{*}(n,\epsilon)$ . Conversely, if $\mathbf{R}=(H(X_{1}),R_{2})\in\overline{\mathscr{R}}^{*}(n,\epsilon)$ , then *

[TABLE]

where $r^{*}$ is the solution to equation

[TABLE]

and $\mathsf{V}_{2}$ is the covariance matrix for random vector $(\imath(X_{2}|X_{1}),\imath(X_{1},X_{2}))$ .

Proof.

Appendix H. ∎

For independent sources, the asymptotic sum-rate boundary contains only the single (corner) point $(R_{1},R_{2})=(H(X_{1}),H(X_{2}))$ , and the entropy dispersion matrix

[TABLE]

is singular.

The next result concerns the third-order-optimal sum rate

[TABLE]

According to Theorem 20, $\overline{R}^{*}_{\rm sum}(n,\epsilon)$ characterizes the best achievable sum rate in SW source coding up to an $O\left(\frac{1}{n}\right)$ gap.

Corollary 22.

For $X_{1},X_{2}$ independent and $\epsilon\in(0,1)$ ,

[TABLE]

which is achieved by $\mathbf{R}=(R_{1},R_{2})$ with

[TABLE]

for any $\lambda\in[0,1]$ and

[TABLE]

Proof.

Appendix I. ∎

By Corollary 22, for independent sources a unique $(r_{1}^{*},r_{2}^{*})$ captures the best MASC second-order sum-rate; the third-order term is achieved at all points on a segment of the rate region boundary. See Figure 5LABEL:sub@fig-sw-sumrate-2. Under assumption (139),

[TABLE]

where $V(X_{1})+V(X_{2})=V(X_{1},X_{2})$ for $(X_{1},X_{2})$ independent. Here (208) follows since its left-hand side solves

[TABLE]

and the constraint in (209) requires $a_{1}>\sqrt{V(X_{1})}Q^{-1}(\epsilon)$ and $a_{2}>\sqrt{V(X_{2})}Q^{-1}(\epsilon)$ , which gives the bound since

[TABLE]

Therefore, when $X_{1}$ and $X_{2}$ are independent, a MASC incurs a positive second-order sum-rate penalty relative to joint coding. Closed-form expressions for this penalty are available in special cases. When $V(X_{1})=V(X_{2})$ , $r_{1}^{*}=r_{2}^{*}=Q^{-1}\left(1-\sqrt{1-\epsilon}\right)$ , and the penalty is

[TABLE]

When $X_{1}$ and $X_{2}$ are i.i.d., the penalty equals the penalty for coding a vector $X^{2n}$ of $2n$ i.i.d. outputs from $P_{X}$ by applying an independent $(n,\epsilon)$ (point-to-point) code with error probability $1-\sqrt{1-\epsilon}$ to each of $(X_{1},\ldots,X_{n})$ and $(X_{n+1},\ldots,X_{2n})$ instead of a single $(2n,\epsilon)$ code to vector $X^{2n}$ .

V-E Limited Feedback and Cooperation

The RASC proposed in Section VI employs limited feedback. We here analyze the impact of feedback on the underlying MASC. In our feedback model, the decoder broadcasts the same $\ell$ bits of feedback to both encoders. A bit sent at time $i$ must be a function of the encoder outputs received in time steps $1,\ldots,i-1$ . (See Figure 6LABEL:sub@fig-fb.) We bound the impact of feedback by studying a MASC with a *cooperation facilitator (CF).*888The CF is introduced for multiple access channel coding in [36] and extended to source and network coding in [37]. The CF broadcasts the same $\ell$ -bit function of the sources to both encoders prior to their encoding operations. (See Figure 6LABEL:sub@fig-cf.) Since the MASC network has no channel noise, feedback from the decoder cannot convey more information than feedback from the CF. As a result, we bound the impact of feedback by bounding the impact of cooperation, which is easier to work with in our analysis.

We begin by defining the CF-MASC and its rate region.

Definition 9 (CF-MASC).

An $(L,M_{1},M_{2},\epsilon)$ CF-MASC for random variables $(X_{1},X_{2})$ on $\mathcal{X}_{1}\times\mathcal{X}_{2}$ comprises a CF function $\mathsf{L}$ , two encoding functions $\mathsf{f}_{1}$ and $\mathsf{f}_{2}$ , and a decoding function $\mathsf{g}$ given by

[TABLE]

with error probability

[TABLE]

Definition 10 (Block CF-MASC).

An $(n,L,M_{1},M_{2},\epsilon)$ MASC is a CF-MASC for random variables $(X_{1}^{n},X_{2}^{n})$ on $\mathcal{X}_{1}^{n}\times\mathcal{X}_{2}^{n}$ .

The code’s finite blocklength rates are defined by

[TABLE]

Definition 11 ( $(n,\ell,\epsilon)$ -CF rate region).

A rate pair $(R_{1},R_{2})$ is $(n,\ell,\epsilon)$ -CF achievable if there exists an $(n,L,M_{1},M_{2},\epsilon)$ CF-MASC with $M_{1}\leq\exp(nR_{1})$ , $M_{2}\leq\exp(nR_{2})$ , and $L\leq\exp(\ell)$ . The $(n,\ell,\epsilon)$ -CF rate region $\mathscr{R}_{\rm CF}^{*}(n,\ell,\epsilon)$ is defined as the closure of the set of all $(n,\ell,\epsilon)$ -CF achievable rate pairs.

We use $\mathscr{R}_{\rm FB}^{*}(n,\ell,\epsilon)$ to denote the feedback-MASC (FB-MASC) rate region, which is defined as the closure of the set of all $(n,\epsilon)$ -achievable rate pairs when the same $\ell$ bits of feedback from the decoder are available to both encoders.

Since the CF sees the source vectors while the decoder sees a coded description of those vectors (using a deterministic code), an $\ell$ -bit CF can implement any function used to determine the decoder’s $\ell$ -bit feedback. As a result, any rate point that is achievable by an $\ell$ -bit FB-MASC is also achievable by an $\ell$ -bit CF-MASC. Therefore, for any $0<\epsilon<1$ and $\ell<\infty$ ,

[TABLE]

Theorem 23 bounds CF-MASC (and FB-MASC) performance, showing that for any $\ell<\infty$ , the third-order rate region for $\ell$ -bit CF-MASCs cannot exceed the corresponding MASC rate region. Hence finite feedback does not enlarge the third-order $(n,\epsilon)$ MASC rate region. This result generalizes to scenarios with more than two encoders.

Theorem 23 (CF-MASC Converse).

Consider stationary, memoryless sources with single-letter distribution $P_{X_{1}X_{2}}$ satisfying (139) and (140). For any $0<\epsilon<1$ and $\ell<\infty$ ,

[TABLE]

Thus $\mathscr{R}_{\rm CF}^{*}(n,\ell,\epsilon)$ and $\overline{\mathscr{R}}^{*}(n,\epsilon)$ share the same outer bound.

Proof.

Appendix J. ∎

*Remark 13**.*

The same proof can be used to show that allowing $\ell$ to grow as $o\left(\log\log n\right)$ does not change the first three terms in the optimal characterization of the $(n,\epsilon)$ -MASC.

*Remark 14**.*

For dependent sources, the optimal third-order MASC sum rate equals the optimal third-order sum rate with full cooperation. (See the discussion in Section V-D1, above.). Since even an infinite amount of decoder feedback is weaker than full cooperation, an infinite amount of feedback does not improve the third-order sum rate in this case.

VI Random Access Source Code (RASC)

An RASC is a generalization of an MASC for networks where the set of participating encoders is unknown to both the encoders and the decoder a priori. We begin by defining the problem and describing our proposed communication strategy.

VI-A Definitions and Coding Strategy

Let $K<\infty$ be the maximal number of active encoders. We associate each encoder with a unique source from the set of sources indexed by $[K]$ . Each encoder chooses whether to be active or silent. Only sources associated with active encoders are compressed and reconstructed. By assumption, the decision to remain silent is independent of the observed source instance. Given the joint distribution $P_{\mathbf{X}_{[K]}}$ on countable alphabet $\mathcal{X}_{[K]}$ , when ordered set $\mathcal{T}\in\mathcal{P}([K])$ of $[K]$ is active, the marginal on the transmitted sources is

[TABLE]

Thus, each encoder’s state has no effect on the statistical relationship among sources observed by other encoders.

As in the random access channel code from [18], our proposed RASC organizes communication into epochs. At the beginning of each epoch, each encoder independently decides its activity state; that activity state remains unchanged until the end of the epoch. Thus, the active encoder set $\mathcal{T}$ is fixed in each epoch. Each active encoder $i\in\mathcal{T}$ observes source output $X_{i}\in\mathcal{X}_{i}$ and independently maps it to a codeword comprised of a sequence of code symbols from alphabet $[Q_{i}]$ . The $|\mathcal{T}|$ codewords are sent simultaneously to the decoder. Since set $\mathcal{T}$ is unknown a priori, the encoder behavior cannot vary with $\mathcal{T}$ . The decoder sees $\mathcal{T}$ and decides a time $m_{\mathcal{T}}$ , called the decoding blocklength, at which to jointly decode all received partial codewords. The set of potential decoding blocklengths $\mathcal{M}\triangleq\left\{m_{\mathcal{T}}:\mathcal{T}\in\mathcal{P}([K])\right\}$ is part of the code design; it is known to all encoders and to the decoder.

Figure 7 illustrates our coding scheme in one epoch when $\mathcal{T}=[k]$ . Each encoder $i\in\mathcal{T}$ sends a single code symbol per time step. At each time $m\in\left\{m^{\prime}\in\mathcal{M}:m^{\prime}<m_{\mathcal{T}}\right\}$ , the decoder sends a “0” to indicate that it is not yet ready to decode; at time $m=m_{\mathcal{T}}$ , the decoder sends a “1,” ending one epoch and starting the next. The decoder then reconstructs source vector $\mathbf{X}_{\mathcal{T}}$ using the first $m_{\mathcal{T}}$ code symbols from each active encoder. To avoid wasting time in an epoch with no active encoders, we include decoding time $m_{\emptyset}=1$ in set $\mathcal{M}$ . The decoder sends at most $2^{K}$ bits of feedback, and encoders need only listen for decoder feedback at the times in set $\mathcal{M}$ .

To formalize the above strategy, fix $K\geq 1$ . Define vectors

[TABLE]

with $m_{\emptyset}=1$ and $m_{\max}\triangleq\max\left\{m_{\mathcal{T}}:\mathcal{T}\in\mathcal{P}([K])\right\}$ .

Definition 12 (RASC).

An $\left(\overline{\mathbf{m}}_{K},\mathbf{Q}_{[K]},\overline{\boldsymbol{\epsilon}}_{K}\right)$ RASC for sources $X_{[K]}$ on source alphabet $\mathcal{X}_{[K]}$ comprises a collection of encoding and decoding functions

[TABLE]

where $\mathsf{f}_{i}$ is the encoding function for source $X_{i}$ and $\mathsf{g}_{\mathcal{T}}$ is the decoding function for active coder set $\mathcal{T}$ . For each $\mathcal{T}\in\mathcal{P}([K])$ , source vector $\mathbf{{X}}_{\mathcal{T}}$ is decoded at time $m_{\mathcal{T}}$ with error probability $\mathbb{P}\big{[}\mathsf{g}_{\mathcal{T}}\left(\mathsf{f}_{i}(X_{i})_{[m_{\mathcal{T}}]},i\in\mathcal{T}\right)\neq\mathbf{{X}}_{\mathcal{T}}\big{]}\leq\epsilon_{\mathcal{T}}$ , where $\mathsf{f}_{i}(x_{i})_{[m]}$ denotes the first $m$ code symbols of $\mathsf{f}_{i}(x_{i})$ .

Definition 13 particularizes Definition 12 to the block setting.

Definition 13 (Block RASC).

An $\left(n,\overline{\mathbf{m}}_{K},\mathbf{Q}_{[K]},\overline{\boldsymbol{\epsilon}}_{K}\right)$ RASC is an RASC for an $n$ -block of source outcomes. The parameter $n$ , called the encoding blocklength does not vary with $\mathcal{T}$ .

An $\left(\overline{\mathbf{m}}_{K},\mathbf{Q}_{[K]},\overline{\boldsymbol{\epsilon}}_{K}\right)$ RASC behaves, for each $\mathcal{T}$ , like a $\big{(}(Q_{i}^{m_{\mathcal{T}}},\,i\in\mathcal{T}),\epsilon_{\mathcal{T}}\big{)}$ MASC (see Definition 5) with a finite number $\left|\left\{m\in\mathcal{M}:m\leq m_{\mathcal{T}}\right\}\right|$ of feedback bits. However, the RASC is one code. Its descriptions are nested (i.e., for each $x_{i}\in\mathcal{X}_{i}$ , if $m_{\mathcal{T}^{\prime}}<m_{\mathcal{T}}$ , then $\mathsf{f}_{i}(x_{i})_{[m_{\mathcal{T}^{\prime}}]}$ is a prefix of $\mathsf{f}_{i}(x_{i})_{[m_{\mathcal{T}}]}$ ). It simultaneously satisfies the error constraints for all $\mathcal{T}\in\mathcal{P}([K])$ . And, since the code symbol alphabet sizes $\mathbf{Q}_{[K]}$ are fixed, its rate vectors are coupled. See Figure 8.

The following definitions build toward the non-asymptotic fundamental limit of RASCs.

Definition 14 ( $n$ -Valid and $\left(n,\overline{\boldsymbol{\epsilon}}_{K}\right)$ -Rate

sets).

A collection $\left(\mathbf{R}_{\mathcal{T}}\right)_{\mathcal{T}\in\mathcal{P}([K])}$ of rate vectors is $n$ -valid if $\exists$ $\left(\overline{\mathbf{m}}_{K},\mathbf{Q}_{[K]}\right)$ s.t.

[TABLE]

The set $\mathcal{R}_{\rm valid}(n)$ is the set of $n$ -valid rate collections. The collection is $\left(n,\overline{\boldsymbol{\epsilon}}_{K}\right)$ -achievable if there exists an $\left(n,\overline{\mathbf{m}}_{K},\mathbf{Q}_{[K]},\overline{\boldsymbol{\epsilon}}_{K}\right)$ RASC. The $\left(n,\overline{\boldsymbol{\epsilon}}_{K}\right)$ -rate set $\mathcal{R}^{*}\left(n,\overline{\boldsymbol{\epsilon}}_{K}\right)$ is the set of $\left(n,\overline{\boldsymbol{\epsilon}}_{K}\right)$ -achievable rate collections.

VI-B Background

While the concept of an RASC is new, the RASC problem is related to the universal MASC problem. Like a universal MASC, the RASC is designed for an unknown distribution from a known collection of possible distributions. In this case, the possible distributions are $\left\{P_{\mathbf{X}_{\mathcal{T}}}:\mathcal{T}\in\mathcal{P}([K])\right\}$ . The RASC differs, however, from universal MASCs since even the set of active encoders is unknown a priori.

A short summary of prior universal MASCs follows.

For a fixed-rate MASC and finite source alphabets, universal decoding can be realized using type methods. (See [13], [38], [39].) Such strategies achieve optimal performance only when the source’s MASC rate region matches the code’s fixed rate. 2. 2.

Oohama [40] and Jaggi and Effros [41] study the effect of limited encoder cooperation on the asymptotically universally achievable rate region. Rate-zero cooperation between encoders suffices to achieve universality in the asymptotic regime. Oohama characterizes the optimal error exponents in [40]. 3. 3.

Yang et al. [42] study a block MASC with progressive encoding; the code uses zero-rate feedback to universally achieve the asymptotic MASC rate region. Sarvotham et al. [43] propose a variable-rate block sequential coding scheme with zero-rate feedback for binary symmetric sources, showing that at blocklength $n$ and target error probability $\epsilon$ , the backoff from the asymptotic MASC rate due to universality is $O\left(\frac{1}{\sqrt{n}}Q^{-1}(\epsilon)\right)$ . 4. 4.

In [22], Draper introduces a rateless MASC with single-bit feedback. Draper’s algorithm asymptotically achieves the optimal coding rates for sources with unknown joint distributions but known finite alphabet sizes. See [44] for a practical rateless MASC.

VI-C Asymptotics: Third-Order Performance of the RASC

In this section, we analyze the performance of an $\left(n,\overline{\mathbf{m}}_{K},\mathbf{Q}_{[K]},\overline{\boldsymbol{\epsilon}}_{K}\right)$ RASC for stationary, memoryless sources. Results include both achievability and converse characterizations of the $\left(n,\overline{\boldsymbol{\epsilon}}_{K}\right)$ -rate set $\mathcal{R}^{*}\left(n,\overline{\boldsymbol{\epsilon}}_{K}\right)$ under the assumption that the single-letter joint source distribution $P_{\mathbf{X}_{[K]}}$ satisfies

[TABLE]

Constraints (221)–(223) enable us to use Berry-Esseen bounds. The resulting characterization is tight up to the third-order term. While the existence of an $\left(n,\overline{\mathbf{m}}_{K},\mathbf{Q}_{[K]},\overline{\boldsymbol{\epsilon}}_{K}\right)$ RASC implies the existence of an $\big{(}n,(Q_{i}^{m_{\mathcal{T}}},\,i\in\mathcal{T}),\epsilon_{\mathcal{T}}\big{)}$ MASC for each $\mathcal{T}\in\mathcal{P}([K])$ , the existence of individual MASCs does not imply the existence of a single RASC that simultaneously satisfies the error probability constraints for all possible configurations of active encoders. Indeed, the existence of a single RASC that simultaneously performs as well (up to the third-order term) as the optimal MASC for each $\mathcal{T}\in\mathcal{P}([K])$ is one of our most surprising results.

Define the inner and outer bounding sets

[TABLE]

where $\mathscr{R}_{{\rm in},\mathcal{T}}^{*}(n,\epsilon)$ and $\mathscr{R}_{{\rm out},\mathcal{T}}^{*}(n,\epsilon)$ are the third-order MASC bounding sets for distribution $P_{\mathbf{X}_{\mathcal{T}}}$ . (See (194) and (LABEL:eq-def-sw-out).)

Theorem 24 (Third-order RASC performance).

For any $K<\infty$ , consider stationary, memoryless sources specified by a single-letter joint distribution $P_{\mathbf{X}_{[K]}}$ satisfying (221)–(223). For any $\mathbf{0}<\overline{\boldsymbol{\epsilon}}_{K}<\mathbf{1}$ ,

[TABLE]

The converse and achievability proofs follow.

Proof of Theorem 24: converse.

As shown in Section V-E (Theorem 23), even with a priori knowledge of the encoder set $\mathcal{T}\in\mathcal{P}([K])$ and $2^{K}$ bits of feedback, a MASC for the encoders in set $\mathcal{T}$ cannot achieve performance outside of the third-order MASC outer bounding set $\mathscr{R}_{{\rm out},\mathcal{T}}^{*}(n,\epsilon_{\mathcal{T}})$ . ∎

The achievability part of Theorem 24 provides a sufficient condition for the existence of a single RASC that is simultaneously good for all $\mathcal{T}\in\mathcal{P}([K])$ . To prove this, we first derive an achievability result assuming that the encoders and decoder share the common randomness used to generate a random code (Theorem 26). Unfortunately, the existence of a random code ensemble with expected error probability satisfying the error probability constraint for each $\mathcal{T}\in\mathcal{P}([K])$ does not guarantee the existence of a single deterministic code satisfying those constraints simultaneously. We therefore take a different approach, which, unexpectedly, combines a converse bound on error probability and a random coding argument to show achievability.

The following refinement of the random coding argument provides a bound on the probability (with respect to the random code choice) that the error probability of a randomly chosen code exceeds a certain threshold. The code of interest here can be any type of source or channel code.

Lemma 25.

Let $\mathcal{C}$ be any class of codes with a corresponding error probability $P_{e}(\mathsf{c})$ for each $\mathsf{c}\in\mathcal{C}$ . Let

[TABLE]

denote the error probability of the best code in $\mathcal{C}$ . Then any random code ensemble999A random code ensemble is a random variable $\mathsf{C}$ defined on code set $\mathcal{C}$ . $\mathsf{C}$ defined over $\mathcal{C}$ satisfies

[TABLE]

Proof.

Let $Y$ be any non-negative random variable and define $y_{\min}\triangleq{\rm ess}\inf Y$ ; that is, $y_{\min}$ is the largest constant $y\in\mathcal{Y}$ for which $Y\geq y$ almost surely. By Markov’s inequality,

[TABLE]

Taking $Y=P_{e}(\mathsf{C})$ and $y=\epsilon$ yields the desired result. ∎

In the regime of interest $\mathbb{E}\left[P_{e}(\mathsf{C})\right]<\epsilon$ . Therefore, the right side of (228) is decreasing as a function of $\epsilon^{*}(\mathcal{C})$ , and replacing $\epsilon^{*}(\mathcal{C})$ by any converse on $\epsilon^{*}(\mathcal{C})$ yields a valid achievability bound. Thus Lemma 25 provides a means to leverage a converse to prove achievability.

Given any RASC $\mathsf{c}$ , for each $\mathcal{T}\in\mathcal{P}([K])$ let $P_{e,\mathcal{T}}(\mathsf{c})$ denote the error probability of code $\mathsf{c}$ under active encoder set $\mathcal{T}$ . The RASC achievability proof applies Lemma 25 with error probability $P_{e,\mathcal{T}}(\mathsf{c})$ for each $\mathcal{T}\in\mathcal{P}([K])$ . Before proceeding to that proof, we use Theorem 26, below, to define a random code ensemble and calculate its expected error probability.

Theorem 26 (Random code).

For any $K<\infty$ , consider a source distribution $P_{\mathbf{X}_{[K]}}$ defined on countable alphabet $\mathcal{X}_{[K]}$ . There exists a random code ensemble $\mathsf{C}$ defined on the set of all RASCs with decoding blocklengths $\overline{\mathbf{m}}_{K}$ and code alphabets $\mathbf{Q}_{[K]}$ for which the following inequalities hold simultaneously for all $\mathcal{T}\in\mathcal{P}([K])$ :

[TABLE]

where

[TABLE]

and the expectation in (231) is with respect to the conditional distribution

[TABLE]

Proof.

We construct the random code ensemble $\mathsf{C}$ as follows.

Random Encoding Map: For every $i\in[K]$ , draw encoder outputs $\mathsf{F}_{i}(x_{i})$ for all $x_{i}\in\mathcal{X}_{i}$ i.i.d. uniformly at random from $[Q_{i}]^{m_{\rm max}}$ , where $m_{\max}\triangleq\max\left\{m_{\mathcal{T}}:\mathcal{T}\in\mathcal{P}([K])\right\}$ .

Maximum Likelihood Decoder: For any $m\in[m_{\max}]$ , $x_{i}\in\mathcal{X}_{i}$ , and $i\in[K]$ , denote the first $m$ symbols of $\mathsf{F}_{i}(x_{i})$ by $\mathsf{F}_{i}(x_{i})_{[m]}$ . For each $\mathcal{T}\in\mathcal{P}([K])$ , the maximum likelihood decoder $\mathsf{g}_{\mathcal{T}}$ for $\mathcal{T}$ observes the first $m_{\mathcal{T}}$ symbols from the encoders in $\mathcal{T}$ , here denoted by

[TABLE]

and, for each $\textbf{c}_{\mathcal{T}}=(\textbf{c}_{i})_{i\in\mathcal{T}}\in\prod\limits_{i\in\mathcal{T}}[Q_{i}]^{m_{\mathcal{T}}}$ , produces the output

[TABLE]

Expected Error Analysis: The expected error probability $\mathbb{E}\left[P_{e,\mathcal{T}}(\mathsf{C})\right]$ over the random code ensemble is bounded above by the probability of event

[TABLE]

It follows that

[TABLE]

and (239) is equal to the right-hand-side of (26). Here, (239) considers the case where source symbols in set $\hat{\mathcal{T}}$ are decoded incorrectly for each $\hat{\mathcal{T}}\in\mathcal{P}(\mathcal{T})$ . The derivation of (239) from (239) follows the argument in (124)–(124). Specifically, since each component of $\bar{\mathbf{x}}_{{\hat{\mathcal{T}}}}$ differs from the corresponding component of $\mathbf{X}_{{\hat{\mathcal{T}}}}$ and since the encoder output for each is drawn independently and uniformly at random from $[Q_{i}]^{m_{\max}}$ ,

[TABLE]

for any $\bar{\mathbf{x}}_{{\hat{\mathcal{T}}}}\in\mathcal{X}_{{\hat{\mathcal{T}}}}\backslash\{\mathbf{X}_{{\hat{\mathcal{T}}}}\}$ . ∎

We now prove the achievability part of Theorem 24 by applying Lemma 25 to the random code in Theorem 26.

Proof of Theorem 24: achievability.

The probability that random RASC $\mathsf{C}$ has error probability $P_{e,\mathcal{T}}(\mathsf{C})$ greater than $\epsilon_{\mathcal{T}}$ for some possible set $\mathcal{T}\in\mathcal{P}([K])$ of active encoders is

[TABLE]

To bound each term $\mathbb{P}\left[P_{e,\mathcal{T}}(\mathsf{C})>\epsilon_{\mathcal{T}}\right]$ using Lemma 25, we next bound the expected error probability $\mathbb{E}\left[P_{e,\mathcal{T}}(\mathsf{C})\right]$ and the error probability $\epsilon^{*}(\mathcal{C}_{\mathcal{T}})$ for the best code in $\mathcal{C}_{\mathcal{T}}$ , where $\mathcal{C}_{\mathcal{T}}$ is the set of $\big{(}n,(Q_{i}^{m_{\mathcal{T}}},\,i\in\mathcal{T}),\epsilon_{\mathcal{T}}\big{)}$ MASCs with $m_{\mathcal{T}}$ set as in (248) below.

To find $\mathbb{E}\left[P_{e,\mathcal{T}}(\mathsf{C})\right]$ , we apply Theorem 26 to our stationary, memoryless sources with $n$ -symbol distribution $P_{\mathbf{X}_{[K]}^{n}}=P_{\mathbf{X}_{[K]}}^{n}$ . Given any $\mathcal{T}\in\mathcal{P}([K])$ and ${\hat{\mathcal{T}}}\in\mathcal{P}(\mathcal{T})$ , let

[TABLE]

Under moment assumptions (221)–(223), one can generalize the argument in (146)–(186) to $|\mathcal{T}|$ active encoders to obtain

[TABLE]

where $\bar{K}_{\mathcal{T},{\hat{\mathcal{T}}}}$ , $K_{\mathcal{T},{\hat{\mathcal{T}}}}$ and $S_{\mathcal{T},{\hat{\mathcal{T}}}}$ are finite positive constants.

Fix any $\mathbf{Q}_{[K]}$ . By the definition of $\overline{\mathbf{R}}_{\mathcal{T}}$ in (189) and the relation in (220), we see that

[TABLE]

For brevity, define constant vector

[TABLE]

and the almost-constant error thresholds

[TABLE]

where $B$ is the Bentkus constant (81) for the vector of information densities (243). We choose the decoding blocklength $m_{\mathcal{T}}$ as

[TABLE]

where $\delta_{\mathcal{T}}$ (which may be a function of $n$ ) satisfying $0\leq\delta_{\mathcal{T}}<\epsilon_{\mathcal{T}}$ will be determined in the sequel, and $\overline{\mathscr{R}}_{\mathcal{T}}^{*}(n,\epsilon)$ is defined in (193). Applying Lemma 12 to (244) with $m_{\mathcal{T}}$ in (248) yields

[TABLE]

To lower-bound $\epsilon^{*}(\mathcal{C})$ , for each $n$ and $\epsilon$ define

[TABLE]

where $\mathscr{R}_{\mathcal{T}}^{*}(n,\epsilon)$ is the $(n,\epsilon)$ -MASC rate region (see Remark 12), $\mathscr{R}_{{\rm out},\mathcal{T}}^{*}\left(n,\epsilon\right)$ is defined in (LABEL:eq-def-sw-out), and (251) is by the converse (Theorem 23). By Lemma 13-83, one can always choose $\Delta_{\mathcal{T}}=O\left(\frac{1}{\sqrt{n}}\right)$ such that for $n$ sufficiently large

[TABLE]

It follows that

[TABLE]

Equation (253) and the converse (Theorem 23) imply that the minimal error probability over $\mathcal{C}_{\mathcal{T}}$ satisfies

[TABLE]

Plugging (249) and (254) into Lemma 25 and noting the monotonicity of the bound in Lemma 25 gives

[TABLE]

We may choose $\delta_{\mathcal{T}}=O\left(\frac{1}{\sqrt{n}}\right)$ to ensure that the right-hand side of (256) is as small a constant as desired. Specifically, we choose constants $(\lambda_{\mathcal{T}})_{\mathcal{T}\in\mathcal{P}([K])}$ to satisfy

[TABLE]

and put

[TABLE]

With (256) and (258), we bound the right-hand side of (242) as

[TABLE]

which implies the existence of a deterministic $\left(n,\overline{\mathbf{m}}_{K},\mathbf{Q}_{[K]},\overline{\boldsymbol{\epsilon}}_{K}\right)$ RASC with $m_{\mathcal{T}}$ in (248), $\overline{Q}({\hat{\mathcal{T}}})$ in (245), and $\mathbf{R}_{\mathcal{T}}$ in (248). ∎

*Remark 15**.*

When parameters $\left(n,\mathbf{Q}_{[K]},\overline{\boldsymbol{\epsilon}}_{K}\right)$ are fixed, increasing $\lambda_{\mathcal{T}}$ yields larger decoding blocklengths $m_{\mathcal{T}}$ . Therefore, the choice of $(\lambda_{\mathcal{T}})_{\mathcal{T}\in\mathcal{P}([K])}$ to satisfy (257) controls the RASC performance trade-off across different active encoder sets. This trade-off affects the performance of the RASC in the fourth- or higher-order terms.

VI-D RASC for Permutation-Invariant Sources

A permutation-invariant101010Polyanskiy [17] introduces a similar notion of permutation invariance for multiple access channel coding in [17]. source is defined by the constraint

[TABLE]

for all permutations $\pi$ on $[K]$ and all $\mathbf{x}_{[K]}\in\mathcal{X}_{[K]}$ . For example, given any $P_{S}$ and $P_{X|S}$ , the marginal $P_{\mathbf{X}_{[K]}}$ of $P_{\mathbf{X}_{[K]}S}=(P_{X|S})^{K}P_{S}$ satisfies (260). Such “hidden variable” models have applications in statistics, science, and economics, where latent variables (e.g., the health of the world economy or the state of the atmosphere) influence observables (e.g., stock prices or climates). Figure 9 shows an example with $K$ sensors reading measurements of a common hidden state $S$ .

Permutation-invariant source models interest us both because of their wide applicability and because they present an opportunity for code simplification through identical encoding, where all encoders employ the same encoding map. For any permutation-invariant source, (215) and (260) imply that $\mathcal{X}_{i}=\mathcal{X}$ for all $i\in[K]$ and, for any $\mathcal{T}\in\mathcal{P}([K])$ with $|\mathcal{T}|=k$ ,

[TABLE]

Thus, $P_{\mathbf{X}_{\mathcal{T}}}$ is permutation-invariant for every $\mathcal{T}$ and the joint source distribution depends on the number of active encoders but not their identities. Assuming that we further employ the same error probability $\epsilon_{k}$ for all $\mathcal{T}\in\mathcal{P}([K])$ with $|\mathcal{T}|=k$ , we can fix a single decoding blocklength for each number $k\in[K]$ of active encoders and use identical encoders at all transmitters, allowing us to accommodate an arbitrarily large number of encoders without designing a unique encoder for each. A similar phenomenon arises for RA channel coding [18].

In analyzing RASC performance with identical encoders on a permutation-invariant source, we assume in addition to (221) and (222) that no two sources are identical, i.e.,

[TABLE]

This is important since using identical encoders on identical sources yields identical descriptions, in which case descriptions from multiple encoders are no better than descriptions from a single encoder. Under these assumptions, Theorem 24 continues to hold. In the analysis, we modify the decoder to output the most probable source vector $\mathbf{x}_{\mathcal{T}}\in\mathcal{X}_{\mathcal{T}}$ that contains no repeated symbols (see the proof of Theorem 26), treating the case where $\mathbf{X}_{\mathcal{T}}$ contains repeated symbols as an error. In the asymptotic analysis for stationary, memoryless sources, the probability of this error event is bounded by

[TABLE]

which decays exponentially in $n$ by (262). Therefore, under the assumption in (262), identical encoding does not incur a first-, second-, or third-order performance penalty.

VII Concluding Remarks

This paper studies finite-blocklength lossless source coding in three scenarios.

We derive a new non-asymptotic achievability (RCU) bound (Theorem 4) and use it to show that for point-to-point coding on stationary, memoryless sources, random code design with maximum likelihood decoding achieves the same coding rate up to the third-order as the optimal code from [6]. The RCU bound generalizes to the MASC scenario (Theorem 18).

A new HT converse (Theorem 19) extends the channel coding meta-converse [5] to an MASC and suggests the possibility of using composite hypothesis testing to derive converses for other multi-terminal scenarios. Our analysis of composite hypothesis testing provides general tools (Lemmas 8, 9, and 10) for use in other related problems. Just as the meta-converse for channel coding recovers previously known converses, our HT converse recovers Han’s MASC converse [9, Lemma. 7.2.2]. Just as the HT converse for lossy source coding [23, Th. 8] is equivalent to the LP-based converse for that setting (see [15, Cor. 3]), our MASC HT converse is equivalent to the MASC LP-based converse [15, Th. 12].

We give the first third-order characterization of the MASC rate region for stationary, memoryless sources, tightening prior second-order characterizations from [13] and [14] and replacing the $2^{k}-1$ thresholds used there to decode for $k$ users by a maximum likelihood decoder that chooses the jointly most probable source realizations consistent with the received codewords. We show that for rate points converging to a non-corner point on the asymptotic sum-rate boundary, separate encoding does not compromise the performance in lossless data compression up to the third-order term. Numerical comparison of the new HT converse and the optimal performance of point-to-point source coding in Figure 4 allows one to bound from below the small gap between joint and separate encoding, which is not captured in the first three terms of the asymptotic expansion. For independent sources, there are no non-corner points, and MASC separate encoding incurs a positive penalty in the second-order term relative to joint encoding with a point-to-point code. When two sources have the same marginals, this penalty equals the penalty for using two independent blocklength- $n$ codes rather than a single blocklength- $2n$ point-to-point code for encoding $2n$ samples.

Our proposed RASC works universally for all possible encoder activity patterns. The nested structure of the RASC demonstrates that there is no need for the encoders to know the set of active encoders a priori. The third-order-optimal MASC performance is achievable even when the only information the encoders receive is the acknowledgment that tells them when to stop transmitting (Theorem 24).

Our refinement of the traditional random coding argument (Lemma 25 and (242)) uses bounds on the minimal (converse) and expected (achievability) error probabilities for each possible active encoder set to show the existence of a single code that is good for all possible active encoder sets. This argument is likely to be useful for other information-theoretic problems.

Appendix A Proof of Theorem 3

Following [5, Eq. (68)], note that for $z>0$ and $\gamma>0$

[TABLE]

Let $z=\frac{1}{P_{X}(X)}$ and $\gamma=M$ . Then taking the expectation of both sides of (A.1) with respect to $P_{X}$ gives

[TABLE]

where $\mathbb{P}\left[\cdot\right]$ denotes a probability with respect to $P_{X}$ and $\mathbb{U}\left[\cdot\right]$ denotes a mass with respect to the counting measure $U_{X}$ on $\mathcal{X}$ , which assigns unit weight to each $x\in\mathcal{X}$ . In light of (A.2), we can prove (23) by demonstrating the existence of an $(M,\epsilon)$ code for which the right-hand side of (A.2) exceeds $\epsilon$ . We prove a slightly stronger result, showing that there exists an $(M,\epsilon)$ code with a threshold decoder such that

[TABLE]

for all $\gamma>0$ . Setting $\gamma=M$ in (A.3) yields the desired bound.

Fix $\gamma>0$ . For each $x\in\mathcal{X}$ , randomly and independently draw each encoder output $\mathsf{F}(x)$ from the uniform distribution on $[M]$ . Define the threshold decoder

[TABLE]

We capture all errors using a union of error events

[TABLE]

By the random coding argument and the union bound, there exists an $(M,\epsilon)$ code such that

[TABLE]

Here,

[TABLE]

where (A.10) applies the union bound and (A.11) holds since the encoder outputs are i.i.d. and uniformly distributed. ∎

Appendix B Proof of Lemma 9

The proof extends the proof of [5, Eq. (102)] (e.g., [24]). We show that for any test $P_{Z|X}$ that decides between $P$ vs. $\{Q_{j}\}_{j=1}^{k}$ ,

[TABLE]

where $\gamma_{j}\geq 0$ , $j\in[k]$ are arbitrary constants. Then Lemma 9 follows immediately by definition of $\beta_{\alpha}\left(P,\{Q_{j}\}_{j=1}^{k}\right)$ .

To prove (B.1), fix a $\gamma_{j}\geq 0$ for each $j\in[k]$ . We then have

[TABLE]

where (B.4) follows from the non-negativity of probability and each $\gamma_{j}$ . The proof is complete since (B.7) equals the right-hand-side of (B.1). ∎

Appendix C Proof of Lemma 10

For any test $P_{Z|X}$ deciding between $P$ vs. $\{Q_{j}\}_{j=1}^{k}$ , we show that

[TABLE]

where $\gamma_{j}\geq 0$ , $j\in[k]$ are arbitrary constants. Fix a $\gamma_{j}\geq 0$ for each $j\in[k]$ . For notational brevity, define sets

[TABLE]

For any test $P_{Z|X}$ , we have

[TABLE]

The equality in (C.7) is achieved by test

[TABLE]

for any $\lambda\in[0,1]$ . Rearranging (C.8) yields (C.1). Choosing the unique $\lambda\in[0,1]$ to satisfy $\mathbb{P}\left[Z=1\right]=\alpha$ , we obtain Lemma 10 by the definition of $\beta_{\alpha}\left(P,\{Q_{j}\}_{j=1}^{k}\right)$ . ∎

Appendix D Proof of Lemma 12

Recall that $\mathsf{T}$ is composed of the $r$ normalized eigenvectors corresponding to the non-zero eigenvalues of covariance matrix V and $\mathbf{U}_{i}=\mathsf{T}\mathbf{W}_{i}$ , where $\mathbf{W}_{i}\in\mathbb{R}^{r}$ for $i=1,\ldots,n$ . Thus $\mathsf{V}=\mathsf{T}\mathsf{V}_{r}\mathsf{T}^{T}$ , where $\mathsf{V}_{r}\triangleq\text{Cov}\left[\mathbf{W}_{1}\right]$ is non-singular.

For each $\mathbf{z}\in\mathbb{R}^{d}$ , define

[TABLE]

which is a convex subset of $\mathbb{R}^{r}$ . Let $\mathbf{Z}_{r}\sim\mathcal{N}(\mathbf{0},\mathsf{V}_{r})\in\mathbb{R}^{r}$ . Applying [13, Cor. 8] to the i.i.d. random vectors $\mathbf{W}_{1},\ldots,\mathbf{W}_{n}$ , we obtain

[TABLE]

which is equivalent to (81) by the definition of $\mathscr{A}_{r}(\mathbf{z})$ . ∎

Appendix E Proof of Lemma 13

For simplicity, we assume that $\mathsf{V}$ is non-singular. When $\mathsf{V}$ is singular, a similar analysis can be applied with $\mathsf{V}$ replaced by $\mathsf{V}_{r}$ defined in Lemma 12.

Let $\mathbf{Z}\sim\mathcal{N}(\mathbf{0},\mathsf{V})$ be a $d$ -dimensional multivariate Gaussian with covariance matrix $\mathsf{V}$ . Recall from (77) that $\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon)$ is defined as

[TABLE]

By the definition of $\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon)$ and the definition of $\Phi(\mathsf{V};\mathbf{z})$ in (II), $\Phi(\mathsf{V};\mathbf{z})=1-\epsilon$ if and only if $\mathbf{z}$ lies on the boundary of $\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon)$ , and $\Phi(\mathsf{V};\mathbf{z})>1-\epsilon$ if and only if $\mathbf{z}$ lies in the interior of $\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon)$ .

Proof of Lemma 13.

To prove (82), consider any $D_{1}>0$ and $\delta\geq 0$ . Since $\Phi(\mathsf{V};\mathbf{z})$ is continuously differentiable everywhere provided that $\mathsf{V}$ is non-singular, we can apply the multivariate Taylor’s theorem to expand $\Phi(\mathsf{V};\mathbf{z}+D_{1}\delta\mathbf{1})$ as

[TABLE]

The second-order residual term $\xi(\mathbf{z},D_{1}\delta)$ can be bounded as

[TABLE]

where

[TABLE]

and $\|\cdot\|_{\max}$ denotes the max norm of a matrix.

Denote

[TABLE]

Since $\Phi(\mathsf{V};\mathbf{z})$ is increasing in any coordinate of $\mathbf{z}$ , $D^{\prime}>0$ . Then, for any $\mathbf{z}\in\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon)$ , we have

[TABLE]

We note that for any finite positive $D_{1}$ , $\xi_{\max}$ approaches $\|\nabla^{2}\Phi(\mathsf{V};\mathbf{z})\|_{\max}$ as $\delta\rightarrow 0$ . Thus, for any finite positive $D_{1}$ that satisfies $D^{\prime}D_{1}>1$ , there exists some $\delta_{1}>0$ such that for all $0\leq\delta<\delta_{1}$ ,

[TABLE]

which yields

[TABLE]

By the definitions of $\Phi(\mathsf{V};\mathbf{z})$ and $\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon)$ , (E.9) implies

[TABLE]

and consequently

[TABLE]

which proves (82).

Eq. (83) can be proved in a similar way.

∎

Appendix F Equivalence between HT and LP-Based Converses for the MASC

In this appendix, we establish the equivalence between the HT converse and the LP-based converse by showing that the bounds in (108) and (138) are equivalent. According to [15, Eq. (31)], (108) is equivalent to the following converse

[TABLE]

where the supremum is over

[TABLE]

Therefore, we show that (F.1) is equivalent to (138).

We first demonstrate that (F.1) implies (138). Set

[TABLE]

for any $\sigma$ -finite $Q^{(i)}_{X_{1}X_{2}}$ and $\gamma_{i}\geq 0$ , $i\in[3]$ . Since

[TABLE]

in (F.1), we obtain (138).

To prove the other direction, we substitute $z_{1}=\gamma_{1}Q^{(1)}_{X_{1}X_{2}}$ , $z_{2}=\gamma_{2}Q^{(2)}_{X_{1}X_{2}}$ , and $z_{3}=\gamma_{3}Q^{(3)}_{X_{1}X_{2}}$ in the right-hand side of (138) to obtain

[TABLE]

Take a supremum of the right-hand side of (F.3) over $\eta_{1},\eta_{2},\eta_{3}\in\mathcal{Z}$ . Since (F.3) does not contain $\eta_{1},\eta_{2},\eta_{3}$ , this does not change anything. Now, weaken (i.e., lower-bound) the inner supremum over $z_{1},z_{2},z_{3}\in\mathcal{Z}$ by setting

[TABLE]

Observing that

[TABLE]

we see that the result of our weakening is exactly the right-hand side of (F.1), as desired. ∎

Appendix G MASCs for Sources with Less Redundancy

Applying Lemma 7 to get the asymptotic achievability result in Theorem 20 requires that all $V(X_{1},X_{2})$ , $V(X_{1}|X_{2})$ , and $V(X_{2}|X_{1})$ are strictly positive (as an implication of assumption (139)). Thus, the analysis in Section V-D breaks down when any of these varentropies is equal to zero. (We refer to such a source as being less redundant.) In this appendix, we analyze the performance of the MASC for less redundant sources. Specifically, we consider a pair of stationary, memoryless sources and analyze the following three cases:

all three varentropies are equal to zero; 2. 2)

exactly two of the varentropies are equal to zero; 3. 3)

exactly one of the varentropies is equal to zero.

We continue to assume that the joint distribution $P_{X_{1}X_{2}}$ satisfies (140) and (141). For those cases in which $V(X_{2}|X_{1})>0$ , we continue to assume $\mathbb{E}\left[V_{c}(X_{2}|X_{1})\right]\!>\!0$ . Likewise, if $V(X_{1}|X_{2})>0$ , we continue to assume $\mathbb{E}\left[V_{c}(X_{1}|X_{2})\right]\!>\!0$ .

In point-to-point almost-lossless source coding, the optimal code for a non-redundant source is easy to find (see Remark 2). When the encoders are required to operate independently in a MASC, we know no easy way to find the optimal codes in general. In Section A below, we give characterizations of the $(n,\epsilon)$ -rate region in the three general cases listed above using the techniques developed in Section V-D. Then, in Section B, we restrict attention to the case where $P_{X_{1}X_{2}}(x_{1},x_{2})>0$ for every $(x_{1},x_{2})\in\mathcal{X}_{1}\times\mathcal{X}_{2}$ ; under this condition, the optimal codes can be found and analyzed directly.

G-A General Characterizations of the $(n,\epsilon)$ -Rate Region

We first list our results in the three general cases below.

Case 1): Suppose that $V(X_{1}|X_{2})=0$ , $V(X_{2}|X_{1})=0$ , and $V(X_{1},X_{2})=0$ . For any $\delta_{1}$ , $\delta_{2}$ , $\delta_{12}>0$ , let

[TABLE]

Define

[TABLE]

Theorem 27.

When $V(X_{1}|X_{2})=0$ , $V(X_{2}|X_{1})=0$ , and $V(X_{1},X_{2})=0$ , the $(n,\epsilon)$ -rate region $\mathscr{R}^{*}(n,\epsilon)$ satisfies

[TABLE]

As in the point-to-point scenario, there are no second-order dispersion terms or $-\frac{\log n}{2n}$ third-order terms in the characterization of $\mathscr{R}^{*}(n,\epsilon)$ in this case. For any $n$ and $\epsilon$ , the achievable region $\mathscr{R}_{\rm in}^{(1)}(n,\epsilon)$ has a curved boundary due to the trade-off in the $O\left(\frac{1}{n}\right)$ fourth-order terms, while the converse region $\mathscr{R}_{\rm out}^{(1)}(n,\epsilon)$ has three linear boundaries.

Case 2): There are three possible cases where exactly two of the three varentropies are equal to zero. Here, we suppose that $V(X_{1}|X_{2})>0$ while $V(X_{2}|X_{1})=V(X_{1},X_{2})=0$ . The other two cases can be analyzed in the same way. Let $B_{1}$ denote the Berry-Esseen constant for the random variable $\imath(X_{1}|X_{2})$ , and let $S_{2}$ , $K_{1}$ , $\bar{K}_{1}$ be the finite positive constants defined in (181), (158), and (176), respectively. For any $\delta_{1}$ , $\delta_{2}$ , $\delta_{12}>0$ , let

[TABLE]

Define

[TABLE]

Theorem 28.

When $V(X_{1}|X_{2})>0$ , $V(X_{2}|X_{1})=0$ , and $V(X_{1},X_{2})=0$ , the $(n,\epsilon)$ -rate region $\mathscr{R}^{*}(n,\epsilon)$ satisfies

[TABLE]

The achievable region $\mathscr{R}_{\rm in}^{(2)}(n,\epsilon)$ has a curved boundary due to the trade-off in $\delta_{1}$ , $\delta_{2}$ , and $\delta_{12}$ . If we let

[TABLE]

then it is apparent that the dispersion corresponding to $R_{1}$ is $V(X_{1}|X_{2})$ with a $-\frac{\log n}{2n}$ third-order term, while the dispersions of $R_{2}$ and $R_{1}+R_{2}$ are zero.

Case 3): Similar to Case 2), there are three possible cases where exactly one of the three varentropies is equal to zero. Here, we consider the case where $V(X_{1}|X_{2})=0$ while $V(X_{2}|X_{1})>0$ and $V(X_{1},X_{2})>0$ . Let $S_{1}$ , $K_{2}$ , $\bar{K}_{2}$ , and $K_{12}$ be the finite positive constants defined in (180), (159), (177), and (160), respectively, and let $B$ be the Bentkus constant (81) for the vector $(I_{2},I_{12})$ . For any $\delta\in(0,\epsilon)$ , let

[TABLE]

where $C_{\rm in}\triangleq K_{2}+K_{12}+B+\frac{S_{1}}{\sqrt{n}}$ , and $\mathsf{V}_{2}$ is the covariance matrix of the random vector $(\imath(X_{2}|X_{1}),\imath(X_{1},X_{2}))$ . Define

[TABLE]

Theorem 29.

When $V(X_{1}|X_{2})=0$ , $V(X_{2}|X_{1})>0$ , and $V(X_{1},X_{2})>0$ , the $(n,\epsilon)$ -rate region $\mathscr{R}^{*}(n,\epsilon)$ satisfies

[TABLE]

For any $n$ and $\epsilon$ , the achievable region $\mathscr{R}_{\rm in}^{(3)}(n,\epsilon)$ has a curved boundary that is characterized by the trade-off between a separate bound on $R_{1}$ and a region in $\mathbb{R}^{2}$ that bounds $(R_{2},R_{1}+R_{2})$ jointly. The converse region $\mathscr{R}_{\rm out}^{(3)}(n,\epsilon)$ is the intersection of a region with a linear boundary that bounds $R_{1}$ only and a region with a curved boundary that bounds $(R_{2},R_{1}+R_{2})$ jointly. If we let

[TABLE]

then it is apparent that the dispersion corresponding to $R_{2}$ and $R_{1}+R_{2}$ is given by $\mathsf{V}_{2}$ with a $-\frac{\log n}{2n}$ third-order term, while the dispersion of $R_{1}$ is zero.

A less redundant stationary, memoryless source has some useful properties. When $V(X_{1},X_{2})=0$ ,

[TABLE]

for every $(x_{1}^{n},x_{2}^{n})\in\mathcal{X}_{1}^{n}\times\mathcal{X}_{2}^{n}$ ; in other words, $(X_{1},X_{2})$ is uniformly distributed over its support in $\mathcal{X}_{1}\times\mathcal{X}_{2}$ . When $V(X_{1}|X_{2})=0$ ,

[TABLE]

in other words, $X_{1}$ is uniformly distributed over its conditional support for each $x_{2}\in\mathcal{X}_{2}$ . When $V(X_{2}|X_{1})=0$ , a result analogous to (G.16) holds. These properties do not reduce the difficulty of characterizing the optimal MASCs in general. As a result, we continue to employ the random coding techniques from Section V-D in our analysis here. For the achievability argument, we invoke the MASC RCU bound (Theorem 18); for the converse, we appeal to a modified version of [9, Lemma 7.2.2], as stated below.

Lemma 30 (Modified [9, Lemma 7.2.2]).

Any $\left(n,\exp\left(nR_{1}\right),\exp\left(nR_{2}\right),\epsilon^{\prime}\right)$ MASC satisfies

[TABLE]

for any $\gamma_{1}$ , $\gamma_{2}$ , $\gamma_{12}>0$ , where $I_{1}$ , $I_{2}$ and $I_{12}$ are defined in (146)–(148).

We next prove Theorems 27, 28, and 29.

Proof of Theorem 27.

Achievability: We employ the RCU bound in (155). To evaluate the terms in (155), note that the uniformity over the distribution’s support that results from $V(X_{1},X_{2})=V(X_{1}|X_{2})=V(X_{2}|X_{1})=0$ implies that for any $(x_{1}^{n},x_{2}^{n})$ such that $P_{X_{1}^{n}X_{2}^{n}}(x_{1}^{n},x_{2}^{n})>0$ ,

[TABLE]

Similar equalities hold for $A_{2}$ and $A_{12}$ , and for any $(R_{1},R_{2})\in\mathscr{R}_{\rm in}^{(1)}(n,\epsilon)$ , (155) gives

[TABLE]

implying that such a rate pair $(R_{1},R_{2})$ is achievable. Therefore, the $(n,\epsilon)$ -rate region satisfies

[TABLE]

Converse: Consider any $(R_{1},R_{2})$ with $R_{1}<H(X_{1}|X_{2})-\frac{1}{n}\log\frac{1}{1-\epsilon}$ . Since the bound in (30) holds for any $\gamma_{1}$ , $\gamma_{2}$ , $\gamma_{12}>0$ , we take

[TABLE]

which, under the given uniformity, implies

[TABLE]

We take $\gamma_{2}$ and $\gamma_{12}$ sufficiently large so that

[TABLE]

and hence

[TABLE]

Under these conditions, (30) gives

[TABLE]

Therefore, any achievable rate pair $(R_{1},R_{2})$ must satisfy

[TABLE]

The same analysis applies to $R_{2}$ and $R_{1}+R_{2}$ . We then conclude that any achievable rate pair $(R_{1},R_{2})$ must satisfy $(R_{1},R_{2})\in\mathscr{R}_{\rm out}^{(1)}(n,\epsilon)$ . Thus,

[TABLE]

∎

Proof of Theorem 28.

Achievability: Take $(R_{1},R_{2})\in\mathscr{R}_{\rm in}^{(2)}(n,\epsilon)$ (G.6) satisfying the inequalities in (G.5) with some $\delta_{1}$ , $\delta_{2}$ , $\delta_{12}>0$ such that $\delta_{1}+\delta_{2}+\delta_{12}=\epsilon$ . We again employ the RCU bound from (155). Since $\mathbb{E}\left[V_{c}(X_{1}|X_{2})\right]>0$ , we use (174) to bound $A_{1}$ . The terms $A_{2}$ and $A_{12}$ are constants (cf. (G.18)). With these observations, we weaken (156) as

[TABLE]

where (G.30) is by our choice of $(R_{1},R_{2})$ , and (G.31) applies (169), the Berry-Esseen inequality (Theorem 6), and (179) to bound the three probability terms. Therefore, $(R_{1},R_{2})$ is achievable at blocklength $n$ and error probability $\epsilon$ , implying

[TABLE]

Converse: We next apply Lemma 30 to derive a converse result. Recall that under our assumptions $V(X_{2}|X_{1})=V(X_{1},X_{2})=0$ , $\imath(X_{2}|X_{1})=H(X_{2}|X_{1})$ and $\imath(X_{1},X_{2})=H(X_{1},X_{2})$ almost surely. Consider any $(R_{1},R_{2})$ such that $R_{2}<H(X_{2}|X_{1})-\frac{1}{n}\log\frac{1}{1-\epsilon}$ . Since the bound in (30) holds for any $\gamma_{1}$ , $\gamma_{2}$ , $\gamma_{12}>0$ , we can take

[TABLE]

so that

[TABLE]

By this choice of $\gamma_{2}$ , $1-\epsilon-\exp\left(-n\gamma_{2}\right)>0$ . Thus, we can take $\gamma_{1}$ and $\gamma_{12}$ sufficiently large such that

[TABLE]

By the above choices of $\gamma_{1}$ , $\gamma_{2}$ , and $\gamma_{12}$ , (30) gives

[TABLE]

Therefore, any achievable rate pair $(R_{1},R_{2})$ must satisfy

[TABLE]

The same analysis applies to $R_{1}+R_{2}$ , and we conclude that any achievable rate pair $(R_{1},R_{2})$ must also satisfy

[TABLE]

Given (G.37) and (G.38), we re-evaluate the bound in (30) by taking

[TABLE]

Under these conditions, we have

[TABLE]

and the bound in (30) becomes

[TABLE]

Then, by the Berry-Esseen inequality (Theorem 6), taking

[TABLE]

in (G.41) yields $\epsilon^{\prime}\geq\epsilon$ . Therefore, any achievable rate pair $(R_{1},R_{2})$ must satisfy $(R_{1},R_{2})\in\mathscr{R}_{\rm out}^{(2)}(n,\epsilon)$ . Thus,

[TABLE]

∎

Proof of Theorem 29.

Achievability: Take any $(R_{1},R_{2})\in\mathscr{R}_{\rm in}^{(3)}(n,\epsilon)$ satisfying the inequalities in (G.10) with some $\delta\leq\epsilon$ . We employ the RCU bound in (155). Since $\mathbb{E}\left[V_{c}(X_{2}|X_{1})\right]>0$ and $V(X_{1},X_{2})>0$ , we use (175) and (163) to bound $A_{2}$ and $A_{12}$ , respectively; $A_{1}$ is the constant in (G.18). With these observations, we weaken (156) as

[TABLE]

where (G.45) is by our choice of $(R_{1},R_{2})$ , and (G.46) applies (170), (171), Lemma 12 (multidimensional Berry-Esseen Theorem), and (179) to bound the four probability terms. Therefore, $(R_{1},R_{2})$ is achievable at blocklength $n$ and error probability $\epsilon$ , implying that

[TABLE]

Converse: We employ Lemma 30 to derive a converse. Recall that in this case, $\imath(X_{1}|X_{2})=H(X_{1}|X_{2})$ almost surely. Consider any $(R_{1},R_{2})$ such that $R_{1}<H(X_{1}|X_{2})-\frac{1}{n}\log\frac{1}{1-\epsilon}$ . Since the bound in (30) holds for any $\gamma_{1}$ , $\gamma_{2}$ , $\gamma_{12}>0$ , we can set

[TABLE]

so that

[TABLE]

By this choice of $\gamma_{1}$ , $1-\epsilon-\exp\left(-n\gamma_{1}\right)>0$ . Thus, we can take $\gamma_{2}$ and $\gamma_{12}$ sufficiently large such that

[TABLE]

By the above choices of $\gamma_{1}$ , $\gamma_{2}$ , and $\gamma_{12}$ , (30) gives

[TABLE]

Therefore, any achievable rate pair $(R_{1},R_{2})$ must satisfy

[TABLE]

Given that (G.52) holds, we re-evaluate the bound in (30) by taking

[TABLE]

Under these conditions, the bound in (30) becomes

[TABLE]

Applying Lemmas 12 and 13 to (G.54), we conclude that any $(R_{1},R_{2})$ in the $(n,\epsilon)$ -rate region must satisfy $(R_{1},R_{2})\in\mathscr{R}_{\rm out}^{(3)}(n,\epsilon)$ . Thus,

[TABLE]

∎

G-B Two Special Cases

The analysis in Section G-A above applies to any stationary, memoryless source with single-letter distribution $P_{X_{1}X_{2}}$ that satisfies (140). In such a general setting, it is hard to find an optimal code. However, there are some special cases in which the optimal codes for a less redundant source can be characterized.

To enable the following analysis on these special cases, we assume that $P_{X_{1}X_{2}}(x_{1},x_{2})>0$ for every $(x_{1},x_{2})\in\mathcal{X}_{1}\times\mathcal{X}_{2}$ . Under this assumption, $V(X_{1},X_{2})=0$ if and only if $V(X_{1}|X_{2})=V(X_{2}|X_{1})=0$ . As a result, the three cases discussed in Section A reduce to only two possible scenarios:

$V(X_{1},X_{2})=V(X_{1}|X_{2})=V(X_{2}|X_{1})=0$ ; 2. 2)

$V(X_{1},X_{2})>0$ , and either $V(X_{1}|X_{2})=0$ or $V(X_{2}|X_{1})=0$ .

Note that $X_{1}$ and $X_{2}$ are independent in both of these scenarios.

We first summarize the results below.

Special Case 1):

Theorem 31.

Suppose that $V(X_{1}|X_{2})=0$ , $V(X_{2}|X_{1})=0$ , and $V(X_{1},X_{2})=0$ . If $P_{X_{1}X_{2}}$ satisfies $P_{X_{1}X_{2}}(x_{1},x_{2})>0$ for every $(x_{1},x_{2})\in\mathcal{X}_{1}\times\mathcal{X}_{2}$ , then

[TABLE]

where $\mathscr{R}_{\rm out}^{(1)}(n,\epsilon)$ is defined in (G.3).

This scenario is a special example of Case 1) discussed in Section A above. The $(n,\epsilon)$ -rate region here coincides with the converse region $\mathscr{R}_{\rm out}^{(1)}(n,\epsilon)$ presented in (G.3) for general source distributions. See Figure 10LABEL:sub@fig-sw-1 for a comparison among $\mathscr{R}_{\rm in}^{(1)}(n,\epsilon)$ , $\mathscr{R}_{\rm out}^{(1)}(n,\epsilon)$ , and $\mathscr{R}^{*}(n,\epsilon)$ in this special case.

Special Case 2): With $V(X_{1},X_{2})>0$ , we here assume that $V(X_{1}|X_{2})=0$ and $V(X_{2}|X_{1})>0$ . The other case can be analyzed similarly. For any $\delta\in[0,\epsilon)$ , we define

[TABLE]

where the functions $\xi_{\rm in}(\epsilon,\delta,n)$ and $\xi_{\rm out}(\epsilon,\delta,n)$ are characterized as follows. For any fixed $\delta$ , $\xi_{\rm out}(\epsilon,\delta,n)=O(\frac{1}{n})$ and $\xi_{\rm in}(\epsilon,\delta,n)=O(\frac{1}{n})$ . For any fixed $n$ , both $\xi_{\rm out}(\epsilon,\delta,n)$ and $\xi_{\rm in}(\epsilon,\delta,n)$ blow up as $\delta$ approaches $\epsilon$ . (These bounds are applications of the point-to-point results in Theorem 1.) Also define

[TABLE]

Theorem 32.

Suppose that $V(X_{1}|X_{2})=0$ , $V(X_{2}|X_{1})>0$ , and $V(X_{1},X_{2})>0$ . If $P_{X_{1}X_{2}}$ satisfies $P_{X_{1}X_{2}}(x_{1},x_{2})>0$ for every $(x_{1},x_{2})\in\mathcal{X}_{1}\times\mathcal{X}_{2}$ , then

[TABLE]

This scenario is a special example of Case 3) discussed in Section A of this appendix. The $(n,\epsilon)$ -rate region characterized in (G.61) is sandwiched between the achievable region presented in (G.11) and the converse region presented in (G.12). To compare these regions, we note that the bounds on $R_{1}+R_{2}$ in (G.11) and (G.12) become inactive in this special scenario where $X_{1}$ and $X_{2}$ are independent. As a result, the achievable region in (G.11) becomes

[TABLE]

and the converse region in (G.12) becomes

[TABLE]

As $\delta$ approaches $\epsilon$ , the boundary of the $(n,\epsilon)$ -rate region given in (G.59) approaches the line $R_{1}=H(X_{1})-\frac{1}{n}\log\frac{1}{1-\epsilon}$ , which matches the vertical segment of the boundary of the converse region $\mathscr{R}_{\rm out}^{(3)}(n,\epsilon)$ . See Figure 10LABEL:sub@fig-sw-2 for a comparison of $\mathscr{R}_{\rm in}^{(3)}(n,\epsilon)$ , $\mathscr{R}_{\rm out}^{(3)}(n,\epsilon)$ , and $\mathscr{R}^{*}(n,\epsilon)$ in this case.

We next give proofs for Theorems 31 and 32.

Proof of Theorem 31.

When $V(X_{1}|X_{2})=V(X_{2}|X_{1})=V(X_{1},X_{2})=0$ , $(X_{1},X_{2})$ is uniformly distributed over $\mathcal{X}_{1}\times\mathcal{X}_{2}$ , which restricts $\mathcal{X}_{1}$ and $\mathcal{X}_{2}$ to be finite and $X_{1}$ and $X_{2}$ to be independent. The MASC problem reduces to independent (point-to-point) almost-lossless source coding problems for the two sources with a single compound error probability. As a result, the optimal MASC with blocklength $n$ and code sizes $(M_{1},M_{2})$ has an error probability given by

[TABLE]

Therefore, for any $0<\epsilon<1$ , there exists an $(n,M_{1},M_{2},\epsilon)$ MASC if and only if

[TABLE]

In this case, $H(X_{1})=\log|\mathcal{X}_{1}|$ and $H(X_{2})=\log|\mathcal{X}_{2}|$ .

$\bullet$ For $R_{1}<H(X_{1})$ and $R_{2}<H(X_{2})$ , (G.65) becomes

[TABLE]

which is equivalent to

[TABLE]

$\bullet$ For $R_{1}\geq H(X_{1})$ , (G.65) becomes

[TABLE]

which is equivalent to

[TABLE]

$\bullet$ For $R_{2}\geq H(X_{2})$ , (G.65) gives

[TABLE]

For all $0<\epsilon<1$ and $n\geq 1$ ,

[TABLE]

∎

Proof of Theorem 32.

When $V(X_{1}|X_{2})=0$ and $V(X_{2}|X_{1}),V(X_{1},X_{2})>0$ , $X_{1}$ is uniformly distributed over $\mathcal{X}_{1}$ , which implies that $\mathcal{X}_{1}$ is finite and $H(X_{1})=\log|\mathcal{X}_{1}|$ . In contrast, $X_{2}$ is not uniform over $\mathcal{X}_{2}$ . Moreover, $X_{1}$ and $X_{2}$ are independent. The MASC problem in this case can also be resolved via independent point-to-point source coding for each of the two sources. The optimal code with blocklength $n$ and code sizes $(M_{1},M_{2})$ encodes $M_{1}$ arbitrary symbols in $\mathcal{X}_{1}^{n}$ and a cardinality- $M_{2}$ subset of $\mathcal{X}_{2}^{n}$ that has the largest probability with respect to $P_{X_{2}^{n}}$ . As a result, for any $0<\epsilon<1$ , there exists an $(M_{1},M_{2},\epsilon)$ MASC if and only if

[TABLE]

where $\delta=1-\min\left\{1,\,\frac{M_{1}}{|\mathcal{X}_{1}|^{n}}\right\}$ is the total marginal probability of symbols that are not encoded in $\mathcal{X}_{1}^{n}$ , and $\delta^{\prime}$ is the total marginal probability (with respect to $P_{X_{2}^{n}}$ ) of the encoded symbols in $\mathcal{X}_{2}^{n}$ . Eq. (G.72) implicitly requires $\delta\in[0,\epsilon]$ and $\delta^{\prime}\in[1-\epsilon,1]$ .

$\bullet$ For $\delta=0$ , we have

[TABLE]

In this case, (G.72) gives

[TABLE]

We can apply the point-to-point almost-lossless source coding results from Theorem 1 to obtain

[TABLE]

$\bullet$ For $0<\delta\leq\epsilon$ , we have

[TABLE]

In this case, (G.72) gives

[TABLE]

We can also apply the point-to-point results to get

[TABLE]

where for any fixed $\delta$ , $\xi_{\rm out}(\epsilon,\delta,n)=O\left(\frac{1}{n}\right)$ and $\xi_{\rm in}(\epsilon,\delta,n)=O\left(\frac{1}{n}\right)$ ; for any fixed $n$ , both $\xi_{\rm out}(\epsilon,\delta,n)$ and $\xi_{\rm in}(\epsilon,\delta,n)$ blow up as $\delta$ approaches $\epsilon$ (see Theorem 1 for the case where $\epsilon$ approaches [math]). ∎

Appendix H Proof of Corollary 21

When $X_{1}$ and $X_{2}$ are dependent, our choice of $\mathbf{R}=(R_{1},R_{2})$ in (197)–(199) implies that

[TABLE]

Define

[TABLE]

We have

[TABLE]

Let $\mathbf{Z}\triangleq(Z_{1},Z_{2},Z_{3})\sim\mathcal{N}(\mathbf{0},\mathsf{V})$ be a multivariate Gaussian in $\mathbb{R}^{3}$ , where $\mathsf{V}$ is the entropy dispersion matrix (see Definition 8). Then

[TABLE]

where (H.7) holds by the union bound. It follows that

[TABLE]

where (H.10) applies the Chernoff bound of the Q-function, and (H.11) holds since $a_{1}\geq\delta_{2}>0$ . Similarly,

[TABLE]

In contrast,

[TABLE]

Plugging (H.11)–(H.13) into (H.7), we conclude that for all $n$ sufficiently large such that

[TABLE]

the bound

[TABLE]

holds. Therefore, $\sqrt{n}\mathbf{a}\in\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon)$ , and hence $\overline{\mathbf{R}}\in\overline{\mathscr{R}}^{*}(n,\epsilon)$ (142).

Recall vector $\mathbf{a}$ defined in (H.3). With $R_{1}=H(X_{1})$ ,

[TABLE]

Note that

[TABLE]

Since $H(X_{1})-H(X_{1}|X_{2})>0$ , $\mathbb{P}\left[Z_{1}>a_{1}\sqrt{n}\right]$ decays exponentially in $n$ . Therefore, by the definition of $r^{*}$ in (202) and a first-order multivariate Taylor bound, $G>0$ in (200) can be chosen so that the right side of (H.19) is equal to $1-\epsilon$ , which implies that $\overline{\mathbf{R}}\in\overline{\mathscr{R}}^{*}(n,\epsilon)$ (142).

Conversely, for any $R_{2}$ such that $\overline{\mathbf{R}}\in\overline{\mathscr{R}}^{*}(n,\epsilon)$ ,

[TABLE]

which further implies

[TABLE]

Thus, by the definition of $r^{*}$ ,

[TABLE]

which is equivalent to (201). ∎

Appendix I Proof of Corollary 22

Fix any $\lambda\in[0,1]$ . Define

[TABLE]

By the assumption that $X_{1}$ and $X_{2}$ are independent, we have

[TABLE]

Denote

[TABLE]

Let $\mathbf{Z}\triangleq(Z_{1},Z_{2},Z_{3})\sim\mathcal{N}(\mathbf{0},\mathsf{V})$ be a multivariate Gaussian in $\mathbb{R}^{3}$ , where $\mathsf{V}$ is the entropy dispersion matrix of the independent sources $X_{1}$ and $X_{2}$ . It follows in this case that $Z_{1}$ and $Z_{2}$ are independent and $Z_{3}=Z_{1}+Z_{2}$ . We then have

[TABLE]

Thus, for any $r_{1}$ , $r_{2}$ such that

[TABLE]

$\mathbf{a}\in\frac{\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon)}{\sqrt{n}}$ and hence

[TABLE]

Therefore,

[TABLE]

On the other hand, for any $r_{1}$ , $r_{2}$ such that

[TABLE]

$\mathbf{a}\not\in\frac{\mathscr{Q}_{\rm inv}(\mathsf{V},\epsilon)}{\sqrt{n}}$ and hence $\overline{\mathbf{R}}\notin\overline{\mathscr{R}}^{*}(n,\epsilon)$ (142). Thus, (I) holds with equality.

Appendix J Proof of Theorem 23

The proof employs an extension of Han’s MASC converse [9, Lemma 7.2.2].

Given an $(L,M_{1},M_{2},\epsilon)$ CF-MASC $(\mathsf{L},\mathsf{f}_{1},\mathsf{f}_{2},\mathsf{g})$ , let

[TABLE]

Then $\mathbb{P}[\mathcal{S}^{c}]$ equals the code’s error probability, and

[TABLE]

where the bound on $|\mathcal{S}|$ is the number of distinct decoder inputs and the bounds on $|\mathcal{S}_{1}(x_{2})|$ and $|\mathcal{S}_{2}(x_{1})|$ are the number of distinct decoder inputs under fixed values of $x_{2}$ and $x_{1}$ and an $\ell$ -bit CF. Fix $\gamma>0$ . Define sets

[TABLE]

Then,

[TABLE]

where (J) follows the definition of $\mathcal{U}_{1}$ , (J.8) applies $1\{Z\leq z\}\leq z$ , and (J.9) holds by (J.2). Similarly,

[TABLE]

Thus,

[TABLE]

Rearranging (J.13) gives a lower bound on the error probability $\epsilon=\mathbb{P}\left[\mathcal{S}^{c}\right]$ . Thus, any $(L,M_{1},M_{2},\epsilon)$ CF-MASC must satisfy

[TABLE]

Particularizing (J.14) to stationary, memoryless sources with single-letter distribution $P_{X_{1}X_{2}}$ satisfying (139) and (140) shows that any $(n,L,M_{1},M_{2},\epsilon)$ CF-MASC must satisfy

[TABLE]

where $\gamma>0$ is an arbitrary constant, $I_{1}$ , $I_{2}$ , and $I_{12}$ are defined in (146), (147), and (148), and $\mathbf{U}_{i}$ is defined in (183). Let $L$ be a finite constant that does not grow with $n$ and let $\gamma=\frac{\log n}{2}-\log L$ . Applying Lemma 12 and Lemma 13-83 to bound the probability (J.16) in a manner similar to (182)–(187), we conclude that any $(n,\ell,\epsilon)$ -achievable rate pair $(R_{1},R_{2})$ must be in $\mathscr{R}_{\rm out}^{*}(n,\epsilon)$ (144). ∎

*Remark 16**.*

One could also prove Theorem 23 by extending our HT converse (Theorem 19 ) to the setting with a cooperation facilitator. Our Theorem 19 continues to hold with $M_{1}$ and $M_{2}$ replaced by $LM_{1}$ and $LM_{2}$ in (126) and (127), respectively ((128) remains unchanged).

Acknowledgment

The authors would like to thank the anonymous reviewer for the especially careful review that is reflected in the final version.

Bibliography44

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. Chen, M. Effros, and V. Kostina, “Lossless source coding in the point-to-point, multiple access, and random access scenarios,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT) , Jul. 2019, pp. 1692–1696.
2[2] SPECTRE: Short packet communication toolbox. [Online]. Available: https://github.com/yp-mit/spectre/tree/master/lossless-sc
3[3] C. E. Shannon, “A mathematical theory of communication,” The Bell System Technical Journal , vol. 27, no. 2, pp. 379–423 and 623–656, July and Oct. 1948.
4[4] V. Strassen, “Asymptotische abschäzungen in shannons informationstheorie,” in Proc. Trans. Third Prague Conf. Inf. Theory, Statist., Decision Funct., Random Process. , 1964, pp. 689–723.
5[5] Y. Polyanskiy, H. V. Poor, and S. Verdú, “Channel coding rate in the finite blocklength regime,” IEEE Trans. Inf. Theory , vol. 56, no. 5, pp. 2307–2359, May 2010.
6[6] I. Kontoyiannis and S. Verdú, “Optimal lossless data compression: Non-asymptotics and asymptotics,” IEEE Trans. Inf. Theory , vol. 60, no. 2, pp. 777–795, Feb. 2014.
7[7] V. Kostina, Y. Polyanskiy, and S. Verdú, “Variable-length compression allowing errors,” IEEE Trans. Inf. Theory , vol. 61, no. 8, pp. 4316–4330, Aug. 2015.
8[8] A. A. Yushkevich, “On limit theorems connected with the concept of entropy of Markov chains,” Uspekhi Mat. Nauk , vol. 8, no. 5(57), pp. 177–180, 1953.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Lossless Source Coding in the Point-to-Point,

Abstract

Index Terms:

I Introduction

II Notation

III Point-to-Point Source Coding

III-A Definitions

Definition 1** (Point-to-point source code).**

Definition 2** (Block point-to-point source code).**

Definition 3** (Minimum achievable rate).**

III-B Background

Theorem 1** (Kontoyiannis and Verdú [6]).**

Remark 1*.*

Remark 2*.*

Remark 3*.*

Theorem 2** (e.g. [24], [25, Th. 9.4]).**

III-C New Achievability Bounds Based on Random Coding

Theorem 3** (DT bound).**

Proof.

Theorem 4** (RCU bound).**

Proof.

Remark 4*.*

Theorem 5** (Third-order-optimal achievability via random coding).**

Theorem 6** (Berry-Esseen inequality).**

Lemma 7** ([5, Lemma 47]).**

Proof of Theorem 5.

IV Composite Hypothesis Testing

Definition 4**.**

IV-A Non-Asymptotic Bounds

Lemma 8** (Achievability).**

Proof.

Lemma 9** (Converse).**

Proof.

Lemma 10** (Variational lemma).**

Proof.

Remark 5*.*

IV-B Asymptotics for I.I.D. Distributions

Theorem 11** (Third-order-optimal asymptotics).**

Remark 6*.*

Lemma 12**.**

Proof.

Lemma 13**.**

Proof.

Proof of Theorem 11.

V Multiple Access Source Coding

V-A Definitions

Definition 5** (MASC).**

Definition 6** (Block MASC).**

Definition 7** ((n,ϵ)(n,\epsilon)(n,ϵ)-rate region).**

V-B Background

Theorem 14** (Achievability, Han [9, Lemma 7.2.1]).**

Theorem 15** (Converse, Han [9, Lemma 7.2.2]).**

Theorem 16** (LP-based converse, [15, Th. 12]).**

Definition 8** (Tan and Kosut [13, Def. 7]).**

Theorem 17** (Tan and Kosut [13, Th. 1]).**

Remark 7*.*

V-C New Non-Asymptotic Bounds

V-C1 Achievability

Theorem 18** (MASC RCU bound).**

Proof.

V-C2 Converse

Theorem 19** (Hypothesis testing (HT) converse).**

Proof.

Remark 8*.*

V-D Asymptotics: Third-Order MASC Rate Region

Theorem 20** (Third-order MASC rate region).**

Remark 9*.*

Proof of Theorem 20: achievability.

Proof of Theorem 20: converse.

Remark 10*.*

Remark 11*.*

Remark 12*.*

V-D1 Comparison with Point-to-Point Source Coding

Corollary 21**.**

Definition 1 (Point-to-point source code).

Definition 2 (Block point-to-point source code).

Definition 3 (Minimum achievable rate).

Theorem 1 (Kontoyiannis and Verdú [6]).

*Remark 1**.*

*Remark 2**.*

*Remark 3**.*

Theorem 2 (e.g. [24], [25, Th. 9.4]).

Theorem 3 (DT bound).

Theorem 4 (RCU bound).

*Remark 4**.*

Theorem 5 (Third-order-optimal achievability via random coding).

Theorem 6 (Berry-Esseen inequality).

Lemma 7 ([5, Lemma 47]).

Definition 4.

Lemma 8 (Achievability).

Lemma 9 (Converse).

Lemma 10 (Variational lemma).

*Remark 5**.*

Theorem 11 (Third-order-optimal asymptotics).

*Remark 6**.*

Lemma 12.

Lemma 13.

Definition 5 (MASC).

Definition 6 (Block MASC).

Definition 7 ( $(n,\epsilon)$ -rate region).

Theorem 14 (Achievability, Han [9, Lemma 7.2.1]).

Theorem 15 (Converse, Han [9, Lemma 7.2.2]).

Theorem 16 (LP-based converse, [15, Th. 12]).

Definition 8 (Tan and Kosut [13, Def. 7]).

Theorem 17 (Tan and Kosut [13, Th. 1]).

*Remark 7**.*

Theorem 18 (MASC RCU bound).

Theorem 19 (Hypothesis testing (HT) converse).

*Remark 8**.*

Theorem 20 (Third-order MASC rate region).

*Remark 9**.*

*Remark 10**.*

*Remark 11**.*

*Remark 12**.*

Corollary 21.

Corollary 22.

Definition 9 (CF-MASC).

Definition 10 (Block CF-MASC).

Definition 11 ( $(n,\ell,\epsilon)$ -CF rate region).

Theorem 23 (CF-MASC Converse).

*Remark 13**.*

*Remark 14**.*

Definition 12 (RASC).

Definition 13 (Block RASC).

Definition 14 ( $n$ -Valid and $\left(n,\overline{\boldsymbol{\epsilon}}_{K}\right)$ -Rate

Theorem 24 (Third-order RASC performance).

Lemma 25.

Theorem 26 (Random code).

*Remark 15**.*

G-A General Characterizations of the $(n,\epsilon)$ -Rate Region

Theorem 27.

Theorem 28.

Theorem 29.

Lemma 30 (Modified [9, Lemma 7.2.2]).

Theorem 31.

Theorem 32.

*Remark 16**.*