Benefits of Cache Assignment on Degraded Broadcast Channels

Shirin Saeedi Bidokhti; Michele Wigger; and Aylin Yener

arXiv:1702.08044·cs.IT·February 28, 2017

Benefits of Cache Assignment on Degraded Broadcast Channels

Shirin Saeedi Bidokhti, Michele Wigger, and Aylin Yener

PDF

TL;DR

This paper investigates how cache memory assignment affects the capacity of degraded broadcast channels, deriving bounds and optimal strategies for different cache size regimes, showing significant gains from non-uniform cache allocation.

Contribution

It introduces new coding schemes for cache assignment, derives capacity bounds, and characterizes optimal cache allocation strategies for degraded broadcast channels.

Findings

01

Optimal cache assignment depends on cache size regime.

02

Non-uniform cache allocation outperforms uniform in most regimes.

03

Global caching gain is achievable with small cache sizes.

Abstract

Degraded K-user broadcast channels (BC) are studied when receivers are facilitated with cache memories. Lower and upper bounds are derived on the capacity-memory tradeoff, i.e., on the largest rate of reliable communication over the BC as a function of the receivers' cache sizes, and the bounds are shown to match for some special cases. The lower bounds are achieved by two new coding schemes that benefit from non-uniform cache assignment. Lower and upper bounds are also established on the global capacity-memory tradeoff, i.e., on the largest capacity-memory tradeoff that can be attained by optimizing the receivers' cache sizes subject to a total cache memory budget. The bounds coincide when the total cache memory budget is sufficiently small or sufficiently large, characterized in terms of the BC statistics. For small cache memories, it is optimal to assign all the cache memory to the…

Equations439

w_{1} ⨁ w_{2}

w_{1} ⨁ w_{2}

D \geq K .

D \geq K .

\displaystyle g_{k}\colon\{1,\ldots,\lfloor 2^{nR}\rfloor\}^{D}\to\big{\{}1,\ldots,\lfloor 2^{n\mathsf{M}_{k}}\rfloor\big{\}},\qquad k\in\mathcal{K},

\displaystyle g_{k}\colon\{1,\ldots,\lfloor 2^{nR}\rfloor\}^{D}\to\big{\{}1,\ldots,\lfloor 2^{n\mathsf{M}_{k}}\rfloor\big{\}},\qquad k\in\mathcal{K},

V_{k} := g_{k} (W_{1}, \dots, W_{D})

V_{k} := g_{k} (W_{1}, \dots, W_{D})

Γ (y_{1}, \dots, y_{K} ∣ x), for x \in X, y_{1} \in Y_{1}, \dots, y_{K} \in Y_{K}

Γ (y_{1}, \dots, y_{K} ∣ x), for x \in X, y_{1} \in Y_{1}, \dots, y_{K} \in Y_{K}

Γ (y_{1}, \dots, y_{K} ∣ x) = Γ_{K} (y_{K} ∣ x) \cdot Γ_{K - 1} (y_{K_{1}} ∣ y_{K}) \dots Γ_{1} (y_{1} ∣ y_{2}) .

Γ (y_{1}, \dots, y_{K} ∣ x) = Γ_{K} (y_{K} ∣ x) \cdot Γ_{K - 1} (y_{K_{1}} ∣ y_{K}) \dots Γ_{1} (y_{1} ∣ y_{2}) .

d := (d_{1}, \dots, d_{K}) .

d := (d_{1}, \dots, d_{K}) .

X^{n} = f_{d} (W_{1}, \dots, W_{D})

X^{n} = f_{d} (W_{1}, \dots, W_{D})

\hat{W}_{k} := φ_{k, d} (Y_{k}^{n}, V_{k}),

\hat{W}_{k} := φ_{k, d} (Y_{k}^{n}, V_{k}),

\displaystyle\varphi_{k,\mathbf{d}}\colon\mathcal{Y}_{k}^{n}\times\big{\{}1,\ldots,\lfloor 2^{n\mathsf{M}_{k}}\rfloor\big{\}}\to\{1,\ldots,\lfloor 2^{nR}\rfloor\}.

\displaystyle\varphi_{k,\mathbf{d}}\colon\mathcal{Y}_{k}^{n}\times\big{\{}1,\ldots,\lfloor 2^{n\mathsf{M}_{k}}\rfloor\big{\}}\to\{1,\ldots,\lfloor 2^{nR}\rfloor\}.

{\mathsf{P}_{\text{e}}}:=\mathbb{P}\bigg{[}\ \bigcup_{\mathbf{d}\in\mathcal{D}^{K}}\bigcup_{k=1}^{K}\big{\{}\hat{W}_{k}\neq W_{d_{k}}\big{\}}\ \bigg{]}.

{\mathsf{P}_{\text{e}}}:=\mathbb{P}\bigg{[}\ \bigcup_{\mathbf{d}\in\mathcal{D}^{K}}\bigcup_{k=1}^{K}\big{\{}\hat{W}_{k}\neq W_{d_{k}}\big{\}}\ \bigg{]}.

C (M_{1}, \dots, M_{K}) := sup {R : (R, M_{1}, \dots, M_{K}) achievable} .

C (M_{1}, \dots, M_{K}) := sup {R : (R, M_{1}, \dots, M_{K}) achievable} .

k = 1 \sum K M_{k} \leq M .

k = 1 \sum K M_{k} \leq M .

C^{⋆} (M) := M_{1}, \dots, M_{K} > 0 : \sum_{k = 1}^{K} M_{k} \leq M max C (M_{1}, \dots, M_{K}) .

C^{⋆} (M) := M_{1}, \dots, M_{K} > 0 : \sum_{k = 1}^{K} M_{k} \leq M max C (M_{1}, \dots, M_{K}) .

R achievable with (M_{1}, \dots, M_{K})

R achievable with (M_{1}, \dots, M_{K})

⟺

\displaystyle\rho=\frac{1}{R}\ \textnormal{ achievable with }\bigg{(}m_{1}=\frac{\mathsf{M}_{1}}{R},\ldots,m_{K}=\frac{\mathsf{M}_{K}}{R}\bigg{)}

under the “source-coding perspective" .

M_{1} = \dots = M_{2} = 0,

M_{1} = \dots = M_{2} = 0,

C (M_{1} = 0, \dots, M_{K} = 0) = C_{K}

C (M_{1} = 0, \dots, M_{K} = 0) = C_{K}

C_{K}

C_{K}

U_{1} - U_{2} - \dots - U_{K - 1} - X - (Y_{1}, \dots, Y_{K})

U_{1} - U_{2} - \dots - U_{K - 1} - X - (Y_{1}, \dots, Y_{K})

P_{Y_{1} \dots Y_{K} ∣ X} (y_{1}, \dots, y_{K} ∣ x) = Γ (y_{1}, \dots, y_{K} ∣ x) .

S := {j_{1}, \dots, j_{∣ S ∣}} \subseteq K, j_{1} < \dots < j_{∣ S ∣} .

S := {j_{1}, \dots, j_{∣ S ∣}} \subseteq K, j_{1} < \dots < j_{∣ S ∣} .

U_{1}-U_{2}-\cdots-U_{|\mathcal{S}|-1}-X-\big{(}Y_{j_{1}},\ldots,Y_{j_{|\mathcal{S}|}}\big{)},

U_{1}-U_{2}-\cdots-U_{|\mathcal{S}|-1}-X-\big{(}Y_{j_{1}},\ldots,Y_{j_{|\mathcal{S}|}}\big{)},

R_{1}

R_{1}

R_{k}

R_{∣ S ∣}

C_{S} := R \geq 0 max {R : (R, \dots, R) \in C_{S}} .

C_{S} := R \geq 0 max {R : (R, \dots, R) \in C_{S}} .

C_{S}

C_{S}

C (M_{1} + Δ, \dots, M_{K} + Δ) \geq C (M_{1}, \dots, M_{K}) + \frac{Δ}{D} .

C (M_{1} + Δ, \dots, M_{K} + Δ) \geq C (M_{1}, \dots, M_{K}) + \frac{Δ}{D} .

C^{⋆} (M + Δ_{total}) \geq C^{⋆} (M) + \frac{Δ _{total}}{K \cdot D} .

C^{⋆} (M + Δ_{total}) \geq C^{⋆} (M) + \frac{Δ _{total}}{K \cdot D} .

I (U_{1}^{⋆}; Y_{1}) < I (U_{1}^{⋆}; Y_{k}), k \in {2, \dots, K},

I (U_{1}^{⋆}; Y_{1}) < I (U_{1}^{⋆}; Y_{k}), k \in {2, \dots, K},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Benefits of Cache Assignment on Degraded Broadcast Channels

Shirin Saeedi Bidokhti, Michèle Wigger, and Aylin Yener S. Saeedi Bidokhti is with the Department of Electrical Engineering at Stanford University, [email protected]. S. Saeedi Bidokhti is supported by the Swiss National Science Foundation fellowship no. 158487. M. Wigger is with LTCI, Telecom ParisTech, Université Paris-Saclay, 75013 Paris, [email protected]. A. Yener is with the Department of Electrical Engineering, School of Electrical Engineering and Computer Science at The Pennsylvania State University and the Department of Electrical Engineering at Stanford University, [email protected], [email protected]. Parts of the material in this paper have been submitted to the IEEE International Conference on Communications, Paris, May 2017, and to the IEEE International Symposium on Information Theory, Aachen, Germany, June 2017.

Abstract

Degraded $K$ -user broadcast channels (BC) are studied when receivers are facilitated with cache memories. Lower and upper bounds are derived on the capacity-memory tradeoff, i.e., on the largest rate of reliable communication over the BC as a function of the receivers’ cache sizes, and the bounds are shown to match for some special cases. The lower bounds are achieved by two new coding schemes that benefit from non-uniform cache assignment. Lower and upper bounds are also established on the global capacity-memory tradeoff, i.e., on the largest capacity-memory tradeoff that can be attained by optimizing the receivers’ cache sizes subject to a total cache memory budget. The bounds coincide when the total cache memory budget is sufficiently small or sufficiently large, characterized in terms of the BC statistics. For small cache memories, it is optimal to assign all the cache memory to the weakest receiver. In this regime, the global capacity-memory tradeoff grows as the total cache memory budget divided by the number of files in the system. In other words, a perfect global caching gain is achievable in this regime and the performance corresponds to a system where all cache contents in the network are available to all receivers. For large cache memories, it is optimal to assign a positive cache memory to every receiver such that the weaker receivers are assigned larger cache memories compared to the stronger receivers. In this regime, the growth rate of the global capacity-memory tradeoff is further divided by the number of users, which corresponds to a local caching gain. Numerical indicate suggest that a uniform cache-assignment of the total cache memory is suboptimal in all regimes unless the BC is completely symmetric. For erasure BCs, this claim is proved analytically in the regime of small cache-sizes.

I Introduction

Storing popular contents at or close to the end users improves the network performance during peak-traffic time. The main challenge is that the contents have to be cached before knowing which files the users will request in the peak-traffic period. A conventional approach is to store the same popular contents in the cache memories of the users. This allows the receivers to locally retrieve the contents without burdening the network. However, further caching gains, i.e., the so called global caching gains, are possible if different contents are stored at different users [1]. Specifically, a careful design of the cache contents creates coding opportunities to simultaneously serve multiple users during the peak-traffic periods, henceforth called the delivery phase.

In this paper, we focus on the scenario depicted in Figure 1. A transmitter communicates with receivers $1,\ldots,K$ which are equipped with cache memories. The delivery-phase communication takes place over a noisy broadcast channel (BC) where the receivers have access to cache memories.

The BC-model has previously been studied in [25, 27, 26, 28, 29, 30, 31, 36, 34, 35, 33, 32, 38, 37, 39, 40, 41, 42]. The simplified version where the BC is a common noise-free bit-pipe to all users was analyzed in[1, 2, 3, 4, 5, 6, 13, 8, 7, 12, 14, 9, 10, 11, 15, 16, 21, 22, 23, 24] under the assumption that all receivers have equal cache sizes, and in [15, 16] under the assumption that various receivers have different cache sizes. Caching was studied for many other scenarios, e.g., for interference networks [44, 45, 46], hierarchical networks [54, 56, 55], and cellular networks [47, 48, 46, 49, 50, 51, 52, 53].

In [25, 26, 27] the gains of caching in noisy broadcast networks are investigated. Specifically, we have proposed a joint cache-channel coding scheme and focused on erasure BCs with two sets of receivers: a set of cache-aided weak receivers (where each channel has the same erasure probability) and a set of strong receivers without cache memories (where each channel has the same erasure probability). Previous works have adapted a separate cache-channel coding architecture where the encoders (resp. decoders) consist of a cache encoder (resp. decoder) that only exploits the cache contents and a channel encoder (resp. decoder) that only exploits the channel statistics; see Figure 2. By contrast, in a joint cache-channel coding scheme, the encoders and decoders simultaneously exploit the knowledge of the channel statistics and the cache contents, leading to improved performance.

The joint cache-channel coding scheme in [25, 26, 27] loads (piggybacks) the information that is intended for the strong receivers, but is already cached at the weaker receivers, onto the information that is communicated to the weak receivers111 The proposed piggyback coding can be seen as a simplified version without binning etc. of “Slepian-Wolf coding over broadcast channels” in [58], which applies to more general scenarios.. When the rate of the piggybacked information is modest, this can be done without harming the decoding performance at the strong receivers. In some sense, piggyback coding provides the stronger receivers virtual access to the weaker receivers’ cache-memories as if these cache contents were locally present at the stronger receivers.

The previous works [25, 26, 27] have shown that when different receivers have different channel statistics, then assigning larger cache memories to the weaker receivers significantly improves the performance compared to the traditional uniform cache assignment. In addition to mitigating the rate-bottleneck at the weaker receivers, non-uniform cache assignment allows to achieve new global caching gains by the means of joint cache-channel coding [25, 26, 27, 32].

Motivated by the new gains of caching in noisy broadcast networks, in this work, we address the problem of efficient cache assignment in broadcast networks and devise two joint cache-channel coding schemes by using piggyback coding, superposition coding, and coded caching.

I-A Main Contributions and Implications

The main contributions of the paper are as follows:

•

Superposition-Piggyback Coding: We generalize the piggyback-coding scheme of [26], that is specific for erasure BCs, to arbitrary BCs with a cache memory only at the weakest receiver and account for different channel qualities in the network by employing superposition coding. We show that this scheme is optimal for small cache memory sizes.

•

Generalized Coded-Caching: The coded-caching scheme in [1] is generalized to noisy BCs with unequal cache sizes. The scheme is optimal for a particular cache assignment.

•

A New Converse Result: A general converse result is provided for degraded BCs with arbitrary cache sizes at the receivers. It strictly improves over the existing converse results for degraded BCs in [26, 28] and for the noise-free bit-pipe model in [1, 10, 6, 19, 9].

•

Global Capacity-Memory Tradeoff: Lower and upper bounds are derived on the global capacity-memory tradeoff. They are shown to match when the total available cache memory is small or large. Suboptimality of the popular approach of assigning equal cache memory to all receivers is proved analytically for erasure BCs in the small cache size regime and shown numerically for erasure and Gaussian BCs in all regimes of cache sizes.

More specifically, we first propose a coding scheme that we call superposition piggyback-coding by assuming that only the weakest receiver has a cache memory. Using this scheme all receivers gain virtual access to the weakest receiver’s cache memory as if the cache contents were locally present at each of these receivers.

The second scheme generalizes the coded caching in [1] to account for different channel statistics and different cache sizes at the receivers. We assign larger cache sizes to the weaker receivers and use piggyback coding to transmit higher rates of information to the stronger receivers without harming the communication to the weaker receivers. As a consequence, the amount of the virtual cache memory that is provided to the stronger receivers increases compared to the original coded-caching scheme, resulting in an improved performance.

The performance criterion of interest in this paper is the capacity-memory tradeoff. That is, the largest rate, as a function of the available cache memories, so that the transmitter can reliably send the messages demanded by the receivers over the noisy BC.

We present a new upper bound on the capacity-memory tradeoff of degraded BCs222Since for our purposes only the conditional marginal distributions matter, it suffices that the BC is stochastically degraded [61].) that improves the previous upper bound in [28, 26]. Using the upper bound, we show the optimality of the superposition piggyback-coding scheme when only the weakest receiver has a cache memory and its size is below a certain threshold that depends on the BC statistics. Moreover, we show that the generalized coded-caching scheme is optimal for a particular cache assignment.

When the BC is a noise-free bit-pipe, the upper bound on the capacity-memory tradeoff leads to a lower bound on the delivery rate-memory tradeoff that improves the previous lower bounds in [1, 10, 6, 19, 9].

The upper bound is asymmetric in the cache sizes: the cache memories at the weaker receivers increase the upper bound more than the cache memories at the stronger receivers. In this sense, the upper bound reinforces the intuition obtained from the lower bounds that the capacity-memory tradeoff increases when larger cache memories are assigned to the weaker receivers as compared to the stronger receivers. To make this statement more precise, we derive upper and lower bounds on the global capacity-memory tradeoff, where one is allowed to optimize over the cache assignment subject to a global cache constraint. The lower bound is obtained using the following cache-assignment strategy and coding schemes:

•

For a small total cache-size $\mathsf{M}$ , all of it is assigned to the weakest receiver, and superposition piggyback-coding is applied. This strategy is optimal in the small total cache-size regime and achieves a global capacity-memory tradeoff that grows as $\frac{\mathsf{M}}{D}$ , where $D$ denotes the total number of files. Thus, in this regime a perfect global caching gain is achieved, i.e., the same performance as in a systems where all cache memories in the network are accessible by all the receivers.

•

For moderate total cache-size $\mathsf{M}$ , generalized coded-caching with parameters $t=1,\ldots,K-1$ and the corresponding cache-assignments are employed. The larger the total cache-size, the larger the parameter $t$ needs to be chosen. However, the larger $t$ , the smaller the global caching gain, since with increasing $t$ the overlap of the different cache contents increases as well, and duplicated cache contents cannot provide global caching gain.

•

When the total cache-size $\mathsf{M}$ equals the total cache memory of generalized coded-caching with parameter $t=K-1$ , then generalized coded-caching is optimal. For total cache memories exceeding this threshold, it is optimal to uniformly assign the additional cache memory across the $K$ receivers. This additional cache memory can only bring local caching gain and the same content can be stored at all the $K$ receivers. In other words, for total cache memory exceeding a threshold, the global capacity-memory tradeoff grows as $\frac{1}{K}\cdot\frac{\mathsf{M}}{D}$ .

Finally, this paper proves analytically that for erasure BCs a uniform cache allocation is strictly suboptimal in the regime of small cache memories, unless all receivers have same channel statistics. Numerical simulations show that the same holds for all regimes of cache memory and also for Gaussian BCs.

I-B Notation

Random variables are denoted by uppercase letters, e.g. $A$ , their alphabets by matching calligraphic font, e.g. $\mathcal{A}$ , and elements of an alphabet by lowercase letters, e.g. $a\in\mathcal{A}$ . We also use uppercase letters for deterministic quantities like rate $R$ , capacity $\mathsf{C}$ , number of users $K$ , cache size $\mathsf{M}$ , and number of files in the library $D$ . Vectors are identified by bold font symbols, e.g., $\mathbf{a}$ , and matrices by the font $\mathsf{A}$ . We use the shorthand notation $A^{n}$ for $(A_{1},\ldots,A_{n})$ . The Cartesian product of $\mathcal{A}$ and $\mathcal{A}^{\prime}$ is $\mathcal{A}\times\mathcal{A}^{\prime}$ , and the $n$ -fold Cartesian product of $\mathcal{A}$ is $\mathcal{A}^{n}$ . $|\mathcal{A}|$ denotes the cardinality of $\mathcal{A}$ .

Finally, for indices $w_{1}$ and $w_{2}$ taking value in $\big{\{}1,\ldots,\lfloor 2^{\ell_{1}}\rfloor\big{\}}$ and $\{1,\ldots,\lfloor 2^{\ell_{2}}\rfloor\}$ , respectively, we denote by

[TABLE]

the index in $\{1,\ldots,\lfloor 2^{\ell_{\max}}\rfloor\}$ that corresponds to the XOR of the length- $\ell_{\max}$ binary representations of $w_{1}$ and $w_{2}$ , where $\ell_{\max}:=\max\{\ell_{1},\ell_{2}\}$ .

We will be using the abbreviation i.i.d. for independent and identically distributed.

I-C Outline

The remainder of the paper is organized as follows. Section II describes the problem setup. Section III recalls known results for the scenario without cache memories. The main results of this paper are described in Sections IV and V, followed by applications of these results to erasure and Gaussian BCs, see Section VI. The paper is concluded with a summary and conclusions, Section VII and various technical appendices contain the proofs of the results in Sections V and VI.

II Problem Definition

Consider a transmitter and receivers $1,\ldots,K$ . The transmitter has access to a library with $D$ independent messages, $W_{1},\ldots,W_{D}$ , each distributed uniformly over the set $\big{\{}1,\ldots,\lfloor 2^{nR}\rfloor\big{\}}.$ Here, $R\geq 0$ denotes the rate of transmission and $n$ is the transmission blocklength. We assume that there are more messages than receivers:

[TABLE]

Each receiver $k\in\mathcal{K}:=\{1,\ldots,K\}$ is equipped with a cache of size $\mathsf{M}_{k}\geq 0$ . Communication takes place in two phases. For the first, i.e., the placement phase, the transmitter chooses caching functions

[TABLE]

and places

[TABLE]

in receiver $k$ ’s cache. This phase takes place in a noiseless fashion.333Following previous works on caching systems, we will also assume that the placement phase takes place in low-traffic hours with abundance of bandwidth resources, and can be considered noiseless.

The subsequent delivery phase takes place over a degraded BC [59] with finite input alphabet $\mathcal{X}$ , finite output alphabets $\mathcal{Y}_{1},\ldots,\mathcal{Y}_{K}$ ,444The results of this paper readily extends to continuous alphabets. We will consider Gaussian BCs in Section VI-C. and channel transition law

[TABLE]

which decomposes as

[TABLE]

Without loss in generality, we order the receivers from the weakest to the strongest.

At the beginning of the delivery phase, each receiver $k$ demands message $W_{d_{k}}$ , $d_{k}\in\mathcal{D}:=\{1,\ldots,D\}$ . Transmitter and all the receivers are informed of the demand vector

[TABLE]

Using this information, the transmitter forms the channel input sequence $X^{n}=(X_{1},\ldots,X_{n})$ as

[TABLE]

for some encoding function $f_{\mathbf{d}}:\{1,\ldots,\lfloor 2^{nR}\rfloor\}^{D}\to\mathcal{X}^{n}.$

Receiver $k\in\mathcal{K}$ observes the channel output sequence $Y_{k}^{n}:=(Y_{k,1},$ $\ldots,Y_{k,n})$ . Given the demand vector $\mathbf{d}$ , cache content $\mathbb{V}_{k}$ , and channel outputs $Y_{k}^{n}$ , it produces its estimate of the desired message $W_{d_{k}}$ ,

[TABLE]

by means of a decoding function

[TABLE]

The worst-case probability of error at any receiver and any demand $\mathbf{d}$ is given by

[TABLE]

A rate-memory tuple $(R,\mathsf{M}_{1},\ldots,\mathsf{M}_{K})$ is achievable if for any $\epsilon>0$ there exists a sufficiently large blocklength $n$ and caching, encoding, and decoding functions as in (3), (6), and (7) so that ${\mathsf{P}_{\text{e}}}\leq\epsilon$ .

Definition 1

The capacity-memory tradeoff $\mathsf{C}(\mathsf{M}_{1},\ldots,\mathsf{M}_{K})$ is the largest rate $R$ for which the rate-memory tuple $(R,\mathsf{M}_{1},\ldots,\mathsf{M}_{K})$ is achievable:

[TABLE]

Our main goal in this paper is to optimize the cache assignment $(\mathsf{M}_{1},\ldots,\mathsf{M}_{K})$ to attain the largest capacity-memory tradeoff $\mathsf{C}(\mathsf{M}_{1},\ldots,\mathsf{M}_{K})$ under the total cache constraint:

[TABLE]

Definition 2

The global capacity-memory tradeoff $\mathsf{C}^{\star}(\mathsf{M})$ is defined as:

[TABLE]

Remark 1

The global capacity memory tradeoff depends on the BC law $\Gamma(y_{1},\ldots,y_{K}|x)$ only through its marginal conditional laws. All our results thus also apply to stochastically degraded BCs.

II-A Minimum Delivery Rate

Previous works on caching that modelled the BC as a noise-free bit-pipe, e.g., [1], adopted a “source-coding perspective” as opposed to a “channel coding perspective” as we have presented above. In the source coding perspective, each message is an $F>0$ bits packet, the delivery communication consists of $\rho\cdot F$ channel uses, and receiver $k$ has $m_{k}F$ bits of cache memory, $k=1,\ldots,K$ . The delivery rate $\rho$ is said to be achievable given normalized memory sizes $m_{1},\ldots,m_{K}$ if there exist caching, encoding, and decoding functions such that the probability of error in (9) tends to 0 as $F\to\infty$ .

The following correspondence holds between the two perspectives:

[TABLE]

For simplicity, we will adopt the “source-coding perspective” in Section VI-B where we specialize the new upper bound on the capacity-memory tradeoff $\mathsf{C}(\mathsf{M}_{1},\ldots,\mathsf{M}_{K})$ to the noise-free bit-pipe model with uniform cache assignment in [1]. For other BCs, we use the “channel-coding perspective” in line with similar setups in network information theory.

III Preliminaries: Capacities without Cache Memories

In the absence of cache memories,

[TABLE]

the capacity-memory tradeoff $\mathsf{C}(\mathsf{M}_{1}=0,\ldots,\mathsf{M}_{K}=0)$ is well known: It is the largest symmetric rate $R$ with which $K$ independent messages can be reliably sent to the $K$ receivers. I.e.,

[TABLE]

where [59]:

[TABLE]

and the maximization in (13) is over all random tuples $U_{1},\ldots,U_{K-1},X,Y_{1},\ldots,Y_{K}$ forming the Markov chain

[TABLE]

To present the results in this paper, we will need the capacity region without cache memories of the BC to a subset of the receivers

[TABLE]

This capacity region $\mathbf{C}_{\mathcal{S}}$ [59] is given by the set of all nonnegative rate-tuples $(R_{1},\ldots,R_{|\mathcal{S}|})$ for which there exist random variables $U_{1},\ldots,U_{|\mathcal{S}|-1},X,Y_{j_{1}},\ldots,Y_{j_{|}\mathcal{S}|}$ satisfying (14b) and forming the Markov chain

[TABLE]

such that the following conditions hold:

[TABLE]

We denote by $\mathsf{C}_{\mathcal{S}}$ the largest symmetric rate $R\geq 0$ in $\mathbf{C}_{\mathcal{S}}$ :

[TABLE]

It equals

[TABLE]

where the maximization is over all random tuples $U_{1},\ldots,U_{|\mathcal{S}|-1},X,Y_{j_{1}},\ldots,Y_{j_{|\mathcal{S}|}}$ that satisfy (14b) and (16).

Notice that $\mathsf{C}_{\{k\}}$ is simply the point-to-point capacity to receiver $k$ and we will abbreviate it as $\mathsf{C}_{k}$ .

IV Coding Schemes and Lower Bounds on the (Global) Capacity-Memory Tradeoff

IV-A The Local Caching Gain

The simplest way to use receiver cache memories is to store the same information at each and every receiver. This allows the receivers to retrieve this information locally, without transmission over the BC. Further global caching gains are not possible under this caching strategy.

Applying the described caching strategy to only a part of the cache memory that is of size $\Delta\geq 0$ , while allowing a smarter use of the remaining memory, leads to the following proposition, see also [47, Proposition 1].

Proposition 1 (Local caching gain)

For all $\Delta>0$ and $\mathsf{M}_{1},\ldots,\mathsf{M}_{K}\geq 0$ :

[TABLE]

As a consequence, for all $\Delta_{\textnormal{total}}>0$ and $\mathsf{M}\geq 0$ :

[TABLE]

We will see that in some regimes this lower bound is tight.

IV-B Superposition Piggyback-Coding

We generalize the piggyback coding for erasure BCs in [26, 28] to general degraded BCs by introducing superposition coding. The idea is to piggyback information of multiple stronger receivers on that of a single weak receiver. This scheme is efficient when a receiver is strictly weaker than the others. Specifically, we assume

[TABLE]

where $(U_{1}^{\star},\ldots,U_{K-1}^{\star},X^{\star})$ is a random $K$ -tuple that achieves the symmetric-capacity $\mathsf{C}_{\mathcal{K}}$ , i.e., it is a solution to the optimization problem in (13).

Preliminaries: Let $\epsilon>0$ be arbitrary small, and define the rates

[TABLE]

The RHS of (23b) is positive by (22).

Split each message $W_{d}$ , $d\in\{1,\ldots,D\}$ , into two parts:

[TABLE]

where $W_{d}^{(\textnormal{A})}$ and $W_{d}^{(\textnormal{B})}$ are of rates $R^{(\textnormal{A})}$ and $R^{(\textnormal{B})}$ , and thus the total message rate is

[TABLE]

Define

[TABLE]

and allocate the cache size

[TABLE]

Placement Phase: Store $W_{1}^{(\textnormal{B})},\ldots,W_{D}^{(\textnormal{B})}$ in the cache memory of receiver $1$ . This is possible by (26a).

Delivery Phase: For the transmission in the delivery phase, construct a $K$ -level superposition code $\mathcal{C}$ with a cloud center of rate $R^{(\textnormal{A})}+(K-1)R^{(\textnormal{B})}$ and satellites of rates $R^{(\textnormal{A})}$ in Levels $2,\ldots,K$ . For the code construction, use a probability distribution

[TABLE]

that achieves $\mathsf{C}_{\mathcal{K}}$ .

It will be convenient to arrange the codewords in the cloud center in an array with $\lfloor 2^{nR^{(\textnormal{A})}}\rfloor$ columns and $(\lfloor 2^{nR^{(\textnormal{B})}}\rfloor)^{K-1}$ rows. The columns are used to encode message $W_{d_{1}}^{(\textnormal{A})}$ and the rows to encode the message tuple

[TABLE]

The $k$ -th level satellite is used to encode message $W_{d_{k}}^{(\textnormal{A})}$ , for $k\in\{2,\ldots,K\}$ . See Figure 3 for an illustration of the code construction.

Let $u_{1}^{n}(w_{1,\textnormal{column}},w_{1,\textnormal{row}})$ denote the cloud-center codeword of $\mathcal{C}$ in column $w_{1,\textnormal{column}}$ and row $w_{1,\textnormal{row}}$ . Similarly, let $x^{n}(w_{1,\textnormal{column}},w_{1,\textnormal{row}};w_{2};w_{3};\ldots;w_{K})$ denote the Level- $K$ satellite codeword of $\mathcal{C}$ that corresponds to the cloud center codeword $u_{!}^{n}(w_{1,\textnormal{column}},w_{1,\textnormal{row}})$ and to the $w_{2}$ -th, $w_{3}$ -th , $w_{4}$ -th, etc. satellite codewords in Levels $2,3,4,\ldots$ .

The transmitter chooses and sends the codeword

[TABLE]

over the channel.

Decoding: Receiver $k\in\{2,\ldots,K\}$ , decodes all messages in Levels $1,\ldots,k$ . Recall that its desired message parts $W_{d_{k}}^{(\textnormal{A})}$ and $W_{d_{k}}^{(\textnormal{B})}$ are encoded in levels $k$ and $1$ (i.e., the cloud center), respectively.

Receiver $1$ only has to decode $W_{d_{1}}^{(\textnormal{A})}$ , because it can retrieve $W_{d_{1}}^{(\textnormal{B})}$ directly from its cache memory. To decode $W_{d_{1}}^{(\textnormal{A})}$ it performs the following steps:

It retrieves the message-tuple $\mathbf{W}^{(\textnormal{B})}$ from its cache memory. 2. 2.

It forms the subcodebook $\mathcal{C}^{\prime}(\mathbf{W}^{(\textnormal{B})})\subseteq\mathcal{C}$ that contains all level- $1$ codewords that are “compatible” with the retrieved tuple $\mathbf{W}^{(\textnormal{B})}$ :

[TABLE]

Figure 3 illustrates such a subcodebook in red. 3. 3.

It decodes its desired message $W_{d_{1}}^{(\textnormal{A})}$ using an optimal decoding rule for subcodebook $\mathcal{C}^{\prime}(\mathbf{W}^{(\textnormal{B})})$ .

Error Analysis: Each receiver $k\in\{2,\ldots,K\}$ reliably decodes messages $(W_{d_{1}}^{(\textnormal{A})},W_{d_{2}}^{(\textnormal{B})},\ldots,W_{d_{K}}^{(\textnormal{B})})$ and $W_{d_{2}}^{(\textnormal{A})},\ldots,W_{d_{k}}^{(\textnormal{A})}$ if the following inequalities hold:

[TABLE]

One can verify that for degraded BCs the choice of $R^{(\textnormal{A})}$ and $R^{(\textnormal{B})}$ in (23) satisfies the constraints in (29).

Finally, receiver 1 can decode with arbitrarily small probability of error because subcodebook $\mathcal{C}^{\prime}(\mathbf{W}^{(\textnormal{B})})$ contains $\lfloor 2^{nR^{(\textnormal{A})}}\rfloor$ codewords that are generated i.i.d. according to $P_{U_{1}^{\star}}$ and because

[TABLE]

Letting $\epsilon\to 0$ , we obtain the following result.

Theorem 2

Under cache assignment (26), we have

[TABLE]

Remark 2

Since receivers can always choose to ignore their cache memories, and because the superposition piggyback coding scheme can be time- and memory-shared with a no-caching scheme, Theorem 2 remains valid for all

[TABLE]

We will see in Corollary 7 ahead, that (30) holds with equality for all $0\leq\mathsf{M}_{1}\leq\mathsf{M}_{1}^{\mathsf{single}}$ provided that $\mathsf{M}_{2}=\ldots=\mathsf{M}_{K}=0$ .

The RHS of (30) coincides with the capacity-memory tradeoff of a scenario where each and every receiver has access to receiver 1’s cache memory. Superposition piggyback coding can thus be viewed as a coding technique that virtually provides all stronger receivers access to the weakest receiver’s cache memory. This is achieved by transmitting the extra-message tuple $\mathbf{W}^{(\textnormal{B})}$ in the cloud center and by adapting the decoding at receiver 1 in a way that this additional communication does not influence its decoding performance.

IV-C Generalized Coded-Caching

We generalize the coded-caching scheme of [1] to noisy BCs with unequal channel conditions and to receivers with unequal cache sizes.

We first explain the scheme for a simple special case.

IV-C1 Special Case $K=2$ and $t=1$

Fix an input distribution $P_{X}$ and a small $\epsilon>0$ , and define the rates

[TABLE]

Notice that by the degradedness of the BC:

[TABLE]

Fix a blocklength $n$ and generate a random codebook

[TABLE]

by choosing all entries i.i.d. according to $P_{X}$ . The codebook $\mathcal{C}$ is revealed to all terminals of the network.

Allocate cache memories

[TABLE]

to receivers 1 and 2, respectively.

Split each message $W_{d}$ , for $d\in\{1,\ldots,D\}$ , into two parts:

[TABLE]

which are of rates $R^{(\textnormal{A})}$ and $R^{(\textnormal{B})}$ , respectively.

In the caching phase, the transmitter stores messages

[TABLE]

in receiver 1’s cache memory and messages

[TABLE]

in receiver 2’s cache memory. This is possible given the cache assignment in (37).

In the delivery phase the transmitter uses codebook $\mathcal{C}$ to send the XOR message555Recall that in Section I-B we defined the XOR operation $\bar{\oplus}$ over the binary representations of the two messages of same length.

[TABLE]

to both receivers using the codeword

[TABLE]

Receiver 2 decodes the XOR-message, and XORs the decoded message with $W_{d_{1}}^{(\textnormal{A})}$ , which it has stored in its cache memory. It then combines this guess of $W_{d_{2}}^{(\textnormal{B})}$ with the message $W_{d_{2}}^{(\textnormal{A})}$ from its cache memory.

Receiver 1 performs joint cache-channel decoding where it can exploit that it has more cache memory than receiver 2. Specifically, it retrieves $W_{d_{2}}^{(\textnormal{B})}$ from its cache memory, and extracts a subcodebook $\mathcal{C}^{\prime}(W_{d_{2}}^{(\textnormal{B})})\subseteq\mathcal{C}$ containing all codewords that are compatible with $W_{d_{2}}^{(\textnormal{B})}$ :

[TABLE]

Note that subcodebook $\mathcal{C}^{\prime}(W_{d_{2}}^{(\textnormal{B})})$ is of rate $R^{(\textnormal{A})}$ which is smaller than the rate $R^{(\textnormal{B})}$ of the original codebook $\mathcal{C}$ .

Receiver 1 then decodes the XOR message in (38) using an optimal decoding rule for this subcodebook $\mathcal{C}^{\prime}(W_{d_{2}}^{(\textnormal{B})})$ , and it XORs the decoded message with $W_{d_{2}}^{(\textnormal{B})}$ , which it has stored in its cache memory. It then combines the resulting guess of $W_{d_{1}}^{(\textnormal{A})}$ with the message $W_{d_{1}}^{(\textnormal{B})}$ from its cache memory.

Since both receivers correctly guess their desired messages $W_{d_{1}}$ and $W_{d_{2}}$ whenever they successfully decode the XOR-message in (38), and since the rate $R^{(\textnormal{B})}$ of the original codebook $\mathcal{C}$ satisfies

[TABLE]

and the rate of $R^{(\textnormal{A})}$ of the subcodebook $\mathcal{C}^{\prime}(W_{d_{2}}^{(\textnormal{B})})$ satisfies

[TABLE]

the probability of decoding error at both receivers tends to 0 as the blocklength $n$ tends to infinity.

Letting $\epsilon\to 0$ , we conclude that for $K=2$ the rate-memory triple

[TABLE]

is achievable.

Notice that the weaker receiver 1 is assigned a larger cache memory than the stronger receiver 2:

[TABLE]

The described scheme can also be applied with a uniform cache assignment $\mathsf{M}_{1}=\mathsf{M}_{2}=D\cdot R^{(\textnormal{A})}$ , however at the cost of a decreased achievable rate $R=2\cdot I(X;Y_{1})$ . In fact, assigning a larger cache memory $\mathsf{M}_{1}$ to receiver 1 allows to transmit more information to receiver 2 during the communication to receiver 1.

IV-C2 General Scheme

We will need the following definitions. Let for each $t\in\mathcal{K}$

[TABLE]

Pick a small number $\epsilon>0$ and an input distribution $P_{X}$ . Pick further a parameter $t\in\{1,\ldots,K-1\}$ , and assign the following cache size to receiver $k\in\mathcal{K}$ :

[TABLE]

Notice that

[TABLE]

so a larger cache memory is assigned the weaker a receiver is.

Split each message $W_{d}$ into ${K\choose t}$ independent submessages:

[TABLE]

where each submessage $W_{d,\mathcal{G}_{\ell}^{(t)}}$ is of rate

[TABLE]

The total message rate is thus

[TABLE]

Notice that when $t=K-1$ the denominator of (44), (45), and (46) all equal $1$ .

Placement Phase: For each $d\in\{1,\ldots,D\}$ , store the tuple

[TABLE]

in the cache memory of receiver $k\in\mathcal{K}$ . This is possible by (45) and the cache assignment in (44).

Delivery Phase: Transmission in the delivery phase takes place in ${K\choose t+1}$ subphases.

A given subphase $j\in\big{\{}1,\ldots,{K\choose t+1}\big{\}}$ is of length

[TABLE]

and is used to transmit messages

[TABLE]

to the intended receivers in $\mathcal{G}_{j}^{(t+1)}$ . For this purpose, the transmitter creates the XOR message

[TABLE]

which is of rate

[TABLE]

and generates a codebook

[TABLE]

by drawing all entries i.i.d. according to $P_{X}$ .

The transmitter then sends the codeword

[TABLE]

over the channel.

We now describe the decoding. Each receiver $k\in\mathcal{K}$ can retrieve messages

[TABLE]

directly from its cache, see (47), and thus only needs to decode messages

[TABLE]

For each $j\in\{1,\ldots,{K\choose t+1}\}$ and $k\in\mathcal{G}_{j}^{(t+1)}$ , receiver $k$ decodes message $W_{d_{k},\mathcal{G}_{j}^{(t+1)}\backslash\{k\}}$ from its subphase- $j$ outputs

[TABLE]

Specifically, with the messages stored in its cache memory, it forms the XOR message

[TABLE]

and it extracts a subcodebook $\mathcal{C}_{j,k}^{\prime}(W_{\textnormal{XOR},j,k})$ from $\mathcal{C}_{j}$ that contains all codewords that are compatible with $W_{\textnormal{XOR},j,k}$ :

[TABLE]

It then decodes the XOR message ${W}_{\textnormal{XOR},\mathcal{G}_{j}^{(t+1)}}$ by applying an optimal decoding rule for subcodebook $\mathcal{C}_{j,k}^{\prime}(W_{\textnormal{XOR},j,k})$ to the subphase- $j$ outputs $Y_{k,j}^{n_{j}}$ , and XORs the resulting guess $\hat{W}_{\textnormal{XOR},\mathcal{G}_{j}^{(t+1)}}$ with $W_{\textnormal{XOR},j,k}$ to obtain

[TABLE]

After the last sub-phase ${K\choose t+1}$ , each receiver $k\in\mathcal{K}$ has decoded all its missing messages in (55), and can thus produce a final guess of message $W_{d_{k}}$ .

Error Analysis: If each XOR-message ${W}_{\textnormal{XOR},\mathcal{G}_{j}^{(t+1)}}$ is decoded correctly by all its intended receivers in $\mathcal{G}_{j}^{(t+1)}$ , $j=1,\ldots,{K\choose t+1}$ , then all receivers $1,\ldots,K$ produce the correct estimate of their desired messages $W_{d_{1}},\ldots,W_{d_{K}}$ .

The probability that receiver $k\in\mathcal{G}_{j}^{(t+1)}$ wrongly decodes the XOR message ${W}_{\textnormal{XOR},\mathcal{G}_{j}^{(t+1)}}$ tends to 0 as $n$ (and thus $n_{j}$ ) $\to\infty$ because the rate of the subcodebook $\mathcal{C}_{j,k}^{\prime}$ satisfies

[TABLE]

see (45) and (48).

By letting $\epsilon\to 0$ , we conclude the following result.

Theorem 3

Fix a $t\in\{1,\ldots,K-1\}$ and an input distribution $P_{X}$ , and consider the corresponding cache assignment in (44). Then,

[TABLE]

where $R^{(t)}$ is calculated from $P_{X}$ as described in (46).

As we will see in Corollary 8, the Inequality in (58) holds with equality for $t=K-1$ .

IV-D Lower Bound on $\mathsf{C}^{\star}(\mathsf{M})$

Proposition 1 and Theorems 2 and 3 readily yield a lower bound on $\mathsf{C}^{\star}(\mathsf{M})$ . As we will see in Corollary 11 ahead, this lower bound is exact in the regimes of small and large total cache size $\mathsf{M}$ .

Let

[TABLE]

Proposition 4

For any $P_{X}$ , all rate-memory pairs in (59) are achievable. By time- and memory-sharing arguments, the upper-convex envelope of all these rate-memory pairs lower bounds $\mathsf{C}^{\star}(\mathsf{M})$ :

[TABLE]

Notice that for any $P_{X}$ :

[TABLE]

and

[TABLE]

V Upper Bounds and Exact Results on Global Capacity-Memory Tradeoff

V-A Results on $\mathsf{C}(\mathsf{M}_{1},\ldots,\mathsf{M}_{K})$

The upper bound is formulated in terms of the following parameters. For each receiver set $\mathcal{S}$ as in (15), define

[TABLE]

Theorem 5

There exist random variables $X,Y_{1},\ldots,Y_{K}$ and for every receiver set $\mathcal{S}$ as in (15) random variables $\{U_{\mathcal{S},1},\ldots,U_{\mathcal{S},{|\mathcal{S}|-1}}\}$ so that the channel law (14b) and the Markov chain

[TABLE]

hold and so that for each $\mathcal{S}$ :

[TABLE]

Proof:

See Appendix A.∎

Without cache memories, $\mathsf{M}_{1}=\ldots=\mathsf{M}_{K}=0$ , the parameters $\alpha_{\mathcal{S},1}^{\star},\ldots,\alpha_{\mathcal{S},|\mathcal{S}|}^{\star}$ equal 0 for all $\mathcal{S}\subseteq\{1,\ldots,K\}$ , and the upper bound in Theorem 5 recovers the exact capacity-memory tradeoff $\mathsf{C}_{\mathcal{K}}$ in (13).

The upper bound in Theorem 5 is asymmetric in the different cache sizes $\mathsf{M}_{1},\mathsf{M}_{2},\ldots,\mathsf{M}_{K}$ , because the parameters $\alpha_{\mathcal{S},j_{i}}^{\star}$ are not symmetric. In fact, increasing the cache memories at weaker receivers generally increases the upper bound more than increasing the cache memories at stronger receivers.

The converse in Theorem 5 is weakened if constraints (65) are ignored for certain receiver sets $\mathcal{S}$ , or if in these constraints the input/output random variables $X,Y_{j_{1}},\ldots,Y_{j_{|\mathcal{S}|}}$ are allowed to depend on the receiver set $\mathcal{S}$ . For this latter relaxation, Theorem 5 results in the following corollary.

Corollary 6

Given cache sizes $\mathsf{M}_{1},\ldots,\mathsf{M}_{K}\geq 0$ , rate $R$ is achievable only if for every receiver set $\mathcal{S}\subseteq\mathcal{K}$ :

[TABLE]

where $\mathbf{C}_{\mathcal{S}}$ denotes the capacity region to receivers in $\mathcal{S}$ (ignoring receivers in $\mathcal{K}\backslash\mathcal{S}$ ) when there are no cache memories.

Remark 3

The upper bounds of Theorem 5 and Corollary 6 are relaxed when each $\alpha_{\mathcal{S},k}^{\star}$ is replaced by $\tilde{\alpha}_{\mathcal{S},k}$ , where

[TABLE]

The same holds if each $\alpha_{\mathcal{S},k}^{\star}$ is replaced by

[TABLE]

Replacing in Corollary 6 each parameter $\alpha_{\mathcal{S},k}^{\star}$ by $\alpha_{\mathcal{S},k}^{\prime}$ recovers the previous upper bound in [26, Theorem 9] and [28, Theorem 1].

Proof:

The proof requires a close inspection of the proof of Theorem 5 in Appendix A. See Appendix D. ∎

By comparing the new upper bounds with the three achievability results in the previous Section IV, the exact expression for $\mathsf{C}(\mathsf{M}_{1},\ldots,\mathsf{M}_{K})$ can be obtained in some special cases.

The following corollary states that superposition piggyback coding is optimal when only receiver 1 has a cache memory and this cache memory is small.

Corollary 7

Under a cache assignment satisfying

[TABLE]

the capacity-memory tradeoff is

[TABLE]

Proof:

Achievability follows by Theorem 2. The converse from Corollary 6, where it suffices to consider only the set $\mathcal{S}=\mathcal{K}$ . In fact, under (69), $\alpha_{\mathcal{K},1}=\ldots=\alpha_{\mathcal{K},K}=\frac{\mathsf{M}_{1}}{D}.$ ∎

The next corollary states that generalized coded caching with parameter $t=K-1$ is optimal under the corresponding cache assignment. Moreover, any extra cache memory that is uniformly distributed over the $K$ receivers only brings local caching gain.

Proposition 8

For each $k\in\mathcal{K}$ , let $\mathsf{M}_{k}^{\star(K-1)}$ be given by (44) when $P_{X}$ is chosen as a maximizer of

[TABLE]

For any $\Delta\geq 0$ :

[TABLE]

Proof:

See Appendix E. ∎

V-B Results on $\mathsf{C}^{\star}(\mathsf{M})$

Theorem 5 directly yields the following result.

Proposition 9

There exist random variables $X,Y_{1},\ldots,Y_{K}$ and for every receiver set $\mathcal{S}$ as in (15) random variables $\{U_{\mathcal{S},1},\ldots,U_{\mathcal{S},{|\mathcal{S}|-1}}\}$ , such that (14b) and (64) hold, and such that for some $\mathsf{M}_{1},\ldots,\mathsf{M}_{K}\geq 0$ summing to $\mathsf{M}$ and all $\mathcal{S}$ :

[TABLE]

*where $\{\alpha_{\mathcal{S},k}^{\star}\}$ are defined in (63). *

Solving this optimization problem numerically is computationally complex. Simpler, albeit looser, upper bounds can be obtained by either ignoring some of the constraints (72); by replacing each parameter $\alpha^{\star}_{\mathcal{S},k}$ in (72) by $\tilde{\alpha}_{\mathcal{S},k}$ or by $\alpha_{\mathcal{S},k}^{\prime}$ ; or by allowing $X,Y_{j_{1}},\ldots,Y_{j_{\mathcal{S}}}$ in (72) to depend on the set $\mathcal{S}$ .

The following corollary presents a simpler bound that is obtained this way. Recall the definitions in (43).

Corollary 10

For each $t\in\mathcal{K}$ :

[TABLE]

Proof:

Fix $t\in\mathcal{K}$ . For each $\ell=1,\ldots{K\choose t}$ , specialize Corollary 6 to $\mathcal{S}=\mathcal{G}_{\ell}^{(t)}$ and relax it by replacing each parameter $\alpha_{\mathcal{G}_{\ell}^{(t)},k}^{\star}$ by $\alpha_{\mathcal{G}_{\ell}^{(t)},k}^{\prime}$ . Since $\alpha_{\mathcal{G}_{\ell}^{(t)},1}^{\prime}=\ldots=\alpha_{\mathcal{G}_{\ell}^{(t)},t}^{\prime}$ , we obtain

[TABLE]

Now, averaging bound (74) over all indices $\ell=1,\ldots,{K\choose t}$ and upperbounding the sum $\mathsf{M}_{1}+\ldots+\mathsf{M}_{K}$ by $\mathsf{M}$ yields the desired result in the corollary. ∎

The last result of this section contains two more simple upper bounds on $\mathsf{C}^{\star}(\mathsf{M})$ . For small total cache size $\mathsf{M}$ one of them is achieved by assigning the entire cache memory to the weakest receiver and applying superposition piggyback coding. For large total cache size $\mathsf{M}$ the other is achieved by generalized coded caching with parameter $t=K-1$ , and by first applying the cache assignment corresponding to this scheme followed by a uniform cache assignment of any remaining cache memory.

Corollary 11

For total cache size $\mathsf{M}\geq 0$ :

[TABLE]

and

[TABLE]

For small cache sizes,

[TABLE]

(75) holds with equality.

For large cache sizes,

[TABLE]

(76) holds with equality.

Proof:

Upper bound (75) follows by specializing Corollary 10 to $t=K$ . Upper bound (76) is proved as follows. Relax Theorem 9 by replacing each parameter $\alpha^{\star}_{\mathcal{S},k}$ by $\alpha_{\mathcal{S},k}^{\prime}$ and considering only the constraints (72) that correspond to sets $\mathcal{S}=\{k\}$ , for $k\in\mathcal{K}$ . Finally, average the $K$ resulting inequalities and maximize over the input distribution $P_{X}$ .

The tightness of (75) for $\mathsf{M}\leq\mathsf{M}^{\textsf{single}}$ follows from Theorem 2. The tightness of (78) for $\mathsf{M}\geq D(K-1)K\mathsf{C}_{\textnormal{Avg}}$ follows from Proposition 8 because

[TABLE]

∎

We remark that for small total cache sizes, $\mathsf{C}^{\star}(\mathsf{M})$ grows as $\frac{\mathsf{M}}{D}$ . This corresponds to a perfect global caching gain, i.e., the same performance as in a system where each receiver can directly access all cache contents in the network. For large total cache sizes, $\mathsf{C}^{\star}(\mathsf{M})$ grows only as $\frac{1}{K}\cdot\frac{\mathsf{M}}{D}$ . This corresponds to the local caching gain achieved by Proposition 1.

VI Examples

VI-A Erasure BCs

We specialize our results to erasure BCs where at time $t$ receiver $k$ ’s output $Y_{k,t}$ equals the channel input $X_{t}$ with probability $1-\delta_{k}$ and it equals an erasure symbol “?” with probability $\delta_{k}$ . The erasure probabilities satisfy:

[TABLE]

For erasure BCs,

[TABLE]

Moreover, a Bernoulli- $1/2$ input distribution $P_{X}$ maximizes $I(X;Y_{k})$ and $I(X;Y_{k}|U)$ simultaneously for all $k\in\mathcal{K}$ and auxiliaries $U$ that form the Markov chain $U-X-Y_{k}$ . Therefore, Theorem 5 and Corollary 6 coincide. Also,

[TABLE]

Figure 4, depicts the upper and lower bounds on $\mathsf{C}^{\star}(\mathsf{M})$ in Propositions 4 and 9. For comparison, also the upper bound in Theorem 5 under a uniform cache assignment

[TABLE]

is plotted. This proves numerically that a smart allocation of the total cache memory $\mathsf{M}$ significantly increases the global capacity-memory tradeoff of erasure BCs when different receivers have different erasure probabilities.

Analytically, we can prove that for small total cache size $\mathsf{M}\leq\mathsf{M}^{\textsf{single}}$ any cache assignment that does not allocate all cache memory to the weakest receiver is suboptimal on the erasure BC. This follows from the achievability in Corollary 11 and the following Proposition 12.

Proposition 12

For given $\mathsf{M}_{1}\geq 0$ and $\mathsf{M}:=\sum_{k=1}^{K}\mathsf{M}_{k}\geq 0$ ,

[TABLE]

The RHS of (82) is strictly less than $\mathsf{C}_{\mathcal{K}}+\frac{\mathsf{M}}{D}$ unless $\mathsf{M}=\mathsf{M}_{1}$ or $\delta_{1}=\ldots=\delta_{K}$ .

Proof:

See Appendix F. ∎

VI-B Noise-Free Bit-Pipe

Consider now the noise-free bit-pipe model with uniform cache assignment in [1]. It corresponds to an erasure BC where each receiver has zero erasure probability,

[TABLE]

We adopt the “source-coding perspective” of [1], and assume equal cache size

[TABLE]

From the upper bound on $\mathsf{C}(\mathsf{M}_{1},\ldots,\mathsf{M}_{K})$ in Theorem 5, the following lower bound on the minimum achievable delivery rate $\rho^{\star}$ can be obtained as a function of the normalized symmetric cache size $m$ :

Corollary 13

For the noise-free bit-pipe model in [1]:

[TABLE]

Proof:

See Appendix G. ∎

Figure 5 compares this new converse result on $\rho$ with the existing converse results in [1], [9], and [10], and with the achievability result in [43]. The converse result in [10] is generally cumbersome to evaluate. The plot shows the numerical value calculated in [10].

VI-C Gaussian BCs

Finally, we specialize our results to memoryless Gaussian BCs. At time $t$ , the received symbol at receiver $k$ is

[TABLE]

where $X_{t}$ is the input to the channel and $\{Z_{k,t}\}$ is an i.i.d. Gaussian process with zero mean and variance $\sigma_{k}^{2}>0$ . The channel inputs are subject to an average block-power constraint $P$ . The receivers are ordered in increasing strength:

[TABLE]

By [60], for every set $\mathcal{S}$ as defined in (15),

[TABLE]

where $\beta_{1},\ldots,\beta_{|\mathcal{S}|}$ form the unique choice of $|\mathcal{S}|$ real numbers in $[0,1]$ that sum to $1$ and satisfy

[TABLE]

In particular,

[TABLE]

Moreover, given a power constraint $P>0$ , a zero-mean variance- $P$ Gaussian input distribution $P_{X}$ maximizes $I(X;Y_{k})$ and $I(X;Y_{k}|U)$ simultaneously for all $k\in\mathcal{K}$ and auxiliaries $U$ that form the Markov chain $U-X-Y_{k}$ . Therefore, Theorem 5 and Corollary 6 coincide. Also,

[TABLE]

Figure 6 shows the upper and lower bounds on $\mathsf{C}^{\star}(\mathsf{M})$ in Propositions 4 and 9. The five blue points indicate the rate-memory points $(R^{(0)},\mathsf{M}^{(0)})$ , $(R^{\textsf{single}},\mathsf{M}^{\textsf{single}})$ , $(R^{(1)},\mathsf{M}^{(1)})$ , $(R^{(2)},\mathsf{M}^{(2)})$ , and $(R^{(3)},\mathsf{M}^{(3)})$ for a zero-mean variance- $P$ Gaussian distribution $P_{X}$ . For comparison, the figure also shows the upper bound in Theorem 5 for a setup with uniform cache assignment $\frac{\mathsf{M}}{K}$ across all receivers. We observe that a smart cache assignment provides substantial gains in the capacity-memory tradeoff.

VII Summary and Conclusion

We have provided close upper and lower bounds on the global capacity-memory tradeoff $\mathsf{C}^{\star}(\mathsf{M})$ of degraded BCs. The bounds coincide in the regimes of small and large total cache memory with thresholds depending on the BC statistics. For small cache memory sizes, the weakest receiver needs to be assigned all. In this regime, $\mathsf{C}^{\star}(\mathsf{M})$ grows as $\frac{\mathsf{M}}{D}$ , which corresponds to a perfect global caching gain where all receivers can benefit from all the cache contents of the network. This performance is achieved by the proposed superposition piggyback coding scheme, which provides each receiver virtual access to the weakest receiver’s cache contents. For the regime of moderate $\mathsf{M}$ , we propose a generalized coded caching scheme, which assigns cache memories to all the receivers, with a larger cache memory the weaker a receiver is. Notice that the larger the total cache budget $\mathsf{M}$ , the larger the coded caching parameter $t\in\{1,\ldots,K-1\}$ needs to be chosen. This leads to a decreasing global caching gain because with increasing $t$ the various cache memories have more and more overlapping contents which cannot provide global caching gains. As a consequence, the slope of the rate-memory tradeoff achieved by generalized coded caching decreases with increasing total cache budget $\mathsf{M}$ . The same behaviour is also suggested by the upper bound. For parameter $t=K-1$ generalized coded caching and the corresponding cache assignment exactly achieve the global capacity-memory tradeoff. Once the total cache memory budget exceeds the corresponding cache budget, it is optimal to uniformly allocate all the remaining cache memory across all the receivers and to store the same content in the extra portions of the receivers’ cache memories. Here, $\mathsf{C}^{\star}(\mathsf{M})$ grows as $\frac{1}{K}\cdot\frac{\mathsf{M}}{D}$ , which corresponds to a local caching gain. We conclude that assigning the total cache memory uniformly across all the receivers is highly suboptimal over noisy BCs, in contrast to the noiseless setup considered in [1].

Appendix A Proof of Upper Bound in Theorem 5

Fix the rate of communication

[TABLE]

Since $R$ is achievable, for each sufficiently large blocklength $n$ and for each demand vector $\mathbf{d}$ , there exist $K$ caching functions $\big{\{}g_{k}^{(n)}\big{\}}$ , an encoding function $\{f_{\mathbf{d}}^{(n)}\}$ , and $K$ decoding functions $\big{\{}\varphi_{k,\mathbf{d}}^{(n)}\big{\}}$ so that the probability of worst-case error ${\mathsf{P}_{\text{e}}}^{(n)}(\mathbf{d})$ tends to 0 as $n\to\infty$ .

Fix $\epsilon>0$ and a sufficiently large blocklength $n$ (depending on this $\epsilon$ ). Let

[TABLE]

denote the cache contents corresponding to the chosen caching function, and let for each demand vector $\mathbf{d}=(d_{1},\ldots,d_{K})$ with all different entries

[TABLE]

denote the input of the degraded BC corresponding to the chosen encoding functions. Let $Y_{k,\mathbf{d}}^{n}$ denote the corresponding channel outputs at receiver $k$ .

Lemma 14

There exist random variables $X_{\mathbf{d}},Y_{1,\mathbf{d}},\ldots,Y_{K,\mathbf{d}}$ and for each set $\mathcal{S}$ as in (15) random variables $\{U_{\mathcal{S},1,\mathbf{d}},\ldots,U_{\mathcal{S},{|\mathcal{S}|-1},\mathbf{d}}\}$ , so that given $X_{\mathbf{d}}=x\in\mathcal{X}$ :

[TABLE]

forms a Markov chain and the following $|\mathcal{S}|$ inequalities hold:

[TABLE]

Proof:

The proof is similar to the converse proof of the capacity of degraded BCs without caching [59].

Since the worst case error probability is bounded by $\epsilon$ , using Fano’s inequality we have

[TABLE]

where $(a)$ uses Fano’s inequality as well as the fact that all messages are independent. Recall that the demand vector $\mathbf{d}$ has all different entries.

We next develop the second summands in (94) and (94). For the second summand in (94) we write

[TABLE]

where $T$ denotes a random variable that is uniformly distributed over $\{1,\ldots,n\}$ and independent of all previously defined random variables, and where

[TABLE]

Define further for $k\in\{2,\ldots,|\mathcal{S}|-1\}$ :

[TABLE]

and

[TABLE]

For $k\in\{2,\ldots,K-1\}$ , we expand the second summand in (94) as:

[TABLE]

where (a) follows from the degradedness of the outputs.

Similarly, we also have

[TABLE]

It can be verified that the defined random variables satisfy Conditions (92). Combining this observation with (94)–(97) concludes the proof. ∎

We average the bounds in (93) over demand vectors. Let $\mathcal{Q}^{\textnormal{dist}}_{K}$ be the set of all the ${D\choose K}{K!}$ $K$ -dimensional demand vectors with all distinct entries. Also, let $Q$ be a uniform random variable over the elements of $\mathcal{Q}^{\textnormal{dist}}_{K}$ and independent of all other random variables. Define for each set $\mathcal{S}$ as in (15): $U_{\mathcal{S},1}:=(U_{\mathcal{S},{1},Q},Q)$ ; $U_{\mathcal{S},k}:=U_{\mathcal{S},k,Q}$ , for $k\in\{2,\ldots,|\mathcal{S}|-1\}$ ; $X:=X_{Q}$ ; and $Y_{k}:=Y_{k,Q}$ for $k\in\mathcal{K}$ .

Notice that the defined random variables satisfy conditions (14b) and (64) in the theorem. It remains to prove that they also satisfy (65). To this end, we average inequalities (93) over all the demand vectors in $\mathcal{Q}^{\textnormal{dist}}_{K}$ . Using standard arguments to take care of the time-sharing random variable $Q$ , and defining

[TABLE]

we obtain for each $\mathcal{S}$ as in (15):

[TABLE]

Lemma 15

For each set $\mathcal{S}$ , parameters $\alpha_{\mathcal{S},1},\ldots,\alpha_{\mathcal{S},|\mathcal{S}|}$ satisfy the following constraints:

[TABLE]

Proof:

See Appendix B. ∎

By (99)–(100) and letting $\epsilon\to 0$ , the following intermediate result—which is used in other proofs in this paper—is obtained.

Lemma 16

There exist random variables $X,Y_{1},\ldots,Y_{K}$ and for every receiver set $\mathcal{S}$ as in (15) random variables $\{U_{\mathcal{S},1},\ldots,U_{\mathcal{S},{|\mathcal{S}|-1}}\}$ , so that (14b) and (64) hold, and for all $\mathcal{S}$ :

[TABLE]

By the following Lemma 17, because constraints (101) are increasing in $\alpha_{\mathcal{S},1},\ldots,\alpha_{\mathcal{S},|\mathcal{S}|}$ , and by constraint (100c), we conclude that the choice $\alpha_{\mathcal{S},k}=\alpha_{\mathcal{S},k}^{\star}$ in (63) makes the upper bound (101) loosest. The following Lemma 17 thus concludes the proof.

Lemma 17

Lemma 16 remains valid, if parameters $\alpha_{\mathcal{S},1},\ldots,\alpha_{\mathcal{S},|\mathcal{S}|}$ are further constrained to satisfy for each $k\in\{1,\ldots,|\mathcal{S}|-1\}$ one of the two following conditions:

•

$\alpha_{\mathcal{S},k}=\frac{{\sum_{i=1}^{k}\mathsf{M}_{j_{i}}}}{D-k+1}$ ; or

•

$\alpha_{\mathcal{S},k}=\alpha_{\mathcal{S},k+1}$ .

Proof:

See Appendix C. ∎

Appendix B Proof of Lemma 15

We only prove the lemma for $\mathcal{S}=\mathcal{K}$ . The other proofs are similar.

We first prove (100a). Every $\alpha_{\mathcal{K},k}$ is non-negative, because mutual information is non-negative. To prove the upper bound in (100a), we proceed as follows. Let $\mathcal{Q}_{K}^{\textnormal{dist}}$ be the set of $K$ -dimensional demand vectors that have $K$ distinct entries in $\{1,\ldots,D\}$ ; and for each $k\in\{1,\ldots,K\}$ and each $k-1$ dimensional demand vector $\tilde{\mathbf{d}}=(d_{1},\ldots,d_{k-1})$ , define $W_{\tilde{\mathbf{d}}}:=(W_{d_{1}},\ldots,W_{d_{k-1}})$ . We have:

[TABLE]

where $(a)$ holds because for each value of $K$ and $j$ there are ${{D-k}\choose{K-k}}(K-k)!$ ordered demand vectors $\mathbf{d}\ \in\mathcal{Q}_{K}^{\textnormal{dist}}$ with $(d_{1},\ldots,d_{k-1})=\tilde{\mathbf{d}}$ and with $d_{k}=j$ ; (b) holds by the independence of the messages; (c) holds because for any random tuple $(A_{1},\ldots,A_{L})$ it holds that $\sum_{l=1}^{L}H(A_{l})\geq H(A_{1},\ldots,A_{L})$ ; and (d) holds because $I(W_{1},\ldots,W_{N};\mathbb{V}_{1},\ldots,\mathbb{V}_{k}|W_{\tilde{\mathbf{d}}})$ cannot exceed $\sum_{i=1}^{k}\mathsf{M}_{i}$ . This concludes the proof of (100a).

To prove constraint (100b), we fix a $K$ -dimensional demand vector $\mathbf{d}\in\mathcal{Q}_{K}^{\textnormal{dist}}$ , and consider the cyclic shifts of this vector. For $\ell\in\{0,\ldots,K-1\}$ , let $\mathbf{d}^{(\ell)}$ be the vector obtained from $\mathbf{d}$ when the elements are cyclically shifted $\ell$ positions to the right. (For example, if $\mathbf{d}=(1,2,3)$ then $\mathbf{d}^{(2)}=(2,3,1)$ .) For each $\ell\in\{0,\ldots,K-1\}$ and $k\in\{1,\ldots,K\}$ , let $d_{k}^{(\ell)}$ denote the $k$ -th index of demand vector $\mathbf{d}^{(\ell)}$ . So,

[TABLE]

where for each positive integer $\xi$ the term $(\xi\mod K)$ takes value in $\{1,\ldots,K\}$ so that

[TABLE]

For each $\ell\in\{1,\ldots,K\!-\!1\}$ and $k,k^{\prime}\in\{2,\ldots,K\}$ with $k^{\prime}\leq k$ , we write

[TABLE]

where (a) follows by (103) and (b) is by the independence of messages.

Fix a demand vector $\mathbf{d}\in\mathcal{Q}_{K}^{\textnormal{dist}}$ and sum up the above inequality (B) over all $K$ cyclic shifts $\mathbf{d}^{(0)},\mathbf{d}^{(1)},\ldots,$ $\mathbf{d}^{(K-1)}$ of $\mathbf{d}$ to obtain:

[TABLE]

Since the set $\mathcal{Q}_{K}^{\textnormal{dist}}$ can be partitioned into subsets of demand vectors that are cyclic shifts of each others and all cyclic shifts of a demand vector in $\mathcal{Q}_{K}^{\textnormal{dist}}$ are also in $\mathcal{Q}_{K}^{\textnormal{dist}}$ , we conclude from (106):

[TABLE]

This proves (100b).

We proceed to prove constraint (100c). For each $\mathbf{d}\in\mathcal{Q}_{K}^{\textnormal{dist}}$ :

[TABLE]

So,

[TABLE]

where (a) holds by the chain rule of mutual information, (b) by the independence and uniform rate of messages $W_{1},\ldots,W_{D}$ and the definition of the set $\mathcal{Q}^{\textnormal{dist}}_{K}$ , which is of size ${D\choose K}K!$ , and (c) by the generalized Han-Inequality (the following Proposition 18).

Proposition 18

Let $L$ be a positive integer and $A_{1},\ldots,A_{L}$ be a finite random $L$ -tuple. Denote by $A_{\mathcal{J}}$ the subset $\{A_{l},\ l\in\mathcal{J}\}$ . For every $i\in\{1,\ldots,L\}$ :

[TABLE]

Proof:

See [62, Theorem 17.6.1]. ∎

Appendix C Proof of Lemma 17

We prove the lemma by contradiction. Fix a random tuple $(X,Y_{1},\ldots,Y_{K})$ satisfying (14b) and for each set $\mathcal{S}$ as in (15) a random tuple $U_{\mathcal{S},1},U_{\mathcal{S},2},\ldots,U_{\mathcal{S},|\mathcal{S}-1}$ satisfying (64) and real numbers ${\alpha}_{\mathcal{S},1},\ldots,{\alpha}_{\mathcal{S},|\mathcal{S}|}$ satisfying (100).

Assume that for some set $\mathcal{S}$ as in (15) and some $\tilde{k}\in\{1,\ldots,|\mathcal{S}|-1\}$ :

[TABLE]

and

[TABLE]

Let

[TABLE]

Notice that by (111):

[TABLE]

Define the new parameters

[TABLE]

Notice that this new set of parameters satisfies constraints (100) when $\alpha_{\mathcal{S},1},\ldots,\alpha_{\mathcal{S},|\mathcal{S}|}$ are replaced by $\bar{\alpha}_{\mathcal{S},1},\ldots,\bar{\alpha}_{\mathcal{S},|\mathcal{S}|}$ . In particular,

[TABLE]

We will show that there exist new auxiliary random variables $\bar{U}_{\mathcal{S},1},\bar{U}_{\mathcal{S},2},\ldots,\bar{U}_{\mathcal{S},|\mathcal{S}|-1}$ satisfying the Markov chain (64), and so that upper bound (93) is looser for these new auxiliares and the new parameters $\bar{\alpha}_{\mathcal{S},1},\ldots,\bar{\alpha}_{\mathcal{S},|\mathcal{S}|}$ than for the original auxiliaries $U_{\mathcal{S},1},\ldots,U_{\mathcal{S},|\mathcal{S}|-1}$ and parameters $\alpha_{\mathcal{S},1},\ldots,\alpha_{\mathcal{S},|\mathcal{S}|-1}$ .

To simplify notation in the following, we define

[TABLE]

Notice that since $\alpha_{\mathcal{S},\tilde{k}}\neq\alpha_{\mathcal{S},\tilde{k}+1}$ and by (100b), the strict inequality

[TABLE]

must hold. Choose

[TABLE]

and

[TABLE]

The choice of $\bar{U}_{\mathcal{S},\tilde{k}}$ depends on whether

[TABLE]

If (120a) holds, choose

[TABLE]

If (120b) holds, let $E\in\{0,1\}$ be a Bernoulli- $\beta$ random variable independent of everything else, where

[TABLE]

Choose

[TABLE]

Notice that in both cases the proposed choice satisfies the Markov chain $\bar{U}_{\mathcal{S},1}-\bar{U}_{2,\mathcal{S}}-\cdots-\bar{U}_{\mathcal{S},|\mathcal{S}|-1}-X$ .

Trivially, for $k\notin\big{\{}\tilde{k},\tilde{k}+1\big{\}}$ , constraint (93) is unchanged if we replace $(U_{\mathcal{S},1},U_{\mathcal{S},2},\ldots,U_{\mathcal{S},|\mathcal{S}|-1},X)$ by $(\bar{U}_{\mathcal{S},1},\bar{U}_{\mathcal{S},2},\ldots,\bar{U}_{\mathcal{S},K-1},{X})$ and $({\alpha}_{\mathcal{S},1},\ldots,{\alpha}_{\mathcal{S},|\mathcal{S}|})$ by $(\bar{\alpha}_{\mathcal{S},1},\ldots,\bar{\alpha}_{\mathcal{S},|\mathcal{S}|})$ .

If (120a) holds, then the proposed replacement relaxes constraint (93) for $k=\tilde{k}$ (because $\bar{\alpha}_{\mathcal{S},\tilde{k}}>{\alpha}_{\mathcal{S},\tilde{k}}$ ) and it tightens it for $k=\tilde{k}+1$ (because $\bar{\alpha}_{\mathcal{S},\tilde{k}+1}<{\alpha}_{\mathcal{S},\tilde{k}+1}$ ). However, the new constraint for $k=\tilde{k}+1$ is less stringent than the original constraint for $k=\tilde{k}$ :

[TABLE]

where (a) holds by (114c); (b) holds by (117); and (c) holds by holds by assumption (120a). We conclude that when (120a) holds, the upper bound on $\mathsf{C}(\mathsf{M}_{1},\ldots,\mathsf{M}_{K})$ in (93) is relaxed if everywhere one replaces

$(U_{\mathcal{S},1},U_{\mathcal{S},2}\ldots,U_{|\mathcal{S},|\mathcal{S}|-1})$ and $({\alpha}_{\mathcal{S},1},\ldots,{\alpha}_{\mathcal{S},|\mathcal{S}|})$ by $(\bar{U}_{\mathcal{S},1},\bar{U}_{\mathcal{S},2},\ldots,\bar{U}_{\mathcal{S},|\mathcal{S}|-1})$ and $(\bar{\alpha}_{\mathcal{S},1},\ldots,\bar{\alpha}_{\mathcal{S},|\mathcal{S}|})$ .

We now assume that (120b) holds. We show that the new constraints obtained for $k=\tilde{k}$ and for $k=\tilde{k}+1$ cannot be more stringent then the tighter of the two original constraints for $k=\tilde{k}$ and $k=\tilde{k}+1$ .

Consider $k=\tilde{k}$ . By (122) and (123) we have

[TABLE]

By (114b) and (C):

[TABLE]

Let now $k=\tilde{k}+1$ . We have:

[TABLE]

where (a) follows by the definition of $\bar{U}_{\mathcal{S},\tilde{k}}$ and $\bar{U}_{\mathcal{S},\tilde{k}+1}$ ; (b) by the Markov chain (64); (c) by the chain rule of mutual information and Markov chain (64); (d) by the degradedness of the channel (14b); (e) by the definition of $\beta$ in (122).

Therefore, by (114c):

[TABLE]

We thus conclude that also when (120b) holds, the upper bound on $\mathsf{C}(\mathsf{M}_{1},\ldots,\mathsf{M}_{K})$ in (93) is relaxed if one replaces $(U_{\mathcal{S},1},U_{\mathcal{S},2},\ldots,U_{\mathcal{S},|\mathcal{S}|-1})$ and $({\alpha}_{\mathcal{S},1},\ldots,{\alpha}_{\mathcal{S},K})$ by $(\bar{U}_{\mathcal{S},1},\bar{U}_{\mathcal{S},2},\ldots,\bar{U}_{\mathcal{S},|\mathcal{S}|-1})$ and $(\bar{\alpha}_{\mathcal{S},1},\ldots,\bar{\alpha}_{\mathcal{S},|\mathcal{S}|})$ .

Appendix D Proof of Remark 3

We first prove that the bound in Theorem 5 is loosened when each $\alpha_{\mathcal{S},k}^{\star}$ is replaced by $\tilde{\alpha}_{\mathcal{S},k}$ . Consider the intermediate Lemma 16 in the proof of Theorem 5, Appendix A. Relax the upper bound in this lemma by replacing for $k=2,\ldots,K$ constraint (100a) by

[TABLE]

Following similar steps as in the proof of Lemma 17, see also [26, Lemma 12], it can be shown that this relaxed upper bound is not changed when one imposes that

[TABLE]

Since constraints (101) are increasing in $\alpha_{\mathcal{S},1},\ldots,\alpha_{\mathcal{S},|\mathcal{S}|}$ , by constraint (100c), we conclude that the relaxed upper bound is loosest for

[TABLE]

i.e., for $\alpha_{\mathcal{S},k}=\tilde{\alpha}_{\mathcal{S},k}$ .

We now prove that the bound in Theorem 5 is loosened when each $\alpha_{\mathcal{S},k}^{\star}$ is replaced by ${\alpha}_{\mathcal{S},k}^{\prime}$ . Consider again the intermediate Lemma 16 in Appendix A. Relax constraint (100a) by replacing it with $\alpha_{\mathcal{S},k}\geq 0$ , for all $k=1,\ldots,K$ . Following the steps in [26, Lemma 12], it can be shown that the new constraints are loosest if each

[TABLE]

This concludes the proof.

Appendix E Proof of Proposition 8

For $\Delta=0$ , achievability follows by specializing Theorem 3 to $t=K-1$ and to the input distribution $P_{X}$ that maximizes (70). In fact, for this input distribution:

[TABLE]

For $\Delta>0$ , achievability follows from Proposition 1.

The converse is proved as follows. Apply Theorem 5, but consider only the constraints (65) corresponding to the sets $\mathcal{S}=\{k\}$ , for $k\in\mathcal{K}$ . Taking the average over the resulting $K$ constraints, establishes that there exists a random variable $(X,Y_{1},\ldots,Y_{K})$ satisfying (14b) and so that

[TABLE]

Maximizing the right-hand side over input distributions $P_{X}$ yields the desired converse.

Appendix F Proof of Proposition 12

Relax the upper bound in Theorem 5 by considering constraints (65) only for the set of all receivers $\mathcal{S}=\mathcal{K}$ , and by replacing each $\alpha_{\mathcal{S},k}^{\star}$ by $\tilde{\alpha}_{\mathcal{S},k}$ . Specializing the resulting relaxed bound to the erasure BC, one obtains the following upper bound:

[TABLE]

where the maximization is over the choice of parameters $\beta_{1},\beta_{2},\ldots,\beta_{K}\geq 0$ satisfying

[TABLE]

The upper bound in the proposition is established by solving this maximization problem. In fact, by noticing that the bound is increasing in $\beta_{1},\beta_{2},\ldots,\beta_{K}\geq 0$ , and by first fixing $\beta_{1}$ and optimizing over the choices $\beta_{2},\ldots,\beta_{K}\geq 0$ summing to $1-\beta_{1}$ , we obtain

[TABLE]

If

[TABLE]

then the maximum is achieved at $\beta_{1}=1$ and the upper bound results in

[TABLE]

Otherwise the maximum is at $\beta=\beta^{\star}$ , where

[TABLE]

and the upper bound results in

[TABLE]

where we used that for erasure BCs

[TABLE]

Appendix G Proof of Corollary 13

Fix $t\in\mathcal{K}$ and $\mathcal{S}=\{1,\ldots,t\}$ . For the considered channel

[TABLE]

The upper bound in Corollary 6 thus states that for this noise-free BC a rate-memory tuple $(R,\mathsf{M}_{1},\ldots,\mathsf{M}_{K})$ is achievable only if

[TABLE]

This is equivalent to the following bound on the capacity-memory tradeoff

[TABLE]

Notice that the sum $\sum_{k=1}^{t}\alpha_{\mathcal{S},k}^{\star}$ takes on only two different values, depending on the outcomes of the minimizations defining $\alpha_{\mathcal{S},k}^{\star}$ . It is either

[TABLE]

Combining (142) with (143), applying the correspondence $\rho=R^{-1}$ and $m_{k}=\frac{\mathsf{M}_{k}}{R}$ , and setting $m_{1}=m_{2}=\ldots=m_{k}=m$ yields,

[TABLE]

which is equivalent to the bound in the corollary.

Bibliography62

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] M. A. Maddah-Ali, U. Niesen, “Fundamental limits of caching,” in IEEE Trans. on Inform. Theory , vol. 60, no. 5, pp. 2856–2867, May 2014.
2[2] Z. Chen, P. Fan, and K. B. Letaief, “Fundamental limits of caching: Improved bounds for small buffer users,” IET Commun. , 2016, Vol. 10, Iss. 17, pp. 2315–2318.
3[3] C. Tian, “A note on the fundamental limits of coded caching,” ar Xiv , 1503.00010 v 1, Feb. 2015.
4[4] K. Wan, D. Tuninetti, and P. Piantanida, “On the optimality of uncoded cache placement,” in Proc. IEEE ITW , Cambridge,UK, 2016, pp. 161–165.
5[5] K. Wan, D. Tuninetti, and P. Piantanida, “On caching with more users than files,” in Proc. IEEE ISIT , Barcelona, Spain, July 2016, pp. 135–139.
6[6] A. Sengupta, R. Tandon, and T. C. Clancy, “Improved approximation of storage-rate tradeoff for caching via new outer bounds,” in Proc. IEEE ISIT , Hong Kong, China June 2015, pp. 1691–1695.
7[7] S. Sahraei and M. Gastpar, “K users caching two files: An improved achievable rate,” in Proc. CISS , pp. 620–624, Mar. 2016.
8[8] C. Tian and J. Chen, “Caching and delivery via interference elimination,” in Proc. IEEE ISIT , Barcelona, Spain, July 2016, pp. 830–834.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Benefits of Cache Assignment on Degraded Broadcast Channels

Abstract

I Introduction

I-A Main Contributions and Implications

I-B Notation

I-C Outline

II Problem Definition

Definition 1

Definition 2

Remark 1

II-A Minimum Delivery Rate

III Preliminaries: Capacities without Cache Memories

IV Coding Schemes and Lower Bounds on the (Global) Capacity-Memory Tradeoff

IV-A The Local Caching Gain

Proposition 1** (Local caching gain)**

IV-B Superposition Piggyback-Coding

Theorem 2

Remark 2

IV-C Generalized Coded-Caching

IV-C1 Special Case K=2K=2K=2 and t=1t=1t=1

IV-C2 General Scheme

Theorem 3

IV-D Lower Bound on C⋆(M)\mathsf{C}^{\star}(\mathsf{M})C⋆(M)

Proposition 4

V Upper Bounds and Exact Results on Global Capacity-Memory Tradeoff

V-A Results on C(M1,…,MK)\mathsf{C}(\mathsf{M}_{1},\ldots,\mathsf{M}_{K})C(M1​,…,MK​)

Theorem 5

Proof:

Corollary 6

Remark 3

Proof:

Corollary 7

Proof:

Proposition 8

Proof:

V-B Results on C⋆(M)\mathsf{C}^{\star}(\mathsf{M})C⋆(M)

Proposition 9

Corollary 10

Proof:

Corollary 11

Proof:

VI Examples

VI-A Erasure BCs

Proposition 12

Proof:

VI-B Noise-Free Bit-Pipe

Corollary 13

Proof:

VI-C Gaussian BCs

VII Summary and Conclusion

Appendix A Proof of Upper Bound in Theorem 5

Lemma 14

Proof:

Lemma 15

Proof:

Lemma 16

Lemma 17

Proof:

Appendix B Proof of Lemma 15

Proposition 18

Proof:

Appendix C Proof of Lemma 17

Appendix D Proof of Remark 3

Appendix E Proof of Proposition 8

Appendix F Proof of Proposition 12

Appendix G Proof of Corollary 13

Proposition 1 (Local caching gain)

IV-C1 Special Case $K=2$ and $t=1$

IV-D Lower Bound on $\mathsf{C}^{\star}(\mathsf{M})$

V-A Results on $\mathsf{C}(\mathsf{M}_{1},\ldots,\mathsf{M}_{K})$

V-B Results on $\mathsf{C}^{\star}(\mathsf{M})$