Benefits of Cache Assignment on Degraded Broadcast Channels
Shirin Saeedi Bidokhti, Michele Wigger, and Aylin Yener

TL;DR
This paper investigates how cache memory assignment affects the capacity of degraded broadcast channels, deriving bounds and optimal strategies for different cache size regimes, showing significant gains from non-uniform cache allocation.
Contribution
It introduces new coding schemes for cache assignment, derives capacity bounds, and characterizes optimal cache allocation strategies for degraded broadcast channels.
Findings
Optimal cache assignment depends on cache size regime.
Non-uniform cache allocation outperforms uniform in most regimes.
Global caching gain is achievable with small cache sizes.
Abstract
Degraded K-user broadcast channels (BC) are studied when receivers are facilitated with cache memories. Lower and upper bounds are derived on the capacity-memory tradeoff, i.e., on the largest rate of reliable communication over the BC as a function of the receivers' cache sizes, and the bounds are shown to match for some special cases. The lower bounds are achieved by two new coding schemes that benefit from non-uniform cache assignment. Lower and upper bounds are also established on the global capacity-memory tradeoff, i.e., on the largest capacity-memory tradeoff that can be attained by optimizing the receivers' cache sizes subject to a total cache memory budget. The bounds coincide when the total cache memory budget is sufficiently small or sufficiently large, characterized in terms of the BC statistics. For small cache memories, it is optimal to assign all the cache memory to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Benefits of Cache Assignment on Degraded Broadcast Channels
Shirin Saeedi Bidokhti, Michèle Wigger, and Aylin Yener S. Saeedi Bidokhti is with the Department of Electrical Engineering at Stanford University, [email protected]. S. Saeedi Bidokhti is supported by the Swiss National Science Foundation fellowship no. 158487. M. Wigger is with LTCI, Telecom ParisTech, Université Paris-Saclay, 75013 Paris, [email protected]. A. Yener is with the Department of Electrical Engineering, School of Electrical Engineering and Computer Science at The Pennsylvania State University and the Department of Electrical Engineering at Stanford University, [email protected], [email protected]. Parts of the material in this paper have been submitted to the IEEE International Conference on Communications, Paris, May 2017, and to the IEEE International Symposium on Information Theory, Aachen, Germany, June 2017.
Abstract
Degraded -user broadcast channels (BC) are studied when receivers are facilitated with cache memories. Lower and upper bounds are derived on the capacity-memory tradeoff, i.e., on the largest rate of reliable communication over the BC as a function of the receivers’ cache sizes, and the bounds are shown to match for some special cases. The lower bounds are achieved by two new coding schemes that benefit from non-uniform cache assignment. Lower and upper bounds are also established on the global capacity-memory tradeoff, i.e., on the largest capacity-memory tradeoff that can be attained by optimizing the receivers’ cache sizes subject to a total cache memory budget. The bounds coincide when the total cache memory budget is sufficiently small or sufficiently large, characterized in terms of the BC statistics. For small cache memories, it is optimal to assign all the cache memory to the weakest receiver. In this regime, the global capacity-memory tradeoff grows as the total cache memory budget divided by the number of files in the system. In other words, a perfect global caching gain is achievable in this regime and the performance corresponds to a system where all cache contents in the network are available to all receivers. For large cache memories, it is optimal to assign a positive cache memory to every receiver such that the weaker receivers are assigned larger cache memories compared to the stronger receivers. In this regime, the growth rate of the global capacity-memory tradeoff is further divided by the number of users, which corresponds to a local caching gain. Numerical indicate suggest that a uniform cache-assignment of the total cache memory is suboptimal in all regimes unless the BC is completely symmetric. For erasure BCs, this claim is proved analytically in the regime of small cache-sizes.
I Introduction
Storing popular contents at or close to the end users improves the network performance during peak-traffic time. The main challenge is that the contents have to be cached before knowing which files the users will request in the peak-traffic period. A conventional approach is to store the same popular contents in the cache memories of the users. This allows the receivers to locally retrieve the contents without burdening the network. However, further caching gains, i.e., the so called global caching gains, are possible if different contents are stored at different users [1]. Specifically, a careful design of the cache contents creates coding opportunities to simultaneously serve multiple users during the peak-traffic periods, henceforth called the delivery phase.
In this paper, we focus on the scenario depicted in Figure 1. A transmitter communicates with receivers which are equipped with cache memories. The delivery-phase communication takes place over a noisy broadcast channel (BC) where the receivers have access to cache memories.
The BC-model has previously been studied in [25, 27, 26, 28, 29, 30, 31, 36, 34, 35, 33, 32, 38, 37, 39, 40, 41, 42]. The simplified version where the BC is a common noise-free bit-pipe to all users was analyzed in[1, 2, 3, 4, 5, 6, 13, 8, 7, 12, 14, 9, 10, 11, 15, 16, 21, 22, 23, 24] under the assumption that all receivers have equal cache sizes, and in [15, 16] under the assumption that various receivers have different cache sizes. Caching was studied for many other scenarios, e.g., for interference networks [44, 45, 46], hierarchical networks [54, 56, 55], and cellular networks [47, 48, 46, 49, 50, 51, 52, 53].
In [25, 26, 27] the gains of caching in noisy broadcast networks are investigated. Specifically, we have proposed a joint cache-channel coding scheme and focused on erasure BCs with two sets of receivers: a set of cache-aided weak receivers (where each channel has the same erasure probability) and a set of strong receivers without cache memories (where each channel has the same erasure probability). Previous works have adapted a separate cache-channel coding architecture where the encoders (resp. decoders) consist of a cache encoder (resp. decoder) that only exploits the cache contents and a channel encoder (resp. decoder) that only exploits the channel statistics; see Figure 2. By contrast, in a joint cache-channel coding scheme, the encoders and decoders simultaneously exploit the knowledge of the channel statistics and the cache contents, leading to improved performance.
The joint cache-channel coding scheme in [25, 26, 27] loads (piggybacks) the information that is intended for the strong receivers, but is already cached at the weaker receivers, onto the information that is communicated to the weak receivers111 The proposed piggyback coding can be seen as a simplified version without binning etc. of “Slepian-Wolf coding over broadcast channels” in [58], which applies to more general scenarios.. When the rate of the piggybacked information is modest, this can be done without harming the decoding performance at the strong receivers. In some sense, piggyback coding provides the stronger receivers virtual access to the weaker receivers’ cache-memories as if these cache contents were locally present at the stronger receivers.
The previous works [25, 26, 27] have shown that when different receivers have different channel statistics, then assigning larger cache memories to the weaker receivers significantly improves the performance compared to the traditional uniform cache assignment. In addition to mitigating the rate-bottleneck at the weaker receivers, non-uniform cache assignment allows to achieve new global caching gains by the means of joint cache-channel coding [25, 26, 27, 32].
Motivated by the new gains of caching in noisy broadcast networks, in this work, we address the problem of efficient cache assignment in broadcast networks and devise two joint cache-channel coding schemes by using piggyback coding, superposition coding, and coded caching.
I-A Main Contributions and Implications
The main contributions of the paper are as follows:
- •
Superposition-Piggyback Coding: We generalize the piggyback-coding scheme of [26], that is specific for erasure BCs, to arbitrary BCs with a cache memory only at the weakest receiver and account for different channel qualities in the network by employing superposition coding. We show that this scheme is optimal for small cache memory sizes.
- •
Generalized Coded-Caching: The coded-caching scheme in [1] is generalized to noisy BCs with unequal cache sizes. The scheme is optimal for a particular cache assignment.
- •
A New Converse Result: A general converse result is provided for degraded BCs with arbitrary cache sizes at the receivers. It strictly improves over the existing converse results for degraded BCs in [26, 28] and for the noise-free bit-pipe model in [1, 10, 6, 19, 9].
- •
Global Capacity-Memory Tradeoff: Lower and upper bounds are derived on the global capacity-memory tradeoff. They are shown to match when the total available cache memory is small or large. Suboptimality of the popular approach of assigning equal cache memory to all receivers is proved analytically for erasure BCs in the small cache size regime and shown numerically for erasure and Gaussian BCs in all regimes of cache sizes.
More specifically, we first propose a coding scheme that we call superposition piggyback-coding by assuming that only the weakest receiver has a cache memory. Using this scheme all receivers gain virtual access to the weakest receiver’s cache memory as if the cache contents were locally present at each of these receivers.
The second scheme generalizes the coded caching in [1] to account for different channel statistics and different cache sizes at the receivers. We assign larger cache sizes to the weaker receivers and use piggyback coding to transmit higher rates of information to the stronger receivers without harming the communication to the weaker receivers. As a consequence, the amount of the virtual cache memory that is provided to the stronger receivers increases compared to the original coded-caching scheme, resulting in an improved performance.
The performance criterion of interest in this paper is the capacity-memory tradeoff. That is, the largest rate, as a function of the available cache memories, so that the transmitter can reliably send the messages demanded by the receivers over the noisy BC.
We present a new upper bound on the capacity-memory tradeoff of degraded BCs222Since for our purposes only the conditional marginal distributions matter, it suffices that the BC is stochastically degraded [61].) that improves the previous upper bound in [28, 26]. Using the upper bound, we show the optimality of the superposition piggyback-coding scheme when only the weakest receiver has a cache memory and its size is below a certain threshold that depends on the BC statistics. Moreover, we show that the generalized coded-caching scheme is optimal for a particular cache assignment.
When the BC is a noise-free bit-pipe, the upper bound on the capacity-memory tradeoff leads to a lower bound on the delivery rate-memory tradeoff that improves the previous lower bounds in [1, 10, 6, 19, 9].
The upper bound is asymmetric in the cache sizes: the cache memories at the weaker receivers increase the upper bound more than the cache memories at the stronger receivers. In this sense, the upper bound reinforces the intuition obtained from the lower bounds that the capacity-memory tradeoff increases when larger cache memories are assigned to the weaker receivers as compared to the stronger receivers. To make this statement more precise, we derive upper and lower bounds on the global capacity-memory tradeoff, where one is allowed to optimize over the cache assignment subject to a global cache constraint. The lower bound is obtained using the following cache-assignment strategy and coding schemes:
- •
For a small total cache-size , all of it is assigned to the weakest receiver, and superposition piggyback-coding is applied. This strategy is optimal in the small total cache-size regime and achieves a global capacity-memory tradeoff that grows as , where denotes the total number of files. Thus, in this regime a perfect global caching gain is achieved, i.e., the same performance as in a systems where all cache memories in the network are accessible by all the receivers.
- •
For moderate total cache-size , generalized coded-caching with parameters and the corresponding cache-assignments are employed. The larger the total cache-size, the larger the parameter needs to be chosen. However, the larger , the smaller the global caching gain, since with increasing the overlap of the different cache contents increases as well, and duplicated cache contents cannot provide global caching gain.
- •
When the total cache-size equals the total cache memory of generalized coded-caching with parameter , then generalized coded-caching is optimal. For total cache memories exceeding this threshold, it is optimal to uniformly assign the additional cache memory across the receivers. This additional cache memory can only bring local caching gain and the same content can be stored at all the receivers. In other words, for total cache memory exceeding a threshold, the global capacity-memory tradeoff grows as .
Finally, this paper proves analytically that for erasure BCs a uniform cache allocation is strictly suboptimal in the regime of small cache memories, unless all receivers have same channel statistics. Numerical simulations show that the same holds for all regimes of cache memory and also for Gaussian BCs.
I-B Notation
Random variables are denoted by uppercase letters, e.g. , their alphabets by matching calligraphic font, e.g. , and elements of an alphabet by lowercase letters, e.g. . We also use uppercase letters for deterministic quantities like rate , capacity , number of users , cache size , and number of files in the library . Vectors are identified by bold font symbols, e.g., , and matrices by the font . We use the shorthand notation for . The Cartesian product of and is , and the -fold Cartesian product of is . denotes the cardinality of .
Finally, for indices and taking value in \big{\{}1,\ldots,\lfloor 2^{\ell_{1}}\rfloor\big{\}} and , respectively, we denote by
[TABLE]
the index in that corresponds to the XOR of the length- binary representations of and , where .
We will be using the abbreviation i.i.d. for independent and identically distributed.
I-C Outline
The remainder of the paper is organized as follows. Section II describes the problem setup. Section III recalls known results for the scenario without cache memories. The main results of this paper are described in Sections IV and V, followed by applications of these results to erasure and Gaussian BCs, see Section VI. The paper is concluded with a summary and conclusions, Section VII and various technical appendices contain the proofs of the results in Sections V and VI.
II Problem Definition
Consider a transmitter and receivers . The transmitter has access to a library with independent messages, , each distributed uniformly over the set \big{\{}1,\ldots,\lfloor 2^{nR}\rfloor\big{\}}. Here, denotes the rate of transmission and is the transmission blocklength. We assume that there are more messages than receivers:
[TABLE]
Each receiver is equipped with a cache of size . Communication takes place in two phases. For the first, i.e., the placement phase, the transmitter chooses caching functions
[TABLE]
and places
[TABLE]
in receiver ’s cache. This phase takes place in a noiseless fashion.333Following previous works on caching systems, we will also assume that the placement phase takes place in low-traffic hours with abundance of bandwidth resources, and can be considered noiseless.
The subsequent delivery phase takes place over a degraded BC [59] with finite input alphabet , finite output alphabets ,444The results of this paper readily extends to continuous alphabets. We will consider Gaussian BCs in Section VI-C. and channel transition law
[TABLE]
which decomposes as
[TABLE]
Without loss in generality, we order the receivers from the weakest to the strongest.
At the beginning of the delivery phase, each receiver demands message , . Transmitter and all the receivers are informed of the demand vector
[TABLE]
Using this information, the transmitter forms the channel input sequence as
[TABLE]
for some encoding function
Receiver observes the channel output sequence . Given the demand vector , cache content , and channel outputs , it produces its estimate of the desired message ,
[TABLE]
by means of a decoding function
[TABLE]
The worst-case probability of error at any receiver and any demand is given by
[TABLE]
A rate-memory tuple is achievable if for any there exists a sufficiently large blocklength and caching, encoding, and decoding functions as in (3), (6), and (7) so that .
Definition 1
The capacity-memory tradeoff is the largest rate for which the rate-memory tuple is achievable:
[TABLE]
Our main goal in this paper is to optimize the cache assignment to attain the largest capacity-memory tradeoff under the total cache constraint:
[TABLE]
Definition 2
The global capacity-memory tradeoff is defined as:
[TABLE]
Remark 1
The global capacity memory tradeoff depends on the BC law only through its marginal conditional laws. All our results thus also apply to stochastically degraded BCs.
II-A Minimum Delivery Rate
Previous works on caching that modelled the BC as a noise-free bit-pipe, e.g., [1], adopted a “source-coding perspective” as opposed to a “channel coding perspective” as we have presented above. In the source coding perspective, each message is an bits packet, the delivery communication consists of channel uses, and receiver has bits of cache memory, . The delivery rate is said to be achievable given normalized memory sizes if there exist caching, encoding, and decoding functions such that the probability of error in (9) tends to 0 as .
The following correspondence holds between the two perspectives:
[TABLE]
For simplicity, we will adopt the “source-coding perspective” in Section VI-B where we specialize the new upper bound on the capacity-memory tradeoff to the noise-free bit-pipe model with uniform cache assignment in [1]. For other BCs, we use the “channel-coding perspective” in line with similar setups in network information theory.
III Preliminaries: Capacities without Cache Memories
In the absence of cache memories,
[TABLE]
the capacity-memory tradeoff is well known: It is the largest symmetric rate with which independent messages can be reliably sent to the receivers. I.e.,
[TABLE]
where [59]:
[TABLE]
and the maximization in (13) is over all random tuples forming the Markov chain
[TABLE]
To present the results in this paper, we will need the capacity region without cache memories of the BC to a subset of the receivers
[TABLE]
This capacity region [59] is given by the set of all nonnegative rate-tuples for which there exist random variables satisfying (14b) and forming the Markov chain
[TABLE]
such that the following conditions hold:
[TABLE]
We denote by the largest symmetric rate in :
[TABLE]
It equals
[TABLE]
where the maximization is over all random tuples that satisfy (14b) and (16).
Notice that is simply the point-to-point capacity to receiver and we will abbreviate it as .
IV Coding Schemes and Lower Bounds on the (Global) Capacity-Memory Tradeoff
IV-A The Local Caching Gain
The simplest way to use receiver cache memories is to store the same information at each and every receiver. This allows the receivers to retrieve this information locally, without transmission over the BC. Further global caching gains are not possible under this caching strategy.
Applying the described caching strategy to only a part of the cache memory that is of size , while allowing a smarter use of the remaining memory, leads to the following proposition, see also [47, Proposition 1].
Proposition 1** (Local caching gain)**
For all and :
[TABLE]
As a consequence, for all and :
[TABLE]
We will see that in some regimes this lower bound is tight.
IV-B Superposition Piggyback-Coding
We generalize the piggyback coding for erasure BCs in [26, 28] to general degraded BCs by introducing superposition coding. The idea is to piggyback information of multiple stronger receivers on that of a single weak receiver. This scheme is efficient when a receiver is strictly weaker than the others. Specifically, we assume
[TABLE]
where is a random -tuple that achieves the symmetric-capacity , i.e., it is a solution to the optimization problem in (13).
Preliminaries: Let be arbitrary small, and define the rates
[TABLE]
The RHS of (23b) is positive by (22).
Split each message , , into two parts:
[TABLE]
where and are of rates and , and thus the total message rate is
[TABLE]
Define
[TABLE]
and allocate the cache size
[TABLE]
Placement Phase: Store in the cache memory of receiver . This is possible by (26a).
Delivery Phase: For the transmission in the delivery phase, construct a -level superposition code with a cloud center of rate and satellites of rates in Levels . For the code construction, use a probability distribution
[TABLE]
that achieves .
It will be convenient to arrange the codewords in the cloud center in an array with columns and rows. The columns are used to encode message and the rows to encode the message tuple
[TABLE]
The -th level satellite is used to encode message , for . See Figure 3 for an illustration of the code construction.
Let denote the cloud-center codeword of in column and row . Similarly, let denote the Level- satellite codeword of that corresponds to the cloud center codeword and to the -th, -th ,-th, etc. satellite codewords in Levels .
The transmitter chooses and sends the codeword
[TABLE]
over the channel.
Decoding: Receiver , decodes all messages in Levels . Recall that its desired message parts and are encoded in levels and (i.e., the cloud center), respectively.
Receiver only has to decode , because it can retrieve directly from its cache memory. To decode it performs the following steps:
It retrieves the message-tuple from its cache memory. 2. 2.
It forms the subcodebook that contains all level- codewords that are “compatible” with the retrieved tuple :
[TABLE]
Figure 3 illustrates such a subcodebook in red. 3. 3.
It decodes its desired message using an optimal decoding rule for subcodebook .
Error Analysis: Each receiver reliably decodes messages and if the following inequalities hold:
[TABLE]
One can verify that for degraded BCs the choice of and in (23) satisfies the constraints in (29).
Finally, receiver 1 can decode with arbitrarily small probability of error because subcodebook contains codewords that are generated i.i.d. according to and because
[TABLE]
Letting , we obtain the following result.
Theorem 2
Under cache assignment (26), we have
[TABLE]
Remark 2
Since receivers can always choose to ignore their cache memories, and because the superposition piggyback coding scheme can be time- and memory-shared with a no-caching scheme, Theorem 2 remains valid for all
[TABLE]
We will see in Corollary 7 ahead, that (30) holds with equality for all provided that .
The RHS of (30) coincides with the capacity-memory tradeoff of a scenario where each and every receiver has access to receiver 1’s cache memory. Superposition piggyback coding can thus be viewed as a coding technique that virtually provides all stronger receivers access to the weakest receiver’s cache memory. This is achieved by transmitting the extra-message tuple in the cloud center and by adapting the decoding at receiver 1 in a way that this additional communication does not influence its decoding performance.
IV-C Generalized Coded-Caching
We generalize the coded-caching scheme of [1] to noisy BCs with unequal channel conditions and to receivers with unequal cache sizes.
We first explain the scheme for a simple special case.
IV-C1 Special Case and
Fix an input distribution and a small , and define the rates
[TABLE]
Notice that by the degradedness of the BC:
[TABLE]
Fix a blocklength and generate a random codebook
[TABLE]
by choosing all entries i.i.d. according to . The codebook is revealed to all terminals of the network.
Allocate cache memories
[TABLE]
to receivers 1 and 2, respectively.
Split each message , for , into two parts:
[TABLE]
which are of rates and , respectively.
In the caching phase, the transmitter stores messages
[TABLE]
in receiver 1’s cache memory and messages
[TABLE]
in receiver 2’s cache memory. This is possible given the cache assignment in (37).
In the delivery phase the transmitter uses codebook to send the XOR message555Recall that in Section I-B we defined the XOR operation over the binary representations of the two messages of same length.
[TABLE]
to both receivers using the codeword
[TABLE]
Receiver 2 decodes the XOR-message, and XORs the decoded message with , which it has stored in its cache memory. It then combines this guess of with the message from its cache memory.
Receiver 1 performs joint cache-channel decoding where it can exploit that it has more cache memory than receiver 2. Specifically, it retrieves from its cache memory, and extracts a subcodebook containing all codewords that are compatible with :
[TABLE]
Note that subcodebook is of rate which is smaller than the rate of the original codebook .
Receiver 1 then decodes the XOR message in (38) using an optimal decoding rule for this subcodebook , and it XORs the decoded message with , which it has stored in its cache memory. It then combines the resulting guess of with the message from its cache memory.
Since both receivers correctly guess their desired messages and whenever they successfully decode the XOR-message in (38), and since the rate of the original codebook satisfies
[TABLE]
and the rate of of the subcodebook satisfies
[TABLE]
the probability of decoding error at both receivers tends to 0 as the blocklength tends to infinity.
Letting , we conclude that for the rate-memory triple
[TABLE]
is achievable.
Notice that the weaker receiver 1 is assigned a larger cache memory than the stronger receiver 2:
[TABLE]
The described scheme can also be applied with a uniform cache assignment , however at the cost of a decreased achievable rate . In fact, assigning a larger cache memory to receiver 1 allows to transmit more information to receiver 2 during the communication to receiver 1.
IV-C2 General Scheme
We will need the following definitions. Let for each
[TABLE]
Pick a small number and an input distribution . Pick further a parameter , and assign the following cache size to receiver :
[TABLE]
Notice that
[TABLE]
so a larger cache memory is assigned the weaker a receiver is.
Split each message into independent submessages:
[TABLE]
where each submessage is of rate
[TABLE]
The total message rate is thus
[TABLE]
Notice that when the denominator of (44), (45), and (46) all equal .
Placement Phase: For each , store the tuple
[TABLE]
in the cache memory of receiver . This is possible by (45) and the cache assignment in (44).
Delivery Phase: Transmission in the delivery phase takes place in subphases.
A given subphase j\in\big{\{}1,\ldots,{K\choose t+1}\big{\}} is of length
[TABLE]
and is used to transmit messages
[TABLE]
to the intended receivers in . For this purpose, the transmitter creates the XOR message
[TABLE]
which is of rate
[TABLE]
and generates a codebook
[TABLE]
by drawing all entries i.i.d. according to .
The transmitter then sends the codeword
[TABLE]
over the channel.
We now describe the decoding. Each receiver can retrieve messages
[TABLE]
directly from its cache, see (47), and thus only needs to decode messages
[TABLE]
For each and , receiver decodes message from its subphase- outputs
[TABLE]
Specifically, with the messages stored in its cache memory, it forms the XOR message
[TABLE]
and it extracts a subcodebook from that contains all codewords that are compatible with :
[TABLE]
It then decodes the XOR message by applying an optimal decoding rule for subcodebook to the subphase- outputs , and XORs the resulting guess with to obtain
[TABLE]
After the last sub-phase , each receiver has decoded all its missing messages in (55), and can thus produce a final guess of message .
Error Analysis: If each XOR-message is decoded correctly by all its intended receivers in , , then all receivers produce the correct estimate of their desired messages .
The probability that receiver wrongly decodes the XOR message tends to 0 as (and thus ) because the rate of the subcodebook satisfies
[TABLE]
By letting , we conclude the following result.
Theorem 3
Fix a and an input distribution , and consider the corresponding cache assignment in (44). Then,
[TABLE]
where is calculated from as described in (46).
As we will see in Corollary 8, the Inequality in (58) holds with equality for .
IV-D Lower Bound on
Proposition 1 and Theorems 2 and 3 readily yield a lower bound on . As we will see in Corollary 11 ahead, this lower bound is exact in the regimes of small and large total cache size .
Let
[TABLE]
Proposition 4
For any , all rate-memory pairs in (59) are achievable. By time- and memory-sharing arguments, the upper-convex envelope of all these rate-memory pairs lower bounds :
[TABLE]
Notice that for any :
[TABLE]
and
[TABLE]
V Upper Bounds and Exact Results on Global Capacity-Memory Tradeoff
V-A Results on
The upper bound is formulated in terms of the following parameters. For each receiver set as in (15), define
[TABLE]
Theorem 5
There exist random variables and for every receiver set as in (15) random variables so that the channel law (14b) and the Markov chain
[TABLE]
hold and so that for each :
[TABLE]
Proof:
See Appendix A.∎
Without cache memories, , the parameters equal 0 for all , and the upper bound in Theorem 5 recovers the exact capacity-memory tradeoff in (13).
The upper bound in Theorem 5 is asymmetric in the different cache sizes , because the parameters are not symmetric. In fact, increasing the cache memories at weaker receivers generally increases the upper bound more than increasing the cache memories at stronger receivers.
The converse in Theorem 5 is weakened if constraints (65) are ignored for certain receiver sets , or if in these constraints the input/output random variables are allowed to depend on the receiver set . For this latter relaxation, Theorem 5 results in the following corollary.
Corollary 6
Given cache sizes , rate is achievable only if for every receiver set :
[TABLE]
where denotes the capacity region to receivers in (ignoring receivers in ) when there are no cache memories.
Remark 3
The upper bounds of Theorem 5 and Corollary 6 are relaxed when each is replaced by , where
[TABLE]
The same holds if each is replaced by
[TABLE]
Replacing in Corollary 6 each parameter by recovers the previous upper bound in [26, Theorem 9] and [28, Theorem 1].
Proof:
The proof requires a close inspection of the proof of Theorem 5 in Appendix A. See Appendix D. ∎
By comparing the new upper bounds with the three achievability results in the previous Section IV, the exact expression for can be obtained in some special cases.
The following corollary states that superposition piggyback coding is optimal when only receiver 1 has a cache memory and this cache memory is small.
Corollary 7
Under a cache assignment satisfying
[TABLE]
the capacity-memory tradeoff is
[TABLE]
Proof:
Achievability follows by Theorem 2. The converse from Corollary 6, where it suffices to consider only the set . In fact, under (69), ∎
The next corollary states that generalized coded caching with parameter is optimal under the corresponding cache assignment. Moreover, any extra cache memory that is uniformly distributed over the receivers only brings local caching gain.
Proposition 8
For each , let be given by (44) when is chosen as a maximizer of
[TABLE]
For any :
[TABLE]
Proof:
See Appendix E. ∎
V-B Results on
Theorem 5 directly yields the following result.
Proposition 9
There exist random variables and for every receiver set as in (15) random variables , such that (14b) and (64) hold, and such that for some summing to and all :
[TABLE]
*where are defined in (63). *
Solving this optimization problem numerically is computationally complex. Simpler, albeit looser, upper bounds can be obtained by either ignoring some of the constraints (72); by replacing each parameter in (72) by or by ; or by allowing in (72) to depend on the set .
The following corollary presents a simpler bound that is obtained this way. Recall the definitions in (43).
Corollary 10
For each :
[TABLE]
Proof:
Fix . For each , specialize Corollary 6 to and relax it by replacing each parameter by . Since , we obtain
[TABLE]
Now, averaging bound (74) over all indices and upperbounding the sum by yields the desired result in the corollary. ∎
The last result of this section contains two more simple upper bounds on . For small total cache size one of them is achieved by assigning the entire cache memory to the weakest receiver and applying superposition piggyback coding. For large total cache size the other is achieved by generalized coded caching with parameter , and by first applying the cache assignment corresponding to this scheme followed by a uniform cache assignment of any remaining cache memory.
Corollary 11
For total cache size :
[TABLE]
and
[TABLE]
For small cache sizes,
[TABLE]
(75) holds with equality.
For large cache sizes,
[TABLE]
(76) holds with equality.
Proof:
Upper bound (75) follows by specializing Corollary 10 to . Upper bound (76) is proved as follows. Relax Theorem 9 by replacing each parameter by and considering only the constraints (72) that correspond to sets , for . Finally, average the resulting inequalities and maximize over the input distribution .
The tightness of (75) for follows from Theorem 2. The tightness of (78) for follows from Proposition 8 because
[TABLE]
∎
We remark that for small total cache sizes, grows as . This corresponds to a perfect global caching gain, i.e., the same performance as in a system where each receiver can directly access all cache contents in the network. For large total cache sizes, grows only as . This corresponds to the local caching gain achieved by Proposition 1.
VI Examples
VI-A Erasure BCs
We specialize our results to erasure BCs where at time receiver ’s output equals the channel input with probability and it equals an erasure symbol “?” with probability . The erasure probabilities satisfy:
[TABLE]
For erasure BCs,
[TABLE]
Moreover, a Bernoulli- input distribution maximizes and simultaneously for all and auxiliaries that form the Markov chain . Therefore, Theorem 5 and Corollary 6 coincide. Also,
[TABLE]
Figure 4, depicts the upper and lower bounds on in Propositions 4 and 9. For comparison, also the upper bound in Theorem 5 under a uniform cache assignment
[TABLE]
is plotted. This proves numerically that a smart allocation of the total cache memory significantly increases the global capacity-memory tradeoff of erasure BCs when different receivers have different erasure probabilities.
Analytically, we can prove that for small total cache size any cache assignment that does not allocate all cache memory to the weakest receiver is suboptimal on the erasure BC. This follows from the achievability in Corollary 11 and the following Proposition 12.
Proposition 12
For given and ,
[TABLE]
The RHS of (82) is strictly less than unless or .
Proof:
See Appendix F. ∎
VI-B Noise-Free Bit-Pipe
Consider now the noise-free bit-pipe model with uniform cache assignment in [1]. It corresponds to an erasure BC where each receiver has zero erasure probability,
[TABLE]
We adopt the “source-coding perspective” of [1], and assume equal cache size
[TABLE]
From the upper bound on in Theorem 5, the following lower bound on the minimum achievable delivery rate can be obtained as a function of the normalized symmetric cache size :
Corollary 13
For the noise-free bit-pipe model in [1]:
[TABLE]
Proof:
See Appendix G. ∎
Figure 5 compares this new converse result on with the existing converse results in [1], [9], and [10], and with the achievability result in [43]. The converse result in [10] is generally cumbersome to evaluate. The plot shows the numerical value calculated in [10].
VI-C Gaussian BCs
Finally, we specialize our results to memoryless Gaussian BCs. At time , the received symbol at receiver is
[TABLE]
where is the input to the channel and is an i.i.d. Gaussian process with zero mean and variance . The channel inputs are subject to an average block-power constraint . The receivers are ordered in increasing strength:
[TABLE]
By [60], for every set as defined in (15),
[TABLE]
where form the unique choice of real numbers in that sum to and satisfy
[TABLE]
In particular,
[TABLE]
Moreover, given a power constraint , a zero-mean variance- Gaussian input distribution maximizes and simultaneously for all and auxiliaries that form the Markov chain . Therefore, Theorem 5 and Corollary 6 coincide. Also,
[TABLE]
Figure 6 shows the upper and lower bounds on in Propositions 4 and 9. The five blue points indicate the rate-memory points , , , , and for a zero-mean variance- Gaussian distribution . For comparison, the figure also shows the upper bound in Theorem 5 for a setup with uniform cache assignment across all receivers. We observe that a smart cache assignment provides substantial gains in the capacity-memory tradeoff.
VII Summary and Conclusion
We have provided close upper and lower bounds on the global capacity-memory tradeoff of degraded BCs. The bounds coincide in the regimes of small and large total cache memory with thresholds depending on the BC statistics. For small cache memory sizes, the weakest receiver needs to be assigned all. In this regime, grows as , which corresponds to a perfect global caching gain where all receivers can benefit from all the cache contents of the network. This performance is achieved by the proposed superposition piggyback coding scheme, which provides each receiver virtual access to the weakest receiver’s cache contents. For the regime of moderate , we propose a generalized coded caching scheme, which assigns cache memories to all the receivers, with a larger cache memory the weaker a receiver is. Notice that the larger the total cache budget , the larger the coded caching parameter needs to be chosen. This leads to a decreasing global caching gain because with increasing the various cache memories have more and more overlapping contents which cannot provide global caching gains. As a consequence, the slope of the rate-memory tradeoff achieved by generalized coded caching decreases with increasing total cache budget . The same behaviour is also suggested by the upper bound. For parameter generalized coded caching and the corresponding cache assignment exactly achieve the global capacity-memory tradeoff. Once the total cache memory budget exceeds the corresponding cache budget, it is optimal to uniformly allocate all the remaining cache memory across all the receivers and to store the same content in the extra portions of the receivers’ cache memories. Here, grows as , which corresponds to a local caching gain. We conclude that assigning the total cache memory uniformly across all the receivers is highly suboptimal over noisy BCs, in contrast to the noiseless setup considered in [1].
Appendix A Proof of Upper Bound in Theorem 5
Fix the rate of communication
[TABLE]
Since is achievable, for each sufficiently large blocklength and for each demand vector , there exist caching functions \big{\{}g_{k}^{(n)}\big{\}}, an encoding function , and decoding functions \big{\{}\varphi_{k,\mathbf{d}}^{(n)}\big{\}} so that the probability of worst-case error tends to 0 as .
Fix and a sufficiently large blocklength (depending on this ). Let
[TABLE]
denote the cache contents corresponding to the chosen caching function, and let for each demand vector with all different entries
[TABLE]
denote the input of the degraded BC corresponding to the chosen encoding functions. Let denote the corresponding channel outputs at receiver .
Lemma 14
There exist random variables and for each set as in (15) random variables , so that given :
[TABLE]
forms a Markov chain and the following inequalities hold:
[TABLE]
Proof:
The proof is similar to the converse proof of the capacity of degraded BCs without caching [59].
Since the worst case error probability is bounded by , using Fano’s inequality we have
[TABLE]
where uses Fano’s inequality as well as the fact that all messages are independent. Recall that the demand vector has all different entries.
We next develop the second summands in (94) and (94). For the second summand in (94) we write
[TABLE]
where denotes a random variable that is uniformly distributed over and independent of all previously defined random variables, and where
[TABLE]
Define further for :
[TABLE]
and
[TABLE]
For , we expand the second summand in (94) as:
[TABLE]
where (a) follows from the degradedness of the outputs.
Similarly, we also have
[TABLE]
It can be verified that the defined random variables satisfy Conditions (92). Combining this observation with (94)–(97) concludes the proof. ∎
We average the bounds in (93) over demand vectors. Let be the set of all the -dimensional demand vectors with all distinct entries. Also, let be a uniform random variable over the elements of and independent of all other random variables. Define for each set as in (15): ; , for ; ; and for .
Notice that the defined random variables satisfy conditions (14b) and (64) in the theorem. It remains to prove that they also satisfy (65). To this end, we average inequalities (93) over all the demand vectors in . Using standard arguments to take care of the time-sharing random variable , and defining
[TABLE]
we obtain for each as in (15):
[TABLE]
Lemma 15
For each set , parameters satisfy the following constraints:
[TABLE]
Proof:
See Appendix B. ∎
By (99)–(100) and letting , the following intermediate result—which is used in other proofs in this paper—is obtained.
Lemma 16
There exist random variables and for every receiver set as in (15) random variables , so that (14b) and (64) hold, and for all :
[TABLE]
By the following Lemma 17, because constraints (101) are increasing in , and by constraint (100c), we conclude that the choice in (63) makes the upper bound (101) loosest. The following Lemma 17 thus concludes the proof.
Lemma 17
Lemma 16 remains valid, if parameters are further constrained to satisfy for each one of the two following conditions:
- •
; or
- •
.
Proof:
See Appendix C. ∎
Appendix B Proof of Lemma 15
We only prove the lemma for . The other proofs are similar.
We first prove (100a). Every is non-negative, because mutual information is non-negative. To prove the upper bound in (100a), we proceed as follows. Let be the set of -dimensional demand vectors that have distinct entries in ; and for each and each dimensional demand vector , define . We have:
[TABLE]
where holds because for each value of and there are ordered demand vectors with and with ; (b) holds by the independence of the messages; (c) holds because for any random tuple it holds that ; and (d) holds because cannot exceed . This concludes the proof of (100a).
To prove constraint (100b), we fix a -dimensional demand vector , and consider the cyclic shifts of this vector. For , let be the vector obtained from when the elements are cyclically shifted positions to the right. (For example, if then .) For each and , let denote the -th index of demand vector . So,
[TABLE]
where for each positive integer the term takes value in so that
[TABLE]
For each and with , we write
[TABLE]
where (a) follows by (103) and (b) is by the independence of messages.
Fix a demand vector and sum up the above inequality (B) over all cyclic shifts of to obtain:
[TABLE]
Since the set can be partitioned into subsets of demand vectors that are cyclic shifts of each others and all cyclic shifts of a demand vector in are also in , we conclude from (106):
[TABLE]
This proves (100b).
We proceed to prove constraint (100c). For each :
[TABLE]
So,
[TABLE]
where (a) holds by the chain rule of mutual information, (b) by the independence and uniform rate of messages and the definition of the set , which is of size , and (c) by the generalized Han-Inequality (the following Proposition 18).
Proposition 18
Let be a positive integer and be a finite random -tuple. Denote by the subset . For every :
[TABLE]
Proof:
See [62, Theorem 17.6.1]. ∎
Appendix C Proof of Lemma 17
We prove the lemma by contradiction. Fix a random tuple satisfying (14b) and for each set as in (15) a random tuple satisfying (64) and real numbers satisfying (100).
Assume that for some set as in (15) and some :
[TABLE]
and
[TABLE]
Let
[TABLE]
Notice that by (111):
[TABLE]
Define the new parameters
[TABLE]
Notice that this new set of parameters satisfies constraints (100) when are replaced by . In particular,
[TABLE]
We will show that there exist new auxiliary random variables satisfying the Markov chain (64), and so that upper bound (93) is looser for these new auxiliares and the new parameters than for the original auxiliaries and parameters .
To simplify notation in the following, we define
[TABLE]
Notice that since and by (100b), the strict inequality
[TABLE]
must hold. Choose
[TABLE]
and
[TABLE]
The choice of depends on whether
[TABLE]
If (120a) holds, choose
[TABLE]
If (120b) holds, let be a Bernoulli- random variable independent of everything else, where
[TABLE]
Choose
[TABLE]
Notice that in both cases the proposed choice satisfies the Markov chain .
Trivially, for k\notin\big{\{}\tilde{k},\tilde{k}+1\big{\}}, constraint (93) is unchanged if we replace by and by .
If (120a) holds, then the proposed replacement relaxes constraint (93) for (because ) and it tightens it for (because ). However, the new constraint for is less stringent than the original constraint for :
[TABLE]
where (a) holds by (114c); (b) holds by (117); and (c) holds by holds by assumption (120a). We conclude that when (120a) holds, the upper bound on in (93) is relaxed if everywhere one replaces
and by and .
We now assume that (120b) holds. We show that the new constraints obtained for and for cannot be more stringent then the tighter of the two original constraints for and .
Consider . By (122) and (123) we have
[TABLE]
[TABLE]
Let now . We have:
[TABLE]
where (a) follows by the definition of and ; (b) by the Markov chain (64); (c) by the chain rule of mutual information and Markov chain (64); (d) by the degradedness of the channel (14b); (e) by the definition of in (122).
Therefore, by (114c):
[TABLE]
We thus conclude that also when (120b) holds, the upper bound on in (93) is relaxed if one replaces and by and .
Appendix D Proof of Remark 3
We first prove that the bound in Theorem 5 is loosened when each is replaced by . Consider the intermediate Lemma 16 in the proof of Theorem 5, Appendix A. Relax the upper bound in this lemma by replacing for constraint (100a) by
[TABLE]
Following similar steps as in the proof of Lemma 17, see also [26, Lemma 12], it can be shown that this relaxed upper bound is not changed when one imposes that
[TABLE]
Since constraints (101) are increasing in , by constraint (100c), we conclude that the relaxed upper bound is loosest for
[TABLE]
i.e., for .
We now prove that the bound in Theorem 5 is loosened when each is replaced by . Consider again the intermediate Lemma 16 in Appendix A. Relax constraint (100a) by replacing it with , for all . Following the steps in [26, Lemma 12], it can be shown that the new constraints are loosest if each
[TABLE]
This concludes the proof.
Appendix E Proof of Proposition 8
For , achievability follows by specializing Theorem 3 to and to the input distribution that maximizes (70). In fact, for this input distribution:
[TABLE]
For , achievability follows from Proposition 1.
The converse is proved as follows. Apply Theorem 5, but consider only the constraints (65) corresponding to the sets , for . Taking the average over the resulting constraints, establishes that there exists a random variable satisfying (14b) and so that
[TABLE]
Maximizing the right-hand side over input distributions yields the desired converse.
Appendix F Proof of Proposition 12
Relax the upper bound in Theorem 5 by considering constraints (65) only for the set of all receivers , and by replacing each by . Specializing the resulting relaxed bound to the erasure BC, one obtains the following upper bound:
[TABLE]
where the maximization is over the choice of parameters satisfying
[TABLE]
The upper bound in the proposition is established by solving this maximization problem. In fact, by noticing that the bound is increasing in , and by first fixing and optimizing over the choices summing to , we obtain
[TABLE]
If
[TABLE]
then the maximum is achieved at and the upper bound results in
[TABLE]
Otherwise the maximum is at , where
[TABLE]
and the upper bound results in
[TABLE]
where we used that for erasure BCs
[TABLE]
Appendix G Proof of Corollary 13
Fix and . For the considered channel
[TABLE]
The upper bound in Corollary 6 thus states that for this noise-free BC a rate-memory tuple is achievable only if
[TABLE]
This is equivalent to the following bound on the capacity-memory tradeoff
[TABLE]
Notice that the sum takes on only two different values, depending on the outcomes of the minimizations defining . It is either
[TABLE]
Combining (142) with (143), applying the correspondence and , and setting yields,
[TABLE]
which is equivalent to the bound in the corollary.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. A. Maddah-Ali, U. Niesen, “Fundamental limits of caching,” in IEEE Trans. on Inform. Theory , vol. 60, no. 5, pp. 2856–2867, May 2014.
- 2[2] Z. Chen, P. Fan, and K. B. Letaief, “Fundamental limits of caching: Improved bounds for small buffer users,” IET Commun. , 2016, Vol. 10, Iss. 17, pp. 2315–2318.
- 3[3] C. Tian, “A note on the fundamental limits of coded caching,” ar Xiv , 1503.00010 v 1, Feb. 2015.
- 4[4] K. Wan, D. Tuninetti, and P. Piantanida, “On the optimality of uncoded cache placement,” in Proc. IEEE ITW , Cambridge,UK, 2016, pp. 161–165.
- 5[5] K. Wan, D. Tuninetti, and P. Piantanida, “On caching with more users than files,” in Proc. IEEE ISIT , Barcelona, Spain, July 2016, pp. 135–139.
- 6[6] A. Sengupta, R. Tandon, and T. C. Clancy, “Improved approximation of storage-rate tradeoff for caching via new outer bounds,” in Proc. IEEE ISIT , Hong Kong, China June 2015, pp. 1691–1695.
- 7[7] S. Sahraei and M. Gastpar, “K users caching two files: An improved achievable rate,” in Proc. CISS , pp. 620–624, Mar. 2016.
- 8[8] C. Tian and J. Chen, “Caching and delivery via interference elimination,” in Proc. IEEE ISIT , Barcelona, Spain, July 2016, pp. 830–834.
