Learning to Cooperate in D2D Caching Networks
Georgios S. Paschos, Apostolos Destounis, George Iosifidis

TL;DR
This paper introduces an online learning algorithm for D2D caching networks that optimally manages content storage and retrieval, minimizing delivery costs without prior knowledge of request patterns.
Contribution
It proposes a distributed online gradient descent-based policy that adapts caching decisions in real-time, achieving asymptotic optimality for arbitrary request processes.
Findings
The algorithm effectively reduces delivery costs in simulated D2D networks.
It operates without prior knowledge of request distributions.
The policy is scalable and suitable for distributed implementation.
Abstract
We consider a wireless device-to-device (D2D) cooperative network where memory-endowed nodes store and exchange content. Each node generates random file requests following an unknown and possibly arbitrary spatio-temporal process, and a base station (BS) delivers any file that is not found at its neighbors' cache, at the expense of higher cost. We design an online learning algorithm which minimizes the aggregate delivery cost by assisting each node to decide which files to cache and which files to fetch from the BS and other devices. Our policy relies on the online gradient descent algorithm, is amenable to distributed execution, and achieves asymptotically optimal performance for any request pattern, without prior information.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Distributed Popularity Learning for D2D Caching
Georgios S. Paschos, Apostolos Destounis, George Iosifidis
G. S. Paschos, A. Destounis are with the Mathematical and Algorithmic Laboratory, Huawei, France, ([email protected]; [email protected]); G. Iosifidis is with Trinity College Dublin. This work was supported by Science Foundation Ireland, grant 17/CDA/4760.
Online D2D Caching Policies
Georgios S. Paschos, Apostolos Destounis, George Iosifidis
G. S. Paschos, A. Destounis are with the Mathematical and Algorithmic Laboratory, Huawei, France, ([email protected]; [email protected]); G. Iosifidis is with Trinity College Dublin. This work was supported by Science Foundation Ireland, grant 17/CDA/4760.
Distributed Online Policies for D2D Caching Networks
Georgios S. Paschos, Apostolos Destounis, George Iosifidis
G. S. Paschos, A. Destounis are with the Mathematical and Algorithmic Laboratory, Huawei, France, ([email protected]; [email protected]); G. Iosifidis is with Trinity College Dublin. This work was supported by Science Foundation Ireland, grant 17/CDA/4760.
Distributed Online Policies for D2D Caching
Georgios S. Paschos, Apostolos Destounis, George Iosifidis
G. S. Paschos, A. Destounis are with the Mathematical and Algorithmic Laboratory, Huawei, France, ([email protected]; [email protected]); G. Iosifidis is with Trinity College Dublin. This work was supported by Science Foundation Ireland, grant 17/CDA/4760.
Learning to Cooperate in D2D Caching Networks
Georgios S. Paschos, Apostolos Destounis, George Iosifidis
G. S. Paschos, A. Destounis are with the Mathematical and Algorithmic Laboratory, Huawei, France, ([email protected]; [email protected]); G. Iosifidis is with Trinity College Dublin. This work was supported by Science Foundation Ireland, grant 17/CDA/4760.
Abstract
We consider a wireless device-to-device (D2D) cooperative network where memory-endowed nodes store and exchange content. Each node generates random file requests following an unknown and possibly arbitrary spatio-temporal process, and a base station (BS) delivers any file that is not found at its neighbors’ cache, at the expense of higher cost. We design an online learning algorithm which minimizes the aggregate delivery cost by assisting each node to decide which files to cache and which files to fetch from the BS and other devices. Our policy relies on the online gradient descent algorithm, is amenable to distributed execution, and achieves asymptotically optimal performance for any request pattern, without prior information.
I Introduction
The rapidly growing demand for mobile content delivery [1] creates new revenue opportunities for wireless networks, but also requires to increase rapidly their capacity. Unfortunately, typical solutions based on PHY-layer advances or network densification are constantly outpaced by the increasing demand [19], and this calls for new content delivery approaches. To this end, a potentially game-changing idea is to employ device-to-device (D2D) communications where memory-endowed user devices can cache popular files and exchange them with each other upon request [9]. Such cooperative D2D swarms can increase the network content delivery capacity, mitigate cellular congestion, and improve the end-user experience.
A key challenge in D2D cooperative caching is to design the caching policy, i.e., identify which files to cache at each device at any given time [20]. On the one hand the devices have small storage and therefore can store only a small subset of the possible content files; on the other hand each of them “sees” a small number of requests per unit time and hence estimating the popular files at each location becomes very challenging. In view of these limitations, the devices may consistently fail to store in their cache the files that will be requested in the future by their neighbors, rendering the D2D cache-hit ratio practically negligible and hence this solution ineffective.
Caching policies often assume that requests are generated by a given stationary process, cf. [19], and systems in practice rely on reactive policies such as LFU and LRU. These, however, solve the 1-cache problem conditionally on the request process. For example, LFU is suitable for stationary requests [7], and LRU for the adversarial model [17]. When the actual process is other than assumed these policies perform poorly [18], and this problem is exacerbated in D2D networks where popularity has hot spots in time and space. Recently, [8, 14, 2] proposed dynamic policies for caching networks, e.g., the m-LRU [8] or “lazy rule” [14] policies, which however do not offer performance guarantees. Hence, the problem of designing a policy robust to the D2D network dynamics is an equally challenging and important open problem.
In this cooperative D2D network, users might change positions and content preferences, and hence we need an algorithm that will allow each device to decide: (i) which files to cache for serving its neighbors’ needs; (ii) from which neighbors to retrieve a file, and when to use the last-resort solution of the base station (BS), Fig. 1. This requires a learning mechanism for making caching and routing decisions in an online and decentralized fashion. Previous efforts employing learning are restricted either to inference of the popularity model [3, 4]; or rely in Q-learning [21, 24] and classification techniques [15] to estimate request frequencies. These approaches are not applicable to this D2D scenario as they are centralized, exhibit often high complexity, presume a stationary model, and do not decide routing. Other interesting suggestions for D2D networks, see [13, 6], suffer from computational complexity or rely on assumptions that are valid only in some cases, e.g., known popularity. Finally, [11] proposes a decentralized D2D file sharing algorithm which however needs access to the pattern of file requests and network evolution.
Here we design a caching policy that has universally-optimal performance, which is defined as the cost for delivering the requested files to users through (cheap) D2D or (costly) BS-to-device transmissions. We formulate the D2D caching operation as an online convex optimization (OCO) problem, and develop a dynamic and distributed algorithm that solves it without the need to make any assumption about the request pattern. That is, our policy ensures asymptotically no regret, as it achieves no more average cost than a static caching configuration selected with knowledge of future requests.
The contributions of this paper can be summarized as follows. (i) We propose the idea of embedding a distributed online learning mechanism to D2D caching policies. We achieve this by formulating an OCO problem, and this opens a link between caching and this novel machine learning tool. (ii) We design an online caching policy that leverages the online gradient descent algorithm to achieve asymptotically optimal performance under any possible spatio-temporal request pattern in our D2D network. We also explain how our policy can adapt to network changes and user churn. (iii) We compare our policy with the state-of-the-art mLRU and “lazy” LRU policies, verifying that it outperforms its competitors, while converging to the optimal static policy.
II System model
Network. Consider a set of wireless users in an area, each one with a cache of size . Let be the set of direct links connecting the users, where a link appears if the devices’ proximity and the electromagnetic environment allows it to be reliably established. A link between and is associated to a cost , which represents, e.g. application-layer latency performance or energy consumption during content transmission; and we assume . All users maintain a connection with a base station (BS, subscripted with [math]), and we define and . The users can obtain any file from the BS at cost . We assume that links can deliver the requested content in the considered time window.
Requests. There is a catalog with files of unit size. The system operation is time-slotted, and denotes the event that a request for file has been submitted by user during slot . At each we assume there is one request, or, from a different perspective, that the system decisions are updated after each request.111We can also consider batches of requests. If the batch has 1 request from each location, the pattern is biased to equal request rate at each location. An unbiased batch should contain an arbitrary number of requests from each location. Our guarantees hold for unbiased batches of arbitrary finite length. Hence, the request process is described by a sequence of vectors drawn from set:
[TABLE]
The instantaneous file popularity is expressed by the probability distribution (with support ), which is allowed to be unknown and arbitrary. The same holds for the joint distribution that describes the file popularity evolution, for any user location, and within an interval of slots. This generic model captures all possible spatio-temporal request sequences, including stationary (i.i.d. or otherwise), non-stationary, and adversarial models. The latter is the most general case, as they include request sequences selected by an adversary aiming to disrupt the system performance.
Caching. The cache of each user can store only files, but the BS has the entire catalog. Following the standard practice in wireless caching models [9, 19], we perform caching using the Maximum Distance Separable (MDS) codes. In MDS, the files are split into a fixed number of data chunks, and we store in each cache an amount of coded chunks that are pseudo-random linear combinations of the data chunks. Using the MDS properties, a user can decode the file (with high probability) if it receives any coded chunks. Hence, the caching decision vector has elements, where denotes the amount of random coded chunks of file stored at user during slot .222The fractional caching is supported by the observation that large files are composed of thousands chunks, stored independently, see literature of partial caching [16]. Hence, by rounding these fine-grained fractional decisions, we will only induce a small application-specific error. In some prior caching models, fractional variables represent probabilities of caching [22, 5]. Based on this, we introduce the convex set of eligible caching vectors:
[TABLE]
We are interested in distributed policies, where each user changes its cache based on information from its one-hop neighbors . Thus, we define:
Definition 1** (Local Caching Policy).**
A local caching policy for user is a (possibly randomized) rule
[TABLE]
The collection of the caching policies for all users will be henceforth referred to as a “caching policy”.
Routing. Since each user might have more than one neighbors, we introduce routing variables to determine the cache from which the requested file will be fetched. Let denote the portion of request that is fetched from cache , and we define the routing vector implemented in slot . There are two important remarks here. First, due to the coded caching model, the requests can be simultaneously routed from multiple caches. In terms of communications, this can be implemented through time-sharing among the activated links, or using concurrently different network interfaces. Second, the caching and routing decisions are coupled and constrained: (i) a request cannot be routed from an unreachable cache, (ii) we cannot route from a cache more data chunks than it has, and (iii) each request must be fully routed.
We define if and otherwise, and thus the set of eligible routing decisions conditioned on is:
[TABLE]
Note that does not appear in the second constraint, because the BS stores the entire catalog and can serve all users. This last-resort routing option ensures that is non-empty for any . As it will become clear next, the optimal routing decisions can be devised for a given cache configuration. This is an inherent property of link-uncapacitated caching networks, see also [9, 19].
III Problem Formulation
A file request of a node can be served, exclusively or partially (due to MDS), by neighboring devices at a smaller cost than fetching it from the base station. Given a cache configuration , the (minimum) cost to satisfy is:
[TABLE]
where the optimization decides the routing that minimizes the cost for a given file placement at the nodes. The function’s form suggests that is beneficial if the file has been cached at the device asking for it () or at nearby devices that can send it with low cost. However, it is daunting to assess the impact of , as it involves the solution of an optimization problem. Fortunately, the cost function above is convex:
Lemma 1**.**
Function is convex in its domain , .
Proof: Fix a request vector and consider cache configurations ; note that, for any , is also a valid configuration. We will show that:
[TABLE]
Let us denote the optimal routing vectors corresponding to , respectively. We then have:
[TABLE]
[TABLE]
It holds , thus
[TABLE]
Subscript at the cost function (2) reminds us its dependence on the request that is generated at . Since these events may vary according to a non-stationary process, we will use the concept of regret from online convex optimization [22].
We capture that the request sequence may follow any arbitrary and a priori unknown probability distribution, by using the idea of an adversary which selects at each slot , while knowing . This assumption reflects that, in practice, caches are populated before the requests are issued. Since by Lemma 1 are convex, our problem falls in the Online Convex Optimization framework [22]. The performance metric of an algorithm in this line of work is the regret: the difference between costs incurred by the algorithm and the best static configuration in hindsight. In our case, this benchmark is the optimal cache configuration (same for all slots) devised with knowledge of all requests in the time horizon of interest . Hence, the regret of policy is:
[TABLE]
The expectation is over the joint probability distribution of requests and possible randomizations in and,
[TABLE]
is the best fixed action in hindsight, i.e. the best chunk placement over the entire sample path of requests. Our goal is to devise a policy whose regret scales sublinearly with :
[TABLE]
This “no regret” property implies that the algorithm learns to perform as good as the best cache configuration . Note that if the requests are i.i.d. “no regret” implies that the performance of the policy approaches the optimal in terms of . However, our adversarial model is much more general; in this case, comparing to a static policy is a way to limit the power of the adversary while still being able to obtain meaningful policies, which are robust for all request models.
IV Distributed D2D Caching Algorithm
Our distributed caching algorithm is based on online gradient descent [25]. The main idea is to use the first order approximation as a predictor of the unknown function that the adversary will select next. The caching configurations, then, are updated by taking an appropriate step in the direction of the gradient .
IV-A Finding the Direction of Improvement
Since the utility function (2) is not necessarily differentiable everywhere, we will rely on subgradients. In order to find one, we first simplify . Let us denote the user making the request and the file requested at slot , respectively, and the set of the nodes (including the BS) connected to user . Then, it is , and hence simplifies to:
[TABLE]
Equations (6)-(8) define an optimization problem, henceforth referred to as , the solution of which yields the optimal routing for any (constant input for ). That is, to evaluate at vector we need to solve . Despite this intricate form of , we show that it is possible to obtain a subgradient which is needed for our online caching algorithm.
We first define the Lagrangian of as follows:
[TABLE]
where and are the dual variables, and we simplified notation by dropping . We will prove that the subgradient of at is the optimal dual variables for (8) in .
Lemma 2** (Subgradient).**
Let:
[TABLE]
and define:
[TABLE]
Then is a subgradient of at , that is: .
Proof. We start by denoting the outcome of (10) for cache configuration and define the function:
[TABLE]
[TABLE]
where () holds since (6)-(8) has the strong duality property; (b) holds since is linear and we can maximize successively over the different primal or dual variables; and (c) holds as only in depends on . Due to strong duality for , we can replace , and then suffices to rearrange terms.
Since is the optimal multiplier for (8), it has nonzero elements only where this constraint is tight. Intuitively, this means that after user requests file , the direction of the subgradient is towards caching more parts of this file at user and at users having low-cost D2D links with .
IV-B Algorithm Design
The Distributed Online Caching Policy (DOCP) is shown in Algorithm 1. The execution of the policy is iterative, where in each slot the following steps take place. First, a user submits a request for a file (step 3). This user solves (6)-(7) to find the optimal routing for the current caching configuration (step 4), and requests the parts of from the respective neighbors or the BS (step 5). A certain utility is accrued based on this routing and the existing (that was calculated based on previous requests). Then, user sends the optimal multiplier to each neighbor (step 8) who updates its caching policy accordingly. This involves calculating the new , based on the latest request, and projecting them back into the feasible space (step 9):
[TABLE]
where is the Euclidean projection on , and has zero elements except the -th element being equal to 1.
Note that DOCP is indeed distributed, since only the neighbors of each requester need to update their caches. Moreover, this update is based solely on messages received by the requester, and these communication overheads are moderate as only the Lagrange multipliers are sent to 1-hop neighbors. Finally, the projection operation can be executed efficiently, i.e., in runtime, and for each user independently, by using the local projection algorithm introduced in [18]. We omit the details here due to lack of space.
IV-C Performance Guarantees
The next theorem proves that DOCP achieves no regret performance, under any possible spatio-temporal arrival pattern.
Theorem 1** (Regret of DOCP).**
For step size , the regret of DOCP satisfies:
[TABLE]
where, we defined the parameters , and .
Proof: Using non-expansiveness of Euclidean projection:
[TABLE]
Also, a telescopic sum over slots gives
[TABLE]
To proceed, note that and . Using that and rearranging:
[TABLE]
Furthermore, due to convexity of , it holds , thus:
[TABLE]
The value of the step size and regret bound follow by minimizing the Right Hand Side of the above inequality.
Hence, no regret is achieved with a constant step which depends on , and if is unknown we can select step , or employ the doubling trick, see [22, Sec. 2.3].
IV-D Dynamic Network Costs
The above model and analysis can be readily extended for the case where users change positions in different slots; or the user population evolves with time; or, finally, the users are static but the channel conditions vary. In particular, these scenarios can be captured by the updated cost function:
[TABLE]
where we have replaced the previously constant link costs with slot-specific ones . For instance, if link exists in slot but not in slot (e.g., nodes have moved farther), then we can use which will make this link non-eligible for DOCP (the BS is available and cheaper). It is interesting to note that this extension does not change the regret bound, which is set by the highest cost of the available links, that remains the one between any device and the BS.
V Numerical Results
We illustrate the performance of DOCP in a setting with files, and devices which are equipped with a cache of capacity and are placed randomly in a cell of size Km. Devices can communicate if they are within a range of m, as in current LTE-direct standards [10]. The (relative) cost of downloading a file from the base station is set to ; a device can fetch a file from its cache at no cost; and the respective costs from other devices vary with the distance: if the device is closer that m, if the distance is within , if in , and if in . File requests are drawn from a power law distribution with exponent . We compare DOCP with the best static policy in hindsight and the lazy LRU and mLRU.
Our results are presented in Fig. 2 which shows the empirical average of the cost for the different policies. We observe that DOCP outperforms both LRU and mLRU, and the margin gets wider as time progresses. In addition, the performance of DOCP gets closer to the one of the best policy in hindsight, thus verifying the no-regret theoretical guarantee. Figure 3 compares the total cache allocation, i.e., the total fraction of each file cached at the devices, for DOCP and the best static hindsight policy. We see that, while the algorithm starts from an almost uniform allocation, by the end of the time interval the DOCP cache contents are very aligned with the best configuration in hindsight. This demonstrates that DOCP indeed tends to learn the best static configuration.
VI Conclusions
D2D cooperative caching is certainly very promising, but raises previously unseen challenges in devising effective caching policies. Here, we used OCO, a fast-developing area of machine learning, to design an online distributed caching and routing policy that adapts to any (unknown) spatio-temporal request process. This makes it an ideal candidate for such dynamic, often sparse, caching networks. Our work opens a new exciting area at the nexus of online learning and D2D caching systems, and a fascinating next step is to explore how such mechanisms can incorporate incentives for ensuring users’ cooperation, leveraging credit mechanisms [12] or the human tendency to build reciprocal sharing relationships [23].
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Cisco visual networking index: Global mobile data traffic forecast update 2017–2022. White Paper , 2015.
- 2[2] K. Avrachenkov, J. Goseling, and B. Serbetci. A low-complexity approach to distributed cooperative caching with geographic constraints. Proc. of ACM Meas. Anal. Computing Systems , 1(1):1–827, 2017.
- 3[3] E. Baştuğ et al. A transfer learning approach for cache-enabled wireless networks. In Proc. of Wi Opt , May 2015.
- 4[4] B. N. Bharath, K. G. Nagananda, and H. V. Poor. A learning-based approach to caching in heterogenous small cell networks. IEEE Trans. on Communications , 64(4), 2016.
- 5[5] B. Blaszczyszyn and A. Giovanidis. Optimal geographic caching in cellular networks. ar Xiv:1409.7626 , 2014.
- 6[6] B. Chen and C. Yang. Caching policy for cache-enabled d 2d communications by learning user preference. IEEE Trans. on Communications , 66(12), 2018.
- 7[7] C. Fricker, P. Robert, and J. Roberts. A versatile and accurate approximation for LRU cache performance. In ITC , 2012.
- 8[8] A. Giovanidis and A. Avranas. Spatial multi-LRU: Distributed caching for wireless networks with coverage overlaps. ar Xiv:1612.04363 , 2016.
