Dynamic Content Updates in Heterogeneous Wireless Networks
Mehdi Salehi Heydar Abad, Emre Ozfatura, Ozgur Ercetin, Deniz Gunduz

TL;DR
This paper proposes a learning-based cache refreshment strategy for edge content storage in wireless networks to optimize user satisfaction while managing network costs.
Contribution
It introduces a novel cache refreshment approach that adapts to user content age tolerance using learning techniques, balancing QoS and network efficiency.
Findings
Enhanced user satisfaction through adaptive cache updates
Reduced network costs compared to frequent refresh strategies
Effective learning of user content age preferences
Abstract
Content storage at the network edge is a promising solution to mitigate the excessive traffic load due to on-demand streaming applications as well as to reduce the streaming delay. To this end, cache-enabled cellular architectures can be utilized to increase the provided quality-of-service (QoS) and to reduce the network cost. However, there are certain issues to be considered in the design of the content storage strategy such that the contents should be refreshed in order to responds user`s expectations. Using a frequent cache refreshment strategy the ratio of satisfied users can be increased at an increasing network cost. In this paper, we introduce a cache refreshment strategy via leveraging learning techniques so that users' tolerance to the age of content is learned and the content is refreshed accordingly.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Opportunistic and Delay-Tolerant Networks · Cooperative Communication and Network Coding
Dynamic Content Updates in Heterogeneous Wireless Networks
Mehdi Salehi Heydar Abad This work was in part supported by EC H2020-MSCA-RISE-2015 programme under grant number 690893. {mehdis,oercetin}@sabanciuniv.edu
Emre Ozfatura
{m.ozfatura,d.gunduz}@imperial.ac.uk
Ozgur Ercetin
{mehdis,oercetin}@sabanciuniv.edu
Deniz Gündüz
{m.ozfatura,d.gunduz}@imperial.ac.uk
Abstract
Content storage at the network edge is a promising solution to mitigate the excessive traffic load due to on-demand streaming applications as well as to reduce the streaming delay. To this end, cache-enabled cellular architectures can be utilized to increase the provided quality-of-service (QoS) and to reduce the network cost. However, there are certain issues to be considered in the design of the content storage strategy such that the contents should be refreshed in order to responds user‘s expectations. Using a frequent cache refreshment strategy the ratio of satisfied users can be increased at an increasing network cost. In this paper, we introduce a cache refreshment strategy via leveraging learning techniques so that users’ tolerance to the age of content is learned and the content is refreshed accordingly.
Index Terms:
Content caching, content refreshment, Markov decision process (MDP), quality of service, multi-armed bandit (MAB)
I Introduction
While proactive content caching has received significant interest in the recent years, most of the existing strategies in the literature (both with uncoded [1, 2, 3] and coded placement [4, 5]) have been designed under the assumption that content popularities are known in advance. Although it is possible to observe the global popularity of contents in on-demand video streaming services, such as YouTube [6], small-cell base stations (SBSs) usually serve a small geographical area, where the local content popularity might not be aligned with the global popularity [7]. This mismatch between the local and global content popularities requires the design of predictive caching policies that aim to learn the local content popularity from the user requests. Predictive caching policies can be classified into two main groups, namely predictive caching with unknown popularities [8, 9, 10] and predictive caching with time-varying popularities [11, 12, 13]. In [8, 9], the authors focus on a single SBS and model the predictive caching problem as a multi-arm bandit problem, in which the received user requests are utilized to predict the file popularities, and the optimal caching strategy is obtained by taking into account the cost of file replacements. In [10], this approach has been extended to a cooperative caching framework, where the SBSs fetch the requested content from the neighboring SBSs if the corresponding file is cached there. Another strategy for predictive caching follows a user-centric approach, possibly implemented at a higher layer, and records user requests for different contents as a matrix. Future user requests are predicted using matrix completion techniques by exploiting the correlations among the requests for different files, similarly to recommendation systems [14]. In [11], a cache replacement strategy has been introduced for time-varying popularity scenario to maximize the local service rate with a minimum replacement cost, while a more theoretical approach is taken in [13], which studies the cache update policy in the case of time-varying content popularities. In parallel to the aforementioned works, another relevant research direction is contextual predictive caching, where various features of the requests, such as genre of the video file, age of the user, or the time of the requests, are utilized to predict future requests and shape the caching strategy accordingly [15, 16]. Although the predictive caching framework is highly effective to increase the efficiency of edge caching, there are certain limitations. Most of the aforementioned predictive caching strategies are designed to predict only the content popularity. However, in many applications, e.g., news, weather, etc., freshness of the content is another important factor for the user satisfaction. Content caching and refreshment problem has been previously studied in [17, 18]. In this paper, we consider a heterogeneous cellular network with cache enabled SBSs and provide a cost aware content update policy. The proposed content update strategy consists of two parts; in the first part, we show that the structure of the optimal periodic content update policy that minimizes the network cost for given users’ tolerance to the age of the content is of threshold type, and in the second part, we utilize the multi-armed bandit approach to learn users’ tolerance to the age of the content. To the best of our knowledge, ours is the first work that utilizes the learning framework to analyze the user behavior based on the content age. Accordingly, we design an online cache refreshment policy to minimize the overall network cost.
II System model and problem formulation
We consider a cellular network with a macro base station (MBS) and a SBS serving the users in a cell. It is assumed that both the MBS and the SBS are equipped with cache memories, storing a library of distinct dynamic contents, denoted by . We note that each dynamic content has a different popularity, i.e., the probability that a user requests content is , for .
II-A Content freshness
Dynamic contents, such as news videos, traffic and weather updates, may change frequently over time. While we assume that the MBS always has the fresh content updates, thanks to its relatively higher-bandwidth connection to the content server in the core network, the SBS needs to regularly refresh the dynamic contents in its cache to keep them up-to-date. For the SBS, downloading all the contents from the MBS through its limited backhaul link is costly in terms of energy, time and spectrum.
We consider a discrete time system model with equal-length time slots. At the beginning of each time slot the SBS decides on which contents to be updated. We assume that a content is updated at the beginning of the next time slot. The corresponding decision vector is denoted by , i.e., when the application is updated at the end of time slot , , and otherwise.
We denote by the age of the content in the SBS cache at time slot . We assume a maximum age at which a content becomes obsolete. In other words, the age of a content increases until it becomes obsolete. Accordingly, age of content , , evolves over time in the following way:
[TABLE]
We denote the length- vector of ages associated with all the dynamic contents in the library by .
II-B User behavior
Let denote the number of users that request a content at time slot . Whenever a user requests a content, the request is first off-loaded to the SBS to be served. Users have different tolerance levels to the age of the contents they receive. Hence, we consider that, with probability , a user is not satisfied with the age of content , and thus, it places another request for content . In that case, the new user request is served directly by the MBS with a fresh content. Let be the number of users that request content , , in time slot , which is governed by the popularity profile . The number of users that request content is split into two disjoint sets, where the first set of users are redirected to the MBS, while the second set consists of the users satisfied with the service provided by the SBS. We denote these numbers by , , respectively. Note that and are governed by the random process .
Let be the vector associated with the number of redirected users for each content. We have the following:
[TABLE]
The corresponding expected values of these parameters are given as:
[TABLE]
II-C Decision model and the problem formulation
Let be the cost associated with serving the users redirected to the MBS at time , and the backhaul cost associated with updating the contents, if there is any. In this work, we assume that this cost is linear in 111For example, in OFDMA, a user re-directed to the MBS is assigned a subcarrier, and the power allocated to that subcarrier adds linearly to the energy cost.. Hence,
[TABLE]
Similarly, we define the back-haul cost function which is also a linear function.
[TABLE]
where , with being the average back-haul cost of updating application . If the SBS decides to update the content (or multiple contents), the age of the content is updated at the end of the time slot. Hence,
[TABLE]
Note that updating a content has an immediate cost which is larger than not-updating. However, the incurred extra cost in updating the content enables more users to be served at a local SBS.
We aim at minimizing the average total cost as follows:
[TABLE]
III MDP formulation
Define the state of the system to be . We denote by the differential value function at state . The differential Bellman equations can be written as:
[TABLE]
where is the optimal average cost and is the differential action-value function defined by:
[TABLE]
where is the transition probability from state into when action is taken which is governed by (1), and
[TABLE]
The MDP associated with the average cost minimization problem can be solved by well known value iteration algorithm. However, the cardinality of state space (i.e., ) and action space (i.e., ) grow exponentially with the number of contents. Hence, the curse of dimensionality is the bottleneck for an efficient solution. To bypass this bottleneck, we note that the cost function in (6) is linear and the transition probabilities of each content does not affect the other. Hence, we can separate the value function in (10) into independent value functions each representing a distinct application. For each application , we have
[TABLE]
We have developed a framework that has enabled distributed policies with respect to the individual contents. We will show that for each content, there exists a threshold policy on the age of the content for which it is optimal to update the content. The following lemma establishes the key property used to prove the structure of the optimal policy.
Lemma 1**.**
The differential value function for all is non-decreasing with respect to the age of the content, .
Proof.
We use the value iteration algorithm to prove the lemma. We start by an arbitrary differential value function and obtain the -step differential value function as follows:
[TABLE]
Note that . The proof is by induction. For , which is the minimum of two non-decreasing functions, and thus, itself is a non-decreasing function in . Assume that the lemma holds for . Then according to (15), is also a non-decreasing function with respect to . By letting , we conclude the proof by showing that is also a non-decreasing function in . ∎
The lemma is intuitively clear considering the non-decreasing property of the cost functions and Bellman equations in (14).
Theorem 1**.**
For each content the optimal policy minimizing the average cost is a threshold policy.
Proof.
The monotonicity of the differential value functions prove the optimality of the threshold policy [19, Chapter 7]. Intuitively, due to the non-decreasing property of the , at some age, it would be optimal to update the content. Since the differential value function is non-decreasing, a larger, or smaller age would not be able to yield a smaller average cost. ∎
IV Learning Content Popularity and Age Tolerance
In the previous section, we showed that the problem is separable and thus, the optimization can be performed for each content separately. Second, we proved that the policy minimizing the cost is a threshold policy. Hence, the SBS by monitoring the age of the contents individually, needs to optimize according to a single threshold for each content. Under the threshold policy, the age of a content increases linearly until it reaches the threshold wherein the content will be updated and the age will refresh to a value of zero. Thus the minimum cost associated with content is the solution of:
[TABLE]
where
[TABLE]
Considering the linearity of the cost functions, the average cost optimization becomes:
[TABLE]
The equivalent optimization problem depends on the redirection probabilities , that are unknown. Hence, in the following we resort to reinforcement learning methods to infer the redirection probabilities.
We consider a sequential learning framework in which the SBS at each iteration of the learning algorithm faces choosing a threshold . After choosing the threshold, the SBS will observe a random cost associated with its decision;
[TABLE]
The learning algorithm should provide the SBS a method to adjust its strategy by observing the outcomes of its decision. This resembles the well-known multi-armed bandit (MAB) problem. In MAB, each action (i.e., thresholds) has an expected return value which is called value of that action. We denote the true value of action by q_{n}(H_{n})=\frac{1}{H_{n}+1}\bigg{(}\sum^{H}_{h_{n}=0}\bar{C}_{n}(h,0)+\mathcal{E}_{n}\bigg{)}. If the agent (i.e., SBS) knows values, then it can simply choose the action with the minimum expected cost. Thus, we need an algorithm that can learn values. A well-studied algorithm for learning those values is the -greedy algorithm [20] which starts by an arbitrary estimate, about the value of the actions and interacts with the environment to update its initial estimates, eventually converging to the true estimates. Two critical phases associated with -greedy algorithm is the exploitation and exploration stages. The agent utilizing the estimates greedily chooses an action, and thus, it exploits what it knows already. Meanwhile, if it chooses an action completely random regardless of the estimates we say that it explores. Exploitation is necessary to act upon the experience while exploration helps to improve the estimate values and it facilitates convergence to the true action values. The -greedy algorithm is presented in Algorithm 1.
V Numerical Results
In this section, we aim at evaluating the performance of the -greedy algorithm in finding the optimal thresholds that minimize the total cost of the system. Due to the separability, we consider only one content and we note that the learning processes for all the contents are the same. The popularity of the contents are modeled by a Zipf distribution with an exponent of . We assume that on average a given user becomes dissatisfied with a content of age with probability of . Users arrive at the system according to a Poisson distribution with rate users per time slot. The cost of re-directed users to MBS is and the backhaul cost is assumed to be . In Figure 2, we illustrate the performance of the -greedy algorithm for by adopting average regret as the metric. The regret of a learning algorithm is defined to be the difference between the cost achieved by the learner and the optimal cost. Here, we obtain the optimal cost by assuming that is known, and by numerically solving (16). Note that the estimates of the action values, , is initialized to be [math] for all . Note also that, the greedy algorithm seems to achieve a better performance even if it always exploits. This is not too surprising considering that the action-values are initialized opportunistically (i.e., the initial costs are believed to be zero by the agent). At the beginning, i.e., , the greedy algorithm believes that every action returns a value of zero. However, by trying each action it gets disappointed in that action and tries the rest. In other words the estimates are biased. Opportunistic initialization is a simple method to incentivize exploration. However, it can only happen once and at the beginning of time. This method quickly fails in non-stationary environments.
To show this, we also study the performance of the -greedy algorithm in a time varying environment. We assume that at the backhaul cost decreases to a value of . The results are depicted in Figure 3. We can see that the greedy algorithm cannot adapt to the non-stationary environment and it gets stuck in a sub-optimal threshold. Meanwhile, for and -greedy algorithm, it is able to adapt to the environment thanks to their exploration strategy. A large value of results in more exploration, and thus, we can see that -greedy algorithm has a faster decay in terms of the average regret compared to the -greedy algorithm. However, note that -greedy algorithm is expected to be at least away from the optimal cost. Thus, there is a trade-off between the rate of convergence and the value of convergence.
VI Conclusion
In this work, we developed a framework for a cost minimization problem in a dynamic content caching setting. Specifically, we aimed at striking a balance between the number of unsatisfied users whom are redirected to MBS and the cost of accessing the backhaul link by SBS in updating the dynamic contents. We formulated the cost minimization problem as an MDP and showed that the problem is separable with respect to the contests. Subsequently, we proved that a threshold policy in the age of the contents is optimal. In finding the optimal thresholds, we resorted to learning algorithms since users’ preferences are not known and vary with each content. To that extend, we represented the problem in MAB framework and through numerical results, we showed that it is possible to make the expected regret of the learning algorithm arbitrarily close to zero. The learning algorithm even shows adaptability to non-stationary settings. As a future work, we aim to investigate the system with non-linear cost functions. Under the non-linear cost functions, the problem is no longer separable and new solution methods needs to be investigated. As an extension of this work we will also analyze the heterogeneous cellular network architecture with energy harvesting SBSs.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] K. Poularakis, G. Iosifidis, and L. Tassiulas, “Approximation Algorithms for Mobile Data Caching in Small Cell Networks,” IEEE Transactions on Communications , vol. 62, no. 10, pp. 3665–3677, Oct 2014.
- 2[2] W. Jiang, G. Feng, and S. Qin, “Optimal Cooperative Content Caching and Delivery Policy for Heterogeneous Cellular Networks,” IEEE Transactions on Mobile Computing , vol. 16, no. 5, pp. 1382–1393, May 2017.
- 3[3] M. Dehghan, B. Jiang, A. Seetharam, T. He, T. Salonidis, J. Kurose, D. Towsley, and R. Sitaraman, “On the complexity of optimal request routing and content caching in heterogeneous cache networks,” IEEE/ACM Transactions on Networking , vol. 25, no. 3, pp. 1635–1648, June 2017.
- 4[4] K. Shanmugam, N. Golrezaei, A. G. Dimakis, A. F. Molisch, and G. Caire, “Femto Caching: Wireless Content Delivery Through Distributed Caching Helpers,” IEEE Transactions on Information Theory , vol. 59, no. 12, pp. 8402–8413, Dec 2013.
- 5[5] E. Ozfatura and D. Gündüz, “Mobility and popularity-aware coded small-cell caching,” IEEE Communications Letters , vol. 22, no. 2, pp. 288–291, Feb 2018.
- 6[6] X. Cheng, C. Dale, and J. Liu, “Statistics and social network of youtube videos,” in 2008 16th Interntional Workshop on Quality of Service , June 2008, pp. 229–238.
- 7[7] G. Ma, Z. Wang, M. Zhang, J. Ye, M. Chen, and W. Zhu, “Understanding performance of edge content caching for mobile video streaming,” IEEE Journal on Selected Areas in Communications , vol. 35, no. 5, pp. 1076–1089, May 2017.
- 8[8] P. Blasco and D. Gunduz, “Learning-based optimization of cache content in a small cell base station,” in 2014 IEEE International Conference on Communications (ICC) , June 2014, pp. 1897–1903.
