Joint Downlink Scheduling for File Placement and Delivery in   Cache-Assisted Wireless Networks with Finite File Lifetime

Bojie Lv; Lexiang Huang; Rui wang

arXiv:1902.09529·cs.IT·February 27, 2019

Joint Downlink Scheduling for File Placement and Delivery in Cache-Assisted Wireless Networks with Finite File Lifetime

Bojie Lv, Lexiang Huang, Rui wang

PDF

Open Access

TL;DR

This paper develops a low-complexity scheduling policy for cache-assisted wireless networks that optimizes downlink transmission, reducing resource use by combining reactive and proactive multicasting based on user request modeling.

Contribution

It introduces a novel dynamic programming approach with a low-complexity approximation and a learning algorithm for unknown user distributions, enhancing downlink scheduling efficiency.

Findings

01

Reactive multicast policy reduces base station resource consumption.

02

Proactive multicast further improves network performance.

03

Proposed methods outperform baseline strategies in simulations.

Abstract

In this paper, downlink transmission scheduling of popular files is optimized with the assistance of wireless cache nodes. Specifically, the requests of each file, which is further divided into a number of segments, are modeled as a Poisson point process within its finite lifetime. Two downlink transmission modes are considered: (1) the base station reactively multicasts the file segments to the requesting users and selected cache nodes; (2) the base station proactively multicasts some file segments to the selected cache nodes without requests. The cache nodes with decoded file segments can help to offload the traffic via other spectrum. Without the proactive multicast, we formulate the downlink transmission resource minimization as a dynamic programming problem with random stage number, which can be approximated via a finite-horizon Markov decision process (MDP) with fixed stage…

Equations167

Pr (\mbox R e q u es tN u mb er = n) = \frac{( λ _{f} T _{r e m} ) ^{n}}{n !} e^{- λ_{f} T_{r e m}},

Pr (\mbox R e q u es tN u mb er = n) = \frac{( λ _{f} T _{r e m} ) ^{n}}{n !} e^{- λ_{f} T_{r e m}},

\displaystyle\Pr(\mbox{Request Number in $N$ frames}=n)

\displaystyle\Pr(\mbox{Request Number in $N$ frames}=n)

R_{f,n,s}=N_{f,n,s}\mathbb{E}_{\mathbf{h}_{f,n,s},I_{f,n}}\bigg{[}\alpha\log_{2}\bigg{(}1+\frac{||\mathbf{h}_{f,n,s}||^{2}P_{f,n,s}}{N_{T}(\sigma_{z}^{2}+I_{f,n})}\bigg{)}\bigg{]},

R_{f,n,s}=N_{f,n,s}\mathbb{E}_{\mathbf{h}_{f,n,s},I_{f,n}}\bigg{[}\alpha\log_{2}\bigg{(}1+\frac{||\mathbf{h}_{f,n,s}||^{2}P_{f,n,s}}{N_{T}(\sigma_{z}^{2}+I_{f,n})}\bigg{)}\bigg{]},

R_{f,n,s}^{c}=N_{f,n,s}\mathbb{E}_{\mathbf{h}_{f,n,s}^{c},I_{c}}\bigg{[}\alpha\log_{2}\bigg{(}1+\frac{||\mathbf{h}_{f,n,s}^{c}||^{2}P_{f,n,s}}{N_{T}(\sigma_{z}^{2}+I_{c})}\bigg{)}\bigg{]},

R_{f,n,s}^{c}=N_{f,n,s}\mathbb{E}_{\mathbf{h}_{f,n,s}^{c},I_{c}}\bigg{[}\alpha\log_{2}\bigg{(}1+\frac{||\mathbf{h}_{f,n,s}^{c}||^{2}P_{f,n,s}}{N_{T}(\sigma_{z}^{2}+I_{c})}\bigg{)}\bigg{]},

R_{k}^{c} = N_{k} E_{h_{k}^{c}, I_{c}} [α lo g_{2} (1 + \frac{∣∣ h _{k}^{c} ∣ ∣ ^{2} P _{k}}{N _{T} ( σ _{z}^{2} + I _{c} )})],

R_{k}^{c} = N_{k} E_{h_{k}^{c}, I_{c}} [α lo g_{2} (1 + \frac{∣∣ h _{k}^{c} ∣ ∣ ^{2} P _{k}}{N _{T} ( σ _{z}^{2} + I _{c} )})],

J_{f, n} = ⎩ ⎨ ⎧ {s ∣ B_{f, s}^{c} = 0} ⋃ {s}, {1, 2, ..., N_{f}}, \mbox w h e n l_{f, n} \in C_{c} \mbox an d c \neq = 0 \mbox w h e n l_{f, n} \in C_{0},

J_{f, n} = ⎩ ⎨ ⎧ {s ∣ B_{f, s}^{c} = 0} ⋃ {s}, {1, 2, ..., N_{f}}, \mbox w h e n l_{f, n} \in C_{c} \mbox an d c \neq = 0 \mbox w h e n l_{f, n} \in C_{0},

R_{f, n, s} \geq R_{f}^{I}, \forall s \in J_{f, n} .

R_{f, n, s} \geq R_{f}^{I}, \forall s \in J_{f, n} .

R_{f, n, s}^{c} \geq R_{f}^{I}, \forall c \in c_{f, n, s}, s \in J_{f, n} .

R_{f, n, s}^{c} \geq R_{f}^{I}, \forall c \in c_{f, n, s}, s \in J_{f, n} .

P_{f, n, s} \leq P_{B}, \forall s \in J_{f, n}, n .

P_{f, n, s} \leq P_{B}, \forall s \in J_{f, n}, n .

g_{f, n, s} (P_{f, n, s}, N_{f, n, s}) = I (l_{f, n} \in / C_{f, n}^{s}) \times (P_{f, n, s} N_{f, n, s} + w N_{f, n, s}),

g_{f, n, s} (P_{f, n, s}, N_{f, n, s}) = I (l_{f, n} \in / C_{f, n}^{s}) \times (P_{f, n, s} N_{f, n, s} + w N_{f, n, s}),

\overline{g}_{f} ({Ω_{f, n} ∣\forall n})

\overline{g}_{f} ({Ω_{f, n} ∣\forall n})

\displaystyle=\sum_{N}\mathbb{E}_{\eta,\rho,\mathcal{T}}\left[\frac{(\lambda_{f}T_{f})^{N}}{N!}e^{-\lambda_{f}T_{f}}\sum_{n=1}^{N}\sum_{s=1}^{N_{f}}g_{f,n,s}(\Omega_{f,n})\bigg{|}S_{f,1}\right],

{Ω_{f, n} ∣\forall f, n} min

{Ω_{f, n} ∣\forall f, n} min

\overline{g}_{f}^{*} =

\overline{g}_{f}^{*} =

\displaystyle W(S_{f},T)=\min\limits_{\{\Omega_{f,k}|\forall k=n+1,...\}}\sum_{N}\mathbb{E}_{\eta,\rho,\mathcal{T}}\left[\frac{(\lambda_{f}T)^{N}}{N!}e^{-\lambda_{f}T}\sum_{n=1}^{N}\sum_{s=1}^{N_{f}}g_{f,n,s}(\Omega_{f,n})\bigg{|}S_{f}\right]

\displaystyle W(S_{f},T)=\min\limits_{\{\Omega_{f,k}|\forall k=n+1,...\}}\sum_{N}\mathbb{E}_{\eta,\rho,\mathcal{T}}\left[\frac{(\lambda_{f}T)^{N}}{N!}e^{-\lambda_{f}T}\sum_{n=1}^{N}\sum_{s=1}^{N_{f}}g_{f,n,s}(\Omega_{f,n})\bigg{|}S_{f}\right]

\displaystyle\Omega_{f,n}^{\dagger}(S_{f,n},T_{f,n}^{f})=\arg\min_{\Omega_{f,n}}\bigg{\{}\sum_{s}g_{f,n,s}(\Omega_{f,n})+\mathbb{E}_{S_{f,n+1}}[W(S_{f,n+1},T_{f,n}^{f})|S_{f,n}]\bigg{\}}

\displaystyle\Omega_{f,n}^{\dagger}(S_{f,n},T_{f,n}^{f})=\arg\min_{\Omega_{f,n}}\bigg{\{}\sum_{s}g_{f,n,s}(\Omega_{f,n})+\mathbb{E}_{S_{f,n+1}}[W(S_{f,n+1},T_{f,n}^{f})|S_{f,n}]\bigg{\}}

\displaystyle V_{N_{R}-n+1}(S_{f,n})=\min_{\Omega_{f,n}(S_{f,n})}\bigg{\{}\sum_{s}g_{f,n,s}(\Omega_{f,n})+\sum\limits_{S_{f,n+1}}{V_{N_{R}-n}(S_{f,n+1})\Pr(S_{f,n+1}|S_{f,n},\Omega_{f,n})}\bigg{\}},\forall S_{f,n},

\displaystyle V_{N_{R}-n+1}(S_{f,n})=\min_{\Omega_{f,n}(S_{f,n})}\bigg{\{}\sum_{s}g_{f,n,s}(\Omega_{f,n})+\sum\limits_{S_{f,n+1}}{V_{N_{R}-n}(S_{f,n+1})\Pr(S_{f,n+1}|S_{f,n},\Omega_{f,n})}\bigg{\}},\forall S_{f,n},

\displaystyle\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})=\min_{\Omega_{f,n}(\widetilde{S}_{f,n})}\mathbb{E}_{\eta,\rho}\bigg{\{}\sum_{s}\!g_{f,n,s}(\Omega_{f,n})+\sum\limits_{\widetilde{S}_{f,n+1}}{\widetilde{V}_{N_{R}-n}(\widetilde{S}_{f,n+1})\Pr(\widetilde{S}_{f,n+1}|{S}_{f,n},\Omega_{f,n})}\bigg{\}},\forall\widetilde{S}_{f,n},

\displaystyle\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})=\min_{\Omega_{f,n}(\widetilde{S}_{f,n})}\mathbb{E}_{\eta,\rho}\bigg{\{}\sum_{s}\!g_{f,n,s}(\Omega_{f,n})+\sum\limits_{\widetilde{S}_{f,n+1}}{\widetilde{V}_{N_{R}-n}(\widetilde{S}_{f,n+1})\Pr(\widetilde{S}_{f,n+1}|{S}_{f,n},\Omega_{f,n})}\bigg{\}},\forall\widetilde{S}_{f,n},

Ω_{f, n}^{*} (S_{f, n}, T_{f, n}^{f}) = ar g Ω_{f, n} min s \sum g_{f, n, s} (Ω_{f, n}) + N, S_{f, n} \sum \frac{( λ _{f} T _{f, n}^{f} ) ^{N}}{N !} e^{- λ_{f} T_{f, n}^{f}} V_{N} (S_{f, n + 1}) Pr (S_{f, n + 1} ∣ S_{f, n}, Ω_{f, n}), \forall S_{f, n}, T_{f, n}^{f},

Ω_{f, n}^{*} (S_{f, n}, T_{f, n}^{f}) = ar g Ω_{f, n} min s \sum g_{f, n, s} (Ω_{f, n}) + N, S_{f, n} \sum \frac{( λ _{f} T _{f, n}^{f} ) ^{N}}{N !} e^{- λ_{f} T_{f, n}^{f}} V_{N} (S_{f, n + 1}) Pr (S_{f, n + 1} ∣ S_{f, n}, Ω_{f, n}), \forall S_{f, n}, T_{f, n}^{f},

Ω_{f, n} (S_{f, n}, T_{f, n}^{f}) = {(P_{f, n, s}, N_{f, n, s}, c_{f, n, s}) ∣\forall s \in J_{f, n}} .

Ω_{f, n} (S_{f, n}, T_{f, n}^{f}) = {(P_{f, n, s}, N_{f, n, s}, c_{f, n, s}) ∣\forall s \in J_{f, n}} .

{Ω_{f, n} ∣\forall n} min

{Ω_{f, n} ∣\forall n} min

Pr (S_{f, n + 1} ∣ S_{f, n}, Ω_{f, n})

Pr (S_{f, n + 1} ∣ S_{f, n}, Ω_{f, n})

=

\displaystyle\times I\bigg{[}\{\mathcal{B}_{f,s}^{c}(n+1)|\forall s,c\}\bigg{]},

E_{S_{f, n + 1}} [W (S_{f, n + 1}, T_{f, n}^{f}) ∣ S_{f, n}, Ω_{f, n}]

E_{S_{f, n + 1}} [W (S_{f, n + 1}, T_{f, n}^{f}) ∣ S_{f, n}, Ω_{f, n}]

\geq

\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})\approx\underbrace{\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})+\sum_{\{(i,s)|\forall\mathcal{B}^{i}_{f,s}(\widetilde{S}_{f,n})=0\}}\bigg{(}\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})-\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})\bigg{)}}_{\mbox{denote as }\widehat{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})},

\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})\approx\underbrace{\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})+\sum_{\{(i,s)|\forall\mathcal{B}^{i}_{f,s}(\widetilde{S}_{f,n})=0\}}\bigg{(}\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})-\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})\bigg{)}}_{\mbox{denote as }\widehat{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})},

V_{N_{R} - n + 1} (T)

V_{N_{R} - n + 1} (T)

V_{N_{R} - n + 1} (S_{f}^{i, s}) =

V_{N_{R} - n + 1} (S_{f}^{i, s}) =

+ E_{η, ρ} [min {\overline{G}_{n}^{2} (S_{f}^{i, s}), \overline{G}_{n}^{3} (S_{f}^{i, s})} ∣ R_{f, n, s} > R_{f, n, s}^{i}] Pr (R_{f, n, s} > R_{f, n, s}^{i}) .

\displaystyle\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{k,n})\approx\frac{N_{k}R_{k}^{I}}{N_{f}R_{f}^{I}}\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})+\sum_{\forall i}\frac{R_{k}^{I}}{R_{f}^{I}}\bigg{(}\!\sum_{\forall s}I[\mathcal{B}^{i}_{k,s}(\widetilde{S}_{k,n})=0]\!\bigg{)}\times\bigg{(}\!\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,1})-\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})\!\bigg{)}.

\displaystyle\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{k,n})\approx\frac{N_{k}R_{k}^{I}}{N_{f}R_{f}^{I}}\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})+\sum_{\forall i}\frac{R_{k}^{I}}{R_{f}^{I}}\bigg{(}\!\sum_{\forall s}I[\mathcal{B}^{i}_{k,s}(\widetilde{S}_{k,n})=0]\!\bigg{)}\times\bigg{(}\!\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,1})-\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})\!\bigg{)}.

V_{N_{R} - n + 1} (S_{f}^{*}) =

V_{N_{R} - n + 1} (S_{f}^{*}) =

\displaystyle\!\!\!\!\times\mathbb{E}_{\rho,\eta}\!\bigg{[}\!\!\sum_{s}\!\!\!\min\limits_{P_{f,n,s}\atop N_{f,n,s}}\!\!\!P_{f,n,s}N_{f,n,s}\!\!+\!wN_{f,n,s}\!\bigg{|}\mathbf{l}_{f,n}\!\!\notin\!\!\mathcal{C}_{f,n}^{s}\!\!\bigg{]}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCaching and Content Delivery · Opportunistic and Delay-Tolerant Networks · Cooperative Communication and Network Coding

Full text

Joint Downlink Scheduling for File Placement and Delivery in Cache-Assisted Wireless Networks with Finite File Lifetime

Bojie Lv

Lexiang Huang

and Rui Wang

Manuscript received June 24, 2018; revised December 3, 2018 and February 16, 2019; accepted February 18, 2019. This work was supported in part by National Natural Science Foundation of China under grant 61771232, Natural Science Foundation of Guangdong Province of China under grant 2017A030313335 and the Shenzhen Science and Technology Innovation Committee under Grant JCYJ20160331115457945. The associate editor coordinating the review of this paper and approving it for publication was Lawrence Ong. (Corresponding author: Rui Wang.) Bojie Lv and and Rui Wang are with Department of Electrical and Electronic Engineering, The Southern University of Scienece and Technology, China, and also with the PCL Research Center of Networks and Communications, Peng Cheng Laboratory, China, Email: {[email protected], [email protected]}. Lexiang Huang is with Department of Electrical and Electronic Engineering, The Southern University of Scienece and Technology, China, Email: {[email protected]}. Part of this work has been accepted in IEEE ICC 2018 [1]. We have extended the conference paper by including the learning-based algorithm in Section IV-C, proactive scheduling algorithm in Section V, and more illustrative simulation results.

Abstract

In this paper, downlink transmission scheduling of popular files is optimized with the assistance of wireless cache nodes. Specifically, the requests of each file, which is further divided into a number of segments, are modeled as a Poisson point process within its finite lifetime. Two downlink transmission modes are considered: (1) the base station reactively multicasts the file segments to the requesting users and selected cache nodes; (2) the base station proactively multicasts some file segments to the selected cache nodes without requests. The cache nodes with decoded file segments can help to offload the traffic via other spectrum. Without the proactive multicast, we formulate the downlink transmission resource minimization as a dynamic programming problem with random stage number, which can be approximated via a finite-horizon Markov decision process (MDP) with fixed stage number. To address the prohibitively huge state space, we propose a low-complexity scheduling policy by linearly approximating the value functions of the MDP, where the bound on the approximation error is derived. Moreover, we propose a learning-based algorithm to evaluate the approximated value functions for unknown geographical distribution of requesting users. Finally, given the above reactive multicast policy, a proactive multicast policy is introduced to exploit the temporal diversity of shadowing effect. It is shown by simulation that the proposed low-complexity reactive multicast policy can significantly reduce the resource consumption at the base station, and the proactive multicast policy can further improve the performance.

I introduction

Caching is a promising technology to improve the transmission efficiency of wireless networks by exploiting the multiple transmissions of popular files [2, 3]. In this paper, we consider a flexible deployment scenario where there is no wired connection or dedicated spectrum between the base station (BS) and cache nodes. Thus the cache nodes receive popular files via downlink multicast, sharing the same transmission resources as ordinary users. Moreover, the timeliness of popular files, as mentioned in [4], is also considered in transmission design.

I-A Related Works

There have been a number of works on the optimization of file placement with limited cache size. For example, it is shown in[5, 6], that cache nodes should cache the most popular files if each user has only access to exactly one cache node. The papers [7, 8] showed that caching files randomly with optimized probabilities is better than storing the most popular files when each user can be served by multiple cache nodes. In [9], the authors proposed a mobility-aware file placement policy to improve data offloading rate. Moreover, there are also some works on the coded caching scheme design to exploit the multicast transmissions [10, 11]. With cached files, the authors in [12] formulated the joint minimization of the average delay and power consumption at the BS as a stochastic optimization problem, and the fetching costs are added into the cost function in [13]. The authors in [4] designed a dynamic file placement algorithm via timely estimation of file content popularity. In most of the above works, the cost of file placement at the cache nodes is not taken into consideration, as it is assumed to be completed before the phase of file delivery to the users. For some types of popular files, however, there may not be sufficient time for file placement before users’ accesses. For instance, a great number of news clips are posted to the websites in the daytime, and there is no off-peak hours for caching at the cache nodes (file placement). Hence, it is also interesting to consider the joint scheduling of file placement and delivery.

There are also some works on the joint scheduling design of caching and downlink file transmission. For example, a file placement and delivery framework for heterogeneous OFDM networks was investigated in [14], where the small BSs can cache the popular files and the overall throughput was maximized in each frame via a joint scheduling algorithm. In [15], an optimal caching and user association policy was proposed to minimize the latency in a cached-enabled heterogeneous network with wireless backhaul. In the above works, the files are delivered to small BSs via dedicated backhaul links, i.e., there is no resource competition with the downlink transmission. Moreover, with the update of cache status, the relation between the scheduling in different slots should also be exploited, which is not considered in the above works. If there is no dedicated link or period for file placement at the cache nodes, the file placement and delivery can be simultaneously scheduled in a multicast manner [16]. This raises an trade-off between the transmission resource consumption and the file placement. For example, if more resource is spent in downlink multicast, files will be cached in more devices, which may save the downlink resource in future transmissions. As a result, a joint optimization of file placement and delivery with shared transmission resource and the consideration of total transmission resource consumption becomes necessary, and the method of dynamic programming can be utilized.

In fact, dynamic programming via Markov decision process (MDP) has been considered in delay-aware resource allocation of wireless systems[16, 17, 18, 19, 20]. For example, the infinite-horizon MDP was used to optimize the cellular uplink [17, 19] and downlink transmissions [16], and relay networks[20], where the average transmission delay is either minimized or constrained. Moreover, low-complexity algorithm design is usually considered in the above works to avoid the curse of dimensionality [21]. However, the popular files to be buffered at the cache nodes usually have a finite lifetime, and the infinite-horizon MDP may not be suitable in modeling anymore. Nevertheless, the MDP with finite stages is usually more complicated [22]. This is because the optimal policy depends not only on the system state but also on the stage index. To our best knowledge, it is still an open issue on the low-complexity algorithm design and analysis with finite-horizon MDP.

I-B Our Contributions

In this paper, we consider the scheduling of downlink file transmission with the assistance of wireless cache nodes. Specifically, the requests of each file is modeled as a Poisson point process (PPP) within its lifetime, and two downlink transmission modes are considered: (1) the BS reactively multicasts file segments to the requesting users and selected cache nodes; (2) the BS proactively multicasts some file segments to the selected cache nodes without requests from users. With the decoded file segments, cache nodes can offload the traffic from the BS and serve the users within their coverage area via different spectrum from the downlink (e.g., Wi-Fi) as [23, 24, 25]. The main contributions of this work are summarized below:

•

With reactive multicast only, we formulate the minimization of a weighted sum of multicast transmission energy and symbol number for one file within its lifetime as a dynamic programming problem with random stage number. The problem does not follow the standard forms of MDP, and it is difficult to find the optimal solution. We first propose to approximate and bound it via a finite-horizon MDP with fixed stage number. Then, we propose a novel linear approximation method on the value functions of the MDP so that the exponential complexity (i.e., curse of dimensionality) can be reduced to linear. With the knowledge of spatial distribution of requesting users, the approximated value functions can be calculated via analytical expressions; whereas, a learning algorithm is also introduced to evaluate the approximated value functions if the distribution of requesting users is unknown.

•

The approximation error of the finite-horizon MDP is usually difficult to analyze, we shall shed some light on this open issue in our problem. Specifically, we first derive an tight upper bound on the gap between the true value functions and the approximated ones. Then we further derive an analytical lower bound on the optimal (minimum) average transmission cost at the BS.

•

Given the above scheduling policy of reactive multicast, a per-stage optimization approach for proactive multicast is proposed to further suppress the average transmission cost at the BS.

It is shown by simulation that, compared with the baseline schemes, the proposed low-complexity algorithm based on approximated value functions can significantly reduce the resource consumption at the BS, and the proactive multicasting policy can further improve the performance.

II System Model

In this section, we introduce the network model for the downlink file transmission with the assistance of wireless cache nodes, and the physical-layer model for the file placement (i.e., transmit files to the cache nodes) and delivery (i.e., transmit files to the requesting users).

II-A Network Model

As illustrated in Fig. 1, we consider the downlink file transmission in a cell with one BS and $N_{C}$ single-antenna cache nodes. There are $N_{T}$ antennas at the BS. Let $\mathcal{C}\subset\mathbb{R}^{2}$ be the service area of the cell, $\mathcal{C}_{c}$ ( $\forall c=1,2,...,N_{C}$ ) be the service region of the $c$ -th cache node and $\mathcal{C}_{0}\triangleq\mathcal{C}-\mathcal{C}^{*}$ be the region not served by any cache node, where $\mathcal{C}^{*}\triangleq\mathcal{C}_{1}\cup\mathcal{C}_{2}\cup...\cup\mathcal{C}_{N_{C}}$ . In this paper, we consider cache node deployment without overlapping, thus assuming $\mathcal{C}_{i}\cap\mathcal{C}_{j}=\emptyset$ for $\forall i\neq j$ . A database with popular files is accessible to the BS. In order to capture the temporal dynamics of files’ popularity, it is assumed that the $f$ -th file ( $f=1,2,3,...$ ) is accessible at the database since the time instance $t_{f}$ and remains popular within a finite lifetime $T_{f}$ . We consider the delivery of the files for the requests raised within their lifetimes (e.g. the lifetime for the $f$ -th file is $[t_{f},t_{f}+T_{f}]$ ). The files are not considered for caching after their lifetime, as their popularity will drop down. It is assumed that the $f$ -th ( $f=1,2,3,...$ ) file consists of $N_{f}$ segments. Each of them, with $R_{f}^{I}$ information bits equally, is encoded separately. Within the lifetime of each file, the locations of the requesting users are independent and identically distributed (i.i.d.) in the cell according to certain spatial distribution with probability density function $\mathcal{F}:\mathcal{A}\rightarrow[0,1]$ , $\forall\mathcal{A}\subset\mathcal{C}$ . It is assumed that the requesting users’ locations are quasi-static during the file transmission.

The requests on the $f$ -th file ( $\forall f$ ) within its lifetime are modeled as a one-dimensional Poisson point process (PPP) with intensity $\lambda_{f}$ . Hence the probability mass function (PMF) of the remaining request number from the time instance $t\in[t_{f},t_{f}+T_{f}]$ is given by

[TABLE]

where $T_{rem}=t_{f}+T_{f}-t$ .

Remark 1 (PPP File Request Model).

The PPP was widely used to model the random phone call arrivals at an exchange. Alternatively, in most of the existing literature [26, 6, 15], the popularity of files is characterized by the probability of access. The equivalence between the two models are elaborated below. Suppose that there is one file request in each frame with probability $\beta\in[0,1]$ . Given one file request arrival, the $f$ -th file is requested with probability $p_{f}$ . $\{p_{f}|\forall f\}$ can follow the Zipf distribution with $\sum_{f}p_{f}=1$ . Then the probability mass function (PMF) of the request number of $f$ -th file within $N$ frames is given by

[TABLE]

Note that $N\beta p_{f}$ is analogy to $\lambda_{f}T_{rem}$ in (1), the Poisson arrival model in (1) is actually consistent with the file request model in the existing literature. Moreover, the condition of sufficient large $N$ is satisfied as the lifetime is significantly larger than the frame duration.

The files may not be cached at the cache nodes before their lifetime. The BS can either proactively multicast some file segments to some cache nodes without any requests, or reactively multicast the segments of one file to the requesting user and some cache nodes if one request is received. In the remaining of this paper, we shall refer to the proactive transmission from the BS to the cache nodes without requests as proactive multicast, and the reactive transmission from the BS to the requesting user and the cache nodes as reactive multicast. The proactive multicast is for the file placement, and the reactive multicast should jointly consider both file placement and delivery. In the file delivery, the segments of the requested file will be delivered from the nearby cache node to the requesting user if they have been cached before, and the remaining segments will be multicasted from the BS. We shall refer to the transmission from the service cache node to the requesting user as device-to-device (D2D) file delivery. It is assumed that the D2D links can use Wi-Fi, bluetooth, or other air interfaces, which are not in the same spectrum as the downlink transmission [23, 25]. In this paper, we shall minimize the total transmission resource consumption at the BS, including the transmission energy and the number of transmission symbols, by offloading traffics to the cache nodes.

Remark 2 (Multi-Transmission Scheduling).

We consider the cached-enabled downlink file transmission where both file placement and delivery should be joint scheduled. For example, if more transmission symbols and power are scheduled in the reactive multicast of one certain file, more cached nodes are able to decode it, which may suppress the downlink resource consumption in the following requests of this file. Note that the requests arrive at random locations and time instantaneous, it is a stochastic optimization problem, and it is difficult to determine the parameters for all transmissions (including the number of transmission symbols and power) at the very beginning of each file’s lifetime. This issue will be addressed via the method of MDP in this paper.

II-B Downlink Physical Layer Model

In either proactive or reactive multicast, the space-time block code (STBC) with full diversity is used at the BS for two reasons: (1) there is no requirement on the channel state information at the transmitter (CSIT); (2) diversity gain can be achieved at all the receivers. In the reactive multicast, we refer to the user, which raises the $n$ -th request on the $f$ -th file, as the $(f,n)$ -th user, and refer to the $s$ -th segment of the $f$ -th file as the $(f,s)$ -th segment. Since the transmission time of one file segment is much larger than the channel coherent time of small-scale fading, it is assumed that the ergodic channel capacity span all possible small-scale channel fading and inter-cell interference can be achieved during one segment transmission. Let $\rho_{f,n}$ and $\rho_{c}$ be the pathloss from the BS to the $(f,n)$ -th user and the $c$ -th cache node respectively, $\eta_{f,n,s}$ and $\eta_{f,n,s}^{c}$ be the corresponding shadowing attenuation in $n$ -th transmission of the $(f,s)$ -th segment, $P_{f,n,s}$ and $N_{f,n,s}$ be the downlink transmission power and the number of transmission symbols of the $s$ -th file segment in response to the request of the $(f,n)$ -th user. Following the capacity of full-diversity STBC in [27], the throughput achieved by the $(f,n)$ -th downlink user in the reactive multicast of the $s$ -th segment is given by

[TABLE]

where $\alpha$ is the transmission rate of the adopted full-diversity STBC 111For example, $\alpha=1$ and $N_{T}=2$ for Alamouti code. Moreover, $\alpha$ is usually less than $1$ for $N_{T}>2$ ., $\sigma_{z}^{2}$ is the power of noise, $I_{f,n}$ is the interference power from the neighbouring BSs222The exact value of $I_{f,n}$ depends on the scheduling strategies of the neighbouring co-channel BSs, which leads to complicated multi-cell joint scheduling. In order to decouple the scheduling among multiple cells, $I_{f,n}$ can be estimated by assuming all the interfering BSs are transmitting with peak power. Note that the BSs usually use the peak power to broadcast the control information at the head of each frame. One simple approach to measure $I_{f,n}$ is to schedule a few quiet symbols in the frame head, where the service BS does not transmit any signal and the inter-cell interference at the frame head can be measured at the receivers., $\mathbf{h}_{f,n,s}$ is the i.i.d. channel vector from the BS to the requesting user. Each element of $\mathbf{h}_{f,n,s}$ is complex Gaussian distributed with zero mean and variance $\rho_{f,n}\eta_{f,n,s}$ . As a remark note that the transmission of one segment may consume a large number of frames, and the channel vector $\mathbf{h}_{f,n,s}$ can be different from frame to frame. Since we consider the ergodic channel capacity, the randomness in small-scale fading is averaged (similar to [28, 29, 30]) . As a result, the $(f,n)$ -th user can decode the $s$ -th segment only when $R_{f,n,s}\geq R_{f}^{I}$ . Simultaneously in the reactive multicast, the throughput from the BS to the $c$ -th cache node is given by

[TABLE]

where $\mathbf{h}_{f,n,s}^{c}$ is the i.i.d. channel vector from the BS to $c$ -th cache node, $I_{c}$ is the interference power from the neighbouring BSs333 $I_{c}$ can be estimated in a similar way to $I_{f,n}$ .. Each element of $\mathbf{h}_{f,n,s}^{c}$ is complex Gaussian distributed with zero mean and variance $\rho_{c}\eta_{f,n,s}^{c}$ . The $c$ -th cache node can decode the $(f,s)$ -th segment only when $R_{f,n,s}^{c}\geq R_{f}^{I}$ . The throughputs of (5) and (6) depend on the pathloss and shadowing of the corresponding links. Hence in the reactive multicast of the $s$ -th segment, the requesting users and cache nodes can decode the segment after receiving different numbers of multicast symbols. By adjusting $P_{f,n,s}$ and $N_{f,n,s}$ in physical layer, the BS can control the set of receiving cache nodes. In the next section, we shall formulate the optimization of $P_{f,n,s}$ and $N_{f,n,s}$ ( $\forall f,s$ ) as an MDP with reactive multicast policy.

If periodic proactive multicast is allowed, let $\eta^{c}_{k}$ be the shadowing attenuation from the BS to the $c$ -th cache node in the $k$ -th proactive multicast, $P_{k}$ and $N_{k}$ be the corresponding downlink transmission power and the number of transmission symbols. The throughput achieved by the $c$ -th cache node is given by

[TABLE]

where $\mathbf{h}_{k}^{c}$ is the i.i.d. channel vector from the BS to $c$ -th cache node. Each element of $\mathbf{h}_{k}^{c}$ is complex Gaussian distributed with zero mean and variance $\rho_{c}\eta_{k}^{c}$ . The selected file segment can be decoded at the $c$ -th cache node when $R_{k}^{c}\geq R_{f}^{I}$ . By adjusting $P_{k}$ and $N_{k}$ in physical layer, the BS can control the set of receiving cache nodes in proactive multicast. In Section V, the allocation of $P_{k}$ and $N_{k}$ will be considered in proactive multicast policy.

In both reactive and proactive multicasts, it is assumed that the downlink shadowing effect is quasi-static during the transmission period of one file segment, and different segment transmissions may experience different shadowing attenuations. This model could characterize the large file transmission. For example, the playback time of videos may be several minutes, which is larger than the coherent time of shadowing effect.

In this paper, we shall address the joint scheduling of proactive and reactive multicasts by two steps. In the following Section III and IV, we first consider low-complexity sub-optimal scheduling designs for reactive multicast. Based on the established scheduling framework, the joint consideration of both proactive and reactive multicasts is addressed in Section V.

Remark 3 (Segment-Level Scheduling).

There is a two-time-scale scheduling structure in our problem. Take the reactive multicast as an example. $N_{f,n,s}$ multicast symbols for the $(f,s)$ -th file segment should be scheduled in a large number of frames, say from the $k$ -th frame to the $(k+m-1)$ -th frame. Let $M_{i}$ be the number of scheduled symbols in the $i$ -th frame ( $k\leq i\leq k+m-1$ ). $N_{f,n,s}$ and $\{M_{i}|k\leq i\leq k+m-1\}$ can be referred to as the segment-level and frame-level parameters respectively. Their relation is $\sum_{i=k}^{k+m-1}M_{i}=N_{f,n,s}$ .

Due to the fixed frame size, the scheduling of $\{M_{i}|k\leq i\leq k+m-1\}$ should jointly consider all the active unicasts, multicasts and broadcasts of the BS, i.e., they should be constrained with other transmissions. On the other hand, we assume there is no buffer overflow at the BS and $N_{f,n,s}$ multicast symbols will always be transmitted, i.e., there is no constraint on $N_{f,n,s}$ or on the maximum number of frames to finish the segment multicast. In this paper, we focus on the optimization in the segment level, and the scheduling in the frame level is outside the scope of this paper. However, it should be mentioned that given $N_{f,n,s}$ in segment level, the scheduling in frame level can affect the transmission delay of the file segment. For example, if $M_{i}$ is small, larger $m$ is required to finish the multicast.

III Finite-Horizon MDP Formulation for Reactive Multicast

In this section, the scheduling design for reactive multicast is first formulated as an dynamic programming problem. However, the optimal solution is difficult to obtain due to the random stage number and continuous state space. Hence, a solvable finite-horizon MDP with a fixed number of stages is introduced to approximate the dynamic programming problem.

III-A Dynamic Programming Problem Formulation

Without proactive multicast, the system state and the scheduling policy are defined as follows.

Definition 1 (System State).

When receiving the $n$ -th request on the $f$ -th file, the system status is uniquely specified by the following set of parameters $U_{f,n}\triangleq\left\{\mathcal{B}_{k,s}^{c},\rho_{f,n},\eta_{f,n,s},\eta_{f,n,s}^{c}|\forall k,c=1,...,N_{C};s=1,...,N_{f}\right\}=S_{f,n}\cup\left\{\mathcal{B}_{k,s}^{c}|\forall k\neq f,c,s\right\},$ where the $S_{f,n}=\left\{\mathcal{B}_{f,s}^{c},\rho_{f,n},\eta_{f,n,s},\eta_{f,n,s}^{c}|\forall c,s\right\}$ , $\mathcal{B}_{k,s}^{c}=1$ means that the $(k,s)$ -th segment has been successfully decoded and stored at the $c$ -th cache node and $\mathcal{B}_{k,s}^{c}=0$ means otherwise. $U_{f,n}$ and $S_{f,n}$ are referred to as the global and per-file system state of the $(f,n)$ -th reactive multicast, respectively.

Definition 2 (Reactive Multicast Policy).

Let $J_{f,n}$ be the set of segments which should be transmitted to the $(f,n)$ -th user via downlink, i.e.,

[TABLE]

where $\mathbf{l}_{f,n}$ is the location of the $(f,n)$ -th user. Let $T_{f,n}^{k}$ be the remaining lifetime of the $k$ -th file when receiving the $n$ -th file request on the $f$ -th file. The scheduling policy $\Omega_{f,n}$ ( $\forall f,n$ ) is a mapping from system state $U_{f,n}$ and the remaining lifetimes of all the files $\{T_{f,n}^{k}|\forall k\}$ to the transmission parameters $(P_{f,n,s},N_{f,n,s})$ ( $\forall s\in J_{f,n}$ ) and the set of receiving cache nodes for multicast $\mathbf{c}_{f,n,s}$ ( $\forall s\in J_{f,n}$ ), i.e. $\Omega_{f,n}(U_{f,n},\{T_{f,n}^{k}|\forall k\})=\{(P_{f,n,s},N_{f,n,s}),\mathbf{c}_{f,n,s}|\forall s\in J_{f,n}\}.$ Meanwhile, the following constraints should be satisfied.

•

Successful decoding of each file segment at the requesting user:

[TABLE]

•

Successful decoding of each file segment at the selected cache nodes:

[TABLE]

•

Peak power constraint:

[TABLE]

where $P_{B}$ is a instantaneous power constraint at the BS.

As mentioned in Remark 3, we shall minimize the total transmission resource consumption on the popular files at the BS by offloading traffics to the cache nodes, so that more transmission resource can be spared for other downlink data or uplink transmission. Let $\mathcal{C}_{f,n}^{s}=\cup_{\{\forall i|\mathcal{B}_{f,s}^{i}=1\}}\mathcal{C}_{i}$ be the area where the requesting users are able to receive the $(f,s)$ -th file segment from cache nodes, we use the following cost function to measure the weighted sum of the transmission energy ( $P_{f,n,s}N_{f,n,s}$ ) and the number of transmission symbols ( $N_{f,n,s}$ ) of the BS spent on the $s$ -th segment for the $(f,n)$ -th user.

[TABLE]

where $w$ is the weight on the number of transmission symbols, and $I(\cdot)$ is the indicator function.

Remark 4 (Trade-off between transmission time and energy).

If the minimization objective is the total number of transmission symbols spent on one file, the BS will always use the peak power, which might not be energy-efficient. When the traffic load of the BS is not heavy, saving energy is an important design criterion of resource allocation. On the other hand, if the minimization objective is the total transmission energy spent in one file, it is possible that the BS will use a small power in downlink multicast, which may occupy a large amount of transmission symbols. Thus it is not suitable for heavy traffic load. As a result, we choose a linear combination of both metrics, where the weight on the number of transmission symbols ( $w$ ) can be chosen according to the traffic load.

Hence the average cost spent on the overall lifetime of the $f$ -th file is given by

[TABLE]

where the expectation is taken over all possible large-scale channel fading (including the shadowing effect $\eta$ and requesting user’s pathloss $\rho$ ) and the remaining lifetimes after each file request $\mathcal{T}=\{T_{f,n}^{f}|\forall n\}$ . The summation on $N$ is to take expectation on the random number of requests as elaborated in (1). Hence the overall system cost function on popular files is $\overline{G}(\{\Omega_{f,n}|\forall f,n\})=\sum_{f=1}^{F}\overline{g}_{f}\left(\{\Omega_{f,n}|\forall n\}\right),$ where $F$ is the total number of popular files considered in the optimization. The system optimization problem can be written as

Problem 1 (Overall System Optimization).

[TABLE]

In this paper, we consider the delivery of popular files, and there is sufficient cache space in each cache node to save the popular files in their lifetime. In fact, since all the cached files are received from downlink and all the files have finite lifetimes, the cache size may not be the critical bottleneck of the cache-enabled network considered in this paper. For example, suppose that one BS is transmitting popular files with overall data rate of $100$ Mbps, and the lifetime of each file is $24$ hours. Then the maximum required cache capacity in one cache node is around $1$ Tera bytes, which is mild. Therefore, the cache size limitation is ignored in this paper. Moreover, as mentioned in Remark 3, there is no constrain on $\{N_{f,n,s}|\forall f,n,s\}$ (no transmission buffer overflow at the BS). Then the above Problem 1 can be further decoupled into the following per-file sub-problems.

Problem 2 (Optimization on the $f$ -th File).

[TABLE]

Hence, the scheduling policy for the $f$ -th file $\{\Omega_{f,n}|\forall n\}$ depends only on the per-file system state $S_{f,n}$ and the its remaining lifetime $T_{f,n}^{f}$ , i.e.

[TABLE]

In order to solve the Problem 2, we first define the cost-to-go function $W(S_{f},T)$ as the minimum average cost on the $f$ -th file, given the initial per-file system state $S_{f}$ and remaining lifetime $T$ , as (14). Hence, suppose that the per-file system state and the remaining lifetime for the $n$ -th request of the $f$ -th file are $S_{f,n}$ and $T_{f,n}^{f}$ respectively, the optimal reactive multicast policy for this file transmission $\Omega_{f,n}^{\dagger}(S_{f,n},T_{f,n}^{f})$ is given by minimizing the summation of current transmission cost and the minimum average future cost, which is given by (15), where the constraints in (9-11) should be satisfied. If $(S_{f,n},T_{f,n}^{f})$ is treated as the system state, its evolution is Markovian. Notice that the number of requests is random and $T_{f,n}^{f}$ is continuous, it is difficult to find the cost-to-go function $W$ accurately and solve the above optimization problem via the standard solution of MDP. In the following section, we shall propose an approximation approach via an MDP with a fixed number of stages.

III-B Approximation of Cost-to-go Function

In order to solve Problem 2, we first introduce the following intermediate MDP problem with fixed $N_{R}$ requests (stages) on the $f$ -th file, which is similar to Problem 2 except for the stage number.

Problem 3 (Optimization with a Fixed Request Number).

[TABLE]

where $N_{R}$ is the number of requests on the $f$ -th file.

The optimal solution of Problem 3 can be deduced via the Bellman’s equations in (16), where $V_{N_{R}-n+1}(S_{f,n})$ is the value function of the $n$ -th stage, and $S_{f,n+1}$ denotes the next state of the $f$ -th file given the current state $S_{f,n}$ , the constraints in (9-11) shall be satisfied in minimizing the right-hand-side of the above equation. The state transition probability can be written as

[TABLE]

where $\mathcal{B}_{f,s}^{c}(n+1)$ is the cache status after taking the action $\Omega_{f,n}(S_{f,n})$ , $I$ is the indicator function. In fact, $V_{N_{R}-n+1}(S_{f,n})$ measures the average remaining cost of the $f$ -th file from the $n$ -th transmission to the $N_{R}$ -th transmission, given the system state of the $n$ -th stage $S_{f,n}$ . Since the large-scale fading is i.i.d. in each file transmission, the expectation on large-scale fading can be taken on both side of the above equation. Hence we have the following conclusion.

Lemma 1 (Bellman’s Equation with Reduced Space).

The optimal control policy of Problem 3 is the solution of the Bellman’s equation with reduced state space as (17), where $\widetilde{S}_{f,n}=\{\mathcal{B}_{f,s}^{c}\in S_{f,n}|\forall c,s\}$ denotes the cache state of the $f$ -th file, $\widetilde{V}_{N_{R}-n}(\widetilde{S}_{f,n+1})=\mathbb{E}_{\eta,\rho}[V_{N_{R}-n}(S_{f,n+1})]$ , and $\Omega_{f,n}(\widetilde{S}_{f,n})=\{\Omega_{f,n}(S_{f,n})|\forall\rho_{f,n},\eta_{f,n,s},\eta_{f,n,s}^{c}\}$ .

Proof.

Please refer to Appendix A. ∎

The standard value iteration can be used to solve the Bellman’s equations (17), and obtain the value functions $\widetilde{V}_{N_{R}-n+1}$ ( $\forall n$ ) as in [31]. In the following lemma, the cost-to-go function $W$ defined in (14) can be lower-bounded via the value functions $\widetilde{V}_{N_{R}-n+1}$ ( $\forall n$ ).

Lemma 2 (Lower Bound of Cost-to-Go Function).

With the value function $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ ( $\forall n$ ), given the per-file system state $S_{f,n}$ , remaining lifetime $T_{f,n}^{f}$ and reactive multicast policy $\Omega_{f,n}$ for the $n$ -th request of the $f$ -th file, the minimum average future cost is lower-bounded as

[TABLE]

Proof.

Please refer to Appendix B. ∎

Using the above lower bound to approximate the cost-to-go function, the optimal scheduling policy in (15) becomes suboptimal as follows.

where the constraints in (9-11) should be satisfied.

It can be observed from (18) that the scheduling policy for the $n$ -th transmission can be obtained by minimizing the sum of the current transmission cost $\sum_{s}g_{f,n,s}(\Omega_{f,n})$ and the lower bound of average future transmission cost $\sum\limits_{N,\widetilde{S}_{f,n+1}}\!\!\!\!\!\frac{(\lambda_{f}T_{f,n}^{f})^{N}}{N!}e^{-\lambda_{f}T_{f,n}^{f}}{\widetilde{V}_{N}(\widetilde{S}_{f,n+1})\Pr(\widetilde{S}_{f,n+1}|{S}_{f,n},\Omega_{f,n})}$ , where the latter depends on the value functions $\widetilde{V}_{N}(\widetilde{S}_{f,n+1})$ . In fact, the state space of $\widetilde{S}_{f,n+1}$ is huge, which grows exponentially with respect to the number of cache nodes $N_{C}$ and the number of segments $N_{f}$ . The accurate evaluation of the value functions is computationally prohibitive. In the following section, we shall propose (1) an analytical approximation of the value functions, such that the computation complexity can be essentially reduced; (2) an analytical lower bound on the cost-to-go function $W$ , such that the gap between the proposed sub-optimal policy and the optimal scheduling policy can be bounded.

IV Low-Complexity Solution via Approximate MDP

In this section, we shall propose a novel linear approximation approach on the value function $\widetilde{V}_{N_{R}-n+1}$ ( $\forall n$ ), derive the scheduling policy given the current system state and approximated value functions, and analyze the approximation error. We shall also propose a reinforcement learning algorithm for evaluating the approximated value functions with unknown distribution $\mathcal{F}$ of the requesting users.

IV-A Approximation of Value Function

We first define the notations for the following reference cache states.

•

$\widetilde{S}_{f}^{*}=\{\mathcal{B}_{f,s}^{c}=1|\forall c,s\}$ is the cache state of the $f$ -th file where all the cache nodes have successfully decoded the whole file.

•

$\widetilde{S}_{f}^{i,s}\{\mathcal{B}_{f,s}^{i}=0,\mathcal{B}_{f,t}^{j}=1|\forall(j,t)\neq(i,s)\}$ is the cache state of $f$ -th file where all the cache nodes have successfully decoded the whole file except the $s$ -th segment at the $i$ -th cache node.

Hence, we approximate the value function $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ linearly as (20), where $\mathcal{B}^{i}_{f,s}(\widetilde{S}_{f,n})$ means the parameter of $\mathcal{B}^{i}_{f,s}$ in the cache state $\widetilde{S}_{f,n}$ . An example of approximated value function is elaborated below.

Example 1.

An illustrated in Fig. 2, there are two cache nodes and the downlink file (say the $f$ -th file) is divided into two segments. For the system state $\widetilde{T}=[\mathcal{B}^{1}_{f,1},\mathcal{B}^{1}_{f,2},\mathcal{B}^{2}_{f,1},\mathcal{B}^{2}_{f,2}]=[1,0,1,0]$ , the value function on the $n$ -th stage can be approximated as

[TABLE]

where the cache states $\widetilde{S}_{f}^{1,2}$ and $\widetilde{S}_{f}^{2,2}$ are illustrated in Fig. 2. In the right hand side of the above approximation, the first term counts the transmission cost for the users outside the coverage region of the cache nodes; the second term approximates the cost on the second segment transmission to the users within the coverage region of the first cache node $\mathcal{C}_{1}$ ; and the third term approximates the cost on the second segment transmission to the users within the coverage region of the second cache node $\mathcal{C}_{2}$ . Note that there is no transmission cost on the first segment for the users within $\mathcal{C}_{1}\cup\mathcal{C}_{2}$ .

In order to apply this approximation on all value function, it is necessary to obtain the value of $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})$ and $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})$ for all $n,i$ , and $s$ via (17). In the following, we provide the analytically expressions for them with the distribution knowledge of the requesting users. Moreover, an online learning algorithm is proposed in Section IV-C for the evaluation of $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})$ and $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})$ with unknown spatial distribution of requesting users.

IV-A1 Evaluation of $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})$

Note that the cache state $\widetilde{S}_{f}^{*}$ represents the situation that all the cache nodes have already decoded the $f$ -th file, the purpose of downlink transmission is only to make sure that the requesting users, which are outside of the coverage region of any cache node, can decode the downlink file. Hence it is clear that

[TABLE]

The above value function can be calculated with analytical expression, which is elaborated below.

Lemma 3.

In the high SINR regime, the analytical expression of the value function $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})$ ( $n=1,2,\cdots,N_{R}$ ) is

[TABLE]

where the optimal power $P_{f,n,s}^{*}=\min\{\frac{w}{\mathbb{W}(\frac{2^{\theta}w}{e})},P_{B}\}$ , the optimal transmission symbol number $N_{f,n,s}^{*}=\max\{\frac{R_{f}^{I}\ln(2)}{\alpha[\mathbb{W}(\frac{2^{\theta}w}{e})+1]},\frac{R_{f}^{I}}{\alpha[\theta+\log_{2}(P_{B})]}\}$ , $\theta=\mathbb{E}_{\mathbf{h}_{f,n,s}}\left[\log_{2}\left(\frac{||\mathbf{h}_{f,n,s}||^{2}}{N_{T}(\sigma^{2}_{z}+I_{f,n})}\right)\right]$ , and $\mathbb{W}(x)$ is the Lambert-W function [32].

Proof.

Please refer to Appendix C. ∎

IV-A2 Evaluation of $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})$

Given the cache state $\widetilde{S}_{f}^{i,s}$ for the $n$ -th stage, there are only two possible next cache states $\widetilde{S}_{f}^{i,s}$ and $\widetilde{S}_{f}^{*}$ in the $(n+1)$ -th stage, which are discussed below.

•

When $\rho_{f,n}\eta_{f,n,s}\leq\rho_{i}\eta_{f,n,s}^{i}$ , thus $R_{f,n,s}\leq R_{f,n,s}^{i}$ , the $i$ -th cache node is alway able to decode the $s$ -th file segment give that the transmission constraint (13) should be satisfied. Thus the next state must be $\widetilde{S}_{f}^{*}$ . In this case, the optimized RHS of (17) is given by

[TABLE]

•

When $\rho_{f,n}\eta_{f,n,s}>\rho_{i}\eta_{f,n,s}^{i}$ , thus $R_{f,n,s}>R_{f,n,s}^{i}$ , the BS can choose to deliver the $s$ -th segment to the $(f,n)$ -th user, or both user and the $i$ -th cache node. Hence the optimized RHS of (17) is given by $\mathbb{E}\bigg{\{}\!\min\bigg{[}\overline{G}_{n}^{2}(\widetilde{S}_{f}^{i,s}),\overline{G}_{n}^{3}(\widetilde{S}_{f}^{i,s})\bigg{]}\!\bigg{\}},$ where $\overline{G}_{n}^{2}$ and $\overline{G}_{n}^{3}$ are defined below.

[TABLE]

As a result, the expression of $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})$ is summarized by the following lemma.

Lemma 4.

The value function $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})$ is given by (IV-A1). The asymptotically optimal scheduling parameters for $\overline{G}_{n}^{1}$ and $\overline{G}_{n}^{2}$ in high SINR regime are the same as Lemma 3. The asymptotically optimal scheduling parameters $\{(P_{f,n,t}^{i},N_{f,n,t}^{i})|\forall t\}$ for $\overline{G}_{n}^{3}$ is given by

[TABLE]

and $\forall t\neq s$

[TABLE]

Proof.

The proof is similar to that of Lemma 3, and it is omitted here. ∎

Hence, it is clear that $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})=\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,t}),\forall s\neq t$ . With the distribution knowledge of large-scale fading, the value functions $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})$ and $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})$ can be calculated according to above analytical expressions. Moreover, although different files may consist of different number of segments or segment size, the calculation of $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})$ and $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})$ on one file can be easily extended to the other files. For example, given the cache state $\widetilde{S}_{k,n}$ of the $k$ -th file ( $\forall k\neq f$ ), the value functions approximation, denoted as $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{k,n})$ , can be calculated via $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})$ and $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,1})(\forall i)$ for the f-th file as (22).

IV-B Reactive Multicast Policy

With $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})$ and $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})$ , the value function for arbitrary system state can be approximated via (20). Hence the reactive multicast policy, denoted as $\Omega_{f,n}^{*}({S}_{f,n},T_{f,n}^{f})$ , can be obtained. Moreover, as $\widetilde{V}_{N}(\widetilde{S}_{f,n+1})$ can be decoupled for each segment, the optimization problem (18) can be also decoupled for each segment. Specifically, for the $s$ -th segment ( $\forall s\in J_{f,n}$ ), the solution of (18) can be obtained by solving the following problem.

Problem 4 (Optimization for the $s$ -th Segment).

[TABLE]

where $\mathcal{B}^{i}_{f,s}(\widetilde{S}_{f,n+1})$ represents the next cache state for the $(f,s)$ -th segment in $i$ -th cache node. Note that the set of receiving cache nodes $\mathbf{c}^{*}_{f,n,s}$ can be determined from $(P_{f,n,s}^{*},N_{f,n,s}^{*})$ .

This is an integrated continuous and discrete optimization, its solution algorithm is summarized below.

Algorithm 1 (Scheduling with Approximated Value Function).

Given the system state $S_{f,n}$ , let $d_{1},d_{2},..$ be the indexes of cache nodes, whose large-scale attenuation to the BS in the $s$ -th segment ( $\forall s\in J_{f,n}$ ) is worse than the $(f,n)$ -th user. Moreover, without loss of generality, it is assumed that $\rho_{d_{1}}\eta_{f,n,s}^{d_{1}}\leq\rho_{d_{2}}\eta_{f,n,s}^{d_{2}}\leq...\leq\rho_{f,n,s}\eta_{f,n,s}$ . The solution of Problem 4 can be obtained by the following steps.

•

For each $i$ , solve the following optimization problem.

[TABLE]

The solution, denoted as $[P_{f,n,s}^{d_{i}},N_{f,n,s}^{d_{i}}]$ , can be derived similar to Lemma 3. Note that $[P_{f,n,s}^{d_{i}},N_{f,n,s}^{d_{i}}]$ are the transmission parameters if the file segment can be decoded in the $d_{i}$ -th cache node.

•

Let $d^{*}=\arg\min\limits_{d_{i}}Q_{d_{i},s}^{*}$ , the solution of Problem 4 is then given by $[P_{f,n,s}^{*},N_{f,n,s}^{*}]=[P_{f,n,s}^{d^{*}},N_{f,n,s}^{d^{*}}]$ .

IV-C Learning Algorithm for Approximated Value Function

In Section IV-A, the values of $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})$ and $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})$ are evaluated analytically by assuming that the distribution of the requesting users $\mathcal{F}$ is known. However in practice, this distribution may be unknown to the BS. In order to address this issue, we propose the following learning-based algorithm to evaluate the value functions $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})$ and $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})$ from the historical request arrivals.

Algorithm 2 (Reinforcement Learning for Value Functions).

•

Step 1*: Let $t=0$ . Initialize the value of $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})$ and $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})$ ( $\forall n,i,s$ ), and denote them as $\widetilde{V}^{t}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})$ and $\widetilde{V}^{t}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})$ . This initialization can be done by assuming all the users appear uniformly in the cell coverage, hence the approach in Section IV-A can be applied to calculate the initial values.*

•

Step 2*: Let $t=t+1$ if there is file request arrival. Suppose it is the $m$ -th request on the $g$ -th file, and the location of the requesting user is $\mathbf{l}_{m,g}$ , we have $\forall i,f,s,n$ *

[TABLE]

where $P_{g,m,s}^{*}=\min\{\frac{w}{\mathbb{W}(\frac{2^{\theta_{t}}w}{e})},P_{B}\},N_{g,m,s}^{*}=\frac{R_{f}^{I}}{\alpha[\theta_{t}+\log_{2}(P_{g,m,s}^{*})]}$ , $\theta_{t}=\!\mathbb{E}_{\mathbf{h}_{g,m,s}}\!\!\left[\log_{2}\left(\frac{||\mathbf{h}_{g,m,s}||^{2}}{N_{T}(\sigma^{2}_{z}+I_{g,m})}\right)\!\right]$ .

[TABLE]

where $\overline{G}_{n}^{1}$ , $\overline{G}_{n}^{2}$ and $\overline{G}_{n}^{3}$ are defined in (• ‣ IV-A2), (• ‣ IV-A2) and (• ‣ IV-A2) respectively.

•

Step 3*: If $\max\{|\widetilde{V}_{N_{R}-n+1}^{t}(\!\widetilde{S}_{f}^{i,s}\!)\!-\!\widetilde{V}_{N_{R}-n+1}^{t-1}(\!\widetilde{S}_{f}^{i,s}\!)|,|\widetilde{V}_{N_{R}-n+1}^{t}(\!\widetilde{S}_{f}^{*}\!)\!-\!\widetilde{V}_{N_{R}-n+1}^{t-1}(\!\widetilde{S}_{f}^{*}\!)|\big{|}\forall n,i,s\}$ is greater than one threshold $\tau$ , the algorithm goes to Step 2; otherwise, the algorithm terminates.*

Moreover, we have the following conclusion on the convergence of above learning algorithm.

Lemma 5.

The Algorithm 2 will converge to the true value of $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})$ and $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})$ ( $\forall f,n,i,s$ ). Thus

[TABLE]

,

[TABLE]

Proof.

Please refer to Appendix D. ∎

IV-D Bounds on Approximated Value function

In this paper, two approximation steps are proposed to find the sub-optimal and low-complexity reactive multicast policy, i.e.,

•

$W\rightarrow\widetilde{V}$ **: ** Approximate the cost-to-go function $W$ via a linear combination of value functions of a finite-horizon MDP in Section III-B.

•

$\widetilde{V}\rightarrow\widehat{V}$ **: ** Analytically approximate the value function in Section IV-A.

In this section, we shall provide an analytical upper bound on the approximation error of $\widetilde{V}\rightarrow\widehat{V}$ , and an analytical lower bound on the cost-to-go function $W$ (Note that the upper bound of $W$ can be obtained by numerical simulation). First of all, we have the following conclusion on the bounds of the true value functions $\widetilde{V}$ .

Lemma 6 (Bounds of Value Functions).

The upper and lower bounds of $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ ( $\forall f,n$ ) are provided below.

[TABLE]

Proof.

Please refer to Appendix E. ∎

Let $\mathcal{E}_{N_{R}-n+1}(\widetilde{S}_{f,n})\triangleq\widehat{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})-\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ , ( $\forall n,\widetilde{S}_{f,n}$ ) be the approximation error of the value functions for arbitrary $n$ and $\widetilde{S}_{f,n}$ . Replacing $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ by (6), we have

[TABLE]

Moreover, a tighter upper bound of the value functions can be obtained numerically.

Corollary 1 (Refined Upper Bound of Value Function).

Let $\overline{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ ( $\forall f,n$ ) be the intermediate value function on system state $\widetilde{S}_{f,n}$ after one-step value iteration based on $\widehat{V}_{N_{R}-n}(\widetilde{S}_{f,n+1})$ , is given by (26). Then, $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})\leq\overline{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})\leq\widehat{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ .

Proof.

Please refer to Appendix F. ∎

With the knowledge of $\mathcal{F}$ , $\overline{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ can be calculated via Monte Carlo simulation. If $\mathcal{F}$ is not available at the BS, the learning-based approach can be used to evaluate $\overline{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ . The algorithm is similar to the one in Section IV-C, and it is omitted here due to page limitation. Hence, it is feasible to obtain better approximation of value function $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ for any specified stage and per-file system state.

Note that the upper bound of the cost-to-go function $W(S_{f,n},T_{f,n}^{f})$ can be obtained by simulating the file transmission with initial system state $S_{f,n}$ and lifetime $T_{f,n}^{f}$ . We introduce the following analytical lower bound.

Lemma 7 (Analytical Lower-bound on $W$ ).

Given the initial system state $S_{f,n+1}$ at the beginning of a remaining lifetime with duration $T_{f,n}^{f}$ , the minimum transmission cost of the BS, denoted as $W(S_{f,n+1},T_{f,n}^{f})$ is lower-bounded as

[TABLE]

Proof.

This lemma is straightforward by combining the conclusions of Lemma 2 and 6. ∎

V Scheduling Algorithm for Proactive Multicast

In this section, we propose a heuristic scheduling algorithm of proactive multicast, which could deliver some file segments to the cache nodes with low transmission cost by exploiting the temporal diversity of shadowing effect. We first define the proactive file placement policy.

Definition 3 (Proactive Multicast Policy).

Suppose that the BS will proactively multicast one file segment in every $T_{p}$ seconds. In the $k$ -th proactive transmission opportunity, given the state of each cache node $\{\mathcal{B}_{f,s}^{c}|\forall c,f,s\}$ , the shadowing from the BS to each cache nodes $\{\eta^{c}_{k}|\forall c\}$ , and the remaining lifetime of each file $\{T^{k}_{f}|\forall f\}$ , the BS should determine the selected file segment $(f_{k},s_{k})$ and the downlink transmission parameters $P_{k}$ and $N_{k}$ for the selected $(f_{k},s_{k})$ -th file segment. Thus denote $S^{k}=\left[\{\mathcal{B}_{f,s}^{c}|\forall c,f,s\},\{\eta^{c}_{k}|\forall c\},\{T^{k}_{f}|\forall f\}\right]$ , the proactive multicast policy can be written as $\Omega_{k}(S^{k})=[f_{k},s_{k},P_{k},N_{k}].$

The joint optimization of reactive multicast $\{\Omega_{f,n}|\forall f,n\}$ and proactive multicast $\{\Omega_{k}|\forall k\}$ is complicated as the transmission strategy of different files are coupled. Instead, we use the low-complexity scheduling policy for reactive multicast derived in the previous section, and propose a heuristic proactive multicast to further suppress the overall transmission cost. Specifically, we use $\widehat{g}_{f}^{k}(\widetilde{S}_{f}^{k},T_{f}^{k})=\sum\limits_{N}\frac{(\lambda_{f}T_{f}^{k})^{N}}{N!}e^{-\lambda_{f}T_{f}^{k}}{\widehat{V}_{N}(\widetilde{S}_{f}^{k})}$ to approximate the remaining transmission cost spent on the $f$ -th file without any proactive multicast, where $\widetilde{S}_{f}^{k}=\{\mathcal{B}_{f,s}^{c}|\forall c,s\}$ is the cache state of the $f$ -th file before the $k$ -th proactive multicast. Hence, $\Omega_{k}(S^{k})$ can be determined as follows.

Problem 5 (Heuristic Scheduling for Proactive Multicast).

[TABLE]

where $\tau^{{}^{\prime}}>1$ is a constant threshold, $\widetilde{S}_{{f_{k}}}^{k}$ and $\breve{S}_{{f_{k}}}^{k}$ are the system cache state before and after proactive multicast respectively.

In the objective of Problem 5, the numerator and the denominator are the approximations of the $f_{k}$ -th file’s remaining transmission cost with and without the $k$ -th proactive multicast, respectively. The constraint with $\tau^{{}^{\prime}}>1$ is due to approximation error. Problem 5 can be solved via the following algorithm.

Algorithm 3 (Proactive Multicast).

On each proactive transmission opportunity (say the $k$ -th opportunity), the algorithm to determine the proactive multicast policy $\Omega_{k}$ is elaborated below.

•

Step 1: For each file segment (say the $(f,s)$ -th one), evaluate

[TABLE]

where $P_{f,s}^{k},N_{f,s}^{k}$ and $\mathcal{A}_{f}^{k}$ represents the transmission power, transmission symbol number and the set of receiving cache nodes, $\breve{S}_{f}^{k}(\mathcal{A}_{f}^{k})$ denotes the cache state where the cache nodes in $\mathcal{A}_{f}^{k}$ have successfully decoded the $(f,s)$ -th segment given the previous state $\widetilde{S}_{f}^{k}$ . The solution of the above optimization problem can be obtained by minimizing the denominator, which is similar to that of Problem 4. Hence it is omitted here.

•

Step 2: The $(f_{k},s_{k})$ -th segment is chosen when the following two conditions are satisfied:

–

$(f_{k},s_{k})=\arg\max\limits_{(f,s)}\Delta g_{f,s}^{k}$ * and $\Delta g_{f_{k},s_{k}}^{k}\geq\tau^{{}^{\prime}}$ .*

VI Simulation

In the simulation, the cell radius $500$ meters, cache nodes are randomly deployed in the cell-edge region with a service radius of $90$ meters. The number of BS antennas is 8. The path loss exponent is $3.5$ . The file segment size $R_{f}^{I}=14$ Mb ( $\forall f$ ). The transmission bandwidth is $20$ MHz. The power constraint at the base station $P_{B}=46$ dBm. The performance of the proposed algorithm will be compared with the following two baselines.

Baseline 1.

The BS only makes sure that the segment delivery to the requesting users in each transmission. The cache nodes with better channel condition to the BS can decode the segments.

Baseline 2.

The BS ensures that all the cache nodes can decode the downlink file in the first transmission. Hence, all the cache nodes can help to forward the file since the second file request.

The performance of the proposed low-complexity algorithm (Algorithm 1) is compared with the above two baselines in Fig.3. In the simulation, the number of cache nodes is $20$ and $25$ respectively, and the requesting users are uniformly distributed in the cell coverage, and distribution statistics are known to the BS. Hence, the analytical expressions derived in Section IV-A can be used to calculate the approximated value functions. It can be observed that the proposed Algorithm 1 is superior to the two baselines for any expected number of requests per file lifetime. Moreover, the Baseline 1 has better performance than Baseline 2 when the popularity of the file is high (larger expected number of file requests). The performance gain tends to be a constant when expectation of request number is large. This is because all the three schemes have the same performance as long as the files have been stored in all cache nodes. In other words, the gain of the proposed scheme lies in the phase of caching.

The approximation error of the value function versus the indexes of file requests is illustrated in Fig.4, where the true value function and the bounds derived in Lemma 6 are plotted. The cache nodes are empty in Fig. 4(a), while half of cache nodes have decoded the whole file in Fig. 4(b). It is shown that for both states, both upper and lower bounds are tight, and therefore the approximation error is small. In addition, the refined upper bound has even smaller gap to the true value function, which matches the conclusion in Corollary 1.

In Fig. 5, there are $3$ and $4$ hot zones in the cell coverage respectively, each with radius $90$ m. The probability that the user appears in the one hot zone is $12.5\%$ (larger than the other regions). The locations and user distribution of the hot zones are unknown to the BS. The performance of two baselines, the proposed Algorithm 1 assuming users are uniformly distributed, and the proposed Algorithm 1 with learning-based evaluation of value functions (Algorithm 2) are compared. It can be observed that the proposed learning algorithm has the best performance. Moreover, the performance gain of the learning-based algorithm is larger with more hot zones.

Finally, the performance gain of the proactive multicast is demonstrated in Fig. 6 (a), where there are 10 files, 3 hot zones in the cell, 50000 times of proactive transmission opportunities in the file’s lifetime. The performance of the proactive content placement algorithm is compared with the above two baselines, Algorithm 1 assuming uniform user distribution, and Algorithm 1 with learning-based evaluation of value functions. It can be observed that the proposed proactive content placement algorithm can further improve the offloading performance, especially when the popularity of files is high. Moreover, it is shown in Fig. 6 (b) that the average cost decreases with the increasing of proactive multicast frequency. However, the transmission cost reduction saturates when the proactive multicast frequency are large.

VII Conclusion

We consider the scheduling of downlink file transmission with the assistance of cache nodes in this paper. The downlink resource minimization problem with reactive multicast is formulated as a dynamic programming problem with random number of stages. We first approximate it by a finite-horizon MDP with fixed stage numbers. In order to address the curse of dimensionality, we also introduce a low-complexity sub-optimal solution based on linear approximation of value functions. The approximated value function can be calculated analytically with the knowledge of distribution statistics of users. Since, the statistics of the distribution may be unknown to the BS, we continue to propose a learning-based online algorithm to evaluate the approximated value functions. Furthermore, we derive a bound on the gap between the approximated value functions and the real ones. Finally, we propose a proactive multicast algorithm, which can exploit the channel temporal diversity of shadowing effect.

Appendix A: Proof Of Lemma 1

Let $\widetilde{V}_{N_{R}-n}(\widetilde{S}_{f,n})=\mathbb{E}_{\eta,\rho}[V_{N_{R}-n}(S_{f,n+1})]$ , where the expectation is taken over the randomness of shadowing and requesting users’ pathloss, we have

[TABLE]

Taking expectation with respect to the shadowing and pathloss on (16), we have

[TABLE]

Appendix B: Proof Of Lemma 2

Due to page limitation, we only provide the sketch of the proof.

[TABLE]

where $\{\Pi_{f,k}^{N}|\forall k\}$ is the optimal policy when the remaining stage number is $N$ . The inequality (a) is because that $\{\Pi_{f,k}^{N}\}$ is optimized for each specific remaining stage number.

Appendix C: Proof Of Lemma 3

First of all, we have the following high SINR approximation on the throughput $R_{f,n,s}$ . $R_{f,n,s}\approx N_{f,n,s}\mathbb{E}_{\mathbf{h}_{f,n,s}}\left[\alpha\log_{2}\left(\frac{||\mathbf{h}_{f,n,s}||^{2}P_{f,n,s}}{N_{T}\sigma^{2}_{z}}\right)\right]=N_{f,n,s}\alpha[\theta+\log_{2}(P_{f,n,s})]$ . With $R_{f,n,s}=R_{f}^{I}$ , we have $N_{f,n,s}=\frac{R_{f}^{I}}{\alpha[\theta+\log_{2}(P_{f,n,s})]}$ . Hence the original optimization becomes $\min\limits_{P_{f,n,s}}\frac{R_{f}^{I}(P_{f,n,s}+w)}{\alpha[\theta+\log_{2}(P_{f,n,s})]}.$ The optimal transmission power $P_{f,n,s}^{*}$ can be obtained by taking first-order derivative.

Appendix D: Proof Of Lemma 5

We only prove the convergence of $\widetilde{V}^{t}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})$ , and the convergence of $\widetilde{V}^{t}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})$ can be applied similarly. Let $\varepsilon_{t}=\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})-\frac{(N_{R}-n+1)R_{f}^{I}}{R_{g}^{I}}I(\mathbf{l}_{g,m}\notin\mathcal{C})\sum_{s}(P_{g,m,s}^{*}N_{g,m,s}^{*}+wN_{g,m,s}^{*})$ be the estimate error in $t$ -th iteration. It is clear that the estimation errors are i.i.d. with respect to $t$ , $\mathbb{E}[\varepsilon_{t}]=0$ and $Var[\varepsilon_{t}]<+\infty$ . Note that $\widetilde{V}^{t}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})$ can be written as

[TABLE]

where the total estimate error is $\sum_{i=0}^{t}\frac{\varepsilon_{i}}{t+1}$ . The mean and variance of total estimate error are $\mathbb{E}\bigg{\{}\sum_{i=0}^{t}\frac{\varepsilon_{i}}{t+1}\bigg{\}}=0,Var\bigg{\{}\sum_{i=0}^{t}\frac{\varepsilon_{i}}{t+1}\bigg{\}}=\frac{Var[\varepsilon_{i}]}{t+1}.$ When $t\rightarrow+\infty$ , the variance of estimation error tends to zero, and $\widetilde{V}^{t}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})$ converges to $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})$ .

Appendix E: Proof Of Lemma 6

VII-1 Proof of Upper Bound

The approach of mathematical induction will be used in the proof. Without loss of generality, we shall assume that the upper bound holds when the first $l$ -th cache nodes have not decoded the $(f,s)$ -th segment, and prove that the upper bound also holds when the first $(l+1)$ -th cache nodes have not decoded the $(f,s)$ -th segment. Define the system state $\widetilde{T_{f}}^{c,s}=[\mathcal{B}^{i}_{f,j}=1,\forall j\neq s,\forall i]\cup[\mathcal{B}^{i}_{f,s}=0,\forall i=1,2,\cdots,c]\cup[\mathcal{B}^{i}_{f,s}=1,\forall i>c]$ .

•

Step 1: When $c=1$ , the upper bound holds as follows

[TABLE]

•

Step 2: Suppose the following bound holds for $c=l$

[TABLE]

•

Step 3: When $c=l+1$ , we can apply the following sub-optimal control policy: (1) if the requesting users appear in the coverage of $\mathcal{C}_{1}\cup\mathcal{C}_{2}\cup...\cup\mathcal{C}_{l}$ , the optimal scheduling policy for system state $\widetilde{T_{f}}^{l,s}$ is applied; (2) if the requesting users appear in the coverage of $\mathcal{C}_{l+1}$ , the optimal scheduling policy for system state $\widetilde{S}_{f}^{l+1,s}$ is applied; (3) if the requesting users appear outside the coverage of any cache nodes, choose the one from the above two policies with larger transmission resource consumption. Let $\breve{V}_{N_{R}-n+1}(\widetilde{T_{f}}^{l+1,s})$ be the average cost of the above sub-optimal scheduling policy, we have

[TABLE]

Although the above proof is for the $(f,s)$ -th file segment, it can be trivially extended to arbitrary file segments. Thus the upper bound is proved.

VII-2 Proof of Lower Bound

Let $\Omega_{f,n}^{*}$ be the optimal scheduling policy and $\widetilde{S}_{f,n}$ be arbitrary cache state for the $f$ -th file in the $n$ -th stage, we have

[TABLE]

As

[TABLE]

and

[TABLE]

We have

[TABLE]

We also have

[TABLE]

As a result, the lower bound is straightforward.

Appendix F: Proof Of Corollary 1

Due to page limitation, we only provide the sketch of the proof. First, $\overline{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})\geq\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ can be deduced from the following two factors:

•

It has been proved in Lemma 6 that $\widehat{V}_{N_{R}-n}(\widetilde{S}_{f,n+1})\geq\widetilde{V}_{N_{R}-n}(\widetilde{S}_{f,n+1}),\forall\widetilde{S}_{f,n+1}$ .

•

$\overline{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ is the minimization of the $(f,n)$ -th file transmission cost and the future cost $\widehat{V}_{N_{R}-n}(\widetilde{S}_{f,n+1})$ ; whereas $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ is the minimization of the $(f,n)$ -th file transmission cost and the future cost $\widetilde{V}_{N_{R}-n}(\widetilde{S}_{f,n+1})$ .

In order to prove $\overline{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})\leq\widehat{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ , we first define the scheduling policy $\widehat{\Omega}_{f,n}$ as follows.

•

When the first requesting user falls in the coverage area of the $c$ -th cache node ( $\mathbf{l}_{f,n}\in\mathcal{C}_{c}$ , $\forall c$ ), the scheduling policy $\widehat{\Omega}_{f,n}=\{(\widehat{P}_{f,n,s}^{c},\widehat{N}_{f,n,s}^{c})|\forall s\}$ minimizes the transmission cost for the $s$ -th segment ( $\forall s$ ) as if all other cache nodes have already decoded this segment. Hence,

[TABLE]

•

When the first requesting user falls outside the coverage area of any cache node ( $\mathbf{l}_{f,1}\in\mathcal{C}_{0}$ ), the scheduling policy $\widehat{\Omega}_{f,n}=\{(P_{f,n,s}^{0},N_{f,n,s}^{0})|\forall s\}$ , where

[TABLE]

$P_{f,n,s}^{*}$ and $N_{f,n,s}^{*}$ are the optimal scheduling by assuming that all the cache nodes have decoded the $s$ -th segment (the expressions of them are provided in Lemma 3). $P_{f,n,s}^{c}$ and $N_{f,n,s}^{c}$ are the optimal scheduling by assuming that all the cache nodes except the $c$ -th one have decoded the $s$ -th segment (the expressions of them are provided in Lemma 4).

Then, we get the inequalities (29), where $\widetilde{S}_{f,n+1}$ is the next cache state given the current cache state ${S}_{f,n}$ and scheduling policy $\widehat{\Omega}_{f,n}$ . The inequality (a) is because that $\overline{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ uses the optimal scheduling policy and $\check{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ uses heuristic scheduling policy. Both $\check{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ and $\widehat{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ spent the same cost on the first file transmission. However, the evaluation of future cost in $\widehat{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ is more conservative (larger) than that of $\check{V}_{N_{R}-n+1}(\widetilde{S}_{f,n})$ . The inequality (b) can be obtained.

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] B. Lv, L. Huang, and R. Wang, “Cellular offloading via downlink cache placement,” in 2018 IEEE International Conference on Communications (ICC) , 2018.
2[2] N. Naderializadeh, D. T. H. Kao, and A. S. Avestimehr, “How to utilize caching to improve spectral efficiency in device-to-device wireless networks,” in 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton) , Sept 2014, pp. 415–422.
3[3] A. Liu and V. K. N. Lau, “Mixed-timescale precoding and cache control in cached MIMO interference network,” IEEE Transactions on Signal Processing , vol. 61, no. 24, pp. 6320–6332, Dec. 2013.
4[4] M. Leconte, G. Paschos, L. Gkatzikis, M. Draief, S. Vassilaras, and S. Chouvardas, “Placing dynamic content in caches with small population,” in IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications , April 2016, pp. 1–9.
5[5] N. Golrezaei, K. Shanmugam, A. G. Dimakis, A. F. Molisch, and G. Caire, “Femtocaching: Wireless video content delivery through distributed caching helpers,” in 2012 Proceedings IEEE INFOCOM , March 2012, pp. 1107–1115.
6[6] N. Golrezaei, A. F. Molisch, A. G. Dimakis, and G. Caire, “Femtocaching and device-to-device collaboration: A new architecture for wireless video distribution,” IEEE Communications Magazine , vol. 51, no. 4, pp. 142–149, April 2013.
7[7] S. H. Chae and W. Choi, “Caching placement in stochastic wireless caching helper networks: Channel selection diversity via caching,” IEEE Transactions on Wireless Communications , vol. 15, no. 10, pp. 6626–6637, Oct 2016.
8[8] J. Wen, K. Huang, S. Yang, and V. O. K. Li, “Cache-enabled heterogeneous cellular networks: Optimal tier-level content placement,” IEEE Transactions on Wireless Communications , vol. 16, no. 9, pp. 5939–5952, Sept 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Joint Downlink Scheduling for File Placement and Delivery in Cache-Assisted Wireless Networks with Finite File Lifetime

Abstract

I introduction

I-A Related Works

I-B Our Contributions

II System Model

II-A Network Model

Remark 1** (PPP File Request Model).**

Remark 2** (Multi-Transmission Scheduling).**

II-B Downlink Physical Layer Model

Remark 3** (Segment-Level Scheduling).**

III Finite-Horizon MDP Formulation for Reactive Multicast

III-A Dynamic Programming Problem Formulation

Definition 1** (System State).**

Definition 2** (Reactive Multicast Policy).**

Remark 4** (Trade-off between transmission time and energy).**

Problem 1** (Overall System Optimization).**

Problem 2** (Optimization on the fff-th File).**

III-B Approximation of Cost-to-go Function

Problem 3** (Optimization with a Fixed Request Number).**

Lemma 1** (Bellman’s Equation with Reduced Space).**

Proof.

Lemma 2** (Lower Bound of Cost-to-Go Function).**

Proof.

IV Low-Complexity Solution via Approximate MDP

IV-A Approximation of Value Function

Example 1**.**

IV-A1 Evaluation of V~NR−n+1(S~f∗)\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})VNR​−n+1​(Sf∗​)

Lemma 3**.**

Proof.

IV-A2 Evaluation of V~NR−n+1(S~fi,s)\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})VNR​−n+1​(Sfi,s​)

Lemma 4**.**

Proof.

IV-B *Reactive Multicast Policy *

Problem 4** (Optimization for the sss-th Segment).**

Algorithm 1** (Scheduling with Approximated Value Function).**

IV-C Learning Algorithm for Approximated Value Function

Algorithm 2** (Reinforcement Learning for Value Functions).**

Lemma 5**.**

Proof.

IV-D Bounds on Approximated Value function

Lemma 6** (Bounds of Value Functions).**

Proof.

Corollary 1** (Refined Upper Bound of Value Function).**

Proof.

Lemma 7** (Analytical Lower-bound on WWW).**

Proof.

V Scheduling Algorithm for Proactive Multicast

Definition 3** (Proactive Multicast Policy).**

Problem 5** (Heuristic Scheduling for Proactive Multicast).**

Algorithm 3** (Proactive Multicast).**

VI Simulation

Baseline 1**.**

Baseline 2**.**

VII Conclusion

Appendix A: Proof Of Lemma 1

Appendix B: Proof Of Lemma 2

Appendix C: Proof Of Lemma 3

Appendix D: Proof Of Lemma 5

Appendix E: Proof Of Lemma 6

VII-1 Proof of Upper Bound

VII-2 Proof of Lower Bound

Appendix F: Proof Of Corollary 1

Remark 1 (PPP File Request Model).

Remark 2 (Multi-Transmission Scheduling).

Remark 3 (Segment-Level Scheduling).

Definition 1 (System State).

Definition 2 (Reactive Multicast Policy).

Remark 4 (Trade-off between transmission time and energy).

Problem 1 (Overall System Optimization).

Problem 2 (Optimization on the $f$ -th File).

Problem 3 (Optimization with a Fixed Request Number).

Lemma 1 (Bellman’s Equation with Reduced Space).

Lemma 2 (Lower Bound of Cost-to-Go Function).

Example 1.

IV-A1 Evaluation of $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{*})$

Lemma 3.

IV-A2 Evaluation of $\widetilde{V}_{N_{R}-n+1}(\widetilde{S}_{f}^{i,s})$

Lemma 4.

IV-B Reactive Multicast Policy

Problem 4 (Optimization for the $s$ -th Segment).

Algorithm 1 (Scheduling with Approximated Value Function).

Algorithm 2 (Reinforcement Learning for Value Functions).

Lemma 5.

Lemma 6 (Bounds of Value Functions).

Corollary 1 (Refined Upper Bound of Value Function).

Lemma 7 (Analytical Lower-bound on $W$ ).

Definition 3 (Proactive Multicast Policy).

Problem 5 (Heuristic Scheduling for Proactive Multicast).

Algorithm 3 (Proactive Multicast).

Baseline 1.

Baseline 2.