Online Reinforcement Learning of X-Haul Content Delivery Mode in Fog Radio Access Networks
Jihwan Moon, Osvaldo Simeone, Seok-Hwan Park, Inkyu Lee

TL;DR
This paper introduces an adaptive, reinforcement learning-based method for selecting content delivery modes in fog radio access networks, balancing current and future latency to optimize overall performance.
Contribution
It proposes a novel RL-based approach for mode selection in F-RANs that accounts for unknown, changing content popularity, improving latency management.
Findings
The RL scheme effectively reduces long-term delivery latency.
Adaptive mode selection outperforms static strategies.
Numerical results validate the approach's efficiency.
Abstract
We consider a Fog Radio Access Network (F-RAN) with a Base Band Unit (BBU) in the cloud and multiple cache-enabled enhanced Remote Radio Heads (eRRHs). The system aims at delivering contents on demand with minimal average latency from a time-varying library of popular contents. Information about uncached requested files can be transferred from the cloud to the eRRHs by following either backhaul or fronthaul modes. The backhaul mode transfers fractions of the requested files, while the fronthaul mode transmits quantized baseband samples as in Cloud-RAN (C-RAN). The backhaul mode allows the caches of the eRRHs to be updated, which may lower future delivery latencies. In contrast, the fronthaul mode enables cooperative C-RAN transmissions that may reduce the current delivery latency. Taking into account the trade-off between current and future delivery performance, this paper proposes an…
| (25) |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Online Reinforcement Learning of X-Haul Content Delivery Mode in Fog Radio Access Networks
Jihwan Moon, Member, IEEE, Osvaldo Simeone, Fellow, IEEE, Seok-Hwan Park, Member, IEEE,
and Inkyu Lee, Fellow, IEEE This work was supported by the National Research Foundation through the Ministry of Science, ICT, and Future Planning (MSIP), Korean Government under Grant 2017R1A2B3012316. O. Simeone has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation Programme (Grant Agreement No. 725731). J. Moon and I. Lee are with the School of Electrical Engineering, Korea University, Seoul 02841, South Korea (e-mail: {anschino, inkyu}@korea.ac.kr). O. Simeone is with the Department of Informatics, King’s College London, London WC2R 2LS, U.K. (e-mail: [email protected]). S.-H. Park is with the Division of Electronic Engineering, Chonbuk National University, Jeonju 54896, South Korea (e-mail: [email protected]).
Abstract
We consider a Fog Radio Access Network (F-RAN) with a Base Band Unit (BBU) in the cloud and multiple cache-enabled enhanced Remote Radio Heads (eRRHs). The system aims at delivering contents on demand with minimal average latency from a time-varying library of popular contents. Information about uncached requested files can be transferred from the cloud to the eRRHs by following either backhaul or fronthaul modes. The backhaul mode transfers fractions of the requested files, while the fronthaul mode transmits quantized baseband samples as in Cloud-RAN (C-RAN). The backhaul mode allows the caches of the eRRHs to be updated, which may lower future delivery latencies. In contrast, the fronthaul mode enables cooperative C-RAN transmissions that may reduce the current delivery latency. Taking into account the trade-off between current and future delivery performance, this paper proposes an adaptive selection method between the two delivery modes to minimize the long-term delivery latency. Assuming an unknown and time-varying popularity model, the method is based on model-free Reinforcement Learning (RL). Numerical results confirm the effectiveness of the proposed RL scheme.
I Introduction
The architecture of the recently launched fifth generation (5G) mobile system can leverage cloud processing at Base Band Units (BBUs), as well as edge processing, including edge caching, at enhanced Remote Radio Heads (eRRHs) [1]. In order to enable a flexible functional split in this architecture, referred to as Fog-Radio Access Network (F-RAN) [2], the concept of X-haul has been introduced to integrate the traditionally distinct backhaul and fronthaul connectivity modes for the interface between the BBU and the eRRH into a unified framework [3, 4, 5]. The backhaul mode enables the transfer of data packets from the BBU in the cloud to the eRRHs. In contrast, the fronthaul mode allows the BBU to carry out joint baseband processing and deliver quantized baseband samples to the eRRHs as in Cloud-RAN (C-RAN) [6, 7, 8].
In this work, we study an adaptive selection of backhaul and fronthaul transfer modes with the aim of optimizing the performance of content delivery. The content delivery in F-RANs has been widely studied in recent years [9, 10, 11, 12, 13, 14, 15]. Most studies assume offline caching with a static popularity model. Under these assumptions, references [9] and [10] investigated the problem of instantaneous delivery latency minimization and minimum data rate maximization, respectively, while keeping the contents of the caches fixed. In contrast, in [11] and [12], information-theoretic performance bounds were provided on the optimal high Signal-to-Noise-Ratio (SNR) performance by considering also the optimization of uncoded caching strategies. An extension of this work that accounts for time-varying and possibly unknown file popularity with online caching was described in [13]. Under an unknown dynamic popularity model, the works [14] and [15] presented a Reinforcement Learning (RL) based optimization of online caching by assuming a backhaul mode.
In this paper, we investigate for the first time the online minimization of the long-term delivery latency over X-haul links in an F-RAN with time-varying unknown file popularity. We focus on the joint optimization of linear precoding strategies and the choice between fronthaul and backhaul modes. The backhaul mode enables cache updates at the eRRHs, hence potentially reducing future latencies. In contrast, the fronthaul mode allows cooperative C-RAN transmissions which decrease the current delivery latency [9, 10, 11]. We propose a new model-free RL approach based on a linear value function approximation with properly selected features, and numerical results confirm the effectiveness of the proposed RL scheme.
Notations: and stand for expectation and probability, respectively. represents the cardinality of set , and denotes an complex matrix. outputs one if condition is true and zero otherwise. For a matrix X, , , , and are defined as determinant, transpose, Hermitian, inverse and trace, respectively. means an identity matrix while equals a Kronecker product operation. Also, \text{diag}\big{(}\textbf{X}_{\scriptscriptstyle{1}},...,\textbf{X}_{\scriptscriptstyle{N}}\big{)} represents block-wise diagonalization of matrices . Lastly, indicates a circularly symmetric complex Gaussian distribution with mean vector and covariance matrix .
II System Model
We study the F-RAN system illustrated in Fig. 1, which consists of a BBU in the cloud, connected to cache-enabled eRRHs and users. Each X-haul link between the BBU and the -th eRRH has capacity bits per symbols and can be operated in both backhaul and fronthaul modes [4][5]. The -th user and the -th eRRH are equipped with and antennas, respectively. We assume a time-slotted operation [15], and the wireless channel matrix between the -th eRRH and the -th user is assumed to be fixed for the given time scale of interest slots. We also define as the library of -bit files, which may be requested by the users. Finally, we denote as the subset of files cached at time slot at the eRRHs whose cardinality is bounded by files due to storage capacity constraints. Note that in this letter, we make a simplifying assumption that all the eRRHs store the same files in their respective caches. Generalization of the framework is possible but at the cost of a more cumbersome notation. Detailed request, online caching and delivery models are described in the following.
II-A Request Model and Online Caching
In each time slot , a subset of files is popular in the sense that all users request files from . Specifically, the -th user requests a uniformly selected file from subset without replacement [13]. The assumption of no replacement ensures that all requested files are distinct, yielding a worst-case performance analysis [11]. We assume that the popularity varies as a Markov process as in [14, 16, 17, 18]. This is a standard assumption which provides a first-order approximation of the evolution of the content popularity [19][20]. Let and denote the indices of the users whose requested files are cached and the indices of users whose requested files are not cached at time , respectively. In case the backhaul mode is selected at time slot , the requested but uncached files in are sent on all the X-haul links and cached. In order to make space for a new file, a previously cached file is evicted by following the standard Least Recently Used (LRU) rule [21].
II-B Delivery Operation
At each slot , the X-haul link is used in either fronthaul or backhaul mode for symbols, where and indicate the selection of fronthaul and backhaul modes, respectively. Subsequently, the eRRHs deliver the requested files in set over the wireless channel for symbols, based on the signals received on the X-haul links and on the cached contents. This results in a total latency of symbols for time slot . Note that the eRRHs’ caches are updated according to the caching mechanism described in Section II-A only if the backhaul mode is selected as .
II-C Problem Formulation
The delivery time at slot depends on the state of the system , , , which includes the set of popular files, cached files and requested files, respectively. Given the Markovity of the process , the state evolves as a controlled Markov process. is partially observable since the set is unknown, and it is only observed indirectly via the file set . In particular, at time , only the history of observations with , is available to the system. Thus, a general policy can map the observations to the selected action through a conditional distribution .
In this work, we aim at minimizing the average long-term delivery latency of the proposed F-RAN system over the selection of policy . Given a forgetting factor , the problem can be formulated as
[TABLE]
where calculation of the total latency will be reviewed in Section III. The expectation in (P) is over the state distribution, which depends on the policy.
III Minimum Instantaneous Latency
In this section, we discuss how to evaluate the delivery latency in problem (P). We emphasize that for and is assumed known when solving problem (P) at each time slot , and is derived as defined in this section. Following [9], we omit the time index for simplicity.
III-A Backhaul Mode
In the backhaul mode (), the BBU first fetches the requested but uncached files and transmits them to the eRRHs. The backhaul transmission to the -th eRRH takes \Delta_{\scriptscriptstyle{m}}^{R}=\big{|}\mathcal{F}_{\scriptscriptstyle{\text{req},\text{NC}}}\big{|}L/C_{\scriptscriptstyle{m}}^{R} symbols, and the total backhaul latency is , since all the eRRHs need to receive the files in . As a result, all the requested files in are available at the eRRHs and cooperative transmission across all eRRHs is feasible. Each file for the -th user is encoded by each eRRH as the signal , where denotes the number of data streams allocated to the -th user, which is assumed to be a fixed parameter. The transmit signal from the -th eRRH is then given as where , and is the precoding matrix for at the -th eRRH. Accordingly, the achievable rate for the -th user on the wireless channel can be written as [9]
[TABLE]
where we have \boldsymbol{\Phi}_{\scriptscriptstyle{\text{back},k}}^{U}\triangleq\big{(}\sum\nolimits_{\ell\in\mathcal{K}_{\scriptscriptstyle{\text{req}}}\backslash k}\textbf{H}_{\scriptscriptstyle{k}}\textbf{G}_{\scriptscriptstyle{\ell}}\textbf{G}_{\scriptscriptstyle{\ell}}^{H}\textbf{H}_{\scriptscriptstyle{k}}^{H}+\sigma_{\scriptscriptstyle{k}}^{2}\textbf{I}_{\scriptscriptstyle{N_{k}^{U}}}\big{)}^{-1}\textbf{H}_{\scriptscriptstyle{k}}\textbf{G}_{\scriptscriptstyle{k}}\textbf{G}_{\scriptscriptstyle{k}}^{H}\textbf{H}_{\scriptscriptstyle{k}}^{H} with \textbf{H}_{\scriptscriptstyle{k}}\triangleq\big{[}\textbf{H}_{\scriptscriptstyle{1k}}\cdots\textbf{H}_{\scriptscriptstyle{Mk}}\big{]} and \textbf{G}_{\scriptscriptstyle{k}}\triangleq\big{[}\textbf{G}_{\scriptscriptstyle{1k}}^{T}\cdots\textbf{G}_{\scriptscriptstyle{Mk}}^{T}\big{]}^{T}, and represents the additive white Gaussian noise variance at the -th user.
The latency for delivering file for the -th user is obtained as , and the overall wireless channel latency equals , since every requesting user needs to receive the requested file. The minimum instantaneous latency for can hence be found as a solution of the problem
[TABLE]
where denotes the maximum transmit power of the -th eRRH, and we define \textbf{E}_{\scriptscriptstyle{m}}\triangleq\big{[}\textbf{0}\cdots\textbf{I}_{\scriptscriptstyle{N_{m}^{R}}}\cdots\textbf{0}\big{]} in which an identity matrix spans columns from to . Although problem (P1) is jointly non-convex, a stationary point can be attained by leveraging Successive Convex Approximation (SCA) as detailed in [9].
III-B Fronthaul Mode
Under the fronthaul mode, any requested but uncached file for the -th user is jointly encoded and precoded at the BBU. The resulting signal dedicated for the -th eRRH is written as , where encodes file , and represents the corresponding precoding matrix for the -th eRRH. The BBU then performs compression on prior to transferring to the eRRHs. As a result, the decompressed signal at the -th eRRH can be written by with quantization noise for a given covariance matrix [9][10].
The rest of the requested cached files are locally precoded with at the eRRHs in the same manner as in the backhaul mode. The final transmit signal at the -th eRRH is then given as , and the achievable rate for the -th user can be obtained as [9]
[TABLE]
where we have \boldsymbol{\Phi}_{\scriptscriptstyle{\text{front},k}}^{U}\triangleq\big{(}\sum\nolimits_{\ell\in\mathcal{K}_{\scriptscriptstyle{\text{req}}}\backslash k}\textbf{H}_{\scriptscriptstyle{k}}\tilde{\textbf{G}}_{\scriptscriptstyle{\ell}}\tilde{\textbf{G}}_{\scriptscriptstyle{\ell}}^{H}\textbf{H}_{\scriptscriptstyle{k}}^{H}+\textbf{H}_{\scriptscriptstyle{k}}\boldsymbol{\Omega}_{\scriptscriptstyle{R}}\textbf{H}_{\scriptscriptstyle{k}}^{H}+\sigma_{\scriptscriptstyle{k}}^{2}\textbf{I}_{\scriptscriptstyle{N_{k}^{U}}}\big{)}^{-1}\textbf{H}_{\scriptscriptstyle{k}}\tilde{\textbf{G}}_{\scriptscriptstyle{k}}\tilde{\textbf{G}}_{\scriptscriptstyle{k}}^{H}\textbf{H}_{\scriptscriptstyle{k}}^{H}, \boldsymbol{\Omega}_{\scriptscriptstyle{R}}\triangleq\text{diag}\big{(}\boldsymbol{\Omega}_{\scriptscriptstyle{1}},...,\boldsymbol{\Omega}_{\scriptscriptstyle{M}}\big{)}, \tilde{\textbf{G}}_{\scriptscriptstyle{k}}\triangleq\big{[}\tilde{\textbf{G}}_{\scriptscriptstyle{1k}}^{T}\cdots\tilde{\textbf{G}}_{\scriptscriptstyle{Mk}}^{T}\big{]}^{T} with , and if and otherwise for the -th user.
The wireless channel latency is defined in the same way as in the backhaul mode. For the fronthaul latency, by the rate-distortion theory, sending quantized signals to the -th eRRH consumes
[TABLE]
with \boldsymbol{\Phi}_{\scriptscriptstyle{m}}^{R}\triangleq\big{(}\textbf{E}_{\scriptscriptstyle{m}}\boldsymbol{\Omega}_{\scriptscriptstyle{R}}\textbf{E}_{\scriptscriptstyle{m}}^{H}\big{)}^{-1}\sum\nolimits_{k\in\mathcal{K}_{\scriptscriptstyle{\text{req},\text{NC}}}}\textbf{E}_{\scriptscriptstyle{m}}\tilde{\textbf{G}}_{\scriptscriptstyle{k}}\tilde{\textbf{G}}_{\scriptscriptstyle{k}}^{H}\textbf{E}_{\scriptscriptstyle{m}}^{H} [9]. Compressing symbols produces \Delta^{U}g_{\scriptscriptstyle{m}}\big{(}\big{\{}\tilde{\textbf{G}}_{\scriptscriptstyle{k}}\big{\}},\boldsymbol{\Omega}_{\scriptscriptstyle{R}}\big{)} bits, which need to be transferred from the BBU to the -th eRRH. Therefore, the fronthaul latency is given by where \Delta_{\scriptscriptstyle{m}}^{R}=\Delta^{U}g_{\scriptscriptstyle{m}}\big{(}\big{\{}\tilde{\textbf{G}}_{\scriptscriptstyle{k}}\big{\}},\boldsymbol{\Omega}_{\scriptscriptstyle{R}}\big{)}/C_{\scriptscriptstyle{m}}^{R}, and the minimum instantaneous latency for is calculated as a solution of the problem
[TABLE]
which can be tackled via the SCA approach detailed in [9]. The total worst-case order of complexity for the SCA method can be expressed as where , and indicate the desired error tolerance, the maximum number of the SCA iterations and the number of constraints, respectively [22]. Here, equals in (P1) and in (P2).
IV RL-Based X-Haul Online Optimization
In this section, we solve problem (P) by proposing an online on-policy RL-based optimization strategy [23].
IV-A Problem (P) as a Partially Observable Decision Process
As discussed in Section II, problem (P) is a Partially Observable Markov Decision Process (POMDP) with the action space and the instantaneous reward given by the negative latency . In order to reduce the complexity of the policy, we optimize here over memoryless policies that select an action based only on the latest observation at time slot [24][25] as well as a summary of the previous observations given by the set where is the most recent time slot at which cached file was requested at time slot .
IV-B SARSA with Linear Value Function Approximation
To optimize over memoryless policies, we adopt the online on-policy value-based strategy State-Action-Reward-State-Action (SARSA) with a carefully designed linear approximation [23]. The SARSA updates an action-value function, or Q-function, that estimates the expected return with . Since the total size of the observation space in (P) grows exponentially with , we propose a linear value function approximation , where w is a parameter vector to be learned, and denotes a feature vector representing the observation-action pair [23].
In order to determine a suitable feature vector, we first note that vector should contain sufficient information to quantify the value of caching for currently cached and requested files. Frequently requested files typically yield lower future latencies when cached, but an optimal choice should account not only for their popularity but also for their remaining life time, which is a duration that a file remains popular (see Sec. II of [26] for further discussion).
Based on these considerations, we introduce a variable for every file as a function of the current observation at time slot . We set it as if , if and otherwise. Furthermore, we also include a variable that measures the “age” of the currently cached files, that is, the maximum time elapsed since the last request of the cached files. We can quantize this variable by ranges with for all and . If the caches are up to date, the quantity is small for all , and hence is also small. Otherwise, if there exists any file with large , a refresh of the caches may be required.
Using the variables introduced above, we define the feature vector as
[TABLE]
where we have used the one-hot encoded vectors , and . The feature vector in (7) has dimension , which increases linearly in and is hence significantly smaller than the size of the conventional look-up table-based SARSA. The effectiveness of the proposed feature vector will be verified in Section V.
The overall proposed procedure for solving (P) is summarized in Algorithm where denotes the temporal difference error, and E indicates the eligibility trace. Here, an -greedy exploration strategy with decreasing is adopted. Note that E is used to assign credit for the current reward to the most frequently visited states and selected actions, so as to enable online learning (see [23] for details).
V Numerical Results
In this section, the performance of the proposed RL-based algorithm is evaluated via numerical examples. We adopt the channel model , where \rho_{\scriptscriptstyle{mk}}\triangleq\rho_{\scriptscriptstyle{0}}\big{(}\frac{d_{\scriptscriptstyle{mk}}}{d_{\scriptscriptstyle{0}}}\big{)}^{-\eta} equals the distance-dependent path loss between eRRH and user , indicates the path loss at reference distance , is the path loss exponent, and represents the distance between the -th eRRH and the -th user. Each element of follows an independent complex Gaussian distribution with zero mean and unit variance. The eRRHs and the users are circularly placed from the BBU at the center with uniformly distributed angles and distance m and m, respectively. The bandwidth is MHz and the thermal noise is dBm/Hz. We set , , , m, , time slots, files, dBm, and bits per symbol. For RL, we use the hyperparameters , , and with where limits the maximum value of .
Reference [26] demonstrated that the popularity of files often exhibits temporal locality in the sense that the content is frequently requested in a bursty fashion for a certain life time. Motivated by these findings, we model the evolution of the subset of popular files such that a currently unpopular file has a probability of to become popular, and file remains popular for time slots. We assume Zipf’s distribution [27] for with . The proposed RL scheme is compared with a greedy fronthaul/backhaul mode selection that minimizes the current delivery latency at each time slot as well as with an offline scheme that keeps the most popular files with the largest under the idealized assumption that this is known in prior.
Fig. 2 compares the average long-term latency performance as a function of the eRRHs’ cache size for dBm, and . We also limit the maximum number of the SCA iterations for solving (P1) and (P2) as . Note that the convergence to a stationary point for SCA does not affect the convergence of SARSA since we treat the negative reward function as fixed. With , the fronthaul mode is seen to yield a lower latency than the backhaul mode given the limited advantage of caching in this regime. The opposite is true when the eRRHs have larger caches, such as , in which the backhaul mode outperforms the fronthaul mode. In agreement with the results in [9, 10, 11] and [13], the greedy scheme almost always selects the fronthaul mode and is hence strongly suboptimal for large enough . The proposed RL method exhibits the lowest latency among all schemes that do not assume the knowledge of the popularity probability. It can be checked that the gain is not obtained by statically selecting the best mode at each time instant, but rather by carrying out an optimized dynamic selection. It is also observed that in a large regime, the proposed strategy can outperform the static offline scheme which assumes popularity dynamics to be known in advance.
VI Conclusions
In this paper, we have demonstrated the advantage of adaptively selecting between the backhaul and fronthaul transfer modes as a function of the current cache contents and the history of past requests in an F-RAN system. The proposed RL-based strategy has been shown via numerical results to outperform baseline schemes, confirming the potential advantages of an X-haul implementation over static fronthaul or backhaul deployments.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Y.-J. Ku, D.-Y. Lin, C.-F. Lee, P.-J. Hsieh, H.-Y. Wei, C.-T. Chou, and A.-C. Pang, “5G radio access network design with the fog paradigm: confluence of communications and computing,” IEEE Commun. Mag. , vol. 55, pp. 46–52, Apr. 2017.
- 2[2] Y.-Y. Shih, W.-H. Chung, A.-C. Pang, T.-C. Chiu, and H.-Y. Wei, “Enabling low-latency applications in fog-radio access networks,” IEEE Netw. , vol. 31, pp. 52–58, Jan. 2017.
- 3[3] A. D. L. Oliva, X. C. Pérez, A. Azcorra, A. D. Giglio, F. Cavaliere, D. Tiegelbekkers, J. Lessmann, T. Haustein, A. Mourad, and P. Iovanna, “Xhaul: toward an integrated fronthaul/backhaul architecture in 5G networks,” IEEE Wireless Commun. , vol. 22, pp. 32–40, Oct. 2015.
- 4[4] T. Pfeiffer, “Next generation mobile fronthaul and midhaul architectures,” J. Opt. Commun. Netw. , vol. 7, pp. 38–45, Nov. 2015.
- 5[5] N. J. Gomes, P. Chanclou, P. Turnbull, A. Magee, and V. Jungnickel, “Fronthaul evolution: From CPRI to Ethernet,” Opt. Fiber Technol. , vol. 26, pp. 50–58, Dec. 2015.
- 6[6] H. Ren, N. Liu, C. Pan, M. Elkashlan, A. Nallanathan, X. You, and L. Hanzo, “Low-latency C-RAN: an next-generation wireless approach,” IEEE Veh. Technol. Mag. , vol. 13, pp. 48–56, Jun. 2018.
- 7[7] J. Kim, H. Lee, S.-H. Park, and I. Lee, “Minimum rate maximization for wireless powered cloud radio access networks,” IEEE Trans. Veh. Technol. , vol. 68, pp. 1045–1049, Jan. 2019.
- 8[8] J. Kim, S.-H. Park, O. Simeone, I. Lee, and S. S. (Shitz), “Joint design of fronthauling and hybrid beamforming for downlink C-RAN systems,” accepted for IEEE Trans. Commun.
