Small Cell Transmit Power Assignment Based on Correlated Bandit Learning

Zhiyang Wang; Cong Shen

arXiv:1703.05975·cs.NI·March 20, 2017

Small Cell Transmit Power Assignment Based on Correlated Bandit Learning

Zhiyang Wang, Cong Shen

PDF

TL;DR

This paper introduces a Bayesian correlated bandit learning approach for small cell transmit power assignment that leverages user feedback and prior knowledge, achieving faster convergence and better performance in dense networks.

Contribution

It proposes a novel power assignment algorithm that exploits correlation among power values and incorporates switching penalties, reducing reliance on manual measurements and improving convergence speed.

Findings

01

Significant performance improvements over existing solutions.

02

Faster convergence to optimal power settings due to correlation exploitation.

03

Effective in both single and multiple SBS deployment scenarios.

Abstract

Judiciously setting the base station transmit power that matches its deployment environment is a key problem in ultra dense networks and heterogeneous in-building cellular deployments. A unique characteristic of this problem is the tradeoff between sufficient indoor coverage and limited outdoor leakage, which has to be met without explicit knowledge of the environment. In this paper, we address the small base station (SBS) transmit power assignment problem based on stochastic bandit theory. Unlike existing solutions that rely on heavy involvement of RF engineers surveying the target area, we take advantage of the human user behavior with simple coverage feedback in the network, and thus significantly reduce the planned human measurement. In addition, the proposed power assignment algorithms follow the Bayesian principle to utilize the available prior knowledge from system self…

Tables2

Table 1. TABLE I: Simulation Parameters

Parameters

Value

SBS transmit power

[-10dBm, 20dBm]

MBS transmit power

40dBm

Thermal noise density

-174dBm/Hz

Bandwidth

20MHz

Carrier frequency

2GHz

Penetration loss (

L_{o ​ w}

)

20dB

Shadowing effect

log-normal with

σ = 8 ​ dB

,

σ^{'} = 4 ​ dB

d_{0}

1m

α

0.7

Enterprise Size

K=1 30m

\times

30m

K=2 40m

\times

40m

K=4 50m

\times

40m

SBS location

K=1 (12m,8m)

K=2 (16m,17m), (-15m,-11m)

K=4 (20m,18m), (11m,-19m)

(-11m,18.5m), (-10.5m,-19m)

Table 2. TABLE II: Multi-SBS simulation results

Metric	K=2	K=4
Globally optimal power [dBm]	(0, 5)	(0, 5, 10, 15)
Coverage percentage	91.506%	96.548%
Leakage percentage	5.691%	28.725%
Simulation output power [dBm]	(0, 5)	(-5, 0, 5, 10) when $N = 20$
		(0, 5, 10, 15) when $N = 40$

Equations144

k_{S} \in K_{S B S} max SINR_{k_{S}, n} > SINR_{th}, for n \in N_{in},

k_{S} \in K_{S B S} max SINR_{k_{S}, n} > SINR_{th}, for n \in N_{in},

k_{M} \in K_{M B S} max SINR_{k_{M}, n} < SINR_{th}, for n \in N_{o u t},

SINR_{k_{S}, n}

SINR_{k_{S}, n}

SINR_{k_{M}, n}

r = α η_{in} - (1 - α) η_{o u t},

r = α η_{in} - (1 - α) η_{o u t},

G_{T} = t = 1 \sum T r_{a (t)} (t)

G_{T} = t = 1 \sum T r_{a (t)} (t)

R_{T} = G_{T}^{*} - G_{T} = i = 1, .., n max (t = 1 \sum T r_{i} (t)) - t = 1 \sum T r_{a (t)} (t),

R_{T} = G_{T}^{*} - G_{T} = i = 1, .., n max (t = 1 \sum T r_{i} (t)) - t = 1 \sum T r_{a (t)} (t),

E [R_{T}]

E [R_{T}]

Q_{i}^{UiPA} (t) = \overset{r}{ˉ}_{i} (t) + \frac{τ = 1 \sum t r _{i}^{2} ( τ ) - r ˉ _{i}^{2} ( t ) N _{i} ( t )}{( N _{i} ( t ) - 1 ) N _{i} ( t )} Φ^{- 1} (1 - 1/ (2 π e t^{2})) .

Q_{i}^{UiPA} (t) = \overset{r}{ˉ}_{i} (t) + \frac{τ = 1 \sum t r _{i}^{2} ( τ ) - r ˉ _{i}^{2} ( t ) N _{i} ( t )}{( N _{i} ( t ) - 1 ) N _{i} ( t )} Φ^{- 1} (1 - 1/ (2 π e t^{2})) .

(ϕ_{t})_{k} = {10 k = a (t), otherwise,

(ϕ_{t})_{k} = {10 k = a (t), otherwise,

q_{t} = \frac{r _{t} ϕ _{t}}{σ _{0}^{2}} + \hat{Λ}_{t - 1} \hat{μ}_{t - 1},

q_{t} = \frac{r _{t} ϕ _{t}}{σ _{0}^{2}} + \hat{Λ}_{t - 1} \hat{μ}_{t - 1},

\hat{Σ}_{t} = \hat{Λ}_{t}^{- 1},

\hat{Λ}_{t}

\hat{Λ}_{t}

= \frac{ϕ _{t} ϕ _{t}^{T}}{σ _{0}^{2}} + \frac{ϕ _{t - 1} ϕ _{t - 1}^{T}}{σ _{0}^{2}} + \dots + \frac{ϕ _{1} ϕ _{1}^{T}}{σ _{0}^{2}} + Λ_{0}

= \frac{1}{σ _{0}^{2}} N_{1} (t) N_{2} (t) ⋱ N_{n} (t) + Λ_{0}

= P (t)^{- 1} + Λ_{0} .

\hat{μ}_{t}

\hat{μ}_{t}

= \hat{Λ}_{t}^{- 1} (\frac{r _{t} ϕ _{t}}{σ _{0}^{2}} + \frac{r _{t} ϕ _{t}}{σ _{0}^{2}} + \dots + \frac{r _{t} ϕ _{t}}{σ _{0}^{2}} + Λ_{0} μ_{0})

= \hat{Λ}_{t}^{- 1} \frac{N _{1} ( t )}{σ _{0}^{2}} \overset{r}{ˉ}_{1} (t) ⋱ \frac{N _{n} ( t )}{σ _{0}^{2}} \overset{r}{ˉ}_{n} (t) + Λ_{0} μ_{0}

= (Λ_{0} + P (t)^{- 1})^{- 1} (P (t)^{- 1} \overset{ˉ}{r}_{t} + Λ_{0} μ_{0}) .

\hat{Λ}_{t}

\hat{Λ}_{t}

\hat{μ}_{t}

SC (T) = t = 2 \sum T s_{a (t) a (t - 1)} = t = 2 \sum T f (∣ p_{a (t)} - p_{a (t - 1)} ∣) .

SC (T) = t = 2 \sum T s_{a (t) a (t - 1)} = t = 2 \sum T f (∣ p_{a (t)} - p_{a (t - 1)} ∣) .

G_{T}^{S} = G_{T} - SC (T) .

G_{T}^{S} = G_{T} - SC (T) .

R_{T}^{S C} = G_{T}^{*} - G_{T}^{S}

R_{T}^{S C} = G_{T}^{*} - G_{T}^{S}

E [R_{T}^{S C}]

E [R_{T}^{S C}]

\begin{split}&\mathbb{E}[R^{SC}_{T}]\leqslant\sum\limits_{i=1,i\neq i^{*}}\limits^{n}\Delta_{i}\mathbb{E}[N_{i}(T)]+\mathbb{E}[{\sf{SC}}(t)]\\ &\leqslant\sum_{i=1,i\neq i^{*}}^{n}\Big{(}\Delta_{i}(C_{1}^{i}\log T+C_{2}^{i})+(\tilde{s}_{i}^{max}+\tilde{s}_{i^{*}}^{max})\mathbb{E}[S_{i}(T)]\Big{)}\\ &\qquad\qquad+\tilde{s}_{i^{*}}^{max}\\ &\leqslant\sum\limits_{i=1,i\neq i^{*}}\limits^{n}\Delta_{i}(C_{1}^{i}\log T+C_{2}^{i})+\sum\limits_{i=1,i\neq i^{*}}\limits^{n}(\tilde{s}_{i}^{max}+\tilde{s}_{i^{*}}^{max})\quad\\ &\Bigg{(}\log 2C_{1}^{i}\sqrt{\log_{2}T}+(C_{2}^{i}+\log 2C_{1}^{i})\left(1+\frac{\pi^{2}}{6}\right)\Bigg{)}+\tilde{s}_{i^{*}}^{max},\end{split}

\begin{split}&\mathbb{E}[R^{SC}_{T}]\leqslant\sum\limits_{i=1,i\neq i^{*}}\limits^{n}\Delta_{i}\mathbb{E}[N_{i}(T)]+\mathbb{E}[{\sf{SC}}(t)]\\ &\leqslant\sum_{i=1,i\neq i^{*}}^{n}\Big{(}\Delta_{i}(C_{1}^{i}\log T+C_{2}^{i})+(\tilde{s}_{i}^{max}+\tilde{s}_{i^{*}}^{max})\mathbb{E}[S_{i}(T)]\Big{)}\\ &\qquad\qquad+\tilde{s}_{i^{*}}^{max}\\ &\leqslant\sum\limits_{i=1,i\neq i^{*}}\limits^{n}\Delta_{i}(C_{1}^{i}\log T+C_{2}^{i})+\sum\limits_{i=1,i\neq i^{*}}\limits^{n}(\tilde{s}_{i}^{max}+\tilde{s}_{i^{*}}^{max})\quad\\ &\Bigg{(}\log 2C_{1}^{i}\sqrt{\log_{2}T}+(C_{2}^{i}+\log 2C_{1}^{i})\left(1+\frac{\pi^{2}}{6}\right)\Bigg{)}+\tilde{s}_{i^{*}}^{max},\end{split}

C_{1}^{i} = \frac{16 σ _{0}^{2}}{Δ _{i}^{2}} + \frac{lo g 2}{2} (e^{\frac{3 M _{i^{*}}^{2}}{2 σ _{0}^{2}}} + e^{\frac{3 M _{i}^{2}}{2 σ _{0}^{2}}}),

C_{1}^{i} = \frac{16 σ _{0}^{2}}{Δ _{i}^{2}} + \frac{lo g 2}{2} (e^{\frac{3 M _{i^{*}}^{2}}{2 σ _{0}^{2}}} + e^{\frac{3 M _{i}^{2}}{2 σ _{0}^{2}}}),

C_{2}^{i} = \frac{4 σ _{0}^{2}}{Δ _{i}^{2}} lo g 2 π e + (e^{\frac{M _{i^{*}}^{2}}{3 σ _{0}^{2}}} + e^{\frac{M _{i}^{2}}{3 σ _{0}^{2}}}),

C_{2}^{i} = \frac{4 σ _{0}^{2}}{Δ _{i}^{2}} lo g 2 π e + (e^{\frac{M _{i^{*}}^{2}}{3 σ _{0}^{2}}} + e^{\frac{M _{i}^{2}}{3 σ _{0}^{2}}}),

E [R_{T}] \leq i = 1, i \neq = i^{*} \sum n Δ_{i} (⌈ \frac{4 σ _{0}^{2}}{Δ _{i}^{2}} (lo g 2 π e + 4 lo g T) - 1 ⌉ + \hat{N}_{i}),

E [R_{T}] \leq i = 1, i \neq = i^{*} \sum n Δ_{i} (⌈ \frac{4 σ _{0}^{2}}{Δ _{i}^{2}} (lo g 2 π e + 4 lo g T) - 1 ⌉ + \hat{N}_{i}),

\hat{N}_{i} = e^{\frac{M _{i^{*}}^{2}}{3 σ _{0}^{2}}} + e^{\frac{M _{i}^{2}}{3 σ _{0}^{2}}} + \frac{9}{2} (e^{\frac{3 M _{i^{*}}^{2}}{2 σ _{0}^{2}}} + e^{\frac{3 M _{i}^{2}}{2 σ _{0}^{2}}}) .

\hat{N}_{i} = e^{\frac{M _{i^{*}}^{2}}{3 σ _{0}^{2}}} + e^{\frac{M _{i}^{2}}{3 σ _{0}^{2}}} + \frac{9}{2} (e^{\frac{3 M _{i^{*}}^{2}}{2 σ _{0}^{2}}} + e^{\frac{3 M _{i}^{2}}{2 σ _{0}^{2}}}) .

E [R_{T}]

E [R_{T}]

E [R_{T}]

E [R_{T}]

E [R_{T}] \leq i = 1 i \neq = i^{*} \sum n Δ_{i} (⌈ \frac{4 σ _{0}^{2}}{Δ _{i}^{2}} (lo g 2 π e + 4 lo g T) - 1 ⌉ + \frac{4}{2 π e}),

E [R_{T}] \leq i = 1 i \neq = i^{*} \sum n Δ_{i} (⌈ \frac{4 σ _{0}^{2}}{Δ _{i}^{2}} (lo g 2 π e + 4 lo g T) - 1 ⌉ + \frac{4}{2 π e}),

P L (d) [dB] = 15.3 + 37.6 \times lo g_{10} (d) + L_{o w} + X_{σ_{d B}},

P L (d) [dB] = 15.3 + 37.6 \times lo g_{10} (d) + L_{o w} + X_{σ_{d B}},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Small Cell Transmit Power Assignment Based on Correlated Bandit Learning

Zhiyang Wang, and Cong Shen Z. Wang and C. Shen are with the Department of Electronic Engineering and Information Science, School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, China. E-mail: [email protected], [email protected].

Abstract

Judiciously setting the base station transmit power that matches its deployment environment is a key problem in ultra dense networks and heterogeneous in-building cellular deployments. A unique characteristic of this problem is the tradeoff between sufficient indoor coverage and limited outdoor leakage, which has to be met without explicit knowledge of the environment. In this paper, we address the small base station (SBS) transmit power assignment problem based on stochastic bandit theory. Unlike existing solutions that rely on heavy involvement of RF engineers surveying the target area, we take advantage of the human user behavior with simple coverage feedback in the network, and thus significantly reduce the planned human measurement. In addition, the proposed power assignment algorithms follow the Bayesian principle to utilize the available prior knowledge from system self configuration. To guarantee good performance when the prior knowledge is insufficient, we incorporate the performance correlation among similar power values, and establish an algorithm that exploits the correlation structure to recover majority of the degraded performance. Furthermore, we explicitly consider power switching penalties in order to discourage frequent changes of the transmit power, which cause varying coverage and uneven user experience. Comprehensive system-level simulations are performed for both single and multiple SBS deployment scenarios, and the resulting power settings are compared to the state-of-the-art solutions. Significant performance gains of the proposed algorithms are observed. Particularly, the correlation structure enables the algorithm to converge much faster to the optimal long-term power than other methods.

Index Terms:

Coverage optimization; Transmit power assignment; Heterogeneous Network (HetNet).

I Introduction

The massive deployment of distributed low-power low-cost small base stations (SBS) has been viewed as one of the most important solutions to address the challenge of exponential growth of the wireless data traffic, particularly for indoor users [1]. In practice, SBSs may be deployed in drastically different scenarios, from large warehouses and buildings to small residential apartments and single-office enterprises. In addition, the radio frequency (RF) conditions may vary significantly from one site to another. Due to the heterogeneous nature of these deployments, the transmit power assigned to the SBS, which effectively determines the coverage range, cannot be the same but must be decided based on the individual deployment environment, such as the building layout, the RF conditions, and the locations of the base stations. Furthermore, indoor enterprise deployments often have stringent access and security constraints. As a result, judiciously setting the SBS transmit power to automatically match its deployment environment is among the most important challenges for in-building SBS network deployment [2].

To address this challenge, in-building enterprise networks typically rely on RF engineers to carry out extensive measurement and RF survey to determine the transmit power for appropriate coverage and limited leakage. Then, during live network operations, the RF engineers often need to make extra visits to optimize the transmit power for better performance. Clearly, this is a heavy human-in-the-loop model, as the success of the power setting relies on the experience of the seasoned engineers, the result of the RF survey of the engineers’ choice, and the planning software. Not only is this approach expensive, inflexible and error-prone, but it also does not scale with the densification of indoor SBS networks [3].

Adaptive, automated and autonomous network optimization is the key principle of the self-organizing networks (SON) paradigm [4], which aims at achieving the optimal network configuration while minimizing the planned human involvement in the deployment, configuration, optimization and maintenance. Self-optimizing the SBS transmit power falls into the framework of SON, and several solutions have already been proposed. Small Cell Forum has defined a common network monitor mode [5], allowing each SBS to periodically measure its surrounding RF environment and adjust its transmit power. This solution relies on an assumed coverage range based on categorization, and the RF measurements are only taken at the SBS location but not over the entire coverage area, which is coarse and may cause RF mismatch [6]. To solve these issues, Supervised Mobile Assisted Range Tuning (SMART) was proposed in [7], which relies on the RF feedback of a technician walking along the sampling routes. The required RF feedback is extensive, including majority of the LTE lower layer quantities such as RSRP RSSI, CQI, etc. These quantities along the measurement routes provide important RF information of the deployment, and a global optimization can be formulated to derive the transmit power that satisfies both coverage and leakage constraints. Unfortunately, this problem is non-convex and the optimal transmit power is difficult to compute [7]. In [8], the authors developed a self-organizing policy for distributed femtocell networks, aiming at minimizing the cell transmit power while satisfying the service requirement. In [9], a heuristic solution was proposed to reliably determine the coverage for the current power level before either increasing or decreasing the power based on user feedback. Solutions from both [8] and [9] have some adaptability but still lack good accuracy when used in different environments. The authors of [10] modeled SBS power management as a Markov Decision Process problem, focusing on the power control in a time-varying network. Similarly, a downlink transmit power control solution for interference mitigation via reinforcement learning was proposed in [11]. The main objective of [10] and [11], however, is to adjust the transmit power in reaction to the changing circumstance for better quality of service, which makes it more of a power control problem that has to be solved at a fast time scale.

We focus on setting the SBS transmit power of an enterprise network in an unknown deployment environment. We limit our attention to SBS networks with closed access mode, which is commonly adopted in the enterprise deployment due to security and management considerations. An adequate power assignment is particularly crucial for the closed access mode, as the transmit power needs to be large enough to provide sufficient coverage for the inside users while small enough to not create significant interference to the outside non-enterprise co-channel users, who cannot be served by the enterprise network. This work proposes to capture this delicate balance between coverage and leakage by a system performance indication function (PIF). If the deployment is known, the optimal power assignment can be obtained by maximizing the PIF.

However, a practical solution needs to be effective in an arbitrarily unknown environment, and prefers minimum human involvement and feedback. Naturally, a good solution must compliment the aforementioned optimization problem with an online learning approach to remove the uncertainty of the environment, which is a key challenge for efficient transmit power assignment. The SBSs have to balance the immediate gains (selecting a power level that performs best so far) and long-term performance (evaluating other power levels). We thus resort to the theory of multi-armed bandit (MAB) [12] to address the resulting exploration and exploitation tradeoff. However, as opposed to directly applying classical MAB algorithms such as UCB [13], our problem has two unique characteristics that were not exploited. First, SBS transmit power assignment falls into the self-optimization category of SON. Generally, a self-configuration phase has already taken place before invoking the transmit power assignment algorithm. As a result, there would be some prior knowledge of the system that can be utilized. Second, performances of similar power levels are often very similar, which means that if we adopt the MAB model, nearby arms are highly correlated. Intuitively, such correlation can be used to accelerate the convergence to the optimal selection, because any sampling of a power level not only reveals information about itself, but also nearby power levels that are highly correlated. Such information was not available in classical UCB solutions [12, 13]111The authors of [14] studied the continuum-armed bandit with an infinite continuum of strategies, which also captures the dependency among arms. We opt out this approach because in the multi-SBS cases, the continuity of the reward functions may not be guaranteed. The discrete arm setting makes the solutions more effective and flexible for practical adoption., and has not been utilized in SON [7, 8, 10, 11].

In this paper, we leverage these engineering characteristics of the problem, and develop bandit-inspired transmit power assignment algorithms. In the bandit literature, similar models have been studied in [15, 16] and the corresponding bandit algorithms have been proposed. The authors of [15] proposed bandit algorithms with a Bayesian prior on the mean reward that is based on a human decision-making model. [16] further extended the algorithm to focus on the correlation among arms. In our work, we first adopt a Bayesian [17] learning algorithm that incorporates the prior knowledge of the system from the self-configuration phase. The developed Bayesian Power Assignment (BPA) algorithm iteratively updates the posterior distribution based on new observations and the prior distribution, and uses the updated posterior distribution to compute the utility function and determine the transmit power level. In addition to utilizing the prior knowledge, we further leverage the correlation structure of the PIF of similar transmit power levels, and a Correlated Bayesian Power Assignment (CBPA) algorithm that combines the Bayesian principle with the correlation property is employed. To the authors’ best knowledge, this is the first work that incorporates bandit with correlated arms into the design of wireless networks. Furthermore, practical deployment often wants to avoid frequent power changes, because it may cause frequent variation of the coverage area and result in uneven user experience. To address this issue, we present a block allocation extension to the proposed BPA and CBPA algorithms which explicitly considers switching cost to discourage frequent changes of power levels. Rigorous analysis of the performance loss with respect to the genie-aided global optimization solution is carried out. A tight upper bound of the performance loss for the most general algorithm (CBPA with switching cost) is derived, and performance characterization of other algorithms can be obtained as special cases. In order to reduce the algorithms’ complexity which increases exponentially with the number of SBSs, we further introduce clustering based on the prior knowledge, so that the complexity can be drastically reduced without sacrificing much of the accuracy and effectiveness of the algorithms. The performances of all the proposed algorithms are verified by extensive system-level simulations and compared with both the globally optimal power assignment with complete information and the existing state-of-the-art solutions. Not only do the proposed algorithms outperform existing solutions and converge to the globally optimal power assignment quickly, but they also reduce the planned human involvement significantly and only require minimum amount of user feedback (one bit per location), as opposed to the full-blown RF measurement and feedback that is universally required in the existing solutions.

The rest of the paper is organized as follows. The system model and problem formulation can be found in Section II. Section III and IV present the proposed power assignment algorithms without and with switching cost, respectively. Performance analysis for all the algorithms is given in Section V. Complexity issues of the multi-SBS deployment are addressed in Section VI. Simulation results are portrayed in Section VII. Finally, Section VIII concludes the paper.

II System Model and Problem Formulation

II-A Network Model

Both single-SBS and multi-SBS deployments are considered. Note that the former is suitable for modeling single-office enterprises, residential apartments and other small deployments, while the latter mainly applies to large enterprises, for which multiple SBSs are installed to jointly cover the indoor users. The set of SBSs is indexed as $\mathcal{K}_{SBS}=\{1,2,..,K\}$ . Each SBS has a set of candidate pilot222As the purpose of the long-term power assignment is to determine the appropriate coverage that fits the deployment, we focus on setting the pilot power instead of the power of data and control channels [2]. power levels, denoted as $\mathcal{P}=\{p_{1},p_{2},..,p_{n}\}$ . As our focus is on the SBSs with closed access and co-channel with the macro base stations (MBS), we simply assume that the users at the measurement points inside the enterprise building are served by the SBS network, while users at points outside can only be served by one of the MBSs from $\mathcal{K}_{MBS}=\{1,2,..,K_{M}\}$ , as Fig. 1 illustrates.

The measurement data come from the customer UE feedback from some inside and outside routes during normal network operations. This is different from the RF survey approach that is carried out during network planning. The detailed mechanism and procedure of obtaining such customer UE feedback are mostly the same as in [9]. However, as opposed to a complete RF feedback required in [9], we only require one-bit coverage indication for each inside report. The extended set of RF measurements, such as RSRP, RSSI, and CQI, are not needed in our power assignment algorithm. For non-enterprise UEs, as we only need to know whether the UE is covered at a reporting location, we will rely on the registration attempt at the outside location to determine such events. Note that this is a common approach to determine leakage and has been adopted in [18, 7, 19].

In this work, our model and procedure on power assignment follow the common industry SON operations [3]. Specifically, the power assignment policy is executed during the self-optimization phase of SON, at the central network controller which is configured to oversee the operation of the entire SBS network. This is a common choice for enterprise cellular networks, as they often have security and privacy constraints which are easier to be satisfied in a centralized architecture. Furthermore, the power assignment algorithm operates in a periodic fashion, which is typical for self-optimization of SON [18]. For each time slot, the SBS first sets the pilot power based on the assignment algorithm. Then the network operates and collects UE feedback from both inside and outside of the intended coverage area. At the end of the current period, a performance measure is computed to evaluate the current pilot power and then used in the assignment algorithm to compute the power level for the next slot. This sequence of operations is illustrated in Fig. 2. Lastly, industry SON operations typically have the self-optimization operations follow a self-configuration phase, during which a coarse measurement and power calibration are performed [7]. As we will see later, the initial self-configuration, albeit coarse and sometimes inaccurate, offers useful prior knowledge that can be leveraged in the power assignment algorithm.

II-B Problem Formulation

To formulate the power assignment problem, we first need to define the criteria for coverage and leakage. To that end, let us denote the set of measurement points on the inside and outside routes as $N_{in}=\{1,2,..,n_{in}\}$ and $N_{out}=\{1,2,..,n_{out}\}$ , respectively. The coverage and leakage criteria for a measurement point can be formally defined as:

[TABLE]

where ${{\sf{SINR}}_{k_{S},n}}$ and ${{\sf{SINR}}_{k_{M},n}}$ represent the SINR of the measurement point $n$ inside corresponding to SBS $k_{S}$ and the SINR of point $n$ outside served by MBS $k_{M}$ , respectively. They can be calculated as:

[TABLE]

where $P^{r}_{i,n}$ and $P^{Mr}_{j,n}$ represent the received power at point $n$ from SBS $i$ and MBS $j$ , respectively, $N_{s}$ denotes the uncontrolled noise and interference, and ${{\sf{SINR}}_{\text{th}}}$ is the SINR threshold.

With the definition at each measurement point, the overall system coverage and leakage are defined as the percentage of measurement points which satisfy the coverage condition (1) and leakage condition (2), respectively. If we denote the number of measurement points that satisfy the corresponding conditions as $n_{cov}$ and $n_{lea}$ , then the coverage percentage and leakage percentage can be computed as $\eta_{in}=n_{cov}/n_{in}\times 100\%$ and $\eta_{out}=n_{lea}/n_{out}\times 100\%$ . Note that a larger pilot transmit power may simultaneously increase the indoor coverage percentage and the outdoor leakage percentage. Hence, the system performance indication function (PIF) associated with each candidate power level must balance coverage and leakage. In this work, we adopt a simple linear PIF as

[TABLE]

where $\alpha\in[0,1]$ is a control parameter and can be tuned to weigh differently between coverage and leakage. Note that PIF (3) is chosen as an example to illustrate the proposed power assignment algorithms. Other meaningful PIFs that capture the tradeoff between coverage and leakage can be used in place of (3). The objective of a power assignment algorithm is to find the optimal solution $p^{*}\in\mathcal{P}$ that maximizes the PIF (3).

Strictly speaking, the function $r$ in (3) is a random variable for a given pilot power level. This is due to the random channel effect such as shadowing, fast fading and other disturbance in the deployment environment. We focus on a probabilistic model with Gaussian random fluctuation around the mean. As we will see in Sec. VII, Gaussian distribution indeed is a very good approximation for the actual PIF. Furthermore, we evaluate the proposed algorithms in settings with non-Gaussian PIF distributions, and the empirical results suggest that algorithms developed based on the Gaussian assumption are very effective.

III Power Assignment Algorithms based on Bayesian Bandit Learning

III-A Stochastic Bandit Model

The necessity of balancing the short-term performance and long-term learning has motivated us to take a stochastic multi-armed bandit approach to the power assignment problem. Specifically, we model the set of candidate pilot power values $\mathcal{P}=\{p_{1},p_{2},..,p_{n}\}$ as $n$ arms, denoted by $\mathcal{N}_{pow}=\{1,2,..,n\}$ . At the beginning of each time slot $t=1,2,..,T$ , a power value $p_{a(t)}\in\mathcal{P},a(t)\in\mathcal{N}_{pow}$ is selected. At the end of the time slot $t$ , the SBS observes a performance feedback $r_{a(t)}(t)$ based on UE measurement reports, which corresponds to $reward$ in the bandit theory. As discussed in Sec. II, we model the random PIF associated with each power value as a Gaussian random variable. The objective is to develop an efficient power assignment solution to maximize the cumulative PIF for any given time horizon $T$ . For the multi-SBS case, each arm corresponds to a set of power levels of all $K$ SBSs, and other definitions remain the same.

In multi-armed bandit theory, a quantity termed as expected cumulative regret [12] is often used to characterize the algorithm performance, which represents the cumulative difference between the reward of the arms chosen and the maximum expected reward, which is attainable by a “genie” who knows the expected reward of all arms. We comment that minimizing the expected cumulative regret is equivalent to maximizing the expected accumulated reward, which is the objective of the power assignment problem. This is because the maximum expected reward is independent of the adopted learning algorithm and the regret is equivalent to the performance loss of any power assignment problem due to learning.

Formally, we denote

[TABLE]

as the cumulative PIF up to a given time horizon $T>0$ , and we define the cumulative PIF loss due to learning as

[TABLE]

which corresponds to the definition of cumulative regret. Here the optimal power level can be obtained by a genie-aided solution, e.g. a global optimization of the expected PIF with complete RF information from the technician survey. We are interested in finding efficient algorithms that maximize the cumulative PIF (4). Equivalently, the goal is to minimize the PIF loss of the system (5) for any given time horizon $T$ . The expected PIF loss can be written as:

[TABLE]

where $\mu^{*}=\max\limits_{i=1,..,n}\mu_{i}$ is the true mean PIF of the optimal power level and $\Delta_{i}=\mu^{*}-\mu_{i}$ measures the mean PIF gap between the chosen power level and the optimum. $N_{i}(T)$ represents the number of times power level $p_{i}$ is selected. According to the ground-breaking work of Lai and Robbins [20], if the expected loss $\mathbb{E}[R_{T}]$ of our proposed algorithms can be upper bounded333 $\log(\cdot)$ represents natural logarithm if the base is not specified. by $\mathcal{O}(\log T)$ , an asymptotically optimal performance is achieved in the sense that the convergence rate is of the same order as the optimum.

III-B Bayesian Power Assignment Algorithm

The first algorithm utilizes the prior knowledge of the PIF estimation before the algorithm is invoked. In practice, the most common form for the prior knowledge comes from the self-configuration phase of SON, which is performed during network initialization. This phase can provide us with some prior estimation of the PIFs as it typically tries different power levels before settling on one. However, all practical SON solutions have certain requirements on the elapsed time of the self-configuration operations. This is because self-configuration affects the boot-up time, and thus must be carefully controlled. As a result, massive measurement during self-configuration is typically out of the question and we often encounter coarse initial setup. Another possibility is that as the proposed power assignment algorithm is recursive over time, it also progressively collects PIF estimations for each selected power level. This can be used iteratively to update the prior knowledge. The quality of the prior depends on the detailed process in self-configuration phase, e.g. the time duration, mechanisms for large power settings, which is uncontrollable and out of scope of this paper. However, it is worth noting that the proposed algorithms also work with inaccurate prior or even without any prior knowledge, at the expense of slower convergence.

We first consider the power assignment algorithm without considering the correlation between power levels. We adopt the well-known Bayesian principle [17] that integrates the prior distribution and quantiles of the posterior distribution. The proposed Bayesian Power Assignment (BPA) algorithm, which adopts the deterministic upper credible limit (UCL) principle in [15], is given in Algorithm 1. In this algorithm, $\{\mu_{i}^{0},\sigma_{0}^{2}\}$ denotes the prior knowledge of the Gaussian distribution for PIF. The utility function defined in step 2 is composed of an estimated performance term and a measure of uncertainty, which reflects the tradeoff between exploration and exploitation. More specifically, $\Phi^{-1}:(0,1)\rightarrow\mathbb{R}$ is the inverse cumulative distribution function (CDF) for a standard Gaussian random variable. We use the quantile function to indicate: $\mathbb{P}(\mu_{i}\leq Q_{i}^{\text{BPA}}(t))=1-1/(\sqrt{2\pi e}t^{2})$ . Asymptotically, the true mean PIF $\mu_{i}$ is more likely to be less than the estimation $Q_{i}^{\text{BPA}}$ , which leads to the convergence to the optimal power level.

If the prior knowledge is not available, the BPA algorithm can be slightly modified to address this issue. Specifically, the estimated PIF term and uncertainty measurement have to be updated simultaneously in each time slot. This philosophy leads to the following utility function:

[TABLE]

The Uninformative Power Assignment (UiPA) algorithm thus can be obtained by replacing the utility function in step 2 of Algorithm 1 with (7), while removing the prior input at the beginning and estimation state update in step 6.

III-C Correlated Bayesian Power Assignment Algorithm

In the BPA algorithm, $\{\mu_{i}^{0},\sigma_{0}^{2}\}$ is used as our prior knowledge of performance for each power level. If the PIFs of different arms are independent, then utilizing individual Gaussian distributions is sufficient in our framework. However, for the considered power assignment problem, the PIFs of similar transmit power levels are generally correlated due to the slow and continuous changing nature of RF propagation. In other words, a stronger PIF correlation exists between adjacent power levels than distant pairs, and leveraging the full covariance matrix of the joint distribution may provide significant performance boost compared to the BPA algorithm. Intuitively, if a transmit power level results in a bad PIF with respect to the balance of coverage and leakage, then an intelligent algorithm may not need to waste much exploration on its immediate neighboring power levels, as they are highly likely to be bad as well.

We formally present the Correlated Bayesian Power Assignment (CBPA) algorithm in Algorithm 2. Let $\mathcal{N}(\bm{\mu}_{0},\Sigma_{0})$ be a correlated prior assumption while $\Sigma_{0}$ is a positive definite matrix, we define $\{\bm{\phi}_{t}\in\mathbb{R}^{n}\}_{t\in\{1,..,T\}}$ as the indicator vector to reveal the currently selected power value $p_{a(t)}$ , i.e.,

[TABLE]

where $({\bm{\phi}}_{t})_{k}$ represents the $k$ -th entry of $\bm{\phi}_{t}$ . The estimation of the mean PIFs and correlation structure of the PIF ( ${\bm{\mu}_{t},\Sigma_{t}}$ ) is updated following the Bayesian principle [16]:

[TABLE]

where $r_{t}$ is the PIF observed at time slot $t$ . To derive a general expression of the estimation, we introduce a diagonal matrix $P(t)$ with entries $\sigma_{0}^{2}/N_{i}(t),i\in\mathcal{N}_{pow}$ , and $\bar{\mathbf{r}}_{t}$ is the vector of $\bar{r}_{i}(t),i\in\mathcal{N}_{pow}$ . We first rewrite the expression of $\hat{\Lambda}_{t}$ as:

[TABLE]

Then, $\hat{\bm{\mu}}_{t}$ can be derived based on (8):

[TABLE]

Finally, combining equation (8) and (9), the estimation at time slot $t$ can be written as:

[TABLE]

which is used in Algorithm 2.

IV Power Assignment with Switching Cost

IV-A Problem Formulation with Switching Cost

In practice, it is very critical for any practical cellular deployment to avoid frequent power changes. In a cellular network, coverage variation due to the change of transmit power often results in poor user experience (call drop, low data rate, frequent handover, etc.), which in turn degrades the network performance significantly. To address this problem, we explicitly add a switching cost when the power level changes. In this way, a good power assignment policy will determine the optimal power value while minimizing frequent switches. We adopt a general switching loss function $s_{ij}=f(|p_{i}-p_{j}|)$ , which is a bounded non-decreasing function of the difference between the two power values with $f(0)=0$ . $s_{ij}$ is incurred whenever SBS changes its pilot power value between $p_{j}$ and $p_{i}$ . The cumulative switching cost up to $T$ can be written as:

[TABLE]

Thus the cumulative PIF in this problem can be expressed as:

[TABLE]

In a multi-SBS deployment, the switching cost is defined as the sum of individual switching costs of all SBSs.

IV-B The Power Assignment Algorithm with Switching Cost

We extend the preceding algorithms to a block allocation scheme to address switching costs. Block allocation schemes, such as the one in [21], determine specific intervals of time over which the selection is consistent. A power value is selected at the beginning of each interval. The construction of the intervals should ensure the expected number of switches scales at most logarithmically in time to guarantee good performance. This idea is graphically presented in Fig. 3. We first divide time into frames whose last time slot is denoted as $L_{f},f\in\{1,2...l\},l=\lceil\sqrt{\log_{2}T}\rceil$ . Each frame is then subdivided into $b_{f}=\lceil(2^{f^{2}}-2^{(f-1)^{2}})/f\rceil$ blocks each of which contains $f$ time slots. Each block is identified by $(f,k),f\in\{1,2,..,l\},k\in\{1,2,..,b_{f}\}$ , with $f$ and $k$ representing the frame number and block number within the frame respectively. The beginning time slot of block $k$ in the $f$ -th frame is denoted as $\tau_{fk}$ . Note that the key element in selecting the blocking length is to only incur ${o}(\log T)$ switching cost. In this way, the $\mathcal{O}(\log T)$ regret of the standard algorithm still dominates the total regret.

The Power Assignment with Switching Cost algorithm is formally presented in Algorithm 3. The Uninformative (UiPA-SC), Bayesian (BPA-SC) and Correlated Bayesian Power Assignment with Switching Cost (CBPA-SC) algorithms can be similarly obtained, by replacing $Q_{i}$ with $Q^{\text{UiPA}}_{i}$ , $Q^{\text{BPA}}_{i}$ and $Q^{\text{CBPA}}_{i}$ respectively. Note that in BPA-SC, the prior estimation state $\Sigma_{0}$ becomes a diagonal matrix with entries $\sigma_{0}^{2}$ , while there is no prior input in UiPA-SC. At the beginning of each block, a power value is selected and the SBS locks on this power value in each of the next $f$ time slots in the block. The estimation update in step 9 also follows step 6 in Algorithm 1 and step 6 in Algorithm 2.

There are two key ideas of Algorithm 3. The first is that since the switching cost results in a penalty in performance, the algorithm needs to “explore in bulk”. This is done by grouping time slots and not switching within these slots. The second is that as time goes by, the algorithm has more information about the optimal power value, and hence the block size should increase to take advantage of the better knowledge.

V Performance Analysis of the Proposed Algorithms

So far, we have presented two sets of power assignment algorithms (without and with switching cost), each of which further consists of components that have different assumptions on the prior knowledge and the correlation structure. In this section, we will provide a unified performance analysis framework that can be applied to all of the developed algorithms. We focus on the finite-time analysis where, for a given stopping time $T$ , the cumulative PIF loss and the convergence speed will be characterized. In this way, we can shed important light on the fundamental differences of these algorithms, and how these differences impact their performances.

We start with the expected cumulative PIF loss defined in Sec. III-A. For BPA and CBPA, the expected PIF loss can be written as (6). When the switching cost is considered, equation (5) and (6) should be rewritten as:

[TABLE]

and

[TABLE]

respectively.

V-A Upper Bound Analysis

In order to derive the unified framework that applies to all the algorithms, we focus on analyzing CBPA-SC as it is the most general algorithm consisting of all the key components. As we have discussed, the expected cumulative PIF loss should grow sub-linearly with $T$ in order to achieve the optimal performance, which indicates that $\lim_{T\rightarrow\infty}R_{T}/T=0$ . We have the following theorem to bound the expected cumulative PIF loss of CBPA-SC.

Theorem 1.

The expected cumulative PIF loss $\mathbb{E}[R^{SC}_{T}]$ of CBPA-SC is bounded above as:

[TABLE]

where

[TABLE]

$\delta_{i}^{2}=\sigma_{0}^{2}/\sigma_{i-cond}^{2}$ , and $\sigma_{i-cond}^{2}=\sigma_{0}^{2}-{\bf{\sigma}}_{i}(0)\Sigma_{\sim i}^{-1}(0){\bf{\sigma}}_{i}^{T}(0)$ . $M_{i}=\sigma_{0}^{2}\sqrt{1+\delta_{i}^{2}}\sum\limits_{j=1}\limits^{n}\sum\limits_{k=1}\limits^{n}|\lambda_{kj}^{0}||\mu_{j}^{0}-\mu_{j}|$ measures the accuracy of the prior knowledge, where $\Sigma_{\sim i}$ is the submatrix of $\Sigma_{0}$ , which excludes the $i$ -th column and $i$ -th row and $\lambda^{0}_{kj}$ is the component of $\Lambda_{0}$ . $\tilde{s}_{i}^{max}=\max_{j=1,..,n}\mathbb{E}[s_{ij}]$ is the maximum expected switching loss when SBS changes power to $p_{i}$ .

Proof.

See Appendix A. ∎

Theorem 1 provides an $\mathcal{O}(\log T)$ upper bound for CBPA-SC, which guarantees that its cumulative PIF will converge to that of the global optimum power value at a rate of $\mathcal{O}(\log T/T)$ . Furthermore, this upper bound applies to any finite time $T$ and any general function of switching loss $f(|p_{i}-p_{j}|)$ as long as $f$ is a non-decreasing and finite function.

Theorem 1 is a powerful result as it gives an $\mathcal{O}(\log T)$ bound for the most general algorithm CBPA-SC. We can now derive similar results for all the other proposed algorithms. First, when $s_{ij}=0,\forall i,j\in\mathcal{N}_{pow}$ , Theorem 1 can be applied to CBPA. Formally, we have the following corollary.

Corollary 2.

The expected cumulative PIF loss $\mathbb{E}[R_{T}]$ of CBPA is bounded above as:

[TABLE]

where

[TABLE]

Proof.

See Appendix B. ∎

As Corollary 2 shows, the $\mathcal{O}(\log T)$ upper bound of the cumulative PIF loss still holds for the CBPA algorithm. Thus, adding switching cost into the problem does not change the optimal scaling of the cumulative PIF loss. However, the algorithm that deals with the switching cost (CBPA-SC) is considerably more complicated than the one without the switching cost (CBPA).

Next, we note that the difference between BPA and CBPA lies in the correlation structure. We can further remove the correlation component in Corollary 2 to analyze BPA.

Corollary 3.

The expected cumulative PIF loss $\mathbb{E}[R_{T}]$ of BPA is bounded above as:

[TABLE]

where $\Delta m_{i}=\mu_{i}-\mu_{i}^{0}$ measures the accuracy of the prior knowledge of the mean PIF.

Proof.

See Appendix C. ∎

Finally, because the UiPA algorithm does not use any prior knowledge, its utility function $Q_{i}^{UiPA}(t)$ is similar to the UCB1-NORMAL algorithm in [13]. Thus, the upper bound of the expected PIF loss can be derived analogously.

Theorem 4.

The expected cumulative PIF loss $\mathbb{E}[R_{T}]$ of UiPA is bounded above as:

[TABLE]

Proof.

See Appendix D. ∎

We can see that even though the constant terms in the upper bounds of CBPA and BPA may possibly be larger than the ones of UiPA, with a much smaller coefficient of $\log T$ , the performance turns out to be better. Moreover, if the prior knowledge is accurate in BPA and CBPA, the upper bounds for both will become:

[TABLE]

which can be easily derived from the corollaries.

VI Reducing Complexity in Multi-SBS

A practical problem in a multi-SBS deployment may arise due to the “curse of dimensionality”. As the set of arms consists of the combinations of different power levels at all SBSs, it leads to $n^{K}$ arms and incurs exponential time and space complexity for the proposed algorithms. Plus, the number of available power levels for each SBS $n$ can be large. Note that in the CBPA and CBPA-SC algorithms, we need matrix calculations when updating the estimated state, which calls for $\mathcal{O}(n^{3K})$ time complexity and $\mathcal{O}(n^{2K})$ space complexity [22]. This severely limits the applicability of the proposed algorithms in large enterprise networks.

To reduce the complexity, we first explore a practical constraint that has not been utilized in the proposed algorithms. In real-world deployment, the neighboring SBSs are generally not allowed to have vastly different pilot power levels. This is because otherwise they may result in significantly different coverage areas and therefore lead to very uneven load distributions. Thus, utilizing this practical constraint, we only need to consider the combinations of power levels in which neighboring SBS power levels are different by no more than a certain threshold $P_{th}$ .

Even with the power difference threshold, the size of set is still exponential in $K$ . To further reduce the complexity, we notice that the performance space of all set of arms exhibits certain “clustering” effect that can be utilized. For two power settings that differ only slightly (e.g., $\{0,3,5\}$ and $\{0,4,4\}$ dBm for $K=3$ ), the performances may be very similar. Thus, if we can carefully group the power settings into a few clusters, and only use the cluster center as the representative power setting, we can achieve a good tradeoff between complexity and performance for the algorithms.

We propose to perform a clustering operation to address the complexity issue. The clustering operation is done after the self-configuration phase to leverage the prior knowledge, but before invoking the power assignment algorithm. We adopt the K-medoids clustering [23] because, different from the well-known K-means clustering, K-medoids is based on the most central object instead of the centroids in K-means, each of which is the mean point of all objects in the cluster. Therefore, the medoids in each cluster can be seen as the representative power settings. We note that the choice of the number of clusters $N$ plays a critical role in the overall performance. If it is too large, the global optimum power setting may be a medoid with high probability, which contributes to high accuracy for the power assignment process but also increases the complexity and leads to low efficiency, and vice versa.

We further note that there is a $\mathcal{O}(n^{K}N)$ time complexity for K-medoids clustering [23], but as clustering is done prior to the self-optimization phase, the process can be handled offline. Thus, time complexity is less of a concern.

VII Simulation Results

VII-A Simulation setup

We resort to numerical simulations to verify the effectiveness of the developed transmit power assignment algorithms. A system-level heterogeneous network simulator is developed considering both indoor SBS and outdoor MBS. We consider a large warehouse with $K=1,2,4$ SBSs deployed inside and a MBS outside with a fixed transmit power setting. The measurement points constitute two routes inside and outside respectively which we assume to follow concentric circle or ellipse pattern. 100 measurement points are set uniformly on each route. At each time slot, the measurement points feedback their own coverage condition, determined by the respective SINR which is naturally decided by the current SBS power setting. We set the total time horizon as $T=3000$ slots and iterate each simulation setting for 50 times to average out the randomness. The size of the warehouse and the SBS locations are given in Table I. Here we set the center of the warehouse as origin. The PIF $r$ under each power value can be calculated following the procedure in Sec. II.

We obtain the received power from SBS or MBS using the indoor femto channel model of urban deployment from [24] as follows.

•

indoor UE to MBS:

[TABLE]

•

outdoor UE to MBS:

[TABLE]

•

indoor UE to SBS:

[TABLE]

•

outdoor UE to SBS:

[TABLE]

Note that (10) and (12) are for indoor routes while (11) and (13) are for outdoor routes; $d$ represents the separation between a BS and the measurement point; $L_{ow}$ is the penetration loss of an outdoor wall, which indoor user suffers when receiving power from outdoor MBS and outdoor user receiving from indoor SBS; $X_{\sigma_{dB}}$ and $X_{\sigma^{\prime}_{dB}}$ stand for shadow fading. Other important simulation parameters are summarized in Table I.

VII-B Evaluation of the PIF Gaussian Distribution

We first study the empirical distribution of the PIF $r$ in $K=1$ SBS with the set of power levels $\mathcal{P}=\{-10,-5,..,15,20\}$ dBm. We present the comparison of empirical and Gaussian distributions in two representative scenarios in Fig. 4(a) and 4(b). As we can see, the assumption on Gaussian distributed PIFs matches well with the empirical distributions.

To further verify the dependency on the Gaussian distribution, we study the performance of the proposed algorithms compared with a well-behaved UCB extended algorithm which makes no assumptions on the distribution of the rewards, e.g. UCB-V in [25] under non-Gaussian reward distributions. More specifically, two well-adopted distributions in wireless communications, uniform and Rayleigh, are considered. We can see from Fig. 5 that performances under non-Gaussian distributions are still very good, particularly for BPA and CBPA. This observation indicates that Gaussianness is not a fundamental assumption that must be met to guarantee the effectiveness of the algorithms.

VII-C System Performance

In the simulation setting for $K=1$ , we deploy an outside MBS at $[100m,100m]$ . The set of power levels for SBS is $\mathcal{P}=\{-10,-8,..,18,20\}$ dBm while other settings follow Table I. The inside and outside routes have the concentric circle pattern, whose radiuses are (2, 13) meters for the two indoor routes, and (24, 30) meters for the two outdoor routes. The cumulative loss over time is used to evaluate the performance, and we use the optimal power achieved by the global optimization of the expected PIF with complete RF information as the genie-aided optimum.

We first compare the performance of UiPA, BPA and CBPA algorithms with different quality of priors. Fig. 6(a) reports the cumulative loss over time for all three algorithms when the prior knowledge is of good quality, i.e. the estimated mean from the empirical distribution is used. Fig. 6(b) shows the same simulation but with a poor prior knowledge, which uses a uniform prior distribution with each element $\mu_{0}=50$ . A few important observations can be made from these simulations. First of all, we see that all three algorithms can converge to the optimal power value asymptotically, but with different speed. To further evaluate the convergence speed, we plot the empirical CDF of the convergence time for all three algorithms in Fig. 7. It becomes clear that leveraging both the prior knowledge and the correlation structure significantly accelerates the convergence of CBPA. In terms of minimizing the total PIF loss, CBPA also outperforms BPA which performs better than UiPA. Second, degradation of the quality of the priors degrades the performance of BPA and CBPA. Particularly, performance of the BPA is getting close to UiPA with poor prior knowledge. It is interesting to note that even with poor prior, CBPA still converge faster than other algorithms with good prior. This is because when the prior knowledge is inaccurate, CBPA recovers some of the PIF degradation by leveraging its correlation structure. Lastly, as UiPA does not leverage the prior knowledge, changing its quality does not affect the convergence speed.

Next, we compare the proposed algorithms with the industry solution. The heuristic solution [9] keeps a power value long enough to obtain a near-perfect PIF estimation, and then it either increases or decreases the power value by a fixed step size. Clearly, this method trades off fast convergence for certainty. Fig. 8 reports the numerical comparison with a maximum $20$ dBm and step size $2$ dB. We can see that the industrial solution adapts poorly to different deployments, while our algorithms are stable thanks to online learning.

For $K=2$ and $K=4$ , an outside MBS is deployed at $[100m,100m]$ . The power value difference threshold is $P_{th}=5$ dB. The power value for each SBS is selected from $\mathcal{P}=\{-10,-5,..,15,20\}$ dBm. It results in $n=19$ for $K=2$ without any clustering, which may be acceptable in terms of complexity. The cumulative PIF loss with respect to the optimal power setting is shown in Fig. 9(a). For $K=4$ case, however, there are $n=149$ power settings. We thus employ the clustering strategy in Sec. VI and study two cases where the number of clusters is either $N=20$ or $N=40$ . The PIF loss normalized by time is shown in Fig. 9(b). We can see that all algorithms exhibit a decaying loss per slot. As for the effect of $N$ , there exists an initial period when larger cluster number results in worse performance for all the algorithms. This is because during the initial slots, more power settings lead to more exploration and thus sub-optimal power settings are selected more. As time goes by, the algorithms have more knowledge about the optimal power setting. While a larger cluster number means one of the selected clustering medoids is closer to the globally optimal power setting, a larger $N$ results in a better performance. Detailed coverage and leakage results under optimal selections are reported in Table II.

Fig. 10(a) and 10(b) study the impact of power switching cost for $K=2$ and $K=4$ , respectively. Here we adopt a simple linear function of switching loss as $s_{ij}=\gamma|p_{i}-p_{j}|$ , where $\gamma$ is a tunable parameter for different scenarios and we set as 0.2. We can see that the additional performance loss occurring whenever a SBS changes its power value increases the overall performance loss in all algorithms. However, the algorithms can still converge to the optimal power settings asymptotically in a sub-linear fashion, matching the regret analysis in Sec. III. In Fig. 10(b), the performances of different cluster numbers also comply with our previous analysis.

VIII Conclusion

We have studied the pilot power assignment problem associated with indoor enterprise closed-access SBS networks, in which the focus is on achieving optimal balance between providing sufficient coverage for the indoor users and suppressing leakage that causes interference to outdoor MBS users. We modeled power assignment as an online learning problem, and adopted a Bayesian approach that leverages the prior information of the Gaussian distribution. We proposed bandit-inspired power assignment algorithms that utilize different levels of the statistical information. The CBPA algorithm makes use of both prior knowledge of the mean and variance of each arm as well as the dependency of PIFs across different power values. In contrast, the BPA algorithm only uses the prior knowledge but not the correlation information, and its performance is worse than CBPA but better than the UiPA algorithm that does not use either prior or correlation. Furthermore, we explicitly took into account the power switching cost, and enhanced the power assignment algorithms with a block allocation scheme to reduce frequent power-switchings. A sub-linear upper bound for performance loss was proved for all the algorithms. Furthermore, for the multi-SBS deployment, we proposed to use K-medoids clustering to reduce the complexity while maintaining the performance. When the cluster number becomes large, the algorithms can approach the globally optimal power setting for all $K$ SBSs.

As a possible future direction, the spectral bandits method proposed in [26] offers a new perspective to efficiently handle a large number of arms while capturing the correlation structure. This can be an interesting alternative for the enterprise transmit power assignment problem. In particular, complexity and performance comparison with the algorithms of this paper may shed light into its feasibility.

Appendix A Proof of Theorem 1

We start by proving for the case $L_{l-1}<T\leqslant L_{l}$ . Note that

[TABLE]

where $i^{*}=\arg\max_{i=1,..,n}\mu_{i}$ , $\eta_{i}$ is a positive integer, and $\mathcal{I}(x)$ is the indicator function. At any time $t$ , sub-optimal $i$ is selected only when $Q_{i^{*}}^{t}\leqslant Q_{i}^{t}$ , which is true as long as one of the following inequalities holds:

[TABLE]

where $U_{i}(\tau_{fk})=\hat{\sigma}_{i}(\tau_{fk})\sqrt{\sum\limits_{j=1}\limits^{n}{\rho_{ij}^{2}(\tau_{fk})}}\Phi^{-1}(1-1/\sqrt{2\pi e}\tau_{fk}^{2})$ . Define the bias $\bm{e}$ and covariance $\bar{\Sigma}$ of the estimate $\hat{\bm{\mu}}(t)$ , with $e_{i}$ and $\bar{\sigma}_{i}$ representing the $i$ -th entry of $\bm{e}$ and the diagonal of $\bar{\Sigma}$ , and we have $\hat{\bm{\mu}}(t)\sim\mathcal{N}(\bm{e}(t)+\bm{\mu},\bar{\Sigma}(t))$ , with $e_{i}(t)=\sum_{j=1}^{n}\sum_{k=1}^{n}\hat{\sigma}_{ik}(t)\lambda_{kj}^{0}(\mu^{0}_{j}-\mu_{j})$ .

We now separately analyze (16a), (16b), and (16c). First, if $N_{i^{*}}(\tau_{fk})=0$ , then (16a) is false if [16, Lemma 7]

[TABLE]

or equivalently,

[TABLE]

Otherwise, if $N_{i^{*}}(\tau_{fk})\geqslant 1$ , we have

[TABLE]

where $z$ is a standard Gaussian random variable. This indicates that $\sqrt{3\log{\tau_{fk}}}-\frac{M_{i^{*}}}{\sigma_{0}}\geqslant 0$ . Thus we have $\tau_{fk}>e^{M_{i^{*}}^{2}/3\sigma_{0}^{2}}=\tau_{1}$ . For $\tau_{fk}>\tau_{1}$ , we have

[TABLE]

Inequality (19) is deduced using [16, Lemma 2].

Similarly, we can deduce that if $N_{i}(\tau_{fk})>\eta_{i}$ and $\tau_{fk}\geqslant\tau_{2}:=e^{M_{i}^{2}/3\sigma_{0}^{2}}$ , then

[TABLE]

For inequality (16c), it holds if

[TABLE]

Thus we have that (16c) does not hold if

[TABLE]

Setting $\eta_{i}=\lceil\frac{4\sigma_{0}^{2}}{\Delta_{i}^{2}}(\log{2\pi e}+4\log{T})-1\rceil$ and combining (17), (19) and (20), the inequality (15) can be written as

[TABLE]

We now focus on $\sum_{f=1}^{l}\sum_{k=1}^{b_{f}}f\tau_{fk}^{-\frac{9}{8}}$ . With $\tau_{fk}=L_{f-1}+1+(k-1)f$ and $2^{f^{2}}\leqslant L_{f}\leqslant 2^{f^{2}}+f^{2}$ , we have

[TABLE]

and

[TABLE]

Therefore (22) yields

[TABLE]

We then establish the expected number of switches to a sub-optimal arm $i$ from a different arm. We have

[TABLE]

using the same argument as [21]. Then it follows that

[TABLE]

With the upper bound on $\mathbb{E}[N_{i}(T)]$ and $L_{f}\leqslant 2^{f^{2}}+f^{2}\leqslant 2^{f^{2}+1}$ , (23) can be further deduced as

[TABLE]

Finally, the cumulative switching cost can be bounded as

[TABLE]

Appendix B Proof of Corollary 2

For the CBPA algorithm, (14) still holds. Hence, the argument from (16a) to (21) equally applies to any time slot $t=1,2,..,T$ . The proof is complete by rewriting (14) as

[TABLE]

Appendix C Proof of Corollary 3

In the BPA algorithm, inequalities (14)(16a)(16b)(16c) still hold for any time slot $t=1,2,..,T$ , with $U_{i}(t)=\frac{\sigma_{0}}{\sqrt{1+N_{i}(t)}}\Phi^{-1}(1-1/\sqrt{2\pi e}t^{2})$ . The estimated mean $\hat{\mu}_{i}(t)$ is a Gaussian random variable with mean $\frac{\mu_{i}^{0}+N_{i}(t)\mu_{i}}{1+N_{i}(t)}$ and variance $\frac{N_{i}(t)\sigma_{0}^{2}}{(1+N_{i}(t))^{2}}$ . The proof then follows the similar steps as Appendix A, with inequality (18) written as

[TABLE]

Thus, inequalities (19) and (20) become

[TABLE]

This leads to

[TABLE]

which completes the proof.

Appendix D Proof of Theorem 4

According to the Lemma 1 in [16], the utility function $Q_{i}^{UiPA}$ can be written as

[TABLE]

with

[TABLE]

Then, we can use [13, Theorem 4] to bound the expected loss. We have (24), shown at the top of the next page,

for all $N_{i}(t)\geqslant\log{2\pi e}/2+2\log t$ . Furthermore, $\mathbb{P}\{\hat{\mu}_{i^{*}}(t)\geqslant\mu_{i^{*}}+U_{i^{*}}(t)\}$ can be similarly bounded. Lastly, using the Chi-squared distribution, we have

[TABLE]

and

[TABLE]

Combining (24)(25)(26), $N_{i}(t)$ can be bounded as

[TABLE]

This completes the proof.

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Cisco, “Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2015–2020,” February 2016.
2[2] T. Quek, G. de la Roche, I. Guvenc, and M. Kountouris, Small Cell Networks: Deployment, PHY Techniques, and Resource Allocation . Cambridge University Press, 2013.
3[3] J. Ramiro and K. Hamied, Self-Organizing Networks (SON): Self-Planning, Self-Optimization and Self-Healing for GSM, UMTS and LTE . Wiley, Nov. 2011.
4[4] O. Aliu, A. Imran, M. Imran, and B. Evans, “A survey of self organisation in future cellular networks,” IEEE Communications Surveys & Tutorials , vol. 15, no. 1, pp. 336–361, 2013.
5[5] Small Cell Forum, “Interference management in UMTS femtocells: topic brief,” February 2014.
6[6] 3GPP, “FDD Home Node B RF Requirements,” TR 25.967 v 9.0.0.
7[7] S. Nagaraja et al. , “Downlink transmit power calibration for enterprise femtocells,” in IEEE VTC , 2011.
8[8] D. López-Pérez, X. Chu, A. Vasilakos, and H. Claussen, “Minimising cell transmit power: Towards self-organized resource allocation in OFDMA femtocells,” in ACM SIGCOMM , August 2011, pp. 410–411.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Small Cell Transmit Power Assignment Based on Correlated Bandit Learning

Abstract

Index Terms:

I Introduction

II System Model and Problem Formulation

II-A Network Model

II-B Problem Formulation

III Power Assignment Algorithms based on Bayesian Bandit Learning

III-A Stochastic Bandit Model

III-B Bayesian Power Assignment Algorithm

III-C Correlated Bayesian Power Assignment Algorithm

IV Power Assignment with Switching Cost

IV-A Problem Formulation with Switching Cost

IV-B The Power Assignment Algorithm with Switching Cost

V Performance Analysis of the Proposed Algorithms

V-A Upper Bound Analysis

Theorem 1**.**

Proof.

Corollary 2**.**

Proof.

Corollary 3**.**

Proof.

Theorem 4**.**

Proof.

VI Reducing Complexity in Multi-SBS

VII Simulation Results

VII-A Simulation setup

VII-B Evaluation of the PIF Gaussian Distribution

VII-C System Performance

VIII Conclusion

Appendix A Proof of Theorem 1

Appendix B Proof of Corollary 2

Appendix C Proof of Corollary 3

Appendix D Proof of Theorem 4

Theorem 1.

Corollary 2.

Corollary 3.

Theorem 4.