Distributed Learning for Channel Allocation Over a Shared Spectrum

S.M. Zafaruddin; Ilai Bistritz; Amir Leshem; Dusit Niyato

arXiv:1902.06353·cs.IT·December 5, 2019

Distributed Learning for Channel Allocation Over a Shared Spectrum

S.M. Zafaruddin, Ilai Bistritz, Amir Leshem, Dusit Niyato

PDF

TL;DR

This paper introduces a distributed algorithm for channel allocation in shared spectrum networks that learns optimal assignments online without requiring inter-user communication, achieving near-optimal regret.

Contribution

The paper presents a novel distributed auction-based algorithm for channel allocation that converges to the optimal solution with minimal regret in a multi-user, unknown channel environment.

Findings

01

Achieves order optimal regret of O(log T)

02

Operates without inter-user communication

03

Effective in LTE and 5G channel simulations

Abstract

Channel allocation is the task of assigning channels to users such that some objective (e.g., sum-rate) is maximized. In centralized networks such as cellular networks, this task is carried by the base station which gathers the channel state information (CSI) from the users and computes the optimal solution. In distributed networks such as ad-hoc and device-to-device (D2D) networks, no base station exists and conveying global CSI between users is costly or simply impractical. When the CSI is time varying and unknown to the users, the users face the challenge of both learning the channel statistics online and converge to a good channel allocation. This introduces a multi-armed bandit (MAB) scenario with multiple decision makers. If two users or more choose the same channel, a collision occurs and they all receive zero reward. We propose a distributed channel allocation algorithm that…

Equations62

N_{i} (t) = {n ∣ a_{n} (t) = i} .

N_{i} (t) = {n ∣ a_{n} (t) = i} .

\eta_{i}\left(t\right)=\Biggl{\{}\begin{array}[]{cc}0&\Bigl{|}\mathcal{N}_{i}\left(t\right)\Bigr{|}>1\\ 1&o.w.\end{array}.

\eta_{i}\left(t\right)=\Biggl{\{}\begin{array}[]{cc}0&\Bigl{|}\mathcal{N}_{i}\left(t\right)\Bigr{|}>1\\ 1&o.w.\end{array}.

r_{n, a_{n}} (t) = q_{n, a_{n}} (t) η_{a_{n}} (t) .

r_{n, a_{n}} (t) = q_{n, a_{n}} (t) η_{a_{n}} (t) .

R = t = 1 \sum T n = 1 \sum N Q_{n}^{*} - t = 1 \sum T n = 1 \sum N q_{n, a_{n} (t)} (t) η_{a_{n} (t)} (t) .

R = t = 1 \sum T n = 1 \sum N Q_{n}^{*} - t = 1 \sum T n = 1 \sum N q_{n, a_{n} (t)} (t) η_{a_{n} (t)} (t) .

a^{*} = ar g a_{1}, ..., a_{N} max n = 1 \sum N Q_{n, a_{n}} .

a^{*} = ar g a_{1}, ..., a_{N} max n = 1 \sum N Q_{n, a_{n}} .

c_{1} \geq K max {\frac{81}{2} K, \frac{128}{9} (\frac{Δ _{m a x}}{Δ _{m i n}})^{2} N^{2}}

c_{1} \geq K max {\frac{81}{2} K, \frac{128}{9} (\frac{Δ _{m a x}}{Δ _{m i n}})^{2} N^{2}}

γ_{n} = i max (Q_{n, i}^{k} - B_{n, i})

γ_{n} = i max (Q_{n, i}^{k} - B_{n, i})

\tilde{i}_{n} = ar g k max (Q_{n, i}^{k} - B_{n, i})

\tilde{i}_{n} = ar g k max (Q_{n, i}^{k} - B_{n, i})

w_{n} = i \neq = \tilde{i}_{n} max (Q_{n, i}^{k} - B_{n, i})

w_{n} = i \neq = \tilde{i}_{n} max (Q_{n, i}^{k} - B_{n, i})

B_{n, \tilde{i}_{n}} = B_{n, \tilde{i}_{n}} + γ_{n} - w_{n} + ε

B_{n, \tilde{i}_{n}} = B_{n, \tilde{i}_{n}} + γ_{n} - w_{n} + ε

τ_{n} = f_{b (k)} (B_{n, \tilde{i}_{n}})

τ_{n} = f_{b (k)} (B_{n, \tilde{i}_{n}})

ar g a_{1}, ..., a_{N} max n = 1 \sum N Q_{n, a (n)} = ar g a_{1}, ..., a_{N} max n = 1 \sum N Q_{n, a (n)}^{k} .

ar g a_{1}, ..., a_{N} max n = 1 \sum N Q_{n, a (n)} = ar g a_{1}, ..., a_{N} max n = 1 \sum N Q_{n, a (n)}^{k} .

P_{e, k} ≜ Pr (n, i max Q_{n, i}^{k} - Q_{n, i} > \frac{3 Δ _{m i n}}{8 N}) \leq 3 N K e^{- k} .

P_{e, k} ≜ Pr (n, i max Q_{n, i}^{k} - Q_{n, i} > \frac{3 Δ _{m i n}}{8 N}) \leq 3 N K e^{- k} .

\overset{ˉ}{R} \leq \overset{ˉ}{R}_{0} k = 1 \sum k_{0} \overset{ˉ}{R}_{k} + k = k_{0} + 1 \sum E \overset{ˉ}{R}_{k}

\overset{ˉ}{R} \leq \overset{ˉ}{R}_{0} k = 1 \sum k_{0} \overset{ˉ}{R}_{k} + k = k_{0} + 1 \sum E \overset{ˉ}{R}_{k}

\overset{ˉ}{R}_{k} \leq (c_{1} + 4 K^{2} N (\frac{Q _{M}}{Δ _{m i n}} + \frac{1}{N}) (2^{b (k)} + 1) + 1) N + 3 N K c_{2} (\frac{2}{e})^{k} N \leq 2 (c_{1} + 4 K^{2} N (\frac{Q _{M}}{Δ _{m i n}} + \frac{1}{N}) (2^{b_{f}} + 1)) N

\overset{ˉ}{R}_{k} \leq (c_{1} + 4 K^{2} N (\frac{Q _{M}}{Δ _{m i n}} + \frac{1}{N}) (2^{b (k)} + 1) + 1) N + 3 N K c_{2} (\frac{2}{e})^{k} N \leq 2 (c_{1} + 4 K^{2} N (\frac{Q _{M}}{Δ _{m i n}} + \frac{1}{N}) (2^{b_{f}} + 1)) N

\overset{ˉ}{R} \leq \overset{ˉ}{R}_{0} + k = k_{0} + 1 \sum E \overset{ˉ}{R}_{k} (a) \leq \overset{ˉ}{R}_{0} + 2 (c_{1} + 4 K^{2} N (\frac{Q _{M}}{Δ _{m i n}} + \frac{1}{N}) (2^{b_{f}} + 1)) N E (b) \leq \overset{ˉ}{R}_{0} + 2 (c_{1} + 4 K^{2} N (\frac{Q _{M}}{Δ _{m i n}} + \frac{1}{N}) (2^{b_{f}} + 1)) N lo g_{2} (\frac{T}{c _{2}} + 2)

\overset{ˉ}{R} \leq \overset{ˉ}{R}_{0} + k = k_{0} + 1 \sum E \overset{ˉ}{R}_{k} (a) \leq \overset{ˉ}{R}_{0} + 2 (c_{1} + 4 K^{2} N (\frac{Q _{M}}{Δ _{m i n}} + \frac{1}{N}) (2^{b_{f}} + 1)) N E (b) \leq \overset{ˉ}{R}_{0} + 2 (c_{1} + 4 K^{2} N (\frac{Q _{M}}{Δ _{m i n}} + \frac{1}{N}) (2^{b_{f}} + 1)) N lo g_{2} (\frac{T}{c _{2}} + 2)

n = 1 \sum N Q_{n, a^{1} (n)}^{k} = n = 1 \sum N (Q_{n, a^{1} (n)} + z_{n, a^{1} (n)} + u_{n, a^{1} (n)}) \geq n = 1 \sum N Q_{n, a^{1} (n)} - (Δ + \frac{Δ _{m i n}}{8 N}) N .

n = 1 \sum N Q_{n, a^{1} (n)}^{k} = n = 1 \sum N (Q_{n, a^{1} (n)} + z_{n, a^{1} (n)} + u_{n, a^{1} (n)}) \geq n = 1 \sum N Q_{n, a^{1} (n)} - (Δ + \frac{Δ _{m i n}}{8 N}) N .

n = 1 \sum N Q_{n, a (n)}^{k} \leq n = 1 \sum N (Q_{n, a^{2} (n)} + z_{n, a (n)} + u_{n, a (n)}) \leq n = 1 \sum N Q_{n, a^{2} (n)} + (Δ + \frac{Δ _{m i n}}{8 N}) N

n = 1 \sum N Q_{n, a (n)}^{k} \leq n = 1 \sum N (Q_{n, a^{2} (n)} + z_{n, a (n)} + u_{n, a (n)}) \leq n = 1 \sum N Q_{n, a^{2} (n)} + (Δ + \frac{Δ _{m i n}}{8 N}) N

n = 1 \sum N Q_{n, a (n)} - n = 1 \sum N Q_{n, a^{'} (n)} \geq Δ_{m i n} .

n = 1 \sum N Q_{n, a (n)} - n = 1 \sum N Q_{n, a^{'} (n)} \geq Δ_{m i n} .

n = 1 \sum N Q_{n, a^{1} (n)}^{k} - n = 1 \sum N Q_{n, a (n)}^{k} (a) \geq (n = 1 \sum N Q_{n, a^{1} (n)} - n = 1 \sum N Q_{n, a^{2} (n)}) - (2Δ + \frac{Δ _{m i n}}{4 N}) N (b) \geq \frac{3 Δ _{m i n}}{4} - 2Δ N (c) > 0

n = 1 \sum N Q_{n, a^{1} (n)}^{k} - n = 1 \sum N Q_{n, a (n)}^{k} (a) \geq (n = 1 \sum N Q_{n, a^{1} (n)} - n = 1 \sum N Q_{n, a^{2} (n)}) - (2Δ + \frac{Δ _{m i n}}{4 N}) N (b) \geq \frac{3 Δ _{m i n}}{4} - 2Δ N (c) > 0

ξ_{n, i} ≜ \frac{1}{V _{n, i} ( t )} \sum A_{n, i} (τ) r_{n, i} (τ) - Q_{n, i} .

ξ_{n, i} ≜ \frac{1}{V _{n, i} ( t )} \sum A_{n, i} (τ) r_{n, i} (τ) - Q_{n, i} .

Pr (E ∣ V_{m i n} = v) = Pr (i = 1 ⋃ K n = 1 ⋃ N {ξ_{n, i} \geq Δ ∣ V_{m i n} = v}) (a) \leq N K n, i max Pr (ξ_{n, i} \geq Δ ∣ V_{m i n} = v) (b) \leq 2 N K e^{- \frac{2 Δ ^{2}}{Δ _{m a x}^{2}} v} .

Pr (E ∣ V_{m i n} = v) = Pr (i = 1 ⋃ K n = 1 ⋃ N {ξ_{n, i} \geq Δ ∣ V_{m i n} = v}) (a) \leq N K n, i max Pr (ξ_{n, i} \geq Δ ∣ V_{m i n} = v) (b) \leq 2 N K e^{- \frac{2 Δ ^{2}}{Δ _{m a x}^{2}} v} .

Pr (A_{n, i} (t) = 1) = \frac{1}{K} (1 - \frac{1}{K})^{N - 1} .

Pr (A_{n, i} (t) = 1) = \frac{1}{K} (1 - \frac{1}{K})^{N - 1} .

Pr (V_{m i n} < \frac{T _{e} ( k )}{4 K}) = Pr (i = 1 ⋃ K n = 1 ⋃ N {V_{n, i} (t) \leq \frac{T _{e} ( k )}{4 K}}) (a) \leq N K Pr (V_{1, 1} (t) \leq \frac{T _{e} ( k )}{4 K}) (b) \leq N K e^{- 2 \frac{1}{K ^{2}} ((1 - \frac{1}{K})^{N - 1} - \frac{1}{4})^{2} T_{e} (k)} (c) \leq N K e^{- \frac{2}{81 K ^{2}} T_{e} (k)}

Pr (V_{m i n} < \frac{T _{e} ( k )}{4 K}) = Pr (i = 1 ⋃ K n = 1 ⋃ N {V_{n, i} (t) \leq \frac{T _{e} ( k )}{4 K}}) (a) \leq N K Pr (V_{1, 1} (t) \leq \frac{T _{e} ( k )}{4 K}) (b) \leq N K e^{- 2 \frac{1}{K ^{2}} ((1 - \frac{1}{K})^{N - 1} - \frac{1}{4})^{2} T_{e} (k)} (c) \leq N K e^{- \frac{2}{81 K ^{2}} T_{e} (k)}

P_{e, k} \leq Pr (E) = v = 0 \sum T_{e} (k) Pr (E ∣ V_{m i n} = v) Pr (V_{m i n} = v) \leq v = 0 \sum ⌊ \frac{T _{e} ( k )}{4 K} ⌋ Pr (V_{m i n} = v) + ⌈ \frac{T _{e} ( k )}{4 K} ⌉ + 1 \sum T_{e} (k) Pr (E ∣ V_{m i n} = v) Pr (V_{m i n} = v) \leq Pr (V_{m i n} < \frac{T _{e} ( k )}{4 K}) + Pr (E ∣ V_{m i n} \geq \frac{T _{e} ( k )}{4 K}) (a) \leq 2 N K e^{- \frac{Δ ^{2} c _{1}}{2 K Δ _{m a x}^{2}} k} + N K e^{- \frac{2 c _{1} k}{81 K ^{2}}}

P_{e, k} \leq Pr (E) = v = 0 \sum T_{e} (k) Pr (E ∣ V_{m i n} = v) Pr (V_{m i n} = v) \leq v = 0 \sum ⌊ \frac{T _{e} ( k )}{4 K} ⌋ Pr (V_{m i n} = v) + ⌈ \frac{T _{e} ( k )}{4 K} ⌉ + 1 \sum T_{e} (k) Pr (E ∣ V_{m i n} = v) Pr (V_{m i n} = v) \leq Pr (V_{m i n} < \frac{T _{e} ( k )}{4 K}) + Pr (E ∣ V_{m i n} \geq \frac{T _{e} ( k )}{4 K}) (a) \leq 2 N K e^{- \frac{Δ ^{2} c _{1}}{2 K Δ _{m a x}^{2}} k} + N K e^{- \frac{2 c _{1} k}{81 K ^{2}}}

P_{e, k} \leq 2 N K e^{- \frac{9 Δ _{m i n}^{2} c _{1}}{128 K Δ _{m a x}^{2} N ^{2}} k} + N K e^{- \frac{2 c _{1} k}{81 K ^{2}}} \leq 3 N K e^{- k} .

P_{e, k} \leq 2 N K e^{- \frac{9 Δ _{m i n}^{2} c _{1}}{128 K Δ _{m a x}^{2} N ^{2}} k} + N K e^{- \frac{2 c _{1} k}{81 K ^{2}}} \leq 3 N K e^{- k} .

I_{auc} \leq K N + \frac{K}{ε} n = 1 \sum N Q_{n, i}^{k} (a) \leq K N + \frac{K N}{ε} (Q_{M} + \frac{Δ _{m i n}}{8 N})

I_{auc} \leq K N + \frac{K}{ε} n = 1 \sum N Q_{n, i}^{k} (a) \leq K N + \frac{K N}{ε} (Q_{M} + \frac{Δ _{m i n}}{8 N})

n = 1 \sum N Q_{n, a_{n}} - n = 1 \sum N Q_{n, a_{n}^{'}} \geq Δ_{m i n}

n = 1 \sum N Q_{n, a_{n}} - n = 1 \sum N Q_{n, a_{n}^{'}} \geq Δ_{m i n}

\left|\sum_{n=1}^{N}Q_{n,a_{n}}^{k}-\sum_{n=1}^{N}Q_{n,a_{n}^{\prime}}^{k}\right|=\Biggl{|}\sum_{n=1}^{N}Q_{n,a_{n}}^{k}-\sum_{n=1}^{N}Q_{n,a_{n}}+\sum_{n=1}^{N}Q_{n,a_{n}}-\sum_{n=1}^{N}Q_{n,a_{n}^{\prime}}+\sum_{n=1}^{N}Q_{n,a_{n}^{\prime}}-\sum_{n=1}^{N}Q_{n,a_{n}^{\prime}}^{k}\Biggr{|}\underset{\left(a\right)}{\geq}\\ \left|\sum_{n=1}^{N}Q_{n,a_{n}}-\sum_{n=1}^{N}Q_{n,a_{n}^{\prime}}\right|-\left|\sum_{n=1}^{N}Q_{n,a_{n}}^{k}-\sum_{n=1}^{N}Q_{n,a_{n}}\right|-\left|\sum_{n=1}^{N}Q_{n,a_{n}^{\prime}}^{k}-\sum_{n=1}^{N}Q_{n,a_{n}^{\prime}}\right|\underset{\left(b\right)}{>}\Delta_{\min}-\frac{3\Delta_{\min}}{4}\geq\frac{\Delta_{\min}}{4}

\left|\sum_{n=1}^{N}Q_{n,a_{n}}^{k}-\sum_{n=1}^{N}Q_{n,a_{n}^{\prime}}^{k}\right|=\Biggl{|}\sum_{n=1}^{N}Q_{n,a_{n}}^{k}-\sum_{n=1}^{N}Q_{n,a_{n}}+\sum_{n=1}^{N}Q_{n,a_{n}}-\sum_{n=1}^{N}Q_{n,a_{n}^{\prime}}+\sum_{n=1}^{N}Q_{n,a_{n}^{\prime}}-\sum_{n=1}^{N}Q_{n,a_{n}^{\prime}}^{k}\Biggr{|}\underset{\left(a\right)}{\geq}\\ \left|\sum_{n=1}^{N}Q_{n,a_{n}}-\sum_{n=1}^{N}Q_{n,a_{n}^{\prime}}\right|-\left|\sum_{n=1}^{N}Q_{n,a_{n}}^{k}-\sum_{n=1}^{N}Q_{n,a_{n}}\right|-\left|\sum_{n=1}^{N}Q_{n,a_{n}^{\prime}}^{k}-\sum_{n=1}^{N}Q_{n,a_{n}^{\prime}}\right|\underset{\left(b\right)}{>}\Delta_{\min}-\frac{3\Delta_{\min}}{4}\geq\frac{\Delta_{\min}}{4}

n = 1 \sum N Q_{n, \tilde{a}_{n}}^{k} - a_{1}, ..., a_{N} max n = 1 \sum N Q_{n, a_{n}}^{k} < \frac{Δ _{m i n}}{4 K} N \leq \frac{Δ _{m i n}}{4}

n = 1 \sum N Q_{n, \tilde{a}_{n}}^{k} - a_{1}, ..., a_{N} max n = 1 \sum N Q_{n, a_{n}}^{k} < \frac{Δ _{m i n}}{4 K} N \leq \frac{Δ _{m i n}}{4}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Distributed Learning for Channel Allocation Over a Shared Spectrum

S.M. Zafaruddin, , Ilai Bistritz, , Amir Leshem, and Dusit (Tao) Niyato S.M. Zafaruddin was with the Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel (e-mail: [email protected]). Currently, he is with the Department of Electrical and Electronics Engineering, BITS Pilani, Pilani-333031 (email: [email protected]). I. Bistritz is with the Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel (e-mail: [email protected]). A. Leshem is with the Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel (e-mail: [email protected]). Dusit (Tao) Niyato is with School of Computer Science and Engineering (SCSE), Nanyang Technological University, Singapore 639798. (e-mail: [email protected]).

This research was supported by the ISF-NRF Joint research Program, under grant ISF 2277/16. S. M. Zafaruddin was partially funded by the Israeli Planning and Budget Committee (PBC) post-doctoral fellowship.

Abstract

Channel allocation is the task of assigning channels to users such that some objective (e.g., sum-rate) is maximized. In centralized networks such as cellular networks, this task is carried by the base station which gathers the channel state information (CSI) from the users and computes the optimal solution. In distributed networks such as ad-hoc and device-to-device (D2D) networks, no base station exists and conveying global CSI between users is costly or simply impractical. When the CSI is time varying and unknown to the users, the users face the challenge of both learning the channel statistics online and converge to a good channel allocation. This introduces a multi-armed bandit (MAB) scenario with multiple decision makers. If two users or more choose the same channel, a collision occurs and they all receive zero reward. We propose a distributed channel allocation algorithm that each user runs and converges to the optimal allocation while achieving an order optimal regret of $O\left(\log T\right)$ . The algorithm is based on a carrier sensing multiple access (CSMA) implementation of the distributed auction algorithm. It does not require any exchange of information between users. Users need only to observe a single channel at a time and sense if there is a transmission on that channel, without decoding the transmissions or identifying the transmitting users. We demonstrate the performance of our algorithm using simulated LTE and 5G channels.

Index Terms:

Distributed channel allocation, multi-armed bandit, online learning, dynamic spectrum accesses, resource management.

I Introduction

Channel allocation in wireless communication is one of the fundamental management tasks. It and has been widely studied for various wireless networks [1, 2, 3, 4, 5]. In the traditional centralized systems, Orthogonal Frequency Division Multiplexing Access (OFDMA) was investigated extensively to meet the high demand for efficient spectrum utilization. If users can be assigned to sub-channels efficiently, certain gains can be derived from the diversity of the channel. The main issue for the OFDMA systems is joint power and sub-carrier allocation in the downlink direction [6, 7, 8, 9] and sub-carrier assignment in the uplink direction [10, 11, 12]. Due to the global view of the whole network, the centralized approach is able to obtain the optimal solution of a desired performance metric. The optimal channel allocation can be computed using the well-known Hungarian method [13].

However, there are some disadvantages that limit the practicality of the centralized approach such as significant signaling overhead, increased implementation complexity and higher latency in dealing with resource allocation problems. Moreover, emerging wireless networking paradigms such as cognitive radio networks, ad-hoc networks, and D2D communications are inherently distributed. A complete information about the network state is typically not available online, which makes the computation of optimal policies intractable for these networks. Hence, it is desirable to develop a distributed learning algorithm for dynamic spectrum access that can effectively adapt for general complex real-world settings in dense and heterogeneous wireless environments.

Open sharing model employs spectrum sharing among peer users as the basis for managing a spectral band. Advocates of this model draw support from the phenomenal success of wireless services operating in the unlicensed industrial, scientific, and medical (ISM) radio band (e.g., WiFi). Centralized and distributed spectrum sharing strategies have been initially investigated to address technological challenges under this spectrum management model.

The center of the channel allocation task is the combinatorial optimization assignment problem. Solving the assignment problem distributively is a major challenge that has received considerable attention. The famous auction algorithm [14] proposed a distributed method to solve the assignment problem where users send their bids to an auctioneer. In [15] a fully distributed version of the auction algorithm was suggested that exploits carrier sense multiple access (CSMA) in order to avoid the need for an auctioneer.

If the resources (channels) values are not known in advance by the users, they have to learn these values online. Learning the CSI in real-time comes at the expense of using the best known channels so far. This introduces the well-known trade off between exploration and exploitation, that is captured by the multi-armed bandit (MAB) problem. In this case, there are several decision makers facing this problem, and when two or more choose the same channel, they receive zero reward. Similarly to other MAB problems, the performance is measured by the expected difference between the actual sum of rewards and the sum of rewards that could have been achieved if the users had perfect knowledge of the CSI. However, as opposed to classical MAB problems, the interaction between the users significantly complicates the learning aspects of the problem. To address that, deep reinforcement learning and Q-learning methods have been proposed for these problems [16, 17, 18], and have been shown to perform well for small-size models. However, for large-scale networks these methods perform poorly since the number of states of the learning algorithm increases exponentially in the number of users.

In [19], the auction algorithm [14] was used as a basis for a distributed algorithm that achieves an expected sum regret of $O(\log T)$ . However, since it relies on [14] , this algorithm requires communication between users in order to communicate the bids and deduce which player won each auction. To implement this algorithm, users need to know which user transmitted on which channel. In this manner, they can use their public channel choices as a signaling method. In practice, this knowledge requires that users decode at least part of the transmission to identify the ID of the transmitting users. Besides being computationally demanding, this might be highly non-trivial when multiple users transmit on the same channel and all their IDs need to be decoded from the mixture.

In this paper, we overcome this requirement by proposing a distributed algorithm that relies on [15] instead of [14]. The algorithm in [15] assumes that the CSI is known. It also uses a continuous back-off time and assumes no tied bids. We lift all of these assumptions in our novel MAC protocol. Our protocol achieves an expected sum of regret of $O(\log T)$ , but in contrast to [19], only requires each user to sense the channel that the user is using and detect if there are other transmissions on this channel. Users do not need to know which user transmitted on which channel or how many of them did. Therefore, our algorithm offers the same order optimal performance as [19] but with dramatically simpler implementation.

I-A Related Works

Developing multi-armed bandit (MAB)-based methods for solving dynamic spectrum allocation (DSA) problems is a relatively new research direction, motivated by recent developments of MAB in various other fields, and many works have been done in this direction recently. A couple of these works [20, 21, 22, 23] considered a cognitive radio scenario where a set of channels can be either empty or occupied by a primary user that interferes all secondary users. A generalized scenario was considered in [24, 25, 26], where the channel qualities are not binary, but still all users have the same vector of channel qualities. Recently, the case of a full channel allocation scenario where different users have different channel qualities (a matrix of channel qualities) have been considered in [27], and later improved in [19], by the same authors, to have an order optimal sum-regret of $O\left(\log T\right)$ .

Recently, it has been shown in [28] (which improved [29]) that achieving a sum-regret of near- $O\left(\log T\right)$ is possible even without communication between users and with a matrix of expected rewards. The algorithm in [28] is general and has a slow convergence rate in $T$ that makes it unsuited for realistic communication scenarios. In this paper, we adopt a more practical and communication oriented approach and achieve an order optimal sum-regret of $O\left(\log T\right)$ . Our algorithm still does not require any communication between users, and each device only needs to sense a single channel at a time (instead of simultaneously all of them as in [19]). It is made possible by adding assumptions that are always valid from a practical perspective - the expected rewards (QoS) are integer multiplications of a common resolution $\Delta_{\min}$ , and a device can choose not to transmit on any channel and instead only to sense a single channel of its choice. Our algorithm is much easier and less costly to implement than that of [19] and has a much better convergence time that that of [28].

The literature on distributed channel allocation without learning, where the CSI is assumed to be known, is vast and we can only cover part of it here. Recently there has been growing interest in distributed spectrum optimization for frequency selective channels, where the assignment problem arises. However, most of the work done in this field relies on explicit exchange of CSI. Several suboptimal approaches that do not require information sharing have been suggested. In [30], a greedy approach to the channel assignment problem was introduced. In [31] and [32], the use of opportunistic carrier sensing was combined with the Gale-Shapley algorithm for stable matching [33] to provide a fully distributed stable channel assignment. This solution basically achieves the greedy channel assignment and analysis of this technique for Rayleigh fading channels was done in [34].

Game theory is often used to design distributed channel allocation algorithms. In [35] the channel assignment problem was formulated as a many-to-one matching game under the limitation that each primary channel can only be assigned to one secondary user. In [36], an algorithm was proposed based on a game with utility design that leads to an asymptotically optimal performance in all Nash equilibria. In [37] the spectrum sharing problem between D2D pairs and multiple co-located cellular networks was formulated as a Bayesian non-transferable utility overlapping coalition formation game. Nash bargaining solutions for channel allocation were considered in [38, 39, 40], and distributed allocation using multichannel ALOHA and potential games was considered in [41, 42].

The auction algorithm has been extensively used to solve a variety of assignment problems. It gets its name from operating similarly to an auction. As in this paper and many others, the auction algorithm may have nothing to do with actual auctions that rely on economic and game-theoretic principles, as was done in [43, 44, 45, 46]. In [47] the auction algorithm was used to solve the channel assignment problem for the uplink, using the base station as the auctioneer. In [48] a distributed auction algorithm with shared memory was used for switch scheduling. In [49] it was shown that a modification of the auction algorithm is equivalent to max product belief propagation. However, all these modified auction algorithms require a base station or shared memory, which prevents them from being fully distributed. In addition, all these algorithms, including [15] that is being used here, assume that the CSI is known to the users. Our algorithm generalizes the distributed CSMA auction algorithm [15] to an online learning framework.

I-B Outline

The paper is organized as follows: in Section II we describe the system model and our network assumptions. Section III discusses the novel MAC protocol we propose. Section IV and Section V analyze the exploration phase and auction phases of our algorithm, respectively. Section VI provides simulation results of our algorithm on practical LTE channels, together with a performance comparison. Finally, Section VII concludes the paper.

II System Model

We consider an Ad-Hoc network with a set of transmitter-receiver pairs (links) $\mathcal{N}=\left\{1,\ldots,N\right\}$ and a set of channels $\mathcal{K}=\left\{1,\ldots,K\right\}$ , where $K\geq N$ . Each channel consists of several OFDMA subcarriers and each link uses a single channel. In the case of more users than channels ( $N>K$ ), a combined OFDMA-TDMA can be used instead in order to have enough resources for all users. However, since this is a trivial consequence of our analysis which only complicates the notation, we choose to avoid considering TDMA. The number of channels $K$ is chosen by the protocol designer to be large enough to support $N$ links in an environment with outside interferers where some of the channels can be very poor and practically unavailable. The identity and number of subcarriers that constitute each channel can also be optimized with respect to the typical channels used by the significant interferes. Links may use multiple-input multiple-output (MIMO) transmission, with different capabilities for each link. Time is slotted and indexed by $t$ , such that in each time slot $L$ OFDM symbols are transmitted. The number of OFDM symbols per time slot $L$ can be designed to match the coherence time of the channel, such that the CSI typically changes every time slot. Hence, we assume a fast-fading scenario where the coherence time is proportional to an OFDM symbol duration. The links are active for a total of $T$ time slots, where $T$ is unknown in advance by the links. We assume that each link can sense a single channel at each time slot, which is the channel they use, and detect whether other links are transmitting on this channel. The chosen channel of link $n$ at time $t$ is denoted by $a_{n}\left(t\right)$ . Naturally, links can choose not to transmit at all at a given time slot, which is denoted $a_{n}\left(t\right)=0$ . Non-transmitting links can still sense transmissions on a single chosen channel.

The links are located in a geographical proximity in an area that typically includes other coexisting networks nearby. As a result, each receiver experiences alien interference from the transmission of other protocols. Due to the geometry of the links and the different channels used by different interferers, the average interference is different for each receiver in our network. A toy example of our network with $K=N=6$ is depicted in Fig. 1. The channel used by each link is indicated by the color of the arrow between its transmitter and receiver. Outside the area of the network there are four major interferers that use four of the six available channels. In this example, links successfully avoid using channels with significant interference at their receiver side.

This outside transmissions can be constant over time or bursty, and may overlap any part of the subcarriers used by a particular link. In addition, the fading of the channel may cause significant changes to the channel gains of the subcarriers. As any modern device, the transmitter and receiver of each link adopt techniques such as adaptive beamforming and modulation together with interleaving and coding for fading channels in order to provide a stable (on average) and reliable communication for the users. However, since the channel statistics and the interference pattern are initially unknown, each link needs to learn them online as fast as possible in order to deduce which Quality of Service (QoS) it can support. As in any practical system, there is some resolution for the supported QoS (e.g., 100Kbps), we denote by $\Delta_{\min}$ . The supported QoS set is $\mathcal{Q}\triangleq\left\{Q_{1},\ldots,Q_{M}\right\}$ where for each $i$ , $Q_{i}=l_{i}\Delta_{\min}$ for a non-negative integer $l_{i}$ and $Q_{1}<\ldots<Q_{M}$ . The QoS experienced by link $n$ using channel $i$ is denoted by $Q_{n,i}$ . A value in this set may represent the weighted quality of a combination of parameters, e.g., 1Mbps for internet, 256kbps for voice and 10Mbaps for video. In general, different links have a subset of different possible QoS values from $\mathcal{Q}$ due to different capabilities, e.g., number of transmitting and receiving antennas. Being part of the standard of the protocol, we assume that the parameters $\Delta_{\min}$ and $\Delta_{\max}=Q_{M}-Q_{1}$ are known to all links.

In each time slot $t$ , each link measures the instantaneous QoS $q_{n,i}\left(t\right)$ by using a finer resolution than that of $\mathcal{Q}$ , in order for the estimation of the average to be accurate. We model $q_{n,i}\left(t\right)$ as an i.i.d. sequence in time, independent for different $n$ or $i$ . The distribution of $q_{n,i}\left(t\right)$ is bounded since $Q_{1}\leq q_{n,i}\left(t\right)\leq Q_{M}$ , and can be either discrete or continuous due to arbitrarily fine measurements.

Define the set of links that are transmitting on channel $i$ at time $t$ by

[TABLE]

Define the no-collision indicator of channel $i$ at time $t$ by

[TABLE]

The instantaneous reward of link $n$ at time $t$ from transmitting on channel $a_{n}$ is

[TABLE]

The theoretical guarantees of our algorithm are formulated using the well-known notion of regret, defined as follows.

Definition 1.

The total regret is defined as the random variable

[TABLE]

The value $Q_{n}^{*}$ is the expectation of the QoS of the channel link $n$ is assigned to in

[TABLE]

The expected total regret $\bar{R}\triangleq E\left\{R\right\}$ is the average of (4) over the randomness of the rewards $\left\{r_{n,i}\left(t\right)\right\}_{t}$ , that dictate the random channel choices $\left\{a_{n}\left(t\right)\right\}$ .

III Protocol Description

We design a novel MAC protocol that each link runs distributedly in order to maximize the accumulated sum of QoS. In the original auction algorithm, an auctioneer is needed to collect the bids and compute the highest bidder. Such an auctioneer is not available in a distributed wireless network. The algorithm in [15] exploits the CSMA mechanism to bypass the need for an auctioneer and by doing that, implements the auction algorithm distributedly. For this purpose, links compute a continuous back-off time that is decreasing with their bid. The highest bidder for a particular channel is simply the first link that accesses this channel. Since we assume all links can sense the channel they chose, all links will agree on which link was the highest bidder for their channel. **Note that we are not analyzing selfish links but devices that are programmed to run our designed MAC protocol. **

The key advantage of our algorithm is that it only requires from each receiver to sense if there are transmissions on a single channel, which is a basic requirement. We assume that all links are of a sensing distance from each other (a fully-connected network). However, as opposed to [19], links do not know which transmission belongs to which link. This is the scenario in practice with wireless links located in close enough proximity. In our protocol, links do not need to distinguish between the transmission of other links, which might have required decoding an ID for each link. Moreover, it can be extremely demanding in practice to separate colliding transmissions and discern the IDs involved. Sensing a single channel at a time instead of all the $K$ channels is another major advantage of our algorithm over [19].

Definition 2.

We divide the $T$ time slots into packets with a dynamic length, one starting immediately after the other. Each packet is further divided into three phases. In the $k$ -th packet:

Exploration Phase - this phase has a length of $c_{1}$ time slots in each packet, and is used for estimating the expected reward in each channel. The estimated values are artificially dithered in order to avoid ties in the subsequent auction phase. This phase is described in detail and analyzed in Section IV. It adds a $O\left(\log T\right)$ to the expected total regret. 2. 2.

Auction Phase - this phase has a length of $\left\lceil 4K^{2}N\left(\frac{Q_{M}}{\Delta_{\min}}+\frac{1}{N}\right)\left(2^{b\left(k\right)}+1\right)\right\rceil$ time slots in the $k$ -th packet, which is the convergence time of the distributed auction algorithm, as dictated by Lemma 8. In this phase, links run the distributed auction on the estimated expected rewards using $b\left(k\right)$ bits for the quantized back-off time. The function $b\left(k\right)$ converges to a constant independent of $k$ . In practice, it is easy to guarantee that $b\left(0\right)$ is already large enough, but the designer can shorten the convergence time by starting from smaller $b\left(0\right)$ values and let the algorithm find the minimal $b\left(k\right)$ necessary. This phase is analyzed in detail in Section V. 3. 3.

Exploitation Phase - this phase has a length of $c_{2}2^{k}$ time slots for some constant $c_{2}$ . During this phase, the links transmit on the channel they were allocated in the auction phase. If the exploration phase provided an accurate enough estimation of the QoS and the CSMA back-off time uses enough bits for quantization, then this phase adds no regret to the expected total regret since the links use the optimal allocation.

The fact that the exploitation phase takes an exponential number of time slots does not mean it takes a long time in practice. In fact, it only means that the lengths of the exploration and auction phases are much shorter. Note that $T$ is finite and can be limited by the designer, so even the last (longest) exploitation phase can still consist of just a couple of thousands of OFDM symbols, which amounts to only a few milliseconds. From a practical point of view, this is the desirable packet structure since the actual transmission takes the vast majority of the OFDM symbols while the equivalents of the synchronization header do not cause a significant overhead. The overhead caused by the exploration and auction phases is naturally measured by the sum of regrets as in (4). The structure of the $k$ -th packet of our algorithm is depicted in Fig. 2.

Our main Theorem is formulated as follows.

Theorem 3 (Main Theorem).

Assume that the instantaneous QoS $\left\{q_{n,i}\left(t\right)\right\}_{t}$ are independent in $n$ and i.i.d. in time $t$ , with expectations $Q_{n,i}\in\left\{Q_{1},\ldots,Q_{M}\right\}$ such that $Q_{i}=l_{i}\Delta_{\min}$ for a non-negative integer $l_{i}$ and a positive $\Delta_{\min}$ , and $Q_{1}<\ldots<Q_{M}$ . Denote $\Delta_{\max}=Q_{M}-Q_{1}$ . Let each link run Algorithm 1 with $\varepsilon<\frac{\Delta_{\min}}{4K}$ and an exploration phase length of

[TABLE]

Then, the expected sum of regrets is $\bar{R}\sim O\left(\log T\right)$ .

Proof:

Lemma 8 in Section V, proved in Appendix D, shows that if the exploration phase succeeds and enough bits are used for the CSMA back-off quantization, then the exploitation phase contributes no regret to the sum of regret. Lemma 5 in Section IV, proved in Appendix C, bounds from above the error probability of the exploration phase, showing that it decreases exponentially with $k$ . The proof follows by bounding from above the expected regret using these two facts. For details see Appendix A. ∎

III-A Implementation Issues

In the problem formulation, the length of the time slots is not specified. This is done in order to keep the theoretical framework identical to other multi-armed bandits algorithms and measure the regret using the same scale. However, when implementing Algorithm 1 in practice, there is no need to assume that all time slots are of equal length. In particular, the time slots used to implement the CSMA back-off time can be much shorter than time slots that are used to transmit a frame of $L$ OFDM symbols. The result, depicted in Fig. 2, is the well-known structure of a CSMA frame, like that used in WiFi. At the beginning of the $k$ -th frame, a contention window of $2^{b\left(k\right)}$ short slots is used, followed by the transmission in the chosen channel, over $L$ OFDM symbols. During the exploration and exploitation phases, no contention window is needed, which makes the overhead of the contention window negligible compared to $T$ .

We also note that the computational complexity of running Algorithm 1 for each link is $O\left(K\right)$ , since maximization over a $K$ -sized vectors is required.

IV Exploration Phase - Estimation of the QoS

In this section, we analyze the performance of the exploration phase and its contribution to the expected sum-regret. The distributed algorithm of [15] assumes each link knows its CSI, or the possible QoS each channel supports. Our algorithm lifts this assumption by working on online estimations of the CSI (or QoS) instead. Each link obtains these estimations by randomly exploring the different $K$ channels and averaging the instantaneous measurements of the QoS of each channel.

The exploration phase does not require the links to know the total number of links $N$ or the total duration of transmission $T$ . Hence, links cannot use a single long enough exploration phase at the beginning, since they want the exploration error probability to be designed according to $T$ and $N$ . The packet structure in Fig. 2 maintains the required balance. In each packet, only a constant number $c_{1}$ of time slots is dedicated to exploration, but the estimation of the $k$ -th exploration phase uses all the previous exploration phases.

The estimated QoS of the channels is needed for the next auction phase to converge to the optimal allocation. However, due to its distributed nature, ties cannot be arbitrarily broken. Hence, the exploration phase needs to output accurate enough estimates that guarantee that there will be no ties in the bids in the auction algorithm. For that purpose, after the estimation of the expected QoS is completed, artificial dither noise is added to the estimated values. This dither values are generated in advance independently and uniformly at random on a small interval. The following lemma characterizes the required estimation accuracy of the exploration phase, taking into account the dither noise.

Lemma 4.

Denote the dithered estimations of the expected QoS values in packet $k$ by $\left\{Q_{n,i}^{k}\right\}$ . Assume that $\left|Q_{n,i}^{k}-Q_{n,i}-u_{n,i}\right|\leq\Delta$ for each link $n$ and channel $i$ for some positive $\Delta$ . If $\Delta<\frac{3\Delta_{\min}}{8N}$ then

[TABLE]

Proof:

The proof follows from the fact that if $Q_{n,i}^{k}$ and $Q_{n,i}$ are close enough for every $i$ and $n$ , then the optimal assignment on $\left\{Q_{n,i}^{k}\right\}$ and $\left\{Q_{n,i}\right\}$ must be identical. For details see Appendix B. ∎

The following lemma concludes this section by providing an upper bound for the probability that the estimation for packet $k$ failed. The fact that this error probability exponentially vanishes with $k$ , allows us to limit the number of exploration time slots to $c_{1}$ , keeping the overhead caused by the exploration phase negligible.

Lemma 5 (Exploration Error Probability).

Denote the dithered estimations of the expected QoS values in packet $k$ by $\left\{Q_{n,i}^{k}\right\}$ . If the length of the exploration phase satisfies $c_{1}\geq K\max\left\{\frac{81}{2}K,\frac{128}{9}\left(\frac{\Delta_{\max}}{\Delta_{\min}}\right)^{2}N^{2}\right\}$ , then after the $k$ -th packet we have

[TABLE]

Proof:

The proof uses Hoeffding’s bound on both $\left|Q_{n,i}^{k}-Q_{n,i}\right|$ and the number of samples of $Q_{n,i}$ without collision. For details see Appendix C. ∎

V Auction Phase - Converging to the Optimal Allocation

We adopt the distributed auction in [15] as the basis for our auction phase. The multi-armed bandit problem uses a discrete time axis. Hence, a continuous back-off time as used in [15] is not possible. From a practical perspective, links cannot implement a truly continuous delay but a quantized one. With integer quantized delays, it is possible that two links use the same delay for the same channel although their continuous bids are different. In this case, they cannot agree on which of them won the bid and got the channel. It is clear that for a fine enough quantization, these bidding collisions will be avoided. However, due to the distributed nature of the problem, links do not know in advance what is considered a fine enough quantization. We propose a collision resolution algorithm that increases the quantization bits, described in step 3 in the Auction phase in Algorithm 1. Links coordinate their quantization by employing a “voting turn” that only uses the fact that all links can sense a single channel of their choice. In this special time slot, links listen to channel 1 which is used to signal if a collision occurred for some of the links.

Another issue to be resolved is where the continuous bids of two links $m$ and $n$ are identical, $B_{n,i}=B_{m,j}$ . Since there is no auctioneer, the links cannot agree on an arbitrary tie braking without communication. Hence, identical bids can prevent the CSMA auction algorithm from converging to the optimal solution. In order to avoid this problem, the auction phase uses a noisy version of the estimated expected rewards from the exploration phase. This noise is an artificial dither added by the links independently such that the probability for identical bids will be zero.

Lemma 6.

After the $k$ -th exploration phase we have $\Pr\left(B_{n,i}=B_{m,j}\right)=0$ for any $n\neq m$ and any $i,j$ .

Proof:

Due to the continuous (uniform) distribution of $u_{n,i}$ and $u_{m,j}$ , for any $m\neq n$ and $i,j$ , the probability that $Q_{n,i}^{k}=\frac{s_{i}(t)}{o_{i}}+u_{n,i}=Q_{m,j}^{k}=\frac{s_{j}(t)}{o_{j}}+u_{m,j}$ is zero. Since any bid $B_{n,i}$ is a linear combination of rewards and $\varepsilon$ , also the probability that at a certain iteration of the auction algorithm $B_{n,i}=B_{m,j}$ is zero. ∎

We emphasize that Lemma 6, and Lemma 7 below, only help to show (in Lemma 8) that Algorithm 1 eventually converges to the optimal solution. Links start transmitting data from the first packet, using a possibly suboptimal allocation in the exploitation phase. Hence, Algorithm 1 is likely to perform well much before convergence to the optimal allocation occurred. Nevertheless, our simulations in Section VI suggest that convergence to the optimal allocation occurs very fast, already in the first or the second packet.

Lemma 7.

Algorithm 1 converges to some final value $b_{f}$ , i.e., there exists a $k_{0}$ such that $b(k)=b_{f}$ for all $k>k_{0}$ .

Proof:

Consider two different bids $B_{n,i}\neq B_{m,j}$ of two different links $n\neq m$ , and assume that after quantization to $b\left(k\right)$ bits we have $f_{b\left(k\right)}\left(B_{n,i}\right)=f_{b\left(k\right)}\left(B_{m,j}\right)$ . In this case, the links will detect a collision after the auction phase and will increase the number of bits used for quantization. Since $B_{n,i}-B_{m,j}$ is a sum of rewards and some multiplication of $\varepsilon$ , for large enough $b\left(k\right)=b^{*}$ , we have $f\left(B_{n,i}\right)\neq f\left(B_{m,j}\right)$ for any $m,n,i,j$ such that $n\neq m$ and $B_{n,i}\neq B_{m,j}$ . Hence, $b\left(k\right)$ will not increase above $b^{*}$ , since collisions between $B_{n,i}\neq B_{m,j}$ cannot occur with $b\left(k\right)=b^{*}$ . Collisions from identical bids $B_{n,i}=B_{m,j}$ do not occur simply because their probability is zero, as shown in Lemma 6. ∎

Lemma 8.

Assume that $b(k^{\prime})=b_{f}$ for all $k^{\prime}>k$ . If the $k$ -th exploration phase succeeded, then the $k$ -th auction phase converges to an allocation $a_{1},\ldots,a_{N}$ such that $\left|\sum_{n=1}^{N}Q_{n,a_{n}}^{k}-\underset{a_{1},...,a_{N}}{\max}\sum_{n=1}^{N}Q_{n,a_{n}}^{k}\right|\leq\varepsilon$ in less than $\frac{KN}{\varepsilon_{k}}\left(Q_{M}+\frac{\Delta_{\min}}{8N}\right)2^{b\left(k\right)}$ time slots with probability 1. If $\varepsilon<\frac{3\Delta_{\min}}{4K}$ , then the auction phase converges to $\arg\underset{a}{\max}\sum_{n=1}^{N}Q_{n,a_{n}}$ .

Proof:

The proof follows from the convergence and performance guarantees proven in [50] together with Lemma 5. For details see Appendix D. ∎

VI Simulation Results

In this section, we demonstrate the performance of Algorithm 1 using computer simulations. We compared Algorithm 1 with the centralized Hungarian method, random channel selection and the E3 algorithm in [19]. The Hungarian method requires some central entity to know the CSI of all users. Requiring much less information, the E3 algorithm assumes that each user can decode which channel each of the other users has chosen. Our algorithm requires even much less information - each user only needs to sense if there is a transmission on a given channel. The role of the simulations of this section is to show that despite our much stricter information constraints, our algorithm performs almost exactly well as the E3 algorithm and even the optimal Hungarian algorithm. The comparison with the random channel selection assures that an algorithm that does not strive to converge to the optimal allocation performs very badly. This serves to show that the problem is far from being degenerated or trivial.

We verified our algorithm under various network scenarios consisting of different path losses and fading environments. The channel was divided into $N$ sub-channels and we used $N=K=10$ . The transmit power spectral density (PSD) was fixed at $12\mbox{dBm}$ for each user. The users were assumed to be moving at a speed of $3\mbox{km/h}$ . We used a transmission duration of $T=10^{5}$ time slots, with a single OFDM symbol per time slot ( $L=1$ ). Our transmission packet (see Fig. 2) has an exploration phase of 800 OFDM symbols and an auction phase of 500 OFDM symbols. Each experiment consists of averaging 100 independent realizations.

First, we considered an ad-hoc network of $N$ links that are uniformly distributed on disk with a radius of 500 m. The central carrier frequency was 2 GHz with a per-user transmission bandwidth of 200 KHz. The path loss was computed using path loss exponent of $\alpha=4$ . We considered two types of channel models: i.i.d. Rayleigh fading channel and the extended pedestrian A model (EPA) of the LTE standard with 9 random taps. In Fig. 3 the sum-regret of our algorithm is compared to that of the E3 algorithm [19] under an i.i.d. Rayleigh fading channel. It is evident that the performance of both algorithms is essentially identical, despite the fact that our algorithm uses no communication between users as the E3 algorithm [19] does. Both algorithms have an expected sum-regret that increases like $\log T$ and both converge to the optimal allocation already at the first packets. In Fig. 5, we present the spectral efficiency performance of both algorithms together with the confidence intervals of 90% and 95%, where again all performances are very similar between our algorithm and the E3 algorithm [19]. It also shows that the proposed algorithm approaches the optimal performance within a few packets, does much better than a random selection and behaves very similarly in all realizations. We have repeated the above experiment for the more realistic scenario of LTE channels. Fig. 4 again confirms that our performance is identical to that of the E3 algorithm [19].

Next, in Fig. 5 we demonstrate the performance of the proposed algorithm in the presence of alien interference for LTE channels. In this scenario, we considered four interferers that use four out of $K=10$ available channels. These interfering nodes are randomly located outside the network disk and within a distance of 500 m from the annular region of the disk. It can be seen from the right graph in Fig. 5 that the spectral efficiency is reduced by ~2 bits/sec/Hz. However, the proposed algorithm achieves the optimal performance within few thousand symbols similar to the interference-free case, as shown in Fig. 4. This scenario again confirms that our performance is identical to that of the E3 algorithm [19].

Finally, we considered a 5G system with more realistic channel scenarios consisting of pathloss, short-term fading, and long-term shadowing. We computed the path loss from empirical models of urban macro (UMa) in the distance range of $45\mbox{m}$ to $1429\mbox{m}$ and urban micro-street canyon (UMi-SC) in the distance range of $19\mbox{m}$ to $272\mbox{m}$ [51, 52]. The shadowing factor is $6\mbox{dB}$ and $7.8\mbox{dB}$ for the UMa and UMi-SC models, respectively. The fading channel consists of tapped delay line (TDL-A) model with 23 taps. The central carrier frequency was $6\mbox{GHz}$ with a per-user transmission bandwidth of $720\mbox{KHz}$ . The results in Fig. 6 demonstrate that the all the realistic channel phenomena we simulated do not prevent the proposed algorithm from quickly converging to the optimal solution.

The simulations in this section provide an additional solid support that our algorithm offers the same performance as [19] but with significantly less requirements from the devices. Specifically, we require no information exchange between different links as required in [19] and we only use the sensing of a single channel each time slot instead of sensing all channels simultaneously.

VII Conclusions

In this paper, we suggested a distributed algorithm for channel allocation with time varying-channels where links initially have no estimation for the statistics of the channels. Learning the statistics of the channels in real-time (exploration) comes at the expanse of using the best known channels (exploitation). The scenario is described by a multi-armed bandit game where a collision occurs if two or more links transmit on the same channel. We proved that our algorithm achieves the optimal order of regret - $O\left(\log T\right)$ . Our algorithm is based on a distributed auction algorithm that uses CSMA to avoid the need for an auctioneer (base station). In contrast to the state-of-the-art algorithms, our algorithm requires neither centralized management nor any communication between different links, which makes it very relevant to cognitive ad-hoc networks. Our algorithm only requires sensing a single channel at each time slot, which is $K$ times less than the state-of-the-art algorithms, where $K$ is the number of channels. Only a detection of whether there are transmissions on this channel is required, and no decoding and demixing operations are needed to discern which user chose which channel. From a practical point of view, this results in a significant complexity reduction of the physical layer design. Simulations show that our algorithm performs very well on realistic LTE and 5G channels.

Appendix A Proof of Theorem 3

Proof:

Denote the number of packets that start within $T$ time slots by $E$ . Let $k_{0}$ be the index of a sufficiently large packet. We compute the expected total regret as follows:

[TABLE]

where $\bar{R}_{k}$ is the expected total regret of packet $k$ and $\bar{R}_{0}$ is a constant with respect to $T$ . Denote by $P_{e,k}$ the error probability of the exploration of packet $k$ . In Lemma 8, we prove that if the exploration phase succeeded and the number of quantization bits $b\left(k\right)$ for the CSMA delay is large enough, then the auction phase is guaranteed to converge to the optimal solution of $\eqref{eq:5}$ for any $\varepsilon<\frac{\Delta_{\min}}{4K}$ . This optimal allocation is played in the exploitation phase, which adds no additional regret to the total regret. We prove in Lemma 5 that if (6) holds then, $P_{e,k}\leq 3NKe^{-k}$ . Hence, we obtain for a large enough $k$ such that $b\left(k\right)$ is sufficiently large that

[TABLE]

for some constant $b_{f}$ . We conclude that

[TABLE]

where in (a) we used the fact that completing the last packet to be a full packet only increases $\bar{R}_{k}$ . In (b) we used $T>\sum_{k=1}^{E-1}c_{2}2^{k}\geq c_{2}\left(2^{E}-2\right)$ , which yields $E\leq\log_{2}\left(\frac{T}{c_{2}}+2\right)$ . ∎

Appendix B Proof of Lemma 4

Proof:

Recall that $\Delta_{\min}=\underset{i\neq j}{\min}\left|Q_{i}-Q_{j}\right|$ . For all $n$ and $i$ we have $Q_{n,i}^{k}=Q_{n,i}+z_{n,i}+u_{n,i}$ such that $\left|u_{n,i}\right|\leq\frac{\Delta_{\min}}{8N}$ , and we assume that $\left|z_{n,i}\right|\leq\Delta$ . In the perturbed assignment problem, an optimal assignment $a^{1}\in\arg\underset{a_{1},...,a_{N}}{\max}\sum_{n=1}^{N}Q_{n,a\left(n\right)}$ performs at least as well as

[TABLE]

Any non optimal assignment $a$ performs at most as well as

[TABLE]

where $a^{2}$ is an assignment with the second best objective. For any two assignments $a\neq a^{\prime}$ with a different sum of QoS we have

[TABLE]

We conclude that for any non optimal $a$

[TABLE]

where (a) follows from (12) and (13), (b) from (14) and (c) holds for $\Delta<\frac{3\Delta_{\min}}{8N}$ . ∎

Appendix C Proof of Lemma 5

Proof:

After the $k$ -th exploration phase, the number of samples that are used for estimating the expected QoS is $T_{e}\left(k\right)=c_{1}k$ . Let $A_{n,i}\left(t\right)$ be the indicator that is equal to one if only link $n$ chose channel $i$ at time slot $t$ . Also define $V_{n,i}\left(t\right)\triangleq\sum_{\tau}A_{n,i}\left(\tau\right)$ , which is the number of times that link $n$ has used channel $i$ with no collision, up to time slot $t$ and define $V_{\min}=\underset{n,i}{\min}V_{n,i}\left(t\right)$ . Recall that $\Delta_{\max}=Q_{M}-Q_{1}$ and define the estimation error of channel $i$ for link $n$ by

[TABLE]

Denote by $E$ the event in which there exists a link $n$ that has $\xi_{n,i}\geq\Delta$ for some channel $i$ . We have

[TABLE]

where (a) follows by taking the union bound over all links and channels and (b) from using Hoeffding’s inequality for bounded variables [53]. Since the exploration phase consists of uniform and independent arm choices we have

[TABLE]

Therefore

[TABLE]

where (a) follows from the union bound, (b) from Hoeffding’s inequality for Bernoulli random variables and (c) since $K\geq N$ and $\left(1-\frac{1}{K}\right)^{K-1}-\frac{1}{4}\geq e^{-1}-\frac{1}{4}>\frac{1}{9}$ . We conclude that

[TABLE]

where (a) follows from (17) and (19). We choose $\Delta=\frac{3\Delta_{\min}}{8N}$ and $c_{1}=K\max\left\{\frac{81}{2}K,\frac{128}{9}\left(\frac{\Delta_{\max}}{\Delta_{\min}}\right)^{2}N^{2}\right\}$ to obtain

[TABLE]

∎

Appendix D Proof of Lemma 8

Proof:

In Lemma 3 of [15] it is shown that the number of iterations $I_{\textrm{auc}}$ of the distributed auction algorithm with $\varepsilon$ is bounded by

[TABLE]

where (a) follows since $Q_{n,i}^{k}\leq Q_{M}+\frac{\Delta_{\min}}{8N}$ for all $n$ and $i$ . Note that each iteration of the auction phase takes $2^{b\left(k\right)}+1$ time slots. If the $k$ -th exploration phase succeeded we have $\underset{n,i}{\max}\left|Q_{n,i}^{k}-Q_{n,i}\right|<\frac{3\Delta_{\min}}{8N}$ . For any two allocations $a\neq a^{\prime}$ with a different sum of QoS we have

[TABLE]

Hence

[TABLE]

where (a) follows from the reverse triangle inequality and (b) from (23) and $\underset{n,i}{\max}\left|Q_{n,i}^{k}-Q_{n,i}\right|<\frac{3\Delta_{\min}}{8N}$ .

Denote by $\tilde{a}$ the allocation that the auction phase converges to. If $\varepsilon<\frac{\Delta_{\min}}{4K}$ , then Theorem 1 in [15] guarantees that

[TABLE]

which, due to (24), is only possible if

[TABLE]

where (a) follows from Lemma 4 since we assume that the $k$ -th exploration phase succeeded. ∎

Bibliography53

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] I. Katzela and M. Naghshineh, “Channel assignment schemes for cellular mobile telecommunication systems: A comprehensive survey,” IEEE Communications Surveys Tutorials , vol. 3, no. 2, pp. 10–31, Second 2000.
2[2] S. Chieochan, E. Hossain, and J. Diamond, “Channel assignment schemes for infrastructure-based 802.11 WLA Ns: A survey,” IEEE Communications Surveys Tutorials , vol. 12, no. 1, pp. 124–136, First 2010.
3[3] G. Ku and J. M. Walsh, “Resource allocation and link adaptation in LTE and LTE advanced: A tutorial,” IEEE Communications Surveys Tutorials , vol. 17, no. 3, pp. 1605–1633, thirdquarter 2015.
4[4] E. Z. Tragos, S. Zeadally, A. G. Fragkiadakis, and V. A. Siris, “Spectrum assignment in cognitive radio networks: A comprehensive survey,” IEEE Communications Surveys Tutorials , vol. 15, no. 3, pp. 1108–1135, Third 2013.
5[5] M. E. Tanab and W. Hamouda, “Resource allocation for underlay cognitive radio networks: A survey,” IEEE Communications Surveys Tutorials , vol. 19, no. 2, pp. 1249–1276, Secondquarter 2017.
6[6] C. Y. Wong, R. S. Cheng, K. B. Lataief, and R. D. Murch, “Multiuser OFDM with adaptive subcarrier, bit, and power allocation,” IEEE Journal on Selected Areas in Communications , vol. 17, no. 10, pp. 1747–1758, Oct 1999.
7[7] Z. Shen, J. G. Andrews, and B. L. Evans, “Adaptive resource allocation in multiuser OFDM systems with proportional rate constraints,” IEEE Transactions on Wireless Communications , vol. 4, no. 6, pp. 2726–2737, Nov 2005.
8[8] S. Sadr, A. Anpalagan, and K. Raahemifar, “Radio resource allocation algorithms for the downlink of multiuser OFDM communication systems,” IEEE Communications Surveys Tutorials , vol. 11, no. 3, pp. 92–106, rd 2009.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Distributed Learning for Channel Allocation Over a Shared Spectrum

Abstract

Index Terms:

I Introduction

I-A Related Works

I-B Outline

II System Model

Definition 1**.**

III Protocol Description

Definition 2**.**

Theorem 3** (Main Theorem).**

Proof:

III-A Implementation Issues

IV Exploration Phase - Estimation of the QoS

Lemma 4**.**

Proof:

Lemma 5** (Exploration Error Probability).**

Proof:

V Auction Phase - Converging to the Optimal Allocation

Lemma 6**.**

Proof:

Lemma 7**.**

Proof:

Lemma 8**.**

Proof:

VI Simulation Results

VII Conclusions

Appendix A Proof of Theorem 3

Proof:

Appendix B Proof of Lemma 4

Proof:

Appendix C Proof of Lemma 5

Proof:

Appendix D Proof of Lemma 8

Proof:

Definition 1.

Definition 2.

Theorem 3 (Main Theorem).

Lemma 4.

Lemma 5 (Exploration Error Probability).

Lemma 6.

Lemma 7.

Lemma 8.