Upper-Confidence Bound for Channel Selection in LPWA Networks with   Retransmissions

Remi Bonnefoi (IETR); Lilian Besson (IETR); Julio Manco-Vasquez; (IETR); Christophe Moy (IETR)

arXiv:1902.10615·cs.NI·February 28, 2019

Upper-Confidence Bound for Channel Selection in LPWA Networks with Retransmissions

Remi Bonnefoi (IETR), Lilian Besson (IETR), Julio Manco-Vasquez, (IETR), Christophe Moy (IETR)

PDF

TL;DR

This paper investigates the use of UCB-based Multi-Arm Bandit algorithms for channel selection in LPWA IoT networks, demonstrating improved transmission success rates by leveraging retransmission data.

Contribution

It introduces and evaluates UCB-based heuristics for IoT channel access, highlighting their effectiveness and simplicity compared to more complex strategies.

Findings

01

UCB algorithms significantly improve successful transmission probabilities.

02

Pure UCB channel access performs as well as more complex methods.

03

Retransmission data enhances the contextual information for learning.

Abstract

In this paper, we propose and evaluate different learning strategies based on Multi-Arm Bandit (MAB) algorithms. They allow Internet of Things (IoT) devices to improve their access to the network and their autonomy, while taking into account the impact of encountered radio collisions. For that end, several heuristics employing Upper-Confident Bound (UCB) algorithms are examined, to explore the contextual information provided by the number of retransmissions. Our results show that approaches based on UCB obtain a significant improvement in terms of successful transmission probabilities. Furthermore, it also reveals that a pure UCB channel access is as efficient as more sophisticated learning strategies.

Equations31

p_{c} = 1 - (1 - x)^{N - 1} ⟺ x = 1 - (1 - p_{c})^{\frac{1}{N - 1}} .

p_{c} = 1 - (1 - x)^{N - 1} ⟺ x = 1 - (1 - p_{c})^{\frac{1}{N - 1}} .

p_{c p} (n) = (n N - 1) x^{n} (1 - x)^{N - 1 - n} .

p_{c p} (n) = (n N - 1) x^{n} (1 - x)^{N - 1 - n} .

p_{c 1} = p_{c a} + (1 - p_{c a}) p_{c} .

p_{c 1} = p_{c a} + (1 - p_{c a}) p_{c} .

p_{b p} (n) = (n N - 1) x^{n} (1 - x)^{N - 1 - n} [1 - (1 - \frac{1}{m})^{n}] .

p_{b p} (n) = (n N - 1) x^{n} (1 - x)^{N - 1 - n} [1 - (1 - \frac{1}{m})^{n}] .

p_{c a} = \frac{1}{p _{c}} n = 1 \sum N - 1 p_{b p} (n) .

p_{c a} = \frac{1}{p _{c}} n = 1 \sum N - 1 p_{b p} (n) .

\frac{1}{p _{c}} n = 1 \sum N - 1 (n N - 1) x^{n} (1 - x)^{N - 1 - n} [1 - (1 - \frac{1}{m})^{n}]

\frac{1}{p _{c}} n = 1 \sum N - 1 (n N - 1) x^{n} (1 - x)^{N - 1 - n} [1 - (1 - \frac{1}{m})^{n}]

= 1 - \frac{1}{p _{c}} n = 1 \sum N - 1 (n N - 1) x^{n} (1 - x)^{N - 1 - n} (1 - \frac{1}{m})^{n} .

p_{c a} ≃ 1 - \frac{( 1 - x ) ^{N - 1}}{p _{c}} n = 1 \sum N_{0} (n N - 1) x^{n} (1 - \frac{1}{m})^{n} .

p_{c a} ≃ 1 - \frac{( 1 - x ) ^{N - 1}}{p _{c}} n = 1 \sum N_{0} (n N - 1) x^{n} (1 - \frac{1}{m})^{n} .

p_{c a} ≃ 1 - \frac{( 1 - x ) ^{N - 1}}{p _{c}} n = 1 \sum N_{0} (n N - 1) x^{n} (1 - \frac{1}{m})^{n} .

p_{c a} ≃ 1 - \frac{( 1 - x ) ^{N - 1}}{p _{c}} n = 1 \sum N_{0} (n N - 1) x^{n} (1 - \frac{1}{m})^{n} .

p_{c a} ≃ 1 - \frac{( 1 - x ) ^{N - 1}}{p _{c}} n = 1 \sum N - 1 (n N - 1) x^{n} (1 - \frac{1}{m})^{n} .

p_{c a} ≃ 1 - \frac{( 1 - x ) ^{N - 1}}{p _{c}} n = 1 \sum N - 1 (n N - 1) x^{n} (1 - \frac{1}{m})^{n} .

p_{c a} ≃ \frac{1}{p _{c}} - (\frac{1}{p _{c}} - 1) [1 + (1 - (1 - p_{c})^{\frac{1}{N - 1}}) (1 - \frac{1}{m})]^{N - 1} .

p_{c a} ≃ \frac{1}{p _{c}} - (\frac{1}{p _{c}} - 1) [1 + (1 - (1 - p_{c})^{\frac{1}{N - 1}}) (1 - \frac{1}{m})]^{N - 1} .

N_{k} (t) = τ = 0 \sum t - 1 \mathbbm 1 (C (τ) = k),

N_{k} (t) = τ = 0 \sum t - 1 \mathbbm 1 (C (τ) = k),

μ_{k} (t) = \frac{1}{N _{k} ( t )} τ = 0 \sum t - 1 r_{k} (τ) \mathbbm 1 (C (τ) = k) .

μ_{k} (t) = \frac{1}{N _{k} ( t )} τ = 0 \sum t - 1 r_{k} (τ) \mathbbm 1 (C (τ) = k) .

B_{k} (t) = α lo g (t) / N_{k} (t),

B_{k} (t) = α lo g (t) / N_{k} (t),

U_{k} (t) = μ_{k} (t) + B_{k} (t) .

U_{k} (t) = μ_{k} (t) + B_{k} (t) .

C (t) = ar g 1 \leq k \leq K max U_{k} (t) .

C (t) = ar g 1 \leq k \leq K max U_{k} (t) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Upper-Confidence Bound for Channel Selection in LPWA Networks with Retransmissions

††thanks: This publication is supported by the French National Research Agency (ANR), under the projects SOGREEN and EPHYL (grants N ANR-14-CE28-0025-02 and N ANR-16-CE25-0002-03), by Région Bretagne, France, by École Normale Supérieure de Paris-Saclay. by European Union, through the European Regional Development Fund (ERDF), and by Ministry of Higher Education and Research, Brittany and Rennes Métropole, through the CPER Project SOPHIE / STIC & Ondes.

Rémi Bonnefoi1, Lilian Besson1, Julio Manco-Vasquez1, and Christophe Moy2

1 IETR / CentraleSupélec Campus de Rennes, F- $35510$ Cesson-Sévigné, France,

$\{$ Remi.Bonnefoi,Lilian.Besson,JulioCesar.MancoVasquez $\}$ @CentraleSupelec.fr

2 Univ Rennes, CNRS, IETR - UMR $6164$ , F- $35000$ , Rennes, France

[email protected]

Abstract

In this paper, we propose and evaluate different learning strategies based on Multi-Arm Bandit (MAB) algorithms. They allow Internet of Things (IoT) devices to improve their access to the network and their autonomy, while taking into account the impact of encountered radio collisions. For that end, several heuristics employing Upper-Confident Bound ( $\mathrm{UCB}$ ) algorithms are examined, to explore the contextual information provided by the number of retransmissions. Our results show that approaches based on $\mathrm{UCB}$ obtain a significant improvement in terms of successful transmission probabilities. Furthermore, it also reveals that a pure $\mathrm{UCB}$ channel access is as efficient as more sophisticated learning strategies.

Index Terms:

Low Power Wide Area, Multi-Armed Bandits, Upper-Confident Bound, retransmissions, Internet of Things.

I Introduction

Nowadays, the Internet of Things (IoT) and in particular the Low Power Wide Area (LPWA) technology is considered a main driver for a vast variety of application that will support the communications among a large number of devices. In fact, network operators are starting to deploy Machine to Machine (M2M) solutions using LPWA networking technologies [1]. For instance, LoRaWAN and SigFox technologies have been most adopted in the monitoring of large scale systems (e.g., smart cities, metering), where a large number of devices compete for the transmission of their packets in the unlicensed Industrial, Scientific and Medical (ISM) bands.

Nevertheless, this demand to fit a growing number of energy-limited end-devices requires the development of contention-based protocol more tailored for LPWAN technologies. Thus, novel access mechanisms considering collision-avoidance methods need to be addressed to avoid degrading the network performance in these unlicensed bands. In fact, the number of packet collisions increases as more devices without coordination share the same band. Hence, an important concern in the Medium Access (MAC) design is to reduce the Packet Loss Ratio (PLR) due to the interference caused by the collisions among the devices.

In this regard, in the context of Cognitive Radio [2, 3], Multi-Arm Bandit (MAB) algorithms [4, 5, 6] have been recently proposed as a potential solution for channel access in LPWA networks [7, 8, 9]. For instance in [9], the impact of non-stationarity on the network performance using MAB algorithms is studied. In this work, low-cost algorithms following two well-known approaches, such as the Upper-Confidence Bound ( $\mathrm{UCB}$ ) [4, 5], and the Thompson Sampling (TS) algorithms [10] have reported encouraging results. Other recent directions include theoretical analysis [11, 12], and realistic empirical simulations [13, 14], of the application of MAB algorithms for slotted wireless protocols in a decentralized manner, or applications to multi-hoping networks [15, 16]. None of the above mentioned articles discusses in detail the impact of retransmissions on the performance of MAB learning algorithms as we do in this paper.

The aim of this paper is to assess the performance of MAB algorithms [6] for channel selection in LPWA networks, while taking into account the impact of retransmissions on the network performance. For this reason, several decision making strategies are applied after a first retransmission (i.e., when a collision occurs). Proposed approach employs contextual information provided by the number of retransmissions, and implemented at each device, so that no coordination among them is needed. Moreover, our $\mathrm{UCB}$ -based heuristics show low complexity making them suitable for being embedded in LPWA devices.

The contributions of this paper are summarized as follows:

•

Firstly, we provide a close form approximation of the radio collision probability after a first retransmission. By doing this, we highlight the need to develop a learning approach for channel selection upon collision.

•

Secondly, different heuristics are proposed to cope with retransmissions.

•

Lastly, we conduct simulations in order to compare the performance of the proposed heuristics with a naive uniform random approach, and a $\mathrm{UCB}$ strategy (i.e., without any learning for the retransmissions).

The rest of the paper is organized as follows. First the system model is introduced in Section II. Our motivations are exposed in Section III, and a formal description of the MAB learning algorithms is given in Section IV. The proposed $\mathrm{UCB}$ -based heuristics are presented in Section V, while the corresponding numerical results are shown in Section VI. Finally, some conclusions are drawn in Section VII.

II System model

II-A LPWA Network

We consider in this paper an LPWA network composed of a gateway and a large number of end-devices that regularly send short data packets, where $K$ channels ( $K>1$ ) are available for the transmission of their packets.

We assume that this network is constituted by two types of devices: on one hand, we have static devices that operate in one channel111 Note that, for unlicensed bands, this definition also encompasses any device following a different standard or trying to establish communication with gateways of other networks. in order to communicate with the gateway. On the other hand, there are IoT devices, that possess the additional advantage of being able to select any of the $K$ available channels to perform their transmissions.

Regardless the type of devices, each of them follows a slotted ALOHA protocol [17], and has a probability $p>0$ to transmit a packet in a time slot. We make the hypothesis that the transmission is successful if the channel is available, otherwise upon radio collision, these devices will attempt to transmit their packet up-to $M$ times, with $M\in\mathbb{N}$ . Note that, every retransmission is carried out after a random back-off time, uniformly distributed in $\llbracket 0,m-1\rrbracket$ , where $m\in\mathbb{N},m>0$ is the length of the back-off interval.

II-B Model of our IoT devices

The aforementioned contention process can be described by a Markov chain model [18] similar to the one presented in [19], as it is depicted in Fig. 1. A device containing a packet for transmission goes from an idle state to a transmission state, while considering retransmissions due to different collision probabilities, i.e., $\{p_{c},p_{c1},\dots,p_{cM-2}\}$ , at each $M-1$ back-off stage. At each time slot, a transition from an idle state to a transmission state (denoted as Trans.) occurs if a packet transmission is required, while waiting states (denoted as Wait), correspond to a $m$ back-off interval.

A device aims to select a channel with the highest probability of successful transmission, for which it resorts to a reinforcement learning approach. It is formulated as a MAB problem, where each channel (also called arms) is viewed as a gambling machine (bandit), and each bandit has a reward. Then, at every trial, a device chooses a channel that maximizes the sum of the collected rewards. These rewards are the acknowledgment (Ack) signals received after transmitting packets to the gateway. In this way, a successful transmission is considered when an acknowledgment is received, and a learning approach is employed to select the best channel.

We address the problem of channel selection taking into account the described Markov model for the retransmissions of end-devices. It motivates our present work for which we consider the retransmissions in the analysis of MAB algorithms.

III Motivations for the proposed approach

When a device experiments a collision, it goes in a back-off state to retransmit the same packet on a channel. If all devices remain in the same channel for retransmissions, it could result in a sequence of successive collisions with the same devices’ packets that previously collided. Thus, it seems interesting to consider in the decision making policy the possibility for a device to retransmit in a different channel. One of our motivations to develop new MAB algorithms for our problem is this option of using a different communication channels between the first transmission and the next retransmissions.

By considering this possibility, the device will have to learn more, thus, we expect the learning time to be longer, but it could be possible that the final performance gain (i.e., in terms of successful transmission rate) increases too. The next Section VI presents analysis to check this performance gain, for various heuristics based on the $\mathrm{UCB}$ algorithm.

Here after, we start by presenting a mathematical derivation that backups this idea. To do so, we study the collision probabilities considering the Markov process depicted in Fig. 1, and foresee the impact of addressing bandit strategies, as well as setting guidelines for the design of heuristic approaches.

III-A Probability of collision at the second transmission slot

As it is well known, having a collision during an access time can be overcome by a retransmission procedure (this can take several retransmission attempts). What interest us here, is to obtain a mathematical approximation of the collision probability at the second transmission slot $p_{c1}$ , as a function of the first collision probability $p_{c}$ .

We consider two hypotheses $\mathcal{H}_{1}$ and $\mathcal{H}_{2}$ defined as,

•

$\mathcal{H}_{1}$ : The probability $p_{c1}$ , is composed by the sum of two probabilities: i) the probability of colliding consecutively twice, i.e., the devices that collide at a given time slot and collide again when retransmitting their packets, and ii) the probability of collision among devices that did not collide in the same previous collision. Moreover, we suppose that the number of devices involved in a collision is small in comparison to the total number of devices.

•

$\mathcal{H}_{2}$ : The total number of the back-off stages at time $t$ is constant, and it is assumed to be large enough to consider that no device will ever be in the last failure state (this case is the one on the right side in Figure 1), after $M$ successive failed retransmissions.

Considering one device and a channel, we denote $x_{t}^{i}$ the probability that it is transmitting a packet for the $i+1$ time in a given time slot $t$ (with $i\in\llbracket 0,M-1\rrbracket$ ), and let $x_{t}=\sum_{i=0}^{M-1}x_{t}^{i}$ be the probability that it transmits a packet. We consider $N$ active devices following the same policy.

We assume to be in the steady state [18], in our Markov chain model depicted in Figure 1, and thus the probabilities no longer depend on the slot number $t$ (i.e., $\forall t,x_{t}=x$ ). Therefore, the probability that this device has a collision at the first transmission is $p_{c}$ , and has the following expression

[TABLE]

Moreover, from (1) we define the probability $p_{cp}(n)$ that involves the collision of $n$ packets sent by each IoT device (for any $1\leq n\leq N-1$ ), during the first transmission slot, and is defined by the following equation

[TABLE]

As explained above, if an IoT device experiences a collision at the first transmission, it proceeds for the retransmission of its packet after a random back-off interval. We denote $p_{ca}$ the probability to have a collision with a packet involved in the previous collision. Under the $\mathcal{H}_{1}$ assumption, the number of packets involved in the same previous collision remains very small in comparison to the total number of devices that may transmit during this time. In other words, this collision probability does not depend on previous retransmissions and is equal to $p_{c}$ . So, the probability that the same device’s packet experiences again a collision at the second time slot is

[TABLE]

If the device has a collision at the first attempt, we consider $p_{bp}(n)$ the probability that it has a collision with exactly $n$ packets (for any $1\leq n\leq N-1$ ), and that at least one of the $n$ devices involved in this first collision chooses the same back-off interval,

[TABLE]

Besides, $p_{ca}$ is the conditional probability of collision with a packet sent by a device involved in the previous collision given that the packet experienced collision at its first transmission. Hence, under hypothesis $\mathcal{H}_{2}$ , we can use Bayes theorem and the law of total probability to relate $p_{ca}$ with $p_{bp}(n)$ , and the different probabilities that a device experienced a collision during the first slot and has the same back-off interval for its retransmission is,

[TABLE]

Therefore, the expression of $p_{ca}$ is

[TABLE]

Once again under $\mathcal{H}_{1}$ , assuming that the number of devices involved in the first collision is small compared to $N-1$ , the first $N_{0}\ll N-1$ terms of the sum in (III-A) are predominant. We derive,

[TABLE]

Moreover, for these terms, $n$ is small compared to $N-1$ , and so $N-1-n$ can be approximated to $N-1$ . Thus it gives,

[TABLE]

Assuming $\mathcal{H}_{1}$ amounts to consider that $x\ll 1$ . As a consequence, the sum in equation (7) can be supplemented by negligible terms,

[TABLE]

We use the binomial theorem to compute the sum in (8), and we rewrite the expression of $p_{ca}$ as

[TABLE]

Finally, our approximation of $p_{c1}$ can be obtained by inserting (9) in (2).

III-B Behaviour analysis of $p_{c}$ and $p_{c1}$

In order to assess the proposed approximation, we suppose a unique channel where all the devices follow the same contention Markov process. We simulate an ALOHA protocol with a maximum number of retransmissions $M=10$ , a maximum back-off interval $m=10$ , and a transmission probability $p=10^{-3}$ . In Fig. 2, we show the collision probabilities for different number of devices $N$ (from $N=50$ up-to $N=400$ ), for both $p_{c}$ and $p_{c1}$ .

From this simulations, we can verify that our approximation is very precise for lower values $p_{c1}\leq 30\%$ (i.e., red and orange curves are quite close). Moreover, a significant gap between $p_{c1}$ and $p_{c}$ , of up-to $10\%$ , can be observed, which suggests us to resort to MAB algorithms for the channel selection for both the first transmission and next retransmissions.

III-C Learning is useful for non-congested networks

It is worth to highlight that, if we write (2) as $p_{c1}=p_{c}+p_{ca}\left(1-p_{c}\right)$ , then it is obvious that $p_{c1}$ is always larger than $p_{c}$ (as $p_{ca}\left(1-p_{c}\right)>0$ ). But for large values of $p_{c}$ , $p_{ca}\left(1-p_{c}\right)\simeq 0$ so the gap gets small, and for small values of $p_{c}$ the gap is significant. Moreover, we can verify (e.g., numerically or by differentiating) that the gap decreases when $p_{c}$ increases (for fixed $N$ and $m$ ). This backups mathematically the observation we made from Fig. 2: the smaller $p_{c}$ , the larger is the gap between $p_{c}$ and $p_{c1}$ .

We interpret this fact in two different situations. On one hand, in a congested network, when devices suffer from a large probability of collision on their first transmission (i.e., $p_{c}$ is not so small), then $p_{c1}\simeq p_{c}$ and so devices cannot really hope to reduce their collision probabilities even if the use a different channel for retransmission. On the other hand, if $p_{c}$ is small enough, i.e., in a network not yet too congested, then our derivation shows that $p_{c1}\gg p_{c}$ , meaning that the possible gain of retransmitting in a different channel that the one used for the first transmission can be large, in terms of collision probability (e.g., up-to $10\%$ in this experimental setting). In other words, when learning can be useful (small $p_{c}$ ), learning to retransmit in a different channel can have a large impact on the global collision rate, thus justifying our approach.

IV A well-known MAB Algorithm: $\mathrm{UCB}$

Without loss of generality, we have adopted a well-studied stochastic MAB learning algorithm, where the reward distributions are unknown and assumed to be independent and identically distributed (i.i.d). The arms model the channels denoted as $C(t)\in\llbracket 1,K\rrbracket$ , and the players, the dynamic devices, learn the distributions to be able to progressively focus on the best arm, i.e., the arm with largest mean representing the mean availability of a given channel $k$ .

Before presenting our proposed heuristics, we describe a $\mathrm{UCB}$ bandit algorithm [4]. It has reported to be efficient, while featuring a low complexity for its implementation. For this reason, it has been employed for IoT applications [9], and we employ this approach to develop our proposals.

IV-A The $\mathrm{UCB}$ algorithm

A first approach is to only use an empirical mean estimator of the rewards in every channel, and select the channel with highest estimated mean at every time step; but this greedy approach is known to fail dramatically [5]. Indeed, with this policy, the selection of arms depends too much on the first draws: if the first transmission in one channel fails and the first one on other channels succeeds, the device will never use the first channel again, even if it is the best one (i.e., the most available, in average).

Rather than relying on the empirical mean reward, $\mathrm{UCB}$ algorithms instead use a confidence interval on the unknown mean $\mu_{k}$ of each arm, which can be viewed as adding a “bonus” exploration to the empirical mean. They follow the “optimism-in-face-of-uncertainty” principle: at each step, they play according to the best model, as the statistically best possible arm (i.e., the highest $\mathrm{UCB}$ ) is selected.

More formally, for one device, let $N_{k}(t)$ be the number of times the channel $k$ (for $k\in\llbracket 1,K\rrbracket$ ) was selected up-to time $t-1$ , for $t\geq 0$ for any $t\in\mathbb{N}$ ,

[TABLE]

where $\mathbbm{1}$ is an indicator function that is equal to $1$ , if the IoT device chooses, for its $\tau$ -th transmission, the channel $k$ , and [math] otherwise. The empirical mean estimator $\widehat{\mu_{k}}(t)$ of channel $k$ is defined as the mean reward obtained up-to time $t-1$ ,

[TABLE]

where $r_{k}(t)$ is the reward obtained after transmission in channel $k$ at time $t$ ( $1$ for a successful transmission, and [math] otherwise) A confidence term $B_{k}(t)$ is given by [5],

[TABLE]

where $\alpha$ refers to an exploration coefficient222 In fact, the larger this coefficient is, the longer the exploration, while the $\mathrm{UCB}$ algorithm is proven to be order optimal for $\alpha>0.5$ [6], and has reported a good performance for lower values of $\alpha>0$ ., that we chose equal to $1/2$ , as suggested in [20] and as done in previous works [7, 9]. Then, an upper confidence bound in each channel $k$ is defined as

[TABLE]

Finally, the transmission channel at time step $t$ is the one maximizing this $\mathrm{UCB}$ index $U_{k}(t)$ , as it is the one expected to be the best one at the current time step $t$ ,

[TABLE]

The $\mathrm{UCB}$ algorithm is implemented independently by each device, and we present it in Algorithm 1. Note that a device using this first approach is only able to select a channel for the first and all the corresponding retransmissions of a packet.

V Proposed Heuristics

A device that implements the UCB algorithm is led to focus is transmissions and retransmissions in the channel which has been identified as the best. As explained in Section III, focusing in one channel increases the collision probability in retransmissions. In this Section, we describe the proposed heuristics for the channel selection in a retransmission. It is carried out taking into account that a device can incorporate a different channel selection strategy while being in a back-off state. Hence, a natural question is to evaluate whether using this additional contextual information can improve the performance of a learning policy.

For that end, all of our heuristics comprise two stages: the first stage is a $\mathrm{UCB}$ algorithm employed for the first attempt to transmit, and the second stage is another algorithm used for channel selections for the next retransmissions.

We present below four heuristics for this second stage (short names in “quotes” correspond to the legend on Figures 3, 4).

V-A Uniform random retransmission (“Random”)

In this first proposal, the device uses a random channel selection, following a uniform distribution (in $\llbracket 1,K\rrbracket$ ). It is described below in Algorithm 2.

V-B $\mathrm{UCB}$ * for retransmission (“Only $\mathrm{UCB}$ ”)*

Instead of applying a random channel selection, another heuristic is to use a second $\mathrm{UCB}$ algorithm in the second stage. In other words, we expect that this algorithm is able to learn the best channel to retransmit a packet. It is described in Algorithm 3, and it is still a practical approach, since the storage requirements and time complexity remains linear w.r.t. the number of channels $K$ (i.e., of order $\mathcal{O}(K)$ ).

Note that, we use the superscript $({}^{r})$ to denote the variables $\widehat{\mu^{r}_{k}}(t)$ , $B^{r}_{k}(t)$ and $U^{r}_{k}(t)$ , related to the $\mathrm{UCB}$ algorithm employed for the retransmission.

V-C $K$ * different $\mathrm{UCB}$ s for retransmission (“ $K$ $\mathrm{UCB}$ ”)*

Another heuristic is to not use the same algorithm no matter where the collision occurred, but to use $K$ different $\mathrm{UCB}$ algorithms. Meaning that after a failed first transmission in channel $j$ , the device relies on the $k$ -th algorithm to decide its retransmission. The corresponding algorithm is depicted in Algorithm 4. Each of these algorithms are denoted using the superscript $({}^{j})$ , for $j\in\llbracket 1,K\rrbracket$ .

Although, this approach increases the complexity and storage requirements (of order $\mathcal{O}(K^{2})$ ). For our LPWA networks of interest, such as LoRaWAN, the cost of its implementation is still affordable, since a small number of channels is used. For instance, for $K=4$ channels, the memory to storage $K+1=5$ algorithms is of the order of the requirements to storing one.

V-D Delayed $\mathrm{UCB}$ for retransmission (“Delayed $\mathrm{UCB}$ ”)

This last heuristic is a composite of the random retransmission (Algorithm 2) and the $\mathrm{UCB}$ retransmission (Algorithm 3) approaches. Instead of starting the second stage $\mathrm{UCB}$ directly from the first retransmission, we introduce a fixed delay $\Delta\in\mathbb{N}$ , $\Delta\geq 1$ , and start to rely on the second stage $\mathrm{UCB}$ after $\Delta$ transmissions. The selection for the first steps is handled with the random retransmission.

The idea behind this delay is to allow the first stage $\mathrm{UCB}$ to start learning the best channel, before starting the second stage $\mathrm{UCB}$ (see details in Algorithm 5). The number of transmissions to wait before applying the second algorithm is denoted by $\Delta$ , it has to be fixed before-hand.

Note that, we use the superscript $({}^{d})$ to denote the variables related to the delayed second-stage $\mathrm{UCB}$ algorithm.

VI Simulations to compare our heuristics

We simulate our network considering $N$ devices following the contention Markov process described in Section II, and a LoRa standard with $K=4$ channels. Each device is set to transmit with a fixed probability $p=10^{-3}$ , i.e., a packet about every $20$ minutes for time slots of $1\;\mathrm{s}$ .

For the evaluation of the proposed heuristics, a total number of $T=20\times 10^{4}$ time slots is considered, and the results are averaged over $10^{3}$ independent random simulations.

In a first scenario, we consider a total number of $N=1000$ IoT devices, with a non-uniform repartition of static devices given by $10\%,30\%,30\%,30\%$ for the four channels. In other words, the channels are occupied $10\%$ , $30\%$ , $30\%$ , and $30\%$ of time, and the contention Markov process considered is given by $M=5$ , and $m=5$ . In Fig. 3, we show the successful transmission rate versus the number of slots, for all the proposed heuristics.

A first result is that all the heuristics clearly outperform the non-learning approach that simply use random channel selection for both transmissions and retransmissions (i.e., the no $\mathrm{UCB}$ curve). The improvement of the heuristics over the non-learning approach is evident, and for every heuristic that use a kind of learning mechanism it can be observed a successful transmission rate that increases rapidly (or equivalently an PLR decreasing). Moreover, all of these approaches show a fast convergence making them suitable for the targeted application. It is also worths mentioning that the employment of the same $\mathrm{UCB}$ algorithm for retransmissions denoted here as “Only $\mathrm{UCB}$ ” achieves the best performance, while a “Random” retransmission features a slight degradation. This result can be explained as follows: the loss of performance related to the separation of information for several algorithms is greater than the gain obtained by considering the first transmissions and retransmissions separately.

We also consider in our analysis the case where $M=5$ , and $m=10$ using ALOHA protocol, a statistic distribution of the devices about $40\%,30\%,20\%,10\%$ for the four channels, and $N=2000$ IoT devices. The corresponding results are depicted in Fig. 4. In this case the successful transmission rate is degraded compared with achieved results in Fig. 3, this can be explained with the fact that we are considering in our network more devices that increase the collision probability. It is important to highlight, that the “Random” retransmission heuristic shows a poor performance in comparison to the other heuristics, and it can be attributed to the fact that the number of retransmission is increased, and consequently a learning approach is able to take advantage of it. Furthermore, the “ $\mathrm{UCB}$ ”, “ $K$ $\mathrm{UCB}$ ” and “Delayed $\mathrm{UCB}$ ” heuristics behave similarly than “Only $\mathrm{UCB}$ ”, after a similar convergence time.

The conclusions we can draw from depicted results are twofold. First, MAB learning algorithms are very useful to reduce the collision rate in LPWA networks, a gain of up-to $30\%$ of successful transmission rate is observed after convergence. A second conclusion that can be highlighted is that, using learning mechanisms for retransmissions can be an interesting way to reduce collisions in networks with massive deployments of IoT as this can be checked in Fig. 4, where the random retransmission heuristic is not very advantageous in front of the $\mathrm{UCB}$ -based approaches that use learning for channel selection during the retransmission procedure.

VII Conclusions

In this paper, we presented a retransmission model of LPWA networks based on an ALOHA protocol, slotted both in time and frequency, in which dynamic IoT devices can use machine learning algorithms, to improve their PLR when accessing the network. The main novelty of this model is to address the packet retransmissions upon radio collision, by using a Multi-Armed Bandit framework. We presented and evaluated several learning heuristic that try to learn how to transmit and retransmit in a smarter way, by using the $\mathrm{UCB}$ algorithm for channel selection for first transmission, and different proposals based on $\mathrm{UCB}$ for the retransmissions upon collisions.

We showed that incorporating learning for the transmission is needed to achieve optimal performance, with significant gain in terms of successful transmission rate in networks with a large number of devices (up-to $30\%$ in the example network). Our empirical simulations show that each of our proposed heuristic outperforms a naive random access scheme. Surprisingly, the main take-away message is that a simple $\mathrm{UCB}$ learning approach, that retransmit in the same channel, turns out to perform as well as more complicated heuristics.

Future works

The utility and impact of the proposed approaches for LPWA networks motivates us to address several subjects as future works. Among them, the non-stationarity of the channel occupancy caused by the learning policy employed by the IoT devices. For that end, modifications of MAB algorithms have been proposed, such as Sliding-Window- $\mathrm{UCB}$ or Discounted- $\mathrm{UCB}$ [21] or more recently M- $\mathrm{UCB}$ [22], that nevertheless have not been explored for the targeted problem.

In order to validate our results in a realistic experimental setting and not only with simulations, future works include a hardware implementation of the analyzed models to complete our recent works [23, 24]. A hardware demonstrator could be also benefit to study other settings by removing some hypotheses, for instance by studying a similar model in non-slotted time.

Note on the simulation code

The source code (MATLAB or Octave) used for the simulations and the figures is open-sourced under the MIT License, at Bitbucket.org/scee_ietr/ucb_smart_retrans.

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] U. Raza, P. Kulkarni, and M. Sooriyabandara, “Low power wide area networks: An overview,” IEEE Communications Surveys Tutorials , vol. 19, no. 2, pp. 855–873, 2017.
2[2] J. Mitola and G. Q. Maguire, “Cognitive Radio: making software radios more personal,” IEEE Personal Communications , vol. 6, pp. 13–18, Aug 1999.
3[3] S. Haykin, “Cognitive Radio: Brain-Empowered Wireless Communications,” IEEE Journal on Selected Areas in Communications , vol. 23, no. 2, pp. 201–220, 2005.
4[4] P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-time Analysis of the Multi-armed Bandit Problem,” Machine Learning , vol. 47, no. 2, pp. 235–256, 2002.
5[5] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, “The Non-Stochastic Multi-Armed Bandit Problem,” SIAM Journal on Computing , vol. 32, no. 1, pp. 48–77, 2002.
6[6] S. Bubeck, N. Cesa-Bianchi, et al. , “Regret analysis of stochastic and nonstochastic multi-armed bandit problems,” Foundations and Trends® in Machine Learning , vol. 5, no. 1, pp. 1–122, 2012.
7[7] R. Bonnefoi, C. Moy, and J. Palicot, “Improvement of the LPWAN AMI backhaul’s latency thanks to reinforcement learning algorithms,” EURASIP Journal on Wireless Communications and Networking , vol. 2018, no. 1, p. 34, 2018.
8[8] A. Azari and C. Cavdar, “Self-organized Low-power Io T Networks: A Distributed Learning Approach,” in IEEE Globecom™ , (Abu Dhabi, UAE), Dec 2018.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Upper-Confidence Bound for Channel Selection in LPWA Networks with Retransmissions

Abstract

Index Terms:

I Introduction

II System model

II-A LPWA Network

II-B Model of our IoT devices

III Motivations for the proposed approach

III-A Probability of collision at the second transmission slot

III-B Behaviour analysis of pcp_{c}pc​ and pc1p_{c1}pc1​

III-C Learning is useful for non-congested networks

IV A well-known MAB Algorithm: UCB\mathrm{UCB}UCB

IV-A The UCB\mathrm{UCB}UCB algorithm

V Proposed Heuristics

V-A Uniform random retransmission (“Random”)

V-B UCB\mathrm{UCB}UCB* for retransmission (“Only UCB\mathrm{UCB}UCB”)*

V-C KKK* different UCB\mathrm{UCB}UCBs for retransmission (“KKK UCB\mathrm{UCB}UCB”)*

V-D Delayed UCB\mathrm{UCB}UCB for retransmission (“Delayed UCB\mathrm{UCB}UCB”)

VI Simulations to compare our heuristics

VII Conclusions

Future works

Note on the simulation code

III-B Behaviour analysis of $p_{c}$ and $p_{c1}$

IV A well-known MAB Algorithm: $\mathrm{UCB}$

IV-A The $\mathrm{UCB}$ algorithm

V-B $\mathrm{UCB}$ * for retransmission (“Only $\mathrm{UCB}$ ”)*

V-C $K$ * different $\mathrm{UCB}$ s for retransmission (“ $K$ $\mathrm{UCB}$ ”)*

V-D Delayed $\mathrm{UCB}$ for retransmission (“Delayed $\mathrm{UCB}$ ”)