Traffic-Aware Transmission Mode Selection in D2D-enabled Cellular   Networks with Token System

Yiling Yuan; Tao Yang; Hui Feng; Bo Hu; Jianqiu Zhang; Bin Wang and; Qiyong Lu

arXiv:1703.00660·cs.IT·June 14, 2017

Traffic-Aware Transmission Mode Selection in D2D-enabled Cellular Networks with Token System

Yiling Yuan, Tao Yang, Hui Feng, Bo Hu, Jianqiu Zhang, Bin Wang and, Qiyong Lu

PDF

Open Access

TL;DR

This paper investigates an incentive-based D2D-enabled cellular network where user devices select transmission modes and token policies to optimize long-term utility, demonstrating the effectiveness of threshold-based strategies through simulations.

Contribution

It introduces an optimal transmission mode and token collection policy framework with proven threshold structure for D2D networks, enhancing efficiency and utility.

Findings

01

Optimal policy is a threshold strategy.

02

Thresholds exhibit monotonicity.

03

Transmission mode selection improves network utility.

Abstract

We consider a D2D-enabled cellular network where user equipments (UEs) owned by rational users are incentivized to form D2D pairs using tokens. They exchange tokens electronically to "buy" and "sell" D2D services. Meanwhile the devices have the ability to choose the transmission mode, i.e. receiving data via cellular links or D2D links. Thus taking the different benefits brought by diverse traffic types as a prior, the UEs can utilize their tokens more efficiently via transmission mode selection. In this paper, the optimal transmission mode selection strategy as well as token collection policy are investigated to maximize the long-term utility in the dynamic network environment. The optimal policy is proved to be a threshold strategy, and the thresholds have a monotonicity property. Numerical simulations verify our observations and the gain from transmission mode selection is observed.

Tables2

Table 1. TABLE I: Action spaces

State $(s, k)$	Action space	Action	Physical meanings
$s \neq s_{0}$	$A_{M}$	$a_{M} = 0$	choose cellular mode
$s \neq s_{0}$	$A_{M}$	$a_{M} = 1$	choose D2D mode
$s = s_{0}$	$A_{R}$	$a_{R} = 0$	accept any D2D request
$s = s_{0}$	$A_{R}$	$a_{R} = 1$	refuse any D2D request

Table 2. TABLE II: Environmental factors

Parameters	Physical meanings
$p$	probability of receiving D2D requests
$q$	probability of its D2D request being accepted

Equations59

P {(s^{'}, k^{'}) ∣ (s, k), a} = ⎩ ⎨ ⎧ p (s^{'}) {(1 - a_{M}) + a_{M} (1 - q)} p (s^{'}) q a_{M} p (s^{'}) p (s^{'}) {a_{R} + (1 - a_{R}) (1 - p)} p (s^{'}) p (1 - a_{R}) p (s^{'}) 0 s \neq = s_{0}, k > 0, k^{'} = k s \neq = s_{0}, k > 0, k^{'} = k - 1 s \neq = s_{0}, k = 0, k^{'} = k s = s_{0}, k < K, k^{'} = k s = s_{0}, k < K, k^{'} = k + 1 s = s_{0}, k = K, k^{'} = k otherwise .

P {(s^{'}, k^{'}) ∣ (s, k), a} = ⎩ ⎨ ⎧ p (s^{'}) {(1 - a_{M}) + a_{M} (1 - q)} p (s^{'}) q a_{M} p (s^{'}) p (s^{'}) {a_{R} + (1 - a_{R}) (1 - p)} p (s^{'}) p (1 - a_{R}) p (s^{'}) 0 s \neq = s_{0}, k > 0, k^{'} = k s \neq = s_{0}, k > 0, k^{'} = k - 1 s \neq = s_{0}, k = 0, k^{'} = k s = s_{0}, k < K, k^{'} = k s = s_{0}, k < K, k^{'} = k + 1 s = s_{0}, k = K, k^{'} = k otherwise .

E {μ (s, k, a)} = {- c p (1 - a_{R}) q a_{M} b_{s} I (k > 0) s = s_{0} s \neq = s_{0} .

E {μ (s, k, a)} = {- c p (1 - a_{R}) q a_{M} b_{s} I (k > 0) s = s_{0} s \neq = s_{0} .

V^{π} (s_{0}, k_{0}) = E {t = 0 \sum \infty β^{t} μ (s_{t}, k_{t}, π (s_{t}, k_{t}))},

V^{π} (s_{0}, k_{0}) = E {t = 0 \sum \infty β^{t} μ (s_{t}, k_{t}, π (s_{t}, k_{t}))},

π^{*} = a r g π max V^{π} (s_{0}, k_{0}) .

π^{*} = a r g π max V^{π} (s_{0}, k_{0}) .

V^{*} (s, k) = a \in A (s, k) max {E {μ (s, k, a)} + β s^{'} \in S \sum p (s^{'}, k^{'} ∣ s, k, a) V^{*} (s, k)} .

V^{*} (s, k) = a \in A (s, k) max {E {μ (s, k, a)} + β s^{'} \in S \sum p (s^{'}, k^{'} ∣ s, k, a) V^{*} (s, k)} .

β s^{'} \in S \sum p (s^{'}) {V^{*} (s^{'}, k) - V^{*} (s^{'}, k - 1)} \geq b_{s} .

β s^{'} \in S \sum p (s^{'}) {V^{*} (s^{'}, k) - V^{*} (s^{'}, k - 1)} \geq b_{s} .

β s^{'} \in S \sum p (s^{'}) {V^{*} (s^{'}, k + 1) - V^{*} (s^{'}, k)} \geq c .

β s^{'} \in S \sum p (s^{'}) {V^{*} (s^{'}, k + 1) - V^{*} (s^{'}, k)} \geq c .

\begin{split}&qa_{M}b_{s}+\beta\sum_{s^{\prime}\in\mathcal{S}}p(s^{\prime},k^{\prime}|s,k,a_{M}){V^{*}}(s,k)\Big{|}_{a_{M}=0}\geq qa_{M}b_{s}+\\ &\beta\sum_{s^{\prime}\in\mathcal{S}}p(s^{\prime},k^{\prime}|s,k,a_{M}){V^{*}}(s,k)\Big{|}_{a_{M}=1}.\end{split}

\begin{split}&qa_{M}b_{s}+\beta\sum_{s^{\prime}\in\mathcal{S}}p(s^{\prime},k^{\prime}|s,k,a_{M}){V^{*}}(s,k)\Big{|}_{a_{M}=0}\geq qa_{M}b_{s}+\\ &\beta\sum_{s^{\prime}\in\mathcal{S}}p(s^{\prime},k^{\prime}|s,k,a_{M}){V^{*}}(s,k)\Big{|}_{a_{M}=1}.\end{split}

β s^{'} \in S \sum p (s^{'}) V^{*} (s^{'}, k) \geq q b_{s} + β s^{'} \in S \sum p (s^{'}) {(1 - q) V^{*} (s^{'}, k) + q V^{*} (s^{'}, k - 1)} .

β s^{'} \in S \sum p (s^{'}) V^{*} (s^{'}, k) \geq q b_{s} + β s^{'} \in S \sum p (s^{'}) {(1 - q) V^{*} (s^{'}, k) + q V^{*} (s^{'}, k - 1)} .

β \sum_{s^{'} \in S} p (s^{'}) {V^{n} (s^{'}, k) - V^{n} (s^{'}, k - 1)} \geq b_{s} .

β \sum_{s^{'} \in S} p (s^{'}) {V^{n} (s^{'}, k) - V^{n} (s^{'}, k - 1)} \geq b_{s} .

β \sum_{s^{'} \in S} p (s^{'}) {V^{n} (s^{'}, k + 1) - V^{n} (s^{'}, k)} \geq c .

β \sum_{s^{'} \in S} p (s^{'}) {V^{n} (s^{'}, k + 1) - V^{n} (s^{'}, k)} \geq c .

V^{n + 1} (s, k) = E {μ (s, k, π^{n + 1} (s, k))} + β s^{'} \in S \sum p (s^{'}, k^{'} ∣ s, k, π^{n + 1} (s, k)) V^{n} (s, k)

V^{n + 1} (s, k) = E {μ (s, k, π^{n + 1} (s, k))} + β s^{'} \in S \sum p (s^{'}, k^{'} ∣ s, k, π^{n + 1} (s, k)) V^{n} (s, k)

V^{n} (s, k + 1) - V^{n} (s, k) \leq V^{n} (s, k) - V^{n} (s, k - 1), n \geq 0.

V^{n} (s, k + 1) - V^{n} (s, k) \leq V^{n} (s, k) - V^{n} (s, k - 1), n \geq 0.

Δ^{n} (s, k + 1) - Δ^{n} (s, k) \leq Δ^{n} (s, k) - Δ^{n} (s, k - 1) .

Δ^{n} (s, k + 1) - Δ^{n} (s, k) \leq Δ^{n} (s, k) - Δ^{n} (s, k - 1) .

V^{n + 1} (s, k - 1) = Δ^{n} (k - 1),

V^{n + 1} (s, k - 1) = Δ^{n} (k - 1),

V^{n + 1} (s, k) = Δ^{n} (k),

V^{n + 1} (s, k + 1) = q b_{s} + q Δ^{n} (k) + (1 - q) Δ^{n} (k + 1) .

V^{n + 1} (s, k + 1) - V^{n + 1} (s, k)

V^{n + 1} (s, k + 1) - V^{n + 1} (s, k)

=

\leq (a)

\leq

=

V^{n + 1} (s, k - 1) = Δ^{n} (k - 1),

V^{n + 1} (s, k - 1) = Δ^{n} (k - 1),

V^{n + 1} (s, k) = b_{s} + q Δ^{n} (k - 1) + (1 - q) Δ^{n} (k),

V^{n + 1} (s, k + 1) = q b_{s} + q Δ^{n} (k) + (1 - q) Δ^{n} (k + 1) .

V^{n + 1} (s, k) - V^{n} (s, k - 1)

V^{n + 1} (s, k) - V^{n} (s, k - 1)

=

\geq (a)

V^{n + 1} (s, k + 1) - V^{n} (s, k)

V^{n + 1} (s, k + 1) - V^{n} (s, k)

=

\leq

π^{*} (s, k) = {01 k < K_{t h} (s) k \geq K_{t h} (s) .

π^{*} (s, k) = {01 k < K_{t h} (s) k \geq K_{t h} (s) .

β s^{'} \in S \sum p (s^{'}) {V^{*} (s^{'}, k) - V^{*} (s^{'}, k - 1)} \leq b_{s} .

β s^{'} \in S \sum p (s^{'}) {V^{*} (s^{'}, k) - V^{*} (s^{'}, k - 1)} \leq b_{s} .

β s^{'} \in S \sum p (s^{'}) {V^{*} (s^{'}, k + 1) - V^{*} (s^{'}, k)} \leq b_{s},

β s^{'} \in S \sum p (s^{'}) {V^{*} (s^{'}, k + 1) - V^{*} (s^{'}, k)} \leq b_{s},

Q_{s_{v}} (P_{s n r}) = 4.5 - \frac{3.5}{1 + exp ( b _{1} ( P _{s n r} - b _{2} ))},

Q_{s_{v}} (P_{s n r}) = 4.5 - \frac{3.5}{1 + exp ( b _{1} ( P _{s n r} - b _{2} ))},

Q_{s_{e}} (θ) = b_{3} lo g (b_{4} θ),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsICT Impact and Policies · Advanced MIMO Systems Optimization · Advanced Wireless Network Optimization

Full text

Traffic-Aware Transmission Mode Selection in D2D-enabled Cellular Networks with Token System

Yiling Yuan1, Tao Yang1, Hui Feng1, Bo Hu12, Jianqiu Zhang1, Bin Wang1 and Qiyong Lu1

1 Research Center of Smart Networks and Systems, School of Information Science and Engineering

2Key Laboratory of EMW Information (MoE)

Fudan University, Shanghai, China, 200433

Emails: {yilingyuan13, taoyang, hfeng, bohu, jqzhang01, wangbin, lqyong}@fudan.edu.cn

Abstract

We consider a D2D-enabled cellular network where user equipments (UEs) owned by rational users are incentivized to form D2D pairs using tokens. They exchange tokens electronically to “buy” and “sell” D2D services. Meanwhile the devices have the ability to choose the transmission mode, i.e. receiving data via cellular links or D2D links. Thus taking the different benefits brought by diverse traffic types as a prior, the UEs can utilize their tokens more efficiently via transmission mode selection. In this paper, the optimal transmission mode selection strategy as well as token collection policy are investigated to maximize the long-term utility in the dynamic network environment. The optimal policy is proved to be a threshold strategy, and the thresholds have a monotonicity property. Numerical simulations verify our observations and the gain from transmission mode selection is observed.

I Introduction

To meet the dramatically increasing traffic demand and provide better user experience, the device-to-device (D2D) communication has been proposed recently. This technology, which enables direct communication between two mobile users in proximity, has attracted attention in both industry and academic [1, 2, 3]. The adoption of D2D communication allows high-rate, low-delay and low-power transmissions[4].

Recent researches on D2D communication mainly focus on developing various optimization and game theory frameworks for mode selection, resource allocation or interference management in order to maximize throughput or to improve energy efficiency [5, 6, 7]. These studies are based on the assumption that there are many devices already in D2D communication mode. However, this assumption needs to be re-examined in realistic scenarios. The UEs are possessed by self-interested users who aim to maximize their individual utilities. In practice, they would have no incentive to provide D2D service unless receiving satisfactory rewards. Therefore, it is crucial to design a proper incentive mechanism to encourage UEs to form D2D pairs[8].

In this paper, we design a token-based incentive system. In such system, UEs pay tokens to or gain tokens from other UEs in exchange for D2D service. Some previous works have investigated the token system on cooperative relaying in cellular networks [9, 10]. In [9], authors designed a token incentive scheme from the perspective of a system designer. In [10], authors investigated how UEs can learn the token gathering strategies online. However, neither of them takes into account how UEs make decisions when facing two alternatives, i.e. D2D link versus cellular link. The former one has to consume tokens while the latter one does not. In practice, there are various types of traffic which will result in different benefits in D2D communication. If the decision on transmission mode selection is considered, tokens can be utilized more efficiently. Intuitively, the UE could spend more tokens on more beneficial traffic types to improve his utility. Therefore, it is crucial to answer the question “when to use tokens” or “which transmission mode to choose” equivalently. To the best of our knowledge, our work is the first attempt in literature to investigate token consuming policy in the token system designed for D2D-enabled cellular networks.

In this paper, we consider a D2D-enabled cellular network where UEs are incentivized to form D2D pair using tokens. We formulate a Markov decision process (MDP) model to characterize the interaction between each UE and environment. i.e. transmission mode selection policy and token collection strategy. When traffic arrives, a UE needs to first choose the transmission mode, and then determines whether to accept D2D request if idle. The objective of a UE is to maximize his long-term utility, which is defined as the difference between the benefit he obtains when receiving data through D2D link and the cost he pays when providing D2D service. Furthermore, the structure of the optimal policy is investigated. Unlike[10, 11], the optimal policy is analytically proved to be threshold in the number of the tokes instead of just taking this property as an assumption. Moreover, it turns out that the threshold increases as a function of the benefits of the traffic types. The numerical simulations verify our observations and the gain from transmission mode selection is observed.

The rest of this paper is organized as follows. In Section II, the system model is discussed. In Section III, the MDP model for individual UE’s decision problem is developed. In Section IV, we investigate the structure of the optimal policy. Section V gives some numerical simulation results, and finally section VI concludes this paper.

II System Model

II-A Network Model

In this paper, we consider a D2D-enabled wireless cellular network and the slot based system is adopted. At each slot, when traffic arrives, such as transferring a file, the UE will choose the transmission mode and start a transmission procedure. The transmission modes include cellular mode and D2D mode. The former mode corresponds to the conventional cellular communication and the latter mode represents D2D communication. According to the given policy as well as the available information, the decision is made at the beginning of each slot. Note that each slot may last several seconds and each type of traffic may last multiple slots.

Without loss of generality, we assume that for any type, D2D mode can always obtain higher benefit than cellular mode. It is reasonable due to lower power consumption and higher throughput of D2D link. Suppose the utility for cellular mode is [math] for convenience. Considering different requirements for different traffic types, we define the specific utility for each type of traffic according to its characteristics. There are some widely used classification in literature under various practical consideration. Basically, in terms of throughput requirements, traffic can be divided into two types, stream traffic and elastic traffic [12]. Besides, traffic can also be classified as video traffic, audio traffic and file transfer according to the types of applications[13].

Similar to [14], we do not specify a concrete traffic classification. Instead, we assume there are $N$ types of traffic and the traffic type set is denoted as $\mathcal{S}^{o}=\{s_{1},s_{2},\cdots,s_{N}\}$ . Especially, we regard $s_{0}$ as a special type of traffic, namely, the idle state. Hence we can define the extended traffic type set as $\mathcal{S}=\mathcal{S}^{o}\cup\{s_{0}\}$ . The stationary probability of each type $s\in\mathcal{S}$ is $p(s)$ with $0<p(s)<1$ and $\sum_{s\in\mathcal{S}}p(s)=1$ . We denote $b_{s}$ as the benefit of D2D mode for traffic type $s\in\mathcal{S}^{o}$ . Moreover, we assume that $0<b_{s_{1}}<b_{s_{2}}<\cdots<b_{s_{N}}$ .

II-B Token System

Although D2D communication has multiple advantages, the UEs are generally reluctant to provide D2D service since this incurs cost and provides them with no reward. To overcome this difficulty, we use token system to incentive UEs to accept D2D requests. Specifically, a UE must spend tokens in exchange for receiving data through D2D link, and can only earn tokens by providing D2D service for other UEs. Because the device works in half-duplex mode and the traffic demand must be met, so it is reasonable to assume that only in idle state, a UE can provide D2D transmission service.

The token system has many advantages[9]. First, there is no extra payment exchange involved, which avoids many financial problems associated with other monetary incentive schemes. Second, no personal information exchange is required, which allows secure implementation. Recently, several techniques which could enable electronic token transaction, have been proposed[15]. We assume that our token system is implemented using such technologies.

The entire system is described as follows. Consider UE $j$ decides to start a D2D transmission. At this point, UE $j$ sends a D2D request to the selected UE $j^{*}$ under a predefined criterion. If the request is accepted, UE $j$ will pay one token to UE $j^{*}$ through token exchange system. Otherwise, the UE $j$ will seek another UE to forward his traffic. If there is no UE accepting the request, UE $j$ has to deliver the data through BS. Moreover, we assume that the UE can serve only one UE simultaneously. Therefore, an idle UE will choose only one or reject all when he receives multiple D2D requests.

III Problem Formulation

In this section, we formulate the optimal policy for a UE based on MDP model. When a UE has no token, he has no choice but to choose cellular mode. In addition, a UE would spend as many tokens as possible on the traffic types with high utility in order to maximize his utility. Therefore, it is needed to investigate the optimal strategy, which includes transmission mode selection policy and token collection strategy.

III-A State and Action Spaces

Token holding state: At any given slot $t$ , the UE holds $k_{t}\in\mathcal{K}=\{0,1,\cdots,K\}$ tokens, where $K$ is the maximal number of tokens allowed in the system.

Traffic type state: At different slots, the UE may have different type of traffic. Denote the type of traffic in slot $t$ as $s_{t}\in\mathcal{S}$ . Assume that the traffic types of different slots are independent mutually.

The state parameters defined above can be used to describe the UE’s private information at slot $t$ . Hence, let $\Omega_{t}=(s_{t},k_{t})$ denote the state of the UE at slot $t$ .

When $s\neq s_{0}$ , which means a specified traffic arrives, the UE can take an action to choose D2D mode or cellular mode. We denote the action taken when $s\neq s_{0}$ as $a_{M}\in A_{M}=\{0,1\}$ . $a_{M}=0$ and $a_{M}=1$ represent the cellular mode and D2D mode, respectively.

When $s=s_{0}$ , the UE can decide whether to accept D2D requests from other UEs. In this situation, we denote the action taken as $a_{R}\in A_{R}=\{0,1\}$ . $a_{R}=0$ is the action that the UE chooses to accept the D2D request to earn one token, and $a_{R}=1$ represents the action that the UE refuses to provide D2D service for other UEs. Putting all these together, the action space $A(s,k)$ is shown in Table.I.

III-B Transition Probability

Now we discuss the state transition probability. Let $P\{(s^{\prime},k^{\prime})|(s,k),a\}$ denote the state transition probability function, which represents the probability that the UE transfers from state $\Omega=(s,k)$ to state $\Omega^{\prime}=(s^{\prime},k^{\prime})$ depending on the action $a$ .

Because the D2D request may not be accepted and a UE may not receive any D2D requests even if he takes the action $a_{R}=0$ , the state transition is influenced by the complicated varying environment. We use a stochastic model to describe the environmental dynamics. The associated environmental factors are shown in Table.II. Specifically, we use $0<p<1$ to denote the probability of receiving D2D requests when the UE takes the action $a_{R}=0$ , and use $0<q<1$ to denote the probability of the D2D request being accepted when the UE takes the action $a_{M}=1$ . These parameters are unknown a priori, but can be learned from history or other reinforcement learning methods[16], such as Q-learning.

Consequently, the state transition probability is presented in (1). We will explain it in detail later.

[TABLE]

At first, we consider the case in which $s\neq s_{0}$ thus $a=a_{M}\in A_{M}$ . If $k>0$ , the number of tokens can decrease by one or stay unchanged depending on the selected action. There are two possibilities for the transition from $(s,k)$ to $(s^{\prime},k)$ : the first situation is that the UE takes the action $a_{M}=0$ , indicating $P\{(s^{\prime},k)|(s,k),a_{M}=0\}=p(s^{\prime})$ ; the other one suggests that the D2D request is rejected by all potential UEs while the UE takes the action $a_{M}=1$ , and it corresponds to $P\{(s^{\prime},k)|(s,k),a_{M}=1\}=p(s^{\prime})(1-q)$ . Therefore, we obtain $P\{(s^{\prime},k)|(s,k),a_{M}\}=p(s^{\prime})\{(1-a_{M})+a_{M}(1-q)\}$ when putting them together. Transition from $(s,k)$ to $(s^{\prime},k-1)$ will happen only when the UE takes the action $a_{M}=1$ and the D2D request is accepted. Thus, the transition probability is $p(s^{\prime})qa_{M}$ , which means that the transition is possible only when $a_{M}=1$ . Besides, the probability of transition from $(s,0)$ to $(s^{\prime},0)$ is $p(s^{\prime})$ no matter which action is taken since the action $a_{M}=1$ is meaningless here. Otherwise the transition probability is zero. Following the similar argument, we can get the transition probability when the UE is idle.

III-C Reward

When the UE provides D2D service for another UE, the cost incurred is defined as $c$ . The cost can be thought as the average cost of all possible D2D transmissions because we only care about the average utility in our model. Thus, we can get the expected reward $\mu(s,k,a)$ depending on state $(s,k)$ and action $a$ as follows.

[TABLE]

where $\mathbb{E}\{\cdot\}$ is the expectation and $I(\cdot)$ is the indicator function.

III-D Optimization Problem Formulation

A policy $\pi$ is defined as a function to specify the action $\pi(s,k)$ to be taken for the state $(s,k)$ . When $s=s_{0}$ , $\pi(s,k)$ represents the transmission mode selection policy and it corresponds to token collection policy when $s\neq s_{0}$ . The expected utility obtained by executing policy $\pi$ starting at state $(s_{0},k_{0})$ is given by

[TABLE]

where $\beta\in(0,1)$ is the discounted factor.

Our goal is to find the optimal policy $\pi^{*}$ to maximize the expected utility, which can be expressed as the optimization problem shown in (4).

[TABLE]

Value iteration or policy iteration[17] can be used to obtain the optimal policy when $p$ and $q$ are known. When these environmental parameters are unknown, Q-learning[16] can be adopted.

IV Optimal Policy for a Single UE

In this section, we investigate the structure of optimal policy. We will prove that the optimal policy is threshold. In [9], this property is proved only for one-dimensional state case, but a two-dimensional state case is analyzed here.

Let $V^{*}(s,k)=V^{\pi^{*}}(s,k)$ for brevity. It is given by the solution of Bellman equation shown in (5)[17].

[TABLE]

The optimal policy $\pi^{*}(s,k)$ is the action $a\in A(s,k)$ to maximize the right hand side of Bellman equation. It is easy to find out that $\pi^{*}(s,0)=0(s\neq s_{0})$ and $\pi^{*}(s_{0},K)=1$ . From the Bellman equation, it turns out that the optimal strategy has the following one-shot deviation property[9].

Lemma 1

The optimal strategy $\pi^{*}$ has following property:

(1)

For $s\neq s_{0},k>0$ , $\pi^{*}(s,k)=0$ if and only if

[TABLE]

(2)

For $s=s_{0},k<K$ , $\pi^{*}(s,k)=0$ if and only if

[TABLE]

Proof:

For $s\neq s_{0},k>0$ , based on Bellman equation (5), $\pi^{*}(s,k)=0$ if and only if

[TABLE]

Using the transition probability in (1), we can obtain the following inequality.

[TABLE]

After some simple algebraic operations, inequality (6) can be verified.

Following the similar argument, we can prove the second part of Lemma 1. ∎

The LHS of (6) is the opportunity cost for using one token at this point and the RHS of (6) is the immediate utility brought by this action. Since the opportunity cost is higher than the immediate utility, the UE will choose $a_{M}=0$ , namely cellular mode. We can interpret (7) in a similar way.

Here we assume that the the environmental factors $p$ and $q$ are known as a prior. Thus the value iteration algorithm can be used to obtain the optimal policy, which is depicted in Algorithm 1.

Now we show the marginal decrease of the utility function $V^{n}(s,k)$ at each iteration of Algorithm 1. This property is depicted in Theorem 1 in detail.

Theorem 1 (The marginal diminishing utility)

At each iteration of Algorithm 1, the following inequality holds:

[TABLE]

Proof:

We will use induction to show that (8) holds for $n\geq 0$ .

Due to the initiation step of Algorithm 1, (8) holds for all $n=0$ .
Suppose the induction hypothesis holds for some $n\geq 0$ . In order to prove (8) holds for $n+1$ , the proof includes two parts. At first we will show that $\pi^{n+1}(s,k)$ has threshold structure, which will be used to verify (8) for $n+1$ in the second part. And for the sake of notational conciseness, we define $\Delta^{n}(k)\triangleq\sum_{s^{\prime}\in\mathcal{S}}p(s^{\prime})V^{n}(s^{\prime},k+1)$ , and then the following inequality holds by using the induction hypothesis:

[TABLE]

We first show the threshold structure of $\pi^{n+1}(s,k)$ . It suffices to prove that if $\pi^{n+1}(s,k+1)=0$ , then $\pi^{n+1}(s,k)=0$ . When $s\neq s_{0}$ , given the step 2 of the algorithm and using (9), we get the inequality $\Delta(k)-\Delta(k-1)\geq\Delta(k+1)-\Delta(k)\geq b_{s}$ , so $\pi^{n+1}(s,k)=0$ . Similarly, we can prove it when $s=s_{0}$ .

Next we will prove that given the utility function obtained in step 2 of the algorithm, (8) holds for $n+1$ .

When $s\neq s_{0}$ , we only need to consider four cases due to the threshold structure of the policy.

Case 1: $\pi^{n+1}(s,k-1)=\pi^{n+1}(s,k)=0$ and $\pi^{n+1}(s,k+1)=1$ . Thus

[TABLE]

Then, we can get:

[TABLE]

Using the fact that $\pi^{n+1}(s,k)=0$ amounts to $\Delta^{n}(k)-\Delta^{n}(k-1)\leq b_{s}$ , we can obtain inequality (a).

Case 2: $\pi^{n+1}(s,k-1)=0$ and $\pi^{n+1}(s,k)=\pi^{n+1}(s,k+1)=1$ . Thus

[TABLE]

Then, the following inequality can be obtained:

[TABLE]

Inequality (a) holds because when $\pi^{n+1}(s,k)=1$ , then $\Delta^{n}(k)-\Delta^{n}(k-1)\geq b_{s}$ . Moveover, we can find out that:

[TABLE]

Therefore, it is obvious that (8) holds for $n+1$ in this situation.

For the case where $\pi^{n+1}(s,k-1)=\pi^{n+1}(s,k)=\pi^{n+1}(s,k+1)=0$ or $\pi^{n+1}(s,k-1)=\pi^{n+1}(s,k)=\pi^{n+1}(s,k+1)=1$ , it is easy to verify the inequality.

Similarly, we can verify the inequality $V^{n+1}(s,k+1)-V^{n+1}(s,k)\leq V^{n+1}(s,k)-V^{n+1}(s,k-1)$ when $s=s_{0}$ . ∎

Remark 1

Theorem 1 indicates that the marginal reward of owning an additional token decreases. The incentive of holding a token is that the UE can use the token to request D2D service to improve his utility. However, keeping tokens has inherent risk modeled by $\beta$ , which exponentially “discounts” future rewards.

Furthermore, from the proof of Theorem 1, we can find an important fact that the optimal policy is a threshold strategy in $k$ for a given traffic type.

Proposition 1 (Threshold structure)

The optimal policy is a threshold strategy when the traffic type is given. Specifically, there exits a constant $K_{th}(s)$ depending on the type of traffic $s\in\mathcal{S}$ , such that:

[TABLE]

Proof:

We only consider the case when $s\neq s_{0}$ here and the proof is similar when $s=s_{0}$ . Recall that $\pi^{*}(s,0)=0(s\neq s_{0})$ , therefore it is sufficient to show that if $\pi^{*}(s,k)=1(k\geq 1,s\neq s_{0})$ , then $\pi^{*}(s,k+1)=1$ .

Suppose $\pi^{*}(s,k)=0(s\neq s_{0})$ . According to Lemma 1, we find out that

[TABLE]

Additionally, Theorem 1 implies that ${V^{*}}(s^{\prime},k+1)-{V^{*}}(s^{\prime},k)\leq{V^{*}}(s^{\prime},k)-{V^{*}}(s^{\prime},k-1)$ . Therefore, we have

[TABLE]

which implies that $\pi^{*}(s,k+1)=1$ . ∎

Intuitively, for traffic type $s\neq s_{0}$ , if the UE chooses D2D mode when owning $k$ tokens, he is more likely to still choose D2D mode when more tokens is available. In fact, many research works make this assumption due to its simplicity when they build their models. Unlike these works, we analytically prove that optimal policy has a threshold structure instead of just assuming this property without rigorously proving its optimality.

Remark 2

According to Proposition 1, only $|\mathcal{S}|$ thresholds is needed to define the optimal policy. Therefore, the size of search space would be significantly reduced due to the small amount of traffic types. Note that this property still holds when the traffic types of adjacent slots are dependent.

Moreover, it turns out that the thresholds have a monotonicity property.

Proposition 2 (Monotonicity)

If $b_{i}<b_{j}(i,j\neq s_{0})$ , then $K_{th}(j)\leq K_{th}(i)$ where $K_{th}(s)$ is the threshold defined in Proposition 1.

Proof:

It is sufficient to verify that if $b_{i}<b_{j}(i,j\neq s_{0})$ and $\pi^{*}(j,k)=0$ , then $\pi^{*}(i,k)=0$ . According to Lemma 1, we can find out that $\sum_{s^{\prime}\in\mathcal{S}}p(s^{\prime}){V^{n}(s^{\prime},k)-V^{n}(s^{\prime},k-1)}\geq b_{j}\geq b_{i}$ , and thus we can get $\pi^{*}(i,k)=0$ using the sufficient condition for the optimal policy. ∎

Proposition 2 implies that the more beneficial traffic types have higher probability to be served in D2D mode due to the lower threshold. It means that the UE will spend more tokens on those traffic types. Consequently, the UE’s long-term utility is improved.

V Numerical Simulations

In this section, we give simulations to verify the analyzed results. At first, we present several numerical results to show the structure of the optimal policy and illustrate the behavior of the optimal threshold $K_{th}(s)(s\neq s_{0})$ with respect to other parameters. We assume that $s_{1},s_{2},s_{3},s_{4}$ belongs to $\mathcal{S}^{o}$ and $p_{s_{0}}=p_{s_{1}}=p_{s_{2}}=p_{s_{3}}=p_{s_{4}}=0.2$ . The benefits of these traffic types are $3,4,5,6$ , respectively, the cost $c=1$ and $K=20$ . These parameters are set for illustration purpose, and a more realistic scenario will be considered later.

The optimal policy is given Fig.1. As indicated in Proposition 1, the optimal policy is threshold in the token state $k$ . Fig.2a shows the thresholds vary with respect to the discount factor $\beta$ . Note that the threshold for the most beneficial traffic type is always one, which is omitted here. The optimal threshold is non-decreasing in $\beta$ . It is because, if a UE is far-sighted, he intends to wait for a better chance to consume tokens, i.e. more beneficial traffic types, and thus has less incentive to use tokens if the benefit of the current traffic is low. Fig.2b illustrates the thresholds vary with respect to the environmental factor $p$ . The optimal threshold decreases as $p$ increases. This happens allowing for being easier to collect tokens as $p$ increases, which leads to more incentive for the UE to use tokens albeit $b_{s}$ is low. Fig.2c shows the variation of the thresholds with respect to the environmental factor $q$ . The optimal threshold decreases as $q$ decreases, since that the D2D request is seldom accepted when $q$ is low, and thus a UE has more incentive to take every opportunity to seek D2D service. Additionally, as proved in Proposition 2, the threshold decreases with the increase in benefit $b_{s}$ .

Furthermore, we give simulations to show the gain obtained from transmission mode selection. A more realistic scenario is considered, where traffic is divided into two types: $s_{v}$ -video traffic and $s_{e}$ -elastic traffic. Thus the extended traffic type set is denoted as $\mathcal{S}=\{s_{0},s_{v},s_{e}\}$ . The mean opinion score (MOS) is often used as a subjective measure of the network quality in literature. The benefit of each traffic type in our simulation is defined as the difference in the MOS obtained by two transmission modes. The MOS estimations of two traffic types depend on experienced Peak Signal-to-Noise-Ratio (PSNR) $P_{snr}$ and throughput $\theta$ , respectively. They are expressed as follows[14]:

[TABLE]

where $b_{1}=1$ , $b_{2}=5$ , $b_{3}=2.6949$ and $b_{4}=0.0235$ . In our simulations, $P_{snr}=10\text{db}$ and $\theta=1500\text{kbps}$ for D2D mode. Meanwhile, $P_{snr}=5\text{db}$ and $\theta=1000\text{kbps}$ for cellular mode. Moreover, the stationary probability is set as $p_{s_{0}}=0.3$ , $p_{s_{v}}=0.2$ and $p_{s_{e}}=0.5$ . Let the environmental factors $p=q=0.8$ and they are known as a prior. The discount factor $\beta$ is set to be $0.99$ and the cost $c$ is set to be 0.4. The simulation runs $10^{6}$ slots.

A greedy policy is considered for comparison. We assume that the UE will choose D2D mode when having any tokens, and the goal of this policy is to optimize the token collection strategy only. Fig.3a shows the distribution of token usage over different traffic types. We can find out that the distribution is proportional to $p_{s}$ when the greedy policy in executed. In contrast, since the different benefits of different traffic types are distinguished, more tokens are spent on the more beneficial traffic types and the number of tokens spent on the least beneficial traffic type $s_{e}$ dramatically decreases. Fig.3b presents the average utilities of two policy with different discount factor $\beta$ . The utilities of both policy increase with increasing $\beta$ due to the fact users with higher $\beta$ are more far-sighted. Moreover, the gain obtained by considering transmission mode selection can be observed. However, the gap tends towards zero when $\beta$ is small. That’s because the UE with low $\beta$ is myopia so that he inclines to spend token no matter the traffic type is, which is similar to the greedy policy. Besides, the emergence of plateau of the curves is because the variation of $\beta$ is not large enough to change the policy.

VI Conclusion

In this paper, we consider a D2D-enabled cellular network where selfish UEs are incentivized to form D2D pairs using tokens. We formulate a MDP model to characterize UE’s behavior including transmission mode selection strategy as well as token collection policy. Moreover, we prove that the optimal strategy is threshold in the token state and show that the threshold increases as a function of the benefits related to the the traffic types. In our future work, we will explore the optimal selection of the maximum number of tokens so that the incentive mechanism can approach the altruism mechanism.

Bibliography17

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] 3GPP, “Technical specification group services and system aspects: Feasibility study for proximity services (Pro Se),” 3rd Generation Partnership Project (3GPP), TR 22.803 Rel-12, 2012.
2[2] K. Doppler, M. Rinne, C. Wijting, C. Ribeiro, and K. Hugl, “Device-to-device communication as an underlay to lte-advanced networks,” IEEE Commun. Mag. , vol. 47, no. 12, pp. 42–49, Dec 2009.
3[3] A. Asadi, Q. Wang, and V. Mancuso, “A survey on device-to-device communication in cellular networks,” IEEE Commun. Surveys Tuts. , vol. 16, no. 4, pp. 1801–1819, Fourthquarter 2014.
4[4] G. Fodor, E. Dahlman, G. Mildh et al. , “Design aspects of network assisted device-to-device communications,” IEEE Commun. Mag. , vol. 50, no. 3, pp. 170–177, March 2012.
5[5] C.-H. Yu, K. Doppler et al. , “Resource sharing optimization for device-to-device communication underlaying cellular networks,” IEEE Trans. Wireless Commun. , vol. 10, no. 8, pp. 2752–2763, August 2011.
6[6] D. Feng, G. Yu, C. Xiong et al. , “Mode switching for energy-efficient device-to-device communications in cellular networks,” IEEE Trans. Wireless Commun. , vol. 14, no. 12, pp. 6993–7003, Dec 2015.
7[7] D. Wu, J. Wang, R. Hu, Y. Cai, and L. Zhou, “Energy-efficient resource sharing for mobile device-to-device multimedia communications,” IEEE Trans. Veh. Technol. , vol. 63, no. 5, pp. 2093–2103, Jun 2014.
8[8] P. Li and S. Guo, “Incentive mechanisms for device-to-device communications,” IEEE Netw. , vol. 29, no. 4, pp. 75–79, July 2015.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Traffic-Aware Transmission Mode Selection in D2D-enabled Cellular Networks with Token System

Abstract

I Introduction

II System Model

II-A Network Model

II-B Token System

III Problem Formulation

III-A State and Action Spaces

III-B Transition Probability

III-C Reward

III-D Optimization Problem Formulation

IV Optimal Policy for a Single UE

Lemma 1

Proof:

Theorem 1** (The marginal diminishing utility)**

Proof:

Remark 1

Proposition 1** (Threshold structure)**

Proof:

Remark 2

Proposition 2** (Monotonicity)**

Proof:

V Numerical Simulations

VI Conclusion

Theorem 1 (The marginal diminishing utility)

Proposition 1 (Threshold structure)

Proposition 2 (Monotonicity)