Optimal Blocklength Allocation towards Reduced Age of Information in   Wireless Sensor Networks

Bin Han; Yao Zhu; Zhiyuan Jiang; Yulin hu; Hans D.; Schotten

arXiv:1907.02779·eess.SP·November 30, 2021

Optimal Blocklength Allocation towards Reduced Age of Information in Wireless Sensor Networks

Bin Han, Yao Zhu, Zhiyuan Jiang, Yulin hu, Hans D., Schotten

PDF

TL;DR

This paper studies how to optimally allocate blocklength in wireless sensor networks to minimize Age of Information (AoI), using Markov decision processes and reinforcement learning to improve data freshness in finite blocklength regimes.

Contribution

It formulates AoI minimization as a resource allocation problem in FBL regime and proposes a reinforcement learning-based solution for optimal blocklength allocation.

Findings

01

Reinforcement learning outperforms traditional error rate policies.

02

Optimal blocklength allocation reduces AoI significantly.

03

The method is validated through simulations.

Abstract

The freshness or timeliness of data at server is a significant key performance indicator of sensor networks, especially in tolerance critical applications such as factory automation. As an effective and intuitive measurement to data timeliness, the metric of Age of Information (AoI) has attracted an intensive recent interest of research. This paper initiates a study on the AoI of wireless sensor networks working in the finite blocklength (FBL) regime as a resource allocation problem, and proposes to minimize the long-term discounted system AoI as a Markov decision process (MDP). The proposed method with its optimum solved by Reinforced Learning technique is verified by simulations to outperform benchmarks, including the conventional error rate minimizing policy.

Tables4

Table 1. TABLE I : Uplink SNRs in different scenarios

	Scenario 1	Scenario 2	Scenario 3	Scenario 4
Sensor 1	$- 13 dB$	$- 13 dB$	$- 10 dB$	$- 8 dB$
Sensor 2	$- 6 dB$	$- 3 dB$	$- 3 dB$	$- 8 dB$

Table 2. TABLE II : Optimal blocklength n 1 subscript 𝑛 1 n_{1} assigned to sensor 1 with different benchmark policies and reference scenarios

	1	2	3	4
Min. PER	402	442	411	250
Uniform allocation	250

Table 3. TABLE III : Long-term discounted AoI under different policies

Scenario	Policy	$Δ \bar{D}$	${Var}_{D}$
1	MDP optimum	1.1808	$3.6537 e - 3$
	One-step optimum	1.2383	$3.5642 e - 3$
	Min error rate	1.2397	$3.0787 e - 3$
	Uniform allocation¹¹1Sensor 1 fails to deliver the required minimal packet transmission rate $(1 - ε_{\max})$ under the uniform allocation policy in Scenarios 1 & 2.	N/A	N/A
2	MDP optimum	0.64029	$8.4297 e - 4$
	One-step optimum	0.68276	$9.5663 e - 4$
	Min error rate	0.68036	$5.9690 e - 4$
	Uniform allocation¹¹1Sensor 1 fails to deliver the required minimal packet transmission rate $(1 - ε_{\max})$ under the uniform allocation policy in Scenarios 1 & 2.	N/A	N/A
3	MDP optimum	$5.9570 e - 3$	$2.7321 e - 10$
	One-step optimum	$6.3155 e - 3$	$1.1603 e - 9$
	Min error rate	$6.3148 e - 3$	$3.9019 e - 10$
	Uniform allocation	0.27194	$8.0836 e - 5$
4	MDP optimum	0.0117	$6.0254 e - 10$
	One-step optimum	0.0117	$1.9271 e - 9$
	Min error rate	0.0117	$2.5542 e - 9$
	Uniform allocation	0.0117	$1.5954 e - 9$

Table 4. TABLE IV : Undiscounted AoI under different policies

Scenario	Policy	$Δ \bar{\| 𝐀 \|}$	${Var}_{\| 𝐀 \|}$
1	MDP optimum	0.11751	$1.2484 \times 10^{- 1}$
	One-step optimum	0.12476	$1.3137 \times 10^{- 1}$
	Min error rate	0.12436	$1.3338 \times 10^{- 1}$
	Uniform allocation¹¹1Sensor 1 fails to deliver the required minimal packet transmission rate $(1 - ε_{\max})$ under the uniform allocation policy in Scenarios 1 & 2.	N/A	N/A
2	MDP optimum	$6.4327 e - 2$	$6.7281 e - 2$
	One-step optimum	$6.8890 e - 2$	$7.1667 e - 2$
	Min error rate	$6.8287 e - 2$	$7.1265 e - 2$
	Uniform allocation¹¹1Sensor 1 fails to deliver the required minimal packet transmission rate $(1 - ε_{\max})$ under the uniform allocation policy in Scenarios 1 & 2.	N/A	N/A
3	MDP optimum	$5.1497 e - 4$	$5.1464 e - 4$
	One-step optimum	$6.3155 e - 4$	$6.8602 e - 4$
	Min error rate	$6.3148 e - 4$	$5.9435 e - 4$
	Uniform allocation	$2.7481 e - 2$	$2.8254 e - 2$
4	MDP optimum	$1.1856 e - 3$	$1.1843 e - 3$
	One-step optimum	$1.3174 e - 3$	$1.1316 e - 3$
	Min error rate	$1.1058 e - 3$	$1.1047 e - 3$
	Uniform allocation	$1.1816 e - 3$	$1.1880 e - 3$

Equations36

A_{m} (k) = {1 A_{m} (k - 1) + 1 success; failure .

A_{m} (k) = {1 A_{m} (k - 1) + 1 success; failure .

= E {∣ A_{k} ∣} = m = 1 \sum M {(1 - ε_{m}) + ε_{m} [A_{m} (k - 1) + 1]} M + m = 1 \sum M ε_{m} A_{m} (k - 1)

= E {∣ A_{k} ∣} = m = 1 \sum M {(1 - ε_{m}) + ε_{m} [A_{m} (k - 1) + 1]} M + m = 1 \sum M ε_{m} A_{m} (k - 1)

n min E {∣ A_{k} ∣}

n min E {∣ A_{k} ∣}

m = 1 \sum M n_{m} \leq ⌊ \frac{T}{T _{S}} ⌋ = Δ N_{m a x},

n_{m} \geq n_{m, m i n}, \forall m \in {1, 2, \dots, M},

ε_{m} (n_{m}) \approx Q (\frac{n _{m}}{V _{m}} (C_{m} - \frac{d _{m}}{n _{m}}) ln 2),

ε_{m} (n_{m}) \approx Q (\frac{n _{m}}{V _{m}} (C_{m} - \frac{d _{m}}{n _{m}}) ln 2),

n min m = 1 \sum M ε_{m},

n min m = 1 \sum M ε_{m},

m = 1 \sum M \tilde{n}_{m}

m = 1 \sum M \tilde{n}_{m}

\frac{ε _{i} ( n _{i}^{'} )}{ε _{j} ( n _{j}^{'} )}

n_{opt}^{'} \approx n^{'} \in V^{'} min i = 1 \sum M (ξ_{i} - \frac{1}{M} j = 1 \sum M ξ_{j})^{2},

n_{opt}^{'} \approx n^{'} \in V^{'} min i = 1 \sum M (ξ_{i} - \frac{1}{M} j = 1 \sum M ξ_{j})^{2},

(N^{+})^{2} \to P {n_{1, m i n}, 1, \dots, (N_{m a x} - n_{2, m i n})},

(N^{+})^{2} \to P {n_{1, m i n}, 1, \dots, (N_{m a x} - n_{2, m i n})},

P (A_{k} ∣ A_{k - 1}) = ⎩ ⎨ ⎧ ε_{1} (A_{k - 1}) ε_{2} (A_{k - 1}) ε_{1} (A_{k - 1}) [1 - ε_{2} (A_{k - 1})] [1 - ε_{1} (A_{k - 1})] ε_{2} (A_{k - 1}) [1 - ε_{1} (A_{k - 1})] [1 - ε_{2} (A_{k - 1})] 0 A_{k} = [A_{1} (k - 1) + 1, A_{2} (k - 1) + 1] A_{k} = [A_{1} (k - 1) + 1, 1] A_{k} = [1, A_{2} (k - 1) + 1] A_{k} = [1, 1] otherwise

P (A_{k} ∣ A_{k - 1}) = ⎩ ⎨ ⎧ ε_{1} (A_{k - 1}) ε_{2} (A_{k - 1}) ε_{1} (A_{k - 1}) [1 - ε_{2} (A_{k - 1})] [1 - ε_{1} (A_{k - 1})] ε_{2} (A_{k - 1}) [1 - ε_{1} (A_{k - 1})] [1 - ε_{2} (A_{k - 1})] 0 A_{k} = [A_{1} (k - 1) + 1, A_{2} (k - 1) + 1] A_{k} = [A_{1} (k - 1) + 1, 1] A_{k} = [1, A_{2} (k - 1) + 1] A_{k} = [1, 1] otherwise

P min E {k = 1 \sum + \infty γ^{k - 1} ∣ A_{k} ∣ A_{0}},

P min E {k = 1 \sum + \infty γ^{k - 1} ∣ A_{k} ∣ A_{0}},

Prob {A_{m} (k) = a} \sim i = 1 \prod a ε_{m} (A_{i - 1}), m \in {1, 2},

Prob {A_{m} (k) = a} \sim i = 1 \prod a ε_{m} (A_{i - 1}), m \in {1, 2},

P^{'} min E {k = 1 \sum K γ^{k - 1} ∣ A_{k} ∣ A_{0}},

P^{'} min E {k = 1 \sum K γ^{k - 1} ∣ A_{k} ∣ A_{0}},

{1, 2 \dots A_{m a x}}^{2} \to P^{'} {n_{1, m i n}, 1, \dots, (N_{m a x} - n_{2, m i n})} .

P_{opt}^{'} (A_{k}) = Ω^{- 1} (ar g 1 \leq j \leq J max Q_{Θ^{- 1} (A_{k}), j}) .

P_{opt}^{'} (A_{k}) = Ω^{- 1} (ar g 1 \leq j \leq J max Q_{Θ^{- 1} (A_{k}), j}) .

D = k = 1 \sum K γ^{k - 1} ∣ A_{k} ∣,

D = k = 1 \sum K γ^{k - 1} ∣ A_{k} ∣,

Δ D = D - D_{lower} = D - k = 1 \sum K 2 γ^{k - 1} = D - 20.

Δ D = D - D_{lower} = D - k = 1 \sum K 2 γ^{k - 1} = D - 20.

Δ \overline{∣ A ∣} = \overline{∣ A ∣} - ∣ A ∣_{lower} = \overline{∣ A ∣} - 2.

Δ \overline{∣ A ∣} = \overline{∣ A ∣} - ∣ A ∣_{lower} = \overline{∣ A ∣} - 2.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\pdfcolInitStack

tcb@breakable

Optimal Blocklength Allocation towards Reduced Age of Information in Wireless Sensor Networks

Bin Han1, Yao Zhu2, Zhiyuan Jiang3, Yulin Hu2, and Hans D. Schotten14

1Technische Universität Kaiserslautern, 2RWTH Aachen University, 3Shanghai University, 4DFKI

Abstract

The freshness or timeliness of data at server is a significant key performance indicator of sensor networks, especially in tolerance critical applications such as factory automation. As an effective and intuitive measurement to data timeliness, the metric of Age of Information (AoI) has attracted an intensive recent interest of research. This paper initiates a study on the AoI of wireless sensor networks working in the finite blocklength (FBL) regime as a resource allocation problem, and proposes to minimize the long-term discounted system AoI as a Markov decision process (MDP). The proposed method with its optimum solved by Reinforced Learning technique is verified by simulations to outperform benchmarks, including the conventional error rate minimizing policy.

Index Terms:

AoI, FBL, sensor networks, IoT, resource allocation, MDP, RL.

††©2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

I Introduction

Wireless sensor networks play a key important role in various applications such as environmental monitoring, target tracking, smart grid and factory automation [1]. They are expected to deliver data with satisfactory freshness, which is especially critical in industrial scenarios to ensure a smooth and safe functioning of the system [2].

To characterize information freshness, recently, the metric of Age of Information (AoI) [3] has been proposed. AoI denotes the time elapsed since the generation time of the last successfully received status. Basically, AoI is a time metric, however, it is fundamentally different from conventional delay metrics and requires special attention. Most existing works focus on AoI optimization as a scheduling problem in Medium-Access-Control (MAC) layer, while the implication on physical layer design is largely ignored.

When shifting our focus to the bottom of physical layer, we can realize that transmission errors in radio access network can also significantly impact the AoI in sensor networks, in a similar way like they influence the uplink delay. Nevertheless, with its special feature of memory, the AoI will probably exhibit a different behavior, in comparison to the delay metrics, with respect to the transmission error rate, which has not been well studied so far.

In all kinds of wireless networks, physical layer error control universally relies on resource allocation techniques in different dimensions including power, bandwidth and time. In context of uplink data transmission for sensor networks, a flexible and commercially practical solution is to allocate blocklength (time/bandwidth) among sensor devices in a TDMA manner. Especially, when data packet size are limited, the blocklength of the transmission is short, the Shannon capacity does not hold and the data transmission becomes no long arbitrarily reliable. Recently, this packet error rate (PER) due to short blocklength is characterized in [4], in a so-called finite blocklength (FBL) regime. Following the FBL model, a quantitative dependency of the PER on blocklength assigned to every device is obtained, so that a cell-level error minimization is enabled.

In this work, we investigate the AoI of sensor network systems working in the FBL regime. In a physical layer perspective, we discuss the following resource allocation problem, which yet has not been studied by literature to the best of our knowledge: How should blocklength be allocated to different sensors, in order to minimize the overall network AoI?

The remainder of this paper is organized as follows: Sec. II reviews the state-of-the-art in both fields of AoI and FBL. Sec. III models the system AoI as a function of blocklength allocation, and propose two optimization approaches, which minimize the system AoI in short and long terms, respectively. In Sec. IV we present the procedure, results and analysis of the numerical simulations, through which our proposed methods are evaluated together with benchmarks. After some additional discussions in Sec. V, we conclude this paper with Sec. VI.

II Relevant Studies

II-A AoI as a Scheduling Problem

Previous works on AoI mainly focus on the MAC-layer problem where the data sink (server) can be busy or idle to process the data packets generated by sources (sensors). Literature has derived [5] that it helps to reduce AoI by replacing the outdated packets in queue and with the latest one from the same source. Therein, the dropping of data packets is caused by queue congestions. In such context, it has been derived that both over-high and over-low sampling rates of a sensor lead to an increased expectation of the AoI [3]. Moreover, in a previous work [6] we have shown that the mean uplink AoI at sink (server) in a simple TDMA master-slave system is linear to the number of sources (sensors), if all sources equally share the same frame length. Furthermore, scheduling schemes for AoI optimization in various perspectives have been well studied in the literature [7, 8, 9, 10, 11].

II-B Error Control in FBL Regime

FBL information theory sets up an exciting binding between the transmission slot length and the transmission error rate [4]. Furthermore, to enhance the performance, the retransmission mechanisms in FBL regime are also introduced, while aiming at minimizing the energy consumption [12]. The trade-off between reliability and energy efficiency is also studied in a retransmission-enabled edge computing scenario [13]. Nervertheless, similar to the case of AoI studies, in FBL regime it is never encouraged to retransmit the same package with respect to error probability, either, as literature has shown that the best performance achievable by HARQ, even when neglecting the feedback loss, is equal to the performance of optimal one-shot transmission [14]. In particular, the total errors with cooperative nodes can be minimized by a time resource allocation that grants nodes with the blocklengths that leads to the same error rate for every device [15].

III Problems, Models and Approaches

III-A Model Setup

Now we consider the AoI problem in perspective of PHY-layer transmission errors as follows: Multiple sensors are scheduled to transmit messages to a Multi-Access Edge Computing (MEC) server in a FBL-TDMA manner, where all sensors are synchronous to the same uplink transmission period. Every message generated by every device has the same bit-length. The uplink channels of different sensors are independent from each other. If a message fails to be delivered to the server, no retransmission is scheduled, and the sensor will just transmit the latest message in every period. For simplification we assume that all the sensors are active sources (i.e. they generate timely information at their scheduled transmission time), the server is always idle to process any incoming message, and the uplink channel of every sensor is non-fading additive white Gaussian noise (AWGN) channel with full channel state information (CSI) at the MEC server.

Now consider for the $m^{\text{th}}$ sensor in the schedule of an $M$ -sensor system. Upon the success/failure of transmission, its AoI at the end of $k^{\text{th}}$ period is

[TABLE]

Thus, the expected sum of AoI at the end of $k^{\text{th}}$ period is

[TABLE]

III-B Single-Period AoI Optimization

Given the AoI of every sensor at the beginning of $k^{\text{th}}$ period, it is a natural idea to attempt minimizing $\lvert\mathbf{A}_{k}\rvert$ with respect to the blocklengths $\mathbf{n}=[n_{1},n_{2},\dots,n_{M}]\in\mathbb{N}^{M}$ :

[TABLE]

where $T_{\text{S}}$ is the symbol length and $n_{m,\min}$ is the minimal blocklength required by sensor $m$ w.r.t. the maximal allowed package error rate $\varepsilon_{\max}$ . Especially, in the FBL regime according to [4] we have

[TABLE]

where $\mathcal{C}_{m}$ , $V_{m}$ and $d_{m}$ denote the Shannon capacity, channel dispersion and message length of sensor $m$ , respectively.

As a reference, the classic FBL problem that aims at the minimization of system PER can be formulated as

[TABLE]

which differs from (3) by a linear coefficient $A_{m}(k-1)$ in every term.

Similar to the approach used in classic FBL problems, here we relax the constraint $\mathbf{n}\in\mathbb{N}^{M}$ to $\mathbf{n}^{\prime}\in\left(\mathbb{R}^{+}\right)^{M}$ . In this case it is trivial to prove that the minimum (3) is achieved when

[TABLE]

and $\mathbf{n}_{\text{opt}}\in\mathbb{N}^{M}$ can be then approximated by rounding $\mathbf{n}^{\prime}_{\text{opt}}$ .

Remark that due to the non-linearity of (6), the relaxed problem (8, 9) is analytically solvable only when $A_{i}(k-1)=A_{j}(k-1),\forall[i,j]$ . In general cases, we have to rely on traversing over the solution vector space $\mathbb{V}\subset\{1,2,\dots,N_{\max}\}^{M}$ to find the approximate optimum:

[TABLE]

where $\xi_{i}=\varepsilon_{i}(n_{m})A_{m}(k-1)$ .

III-C Long-Term Optimization

Nevertheless, it shall be noted that the single-period AoI minimization discussed above has ignored the future impact of current decision on the system, as AoI is a feature with memory. In long term, this may lead to a convergence at local sub-optimum instead of global optimum, which reduces the optimization gain. To cope with this issue, in this section we consider the long-term AoI optimization.

To simplify the discussion we consider here a two-sensor case without fading where $M=2$ , both $[\mathcal{C}_{1},\mathcal{C}_{2}]$ and $[V_{1},V_{2}]$ are constant. We also reasonably assume the system to be initialized with a certain AoI state $\mathbf{A}_{0}$ .

Consider a consistent policy $\mathcal{P}$ :

[TABLE]

which maps the AoI state at the beginning of $k^{\text{th}}$ period $\mathbf{A}_{k-1}$ to the time allocation $[n_{1}(k),N_{\max}-n_{1}(k)]$ , so we can rewrite $n_{1}(k)$ as $n_{1}(\mathbf{A}_{k-1})$ , $\varepsilon_{1}(n_{1})$ as $\varepsilon_{1}(\mathbf{A}_{k-1})$ , and $\varepsilon_{2}(n_{2})$ as $\varepsilon_{2}(\mathbf{A}_{k-1})$ . The state transition probability $P(\mathbf{A}_{k}|\mathbf{A}_{k-1})$ relies only on $\mathbf{A}_{k-1}$ therefore :

{strip}

$P(\mathbf{A}_{k}|\mathbf{A}_{k-1})=\begin{cases}\varepsilon_{1}(\mathbf{A}_{k-1})\varepsilon_{2}(\mathbf{A}_{k-1})&\mathbf{A}_{k}=[A_{1}(k-1)+1,A_{2}(k-1)+1]\\ \varepsilon_{1}(\mathbf{A}_{k-1})[1-\varepsilon_{2}(\mathbf{A}_{k-1})]&\mathbf{A}_{k}=[A_{1}(k-1)+1,1]\\ [1-\varepsilon_{1}(\mathbf{A}_{k-1})]\varepsilon_{2}(\mathbf{A}_{k-1})&\mathbf{A}_{k}=[1,A_{2}(k-1)+1]\\ [1-\varepsilon_{1}(\mathbf{A}_{k-1})][1-\varepsilon_{2}(\mathbf{A}_{k-1})]&\mathbf{A}_{k}=[1,1]\\ 0&\text{otherwise}\end{cases}$

(12)

Thus, this becomes an infinite-state Markov Decision Process (MDP) that can be optimized to minimize the long-term discounted AoI:

[TABLE]

where $\gamma\in(0,1)$ is the discount factor.

As the weight $\gamma^{k-1}$ falls exponentially w.r.t. $k$ , here we can set a finite yet sufficient term length $K$ that $\gamma^{K}\approx 0$ , in order to reduce the computational effort. Furthermore, it shall be noted that (12) implies

[TABLE]

where $\varepsilon_{m}(\mathbf{A_{i-1}})\in(0,\varepsilon_{\max})$ always hold. Practically we can find some finite $A_{\max}$ that $\text{Prob}\{A_{m}(k)>A_{\max}\}\approx 0$ . Thus, the infinite-state MDP optimization problem over infinite time (13) can be approximately degraded to a finite-state MDP optimization problem over $K$ periods:

[TABLE]

It is common to solve problems such as (15) with Reinforced Learning (RL) approaches, e.g. the well-known Q-Learning algorithm. In this algorithm, a so-called Q-matrix $\mathbf{Q}_{I\times J}$ is constructed to represent the expected discounted rewards of different actions (time allocations) in all possible system states (AoI), where $I=A_{\max}^{2}$ is the number of possible states, and $J=N_{\max}-n_{2,\min}-n_{1,\min}$ is the number of valid actions. Two mappings $\{1,2\dots I\}\overset{\Theta}{\to}\{1,2\dots A_{\max}\}^{2}$ and $\{1,2\dots J\}\overset{\Omega}{\to}\{n_{1,\min},n_{1,\min}+1\dots N_{\max}-n_{2,\min}\}$ are defined here for convenience of the matrix index notation. The offline learning process to train the Q-matrix is briefly described by Algorithm III-C. Remark that the notations $\mathbf{Q}$ and $Q_{i,j}$ here are irrelevant to the $Q$ function in Eq. (6).

**Algorithm 1: **

Q-Learning-based MDP solver

Specification: $l_{\max},\epsilon_{\min},\gamma$ ;

Initialization: $l=1;\epsilon=0;Q_{i,j}=-\infty,\forall{i,j}$ ;

for $l\leq l_{\max}$ do Iterative updating

for $1\leq i\leq I$ do Traversing over all states

$Q_{\text{old}}\leftarrow\max\limits_{1\leq j\leq J}Q_{i,j}$ ;

$Q_{\max}\leftarrow-\infty$ ;

for $1\leq j\leq J$ do Traversing over all actions

$\mathbf{A}_{k-1}\leftarrow\Theta(i)$ ;

$n_{1}\leftarrow\Omega(j)$ ;

$r\leftarrow-\sum\limits_{\mathbf{A}_{k}}|\mathbf{A}_{k}|P(\mathbf{A}_{k}|\mathbf{A}_{k-1},n_{1})$ ;

$Q_{i,j}\leftarrow r+\gamma\max\limits_{1\leq j^{\prime}\leq J}Q_{\Theta(i),j^{\prime}}$ ;

$Q_{\max}\leftarrow\max\{Q_{\max},Q_{i,j}\}$

end for

$\epsilon\leftarrow\max\{\epsilon,|Q_{\max}-Q_{\text{old}}|\}$

end for

if $\epsilon\leq\epsilon_{\min}$ then Convergence

Break

end if

end for

Having the training process accomplished, the optimal policy $\mathcal{P}^{\prime}_{\text{opt}}$ is obtained by

[TABLE]

IV Simulations

IV-A Simulation Setup

To verify the AoI performance of the proposed methods, we simulate a two-sensor system working over non-fading AWGN channels, where each sensor is granted a bandwidth normalized to unity $B=1$ . In each period, the two sensors share a total blocklength of $N_{\max}=500$ times symbol length, each sensor attempting to transmit a packet of $d=16$ bits to the MEC server in uplink, with a maximal allowed PER $\varepsilon_{\max}=0.1$ . The sensor AoI at the MEC server is initialized at $\mathbf{A}_{0}=[1,1]$ . We consider four scenarios with different SNR specifications, as listed in Table I.

Under every specification we conduct a $500$ -time Monte-Carlo experiment, where in every individual test we simulate the system over $K=500$ periods with both the single-period AoI optimal policy (which is obtained through an exhaustive traversing over the solution space) and the long-term optimal policy. The system AoI state $\mathbf{A}$ is recorded every period. For the long-term optimum, we model the MDP with maximal sensor AoI $A_{\max}=8$ , discount factor $\gamma=0.9$ , and set the convergence conditions of the Q-Learning-based MDP solver to $l_{\max}=100,\epsilon_{\min}=1~{}\mathrm{e}-5$ .

As a benchmark we also test the standard FBL resource allocation policy [15] that minimizes the overall uplink transmission PER in every period by solving (7), as well as the simple TDMA scheme that uniformly allocate the time resource to both sensors, i.e. $n_{1}\equiv n_{2}$ .

IV-B Results and Analysis

To ensure $A_{\max}=8$ is sufficient for $\text{Prob}(A_{m}(k)>A_{\max})\approx 0,\forall k\leq K$ to hold, we examine the scatter plots of $\mathbf{A}$ generated by the tested policies. For example, with the simulation environments specified to Scenario 1, the system AoI states under different policies are shown in Fig. 1, where it can be observed that neither $A_{1}(k)$ nor $A_{2}(k)$ has ever exceeded 6 over the simulation trace. This implies the validness of our assumption and therefore it holds $\mathcal{P}^{\prime}_{\text{opt}}=\mathcal{P}_{\text{opt}}$ . The results for Scenarios 2–4 are similar.

Next, we compare the policies of MDP optimum and one-step optimum, as visualized in Fig. 2, which shows the blocklengths assigned to sensor 1 in different scenarios, by the long-term optimal policy ( $n_{1}^{\text{lt}}$ , red) and the one-step optimal policy ( $n_{1}^{\text{os}}$ , blue), respectively. In cases of different channel conditions for the two sensors (Scenarios 1–3), it can be observed that in comparison to the one-step optimum, the long-term optimal policy obtained by solving the MDP generally tends to reserve more blocklength for the sensor with better channel (sensor 2). When both sensors have the same SNR, this difference becomes negligible. Remark that both benchmarks, i.e. the PER minimizing strategy and the equal blocklength allocation policy, are independent from the current system AoI $\mathbf{A}$ , as listed in Tab. II.

Then we evaluate the performances of both policies together with the benchmarks. First, for every individual 500-period test we calculate the long-term discounted AoI

[TABLE]

then for every set of Monte-Carlo test we investigate the average discounted AoI of 500 experiments, subtracting from it the lower bound $D_{\text{lower}}=20$ which is the value of $D$ when $\mathbf{A}_{k}\equiv[1,1]$ :

[TABLE]

We also calculate the variance of $D$ among 500 experiments. The long-term performances are listed in Tab. III.

In addition we study the undiscounted AoI performance as well. For every policy in every individual scenario, we track the instantaneous system AoI ${|\mathbf{A}|}$ through $500\times 500$ simulated periods, then calculate its mean value and variance, which are listed in Tab. IV. Similarly, the lower bound is subtracted from the mean value for a more intuitive comparison:

[TABLE]

From the tables it can be clearly concluded that the MDP approach outperforms all other three methods in both discounted and undiscounted AoI performances by providing lower and stabler system AoI, especially when the channel is harsh and non-uniform for different sensors. The naïve TDMA approach with equal blocklength allocation fails to deliver satisfactory performance in most scenarios. It is interesting to observe that the AoI performance of one-step AoI minimization approach hardly differs from the PER minimization method, which implies that local optimum to minimize AoI in every individual period ignores the impact of current decision in future system states, which strongly reduces the overall performance gain it brings to the system.

V Further Discussions

V-A Fading Channels

In this study we have assumed non-fading channels for simplification so far. In practical applications, the random fluctuation of channel conditions must be taken into account. In this case, the terms $\varepsilon_{1}(\mathbf{A}_{k-1})$ and $\varepsilon_{2}(\mathbf{A}_{k-1})$ in (3) and (12) must be replaced by their expectations $\mathbb{E}\left\{\varepsilon_{1}\right\}$ and $\mathbb{E}\left\{\varepsilon_{2}\right\}$ , respectively. Nevertheless, as long as the CSI is available at the cloud server or measurable with sufficient accuracy, both the expectations can be straightforwardly calculated without impacting the deployment of our proposed methods.

V-B Computational Complexity of the MDP Method

To obtain the optimal $\mathbf{n}$ from a trained Q-matrix according to (17) is a straightforward inquiry from indexed data set, of which the computational effort is negligible. However, the iterative Q-learning process to train the Q-matrix according to Algorithm III-C has a time complexity of $\mathcal{O}(IJl_{\max})$ , where in the $M$ -sensor case $I=A_{\max}^{M}$ and $J\sim N_{\max}^{M}$ . This leads to a dramatic increase of learning effort as $M$ increases to a common level in practical sensor networks, which becomes a huge challenge for the deployment of the proposed long-term AoI optimization method, especially when taking the long-term inconsistency of channels into account. Fast solvers for the MDP (15) are therefore required. Possible approaches to boost the MDP solution include fast online learning, clustering sensors with similar CSI and applying heuristic algorithms.

V-C Multi-Hop Clustering

In context of massive Machine-Type Communications (mMTC) scenarios such as Internet-of-Things (IoT) and sensor networks, our previous works have encouraged to deploy a multi-hop architecture, where devices are grouped into multiple clusters, each with a head that relays messages for other cluster members [16, 17]. It has been demonstrated that appropriate clustering and head selection are critical to control the message delay in uplink. Obviously, the AoI at the server will also be significantly impacted by the same factors, which is worth further study.

VI Conclusion

In this paper, we have pioneered to bridge the gap between two significant research areas: the Age of Information and the finite blocklength information theory. Our study begins with a novel AoI model for sensor networks, in which the system AoI is formulated as a function of sensors’ transmission error rates. By linking the error rates to the blocklength assignment we are able to propose to optimize the system in terms of both instantaneous undiscounted AoI and long-term discounted AoI. While the former one can be solved in a straight-forward manner, the latter one is an infinite-state MDP that can be approximated to a finite-state version with good accuracy, and solved with RL technique. Our simulations have demonstrated that the long-term optimization method outperforms all other methods, while the one-step optimizer fails, as we have expected, to deliver a significant performance gain in comparison to the error-rate oriented benchmark. We have also provided several potential extensions of this work for future study, each supplemented by a brief discussion.

Acknowledgment

This work has been supported by the Federal Ministry of Education and Research (BMBF) of the Federal Republic of Germany, in scope of the project “5GANG” (Grant Number 16KIS0725K). The authors alone are responsible for the content of this paper.

Bibliography17

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Jennifer Yick, Biswanath Mukherjee, and Dipak Ghosal, “Wireless sensor network survey,” in Computer Networks , vol. 52, no.12, pp. 2292–2330, 2008.
2[2] Kamrul Islam, Weiming Shen, and Xianbin Wang, “Wireless sensor network reliability and security in factory automation: A survey,” in IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) , vol. 42, no. 6, pp. 1243–1256, 2012.
3[3] S. Kaul, R. Yates, and M. Gruteser, “Real-time status: How often should one update?,” in Proc. IEEE Conf. Comput. Commun. (INFOCOM) , pp. 2731–2735, Mar 2012.
4[4] Y. Polyanskiy, H. Poor, and S. Verdu, “Channel coding rate in the finite blocklength regime,” IEEE Trans. Inf. Theory , vol. 56, no. 5, pp. 2307–2359, May 2010.
5[5] Huang, K., Liu, W., Shirvanimoghaddam, M., Li, Y., Vucetic, B., “Real-time remote estimation with Hybrid ARQ in wireless networked control,” ar Xiv preprint ar Xiv:1903.12472 .
6[6] Z. Jiang, B. Krishnamachari, X. Zheng, S. Zhou, and Z. Niu, “Timely status update in wireless uplinks: Analytical solutions With asymptotic optimality,” in IEEE Internet of Things Journal , vol. 6, no. 2, pp. 3885–3898, April 2019
7[7] Y.-P. Hsu, “Age of information: Whittle index for scheduling stochastic arrivals,” in Proc. IEEE Int’l Symp. Info. Theory , Jun 2018.
8[8] I. Kadota, A. Sinha, E. Uysal-Biyikoglu, R. Singh, and E. Modiano, “Scheduling policies for minimizing age of information in broadcast wireless networks,” IEEE/ACM Trans. Netw. , vol. 26, pp. 2637–2650, Dec. 2018.