Optimal Status Updating with a Finite-Battery Energy Harvesting Source

Baran Tan Bacinoglu; Yin Sun; Elif Uysal; and Volkan Mutlu

arXiv:1905.06679·cs.IT·May 23, 2019

Optimal Status Updating with a Finite-Battery Energy Harvesting Source

Baran Tan Bacinoglu, Yin Sun, Elif Uysal, and Volkan Mutlu

PDF

TL;DR

This paper analyzes optimal update policies for energy-harvesting sensors with finite batteries, proving that threshold-based policies are optimal and showing that increasing battery size significantly reduces age penalty, especially at small capacities.

Contribution

It establishes the structural optimality of monotone threshold policies for finite-battery energy harvesting sources and develops an algorithm to compute these thresholds.

Findings

01

Optimal policies are monotone threshold policies based on battery level and age.

02

Increasing battery capacity notably reduces age penalty, especially from one to two units of energy.

03

Over half of the potential age reduction occurs when increasing battery size from one to two.

Abstract

We consider an energy harvesting source equipped with a finite battery, which needs to send timely status updates to a remote destination. The timeliness of status updates is measured by a non-decreasing penalty function of the Age of Information (AoI). The problem is to find a policy for generating updates that achieves the lowest possible time-average expected age penalty among all online policies. We prove that one optimal solution of this problem is a monotone threshold policy, which satisfies (i) each new update is sent out only when the age is higher than a threshold and (ii) the threshold is a non-increasing function of the instantaneous battery level. Let $τ_{B}$ denote the optimal threshold corresponding to the full battery level $B$ , and $p (\cdot)$ denote the age-penalty function, then we can show that $p (τ_{B})$ is equal to the optimum objective value, i.e., the minimum…

Tables1

Table 1. TABLE I: Optimal thresholds for different battery sizes for μ H = 1 subscript 𝜇 𝐻 1 \mu_{H}=1

	$τ_{1}$	$τ_{2}$	$τ_{3}$	$τ_{4}$	${\bar{Δ}}_{π^{*}}$
$B = 1$	0.90	-	-	-	0.90
$B = 2$	1.5	0.72	-	-	0.72
$B = 3$	1.5	1.2	0.64	-	0.64
$B = 4$	1.5	1.2	0.86	0.604	0.604

Equations258

Δ (t) = t - U (t),

Δ (t) = t - U (t),

Δ (t) = t - max {Z_{k} : Z_{k} \leq t},

Δ (t) = t - max {Z_{k} : Z_{k} \leq t},

E (Z_{k}) = E (Z_{k}^{-}) - 1,

E (Z_{k}) = E (Z_{k}^{-}) - 1,

E (t) = min {E (Z_{k}) + H (t) - H (Z_{k}), B},

E (t) = min {E (Z_{k}) + H (t) - H (Z_{k}), B},

\overset{p}{ˉ} = T \to \infty lim sup \frac{1}{T} E [\int_{0}^{T} p (Δ (t)) d t] .

\overset{p}{ˉ} = T \to \infty lim sup \frac{1}{T} E [\int_{0}^{T} p (Δ (t)) d t] .

π \in Π^{online} min \overset{p}{ˉ}_{π} .

π \in Π^{online} min \overset{p}{ˉ}_{π} .

Z_{k + 1} = in f {t \geq Z_{k} : Δ (t) \geq τ_{E (t)}},

Z_{k + 1} = in f {t \geq Z_{k} : Δ (t) \geq τ_{E (t)}},

π \in Π^{online} min E [\int_{0}^{\infty} e^{- α (t - a)} p (Δ (t)) d t] .

π \in Π^{online} min E [\int_{0}^{\infty} e^{- α (t - a)} p (Δ (t)) d t] .

P (Y_{i} \leq x) = 1 - v = 0 \sum i - 1 \frac{1}{v !} e^{- μ_{H} x} (μ_{H} x)^{v},

P (Y_{i} \leq x) = 1 - v = 0 \sum i - 1 \frac{1}{v !} e^{- μ_{H} x} (μ_{H} x)^{v},

Pr (X_{k + 1} \leq x ∣ E (Z_{k}) = j) =

Pr (X_{k + 1} \leq x ∣ E (Z_{k}) = j) =

⎩ ⎨ ⎧ 0, Pr (Y_{m - j} \leq x), Pr (Y_{1 - j} \leq x), if x < τ_{B}^{,} if τ_{m} \leq x < τ_{m - 1}, \forall m \in {2, ..., B}, if τ_{1} \leq x,

Pr (E (Z_{k + 1}) = i ∣ E (Z_{k}) = j) =

Pr (E (Z_{k + 1}) = i ∣ E (Z_{k}) = j) =

{Pr (Y_{B - j} \leq τ_{B - 1}), Pr (Y_{1 + i - j} \leq τ_{i}) - Pr (Y_{2 + i - j} \leq τ_{i + 1}), if i = B - 1, if i < B - 1,

p (τ_{B}^{*}) = \overset{p}{ˉ}_{π^{*}} = π \in Π^{online} min \overset{p}{ˉ}_{π} .

p (τ_{B}^{*}) = \overset{p}{ˉ}_{π^{*}} = π \in Π^{online} min \overset{p}{ˉ}_{π} .

Z_{k + 1} = in f {t \geq Z_{k} : p (Δ (t)) \geq p (τ_{E (t)}^{*})}

Z_{k + 1} = in f {t \geq Z_{k} : p (Δ (t)) \geq p (τ_{E (t)}^{*})}

Pr (X \leq x) = ⎩ ⎨ ⎧ 0 F_{i} (x) F_{1} (x) if x < τ_{B}, if τ_{i} \leq x < τ_{i - 1}, \forall i \in {2, ..., B}, if τ_{1} \leq x,

Pr (X \leq x) = ⎩ ⎨ ⎧ 0 F_{i} (x) F_{1} (x) if x < τ_{B}, if τ_{i} \leq x < τ_{i - 1}, \forall i \in {2, ..., B}, if τ_{1} \leq x,

\frac{\partial}{\partial τ _{i}} E [X^{2}] = 2 τ_{i} \frac{\partial}{\partial τ _{i}} E [X] .

\frac{\partial}{\partial τ _{i}} E [X^{2}] = 2 τ_{i} \frac{\partial}{\partial τ _{i}} E [X] .

\frac{\partial}{\partial τ _{i}} E [X^{2} ∣ E = j] = 2 τ_{i} \frac{\partial}{\partial τ _{i}} E [X ∣ E = j],

\frac{\partial}{\partial τ _{i}} E [X^{2} ∣ E = j] = 2 τ_{i} \frac{\partial}{\partial τ _{i}} E [X ∣ E = j],

\overset{ˉ}{Δ} = \frac{\frac{1}{2} ( μ _{H} τ _{1} ) ^{2} + e ^{- μ_{H} τ_{1}} ( μ _{H} τ _{1} + 1 )}{μ _{H} ( μ _{H} τ _{1} + e ^{- μ_{H} τ_{1}} )},

\overset{ˉ}{Δ} = \frac{\frac{1}{2} ( μ _{H} τ _{1} ) ^{2} + e ^{- μ_{H} τ_{1}} ( μ _{H} τ _{1} + 1 )}{μ _{H} ( μ _{H} τ _{1} + e ^{- μ_{H} τ_{1}} )},

\overset{ˉ}{Δ} =

\overset{ˉ}{Δ} =

\frac{\frac{α _{2}^{2}}{2} + e ^{- α_{2}} [ α _{2} + 1 + ρ _{1} ( α _{2}^{2} + 2 α _{2} + 2 )] - e ^{- α_{1}} [ α _{1} + 1 + ρ _{1} ( α _{1}^{2} + α _{1} + 1 )]}{μ _{H} ( α _{2} + e ^{- α_{2}} [ 1 + ρ _{1} ( α _{2} + 1 )] - e ^{- α_{1}} [ 1 + ρ _{1} α _{1} ] )},

ρ_{1} = \frac{e ^{- α_{1}}}{1 - e ^{- α_{1}} α _{1}},

ρ_{1} = \frac{e ^{- α_{1}}}{1 - e ^{- α_{1}} α _{1}},

α_{1} = μ_{H} τ_{1}, α_{2} = μ_{H} τ_{2} .

α_{1} = μ_{H} τ_{1}, α_{2} = μ_{H} τ_{2} .

m_{1} (τ_{1}, τ_{2}, ..., τ_{B}) = j = 0 \sum B - 1 E [X ∣ E = j] Pr (E = j),

m_{1} (τ_{1}, τ_{2}, ..., τ_{B}) = j = 0 \sum B - 1 E [X ∣ E = j] Pr (E = j),

m_{2} (τ_{1}, τ_{2}, ..., τ_{B}) = j = 0 \sum B - 1 E [X^{2} ∣ E = j] Pr (E = j),

m_{2} (τ_{1}, τ_{2}, ..., τ_{B}) = j = 0 \sum B - 1 E [X^{2} ∣ E = j] Pr (E = j),

2 τ_{B} m_{1} (τ_{1}, τ_{2}, ..., τ_{B}) - m_{2} (τ_{1}, τ_{2}, ..., τ_{B}) = 0,

2 τ_{B} m_{1} (τ_{1}, τ_{2}, ..., τ_{B}) - m_{2} (τ_{1}, τ_{2}, ..., τ_{B}) = 0,

p (τ_{B}^{*}) = \overset{p}{ˉ}_{π^{*}} .

p (τ_{B}^{*}) = \overset{p}{ˉ}_{π^{*}} .

π \in Π^{online} min E [\int_{0}^{\infty} e^{- α t} p (Δ (t)) d t] .

π \in Π^{online} min E [\int_{0}^{\infty} e^{- α t} p (Δ (t)) d t] .

π \in Π^{online} min E [\int_{a}^{\infty} e^{- α (t - a)} p (Δ (t)) d t F_{a}] .

π \in Π^{online} min E [\int_{a}^{\infty} e^{- α (t - a)} p (Δ (t)) d t F_{a}] .

J_{α} (Δ (a), E (a)) := π \in Π^{online} min E [\int_{a}^{\infty} e^{- α (t - a)} p (Δ (t)) d t F_{a}] =

J_{α} (Δ (a), E (a)) := π \in Π^{online} min E [\int_{a}^{\infty} e^{- α (t - a)} p (Δ (t)) d t F_{a}] =

π \in Π^{online} min E [\int_{a}^{\infty} e^{- α (t - a)} p (Δ (t)) d t Δ (a), E (a)]

(Z_{1}, \dots, Z_{k} = a, Z_{k + 1}, \dots) \in Π^{online} min

(Z_{1}, \dots, Z_{k} = a, Z_{k + 1}, \dots) \in Π^{online} min

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Optimal Status Updating with a Finite-Battery Energy Harvesting Source

Baran Tan Bacinoglu1, Yin Sun3, Elif Uysal1, and Volkan Mutlu1

1METU, Ankara, Turkey, 3Auburn University, AL, USA

E-mail: [email protected], [email protected], [email protected], [email protected]

Abstract

We consider an energy harvesting source equipped with a finite battery, which needs to send timely status updates to a remote destination. The timeliness of status updates is measured by a non-decreasing penalty function of the Age of Information (AoI). The problem is to find a policy for generating updates that achieves the lowest possible time-average expected age penalty among all online policies. We prove that one optimal solution of this problem is a monotone threshold policy, which satisfies (i) each new update is sent out only when the age is higher than a threshold and (ii) the threshold is a non-increasing function of the instantaneous battery level. Let $\tau_{B}$ denote the optimal threshold corresponding to the full battery level $B$ , and $p(\cdot)$ denote the age-penalty function, then we can show that $p(\tau_{B})$ is equal to the optimum objective value, i.e., the minimum achievable time-average expected age penalty. These structural properties are used to develop an algorithm to compute the optimal thresholds. Our numerical analysis indicates that the improvement in average age with added battery capacity is largest at small battery sizes; specifically, more than half the total possible reduction in age is attained when battery storage increases from one transmission’s worth of energy to two. This encourages further study of status update policies for sensors with small battery storage.

Index Terms:

Age of information; age-energy tradeoff; non-linear age penalty, threshold policy; optimal threshold; energy harvesting; battery capacity.

I Introduction

00footnotetext: This paper was presented in part at IEEE ISIT 2018 [1]. This work was supported in part by NSF grant CCF-1813050, ONR grant N00014-17-1-2417 and TUBITAK grant no 117E215.

The Age of Information (AoI), or simply the age, was proposed in [2, 3] as a performance metric that measures the freshness of information in status-update systems. For a flow of information updates sent from a source to a destination, the age is defined as the time elapsed since the newest update available was generated at the source. That is, if $U(t)$ is the largest among the time-stamps of all packets received by time $t$ , the age is defined as:

[TABLE]

AoI is a particularly relevant performance metric for status-update applications that have growing importance in remote monitoring [4, 5], machine-type communication, industrial manufacturing, telerobotics, Internet of Things and social networks.

In many applications, the timeliness of status updates not only determines the quality of service, but also affects other design goals such as the controllability of a dynamical system that relies on the updates of sensing and control signals. AoI quantifies the timeliness of status-updates from the perspective of the receiver rather than throughput or delay based measures that are actually channel-centric. Moreover, AoI is also related to measures such as the time-average mean-square error (MSE) for remote estimation. An example of this is the result in [6] which showed remote estimation of a Wiener process minimizing MSE reduces to an AoI optimization problem when the sampling times at the transmitting side are independent of the process. While AoI optimization based on linear functions of the age $\Delta(t)$ is a relevant performance goal for most scenarios, the performance of some applications may be related to non-linear functions of the age. For example, the change in the value of stale data can be less/more significant as its age grows. In such cases, the penalty of data staleness can be modelled as a non-linear function $p(\Delta(t))$ of the age $\Delta(t)$ , i.e., the age-penalty. This function is chosen to be non-decreasing so that a decrease in age-penalty can be only possible when the age is less. Accordingly, the optimization of the age-penalty parallels to average AoI optimization while it might have distinct optimality conditions.

Ideally, AoI is minimized when status updates are frequent and fresh. That is, good AoI performance requires packets with low delay received regularly. A limitation in the minimization of AoI is a constraint on the long-term average update rate which may be due to an average power budget for the channel over which status updates are sent. A stricter constraint is to keep a detailed budget on the number of status updates by allowing update transmission when a replenishable resource becomes available. This is the case of energy harvesting communication systems where each update consumes a certain amount of the harvested energy, if available. In the related literature of AoI optimization for energy harvesting communication systems, energy harvesting process is considered as an arrival process where each energy arrival carries the energy required for an update [7, 8, 9, 10, 11, 12, 13, 14]. The goal of AoI optimization in such formulations is to find an optimal timing of update instants in order to minimize average AoI while transmission opportunities are subject to the availability of energy. Energy arrivals occur irregularly or randomly, which models an energy harvesting scenario. The main challenge in optimizing time average expected age under random energy arrivals is that in the case of an energy outages (empty battery), the transmitter must idle for an unknown duration of time. If it is the case that such random durations are inevitable, they introduce a tension for the regulation of inter-update durations. Another challenge is due to the finiteness of battery sizes. Theoretically, it is possible to achieve asymptotically optimal average AoI by employing simple schemes assuming infinite [8] or sufficiently large battery [9] sizes. However, when the battery size is comparable to the energy required per update, such simple schemes do not allow performance guarantees. Consequently, it is important to explore optimal policies under such regimes where performance depends heavily on the statistics of energy arrivals and the battery size.

This study is motivated by the aforementioned challenges of optimizing AoI in energy harvesting systems, capturing both the randomness of energy arrivals and finite energy storage capability. In addition capturing both challenges we go further, by optimizing not only average age itself, but a more general age penalty function $p(\Delta(t))$ that is not necessarily linear (see [15, 16, 17, 18, 19, 20]). Hence, the problem considered in this study is an age-penalty optimization problem where status updates consume discrete units of energy that are randomly generated, i.e., harvested, such that the number of energy units that can be stored at a time is limited by a finite value which is called battery capacity.

Under the assumption of Poisson energy arrivals, we show the structure of solutions for the age-penalty optimization problem. The structure of the optimal solution reflects a basic intuition about the optimal strategy: Updates should be sent when the update is valuable (when the age is high) and the energy is cheap (the battery level is high). We show that the optimal solution is given by a stopping rule according to which an update is sent when its immediate cost is surpassed by the expected future cost. For Poisson energy arrivals, this stopping rule can be found in the set of policies that we refer as monotone threshold policies. Monotone threshold policies have the property that each update is sent only when the age is higher than a certain threshold which is a non-increasing function of the instantaneous battery level. One of our key results is that the value of the age-penalty function at the optimal threshold corresponding to the full battery level is exactly equal to the optimal value of the average age-penalty.

I-A Contributions

The contributions of this paper can be summarized as follows:

•

We formulate the general average age-penalty optimization problem for sending status updates from an energy harvesting source. This generalizes the AoI optimization goal in the prior studies [8, 9, 11, 12, 1] to a non-linear function of age. In addition to the generalization on the objective, the optimization is carried out over a more general policy space defined only using the causality assumption. We prove that solutions to this general optimization problem can be found among threshold-type policies.

•

We show that, for optimal threshold-type policies with non-decreasing thresholds, the value of the penalty function at the threshold corresponding to the highest battery level is equal to the minimum value of the average age-penalty. As this optimal threshold is also the minimum of optimal thresholds at different battery levels, this implies that inter-update durations under such a policy is always above the minimum value of the average age-penalty.

•

For the case when the age-penalty function is linear, i.e., average AoI minimization problem, we provide the optimal thresholds for integer battery size up to $4$ . These results show that the most significant decrease in the minimum average AoI happens when incrementing the battery capacity of unit size (capable of holding one packet transmission’s worth of energy) to two units. The minimum achievable average AoI with a battery size of 4 units is only about $10\%$ larger than the ultimate minimum average AoI with infinite battery capacity. That is a promising result for small sensor systems.

•

For average AoI minimization problem, we provide an algorithm that can find near optimal policies achieving average AoI values arbitrarily close to the optimal values for any given battery capacity. This algorithm provides a methodical way to derive near optimal policies utilizing analytical results.

I-B Paper Organization

The rest of the paper is organized as follows. In Section II, the related work is discussed and summarized. In Section III, the system model and the formulation of the AoI optimization problem are described. In Section IV, the main results on the structural properties of the solution to the AoI optimization problem are shown and an algorithm to derive solutions for arbitrary integer battery sizes is provided. In Section V, the numerical results validating analytical results and also showing optimal solutions for integer battery size up to $4$ are presented. In Section VI, the paper is concluded summaring the results and insights obtained over the course of this study.

II Related Work

Several studies on AoI considered this performance metric under various queueing system models comparing service disciplines and queue management policies (e.g., [21, 22, 23, 24, 25, 26, 27, 28, 29]). A common observation in these studies was that many queueing/service policies that are throughput and delay optimal but are often suboptimal with respect to AoI, while AoI-optimal policies can be throughput and delay optimal, at the same time. This showed that AoI optimization is quite different than optimization with respect to classical performance metrics. This required many queueing models to be re-addressed under respect to age related objectives. Moreover, queueing system formulations typically assume no precise control on the transmission or generation times of status updates. However, such control is important for age optimization [16, 17].

A direct control on the generation times of status updates is possible through a control algorithm that runs at the source. This is the “generate-at-will” assumption formulated in [7, 10] and studied in [16, 6, 17]. In [7], the problem of AoI optimization for a source, which is constrained by an arbitrary sequence of energy arrivals was studied. In [10], AoI optimization was considered for a source that harvests energy at a constant rate under stochastic delays experienced by the status update packets. The results in these studies showed suboptimality of work-conserving transmission schemes. Often, introducing a waiting time before sending the next update is optimal. That is, for maximum freshness, one may sometimes send updates at a rate lower than one is allowed to which may be counter-intuitive at first sight.

The problem in [7] was extended to a continuous-time formulation with Poisson energy arrivals, finite energy storage (battery) capacity, and random packet errors in the channel in [8]. An age-optimal threshold policy was proposed for the unit battery case, and the achievable AoI for arbitrary battery size was bounded for a channel with a constant packet erasure probability. The concurrent study in [9], limited to the special cases of unit battery capacity and infinite battery capacity computed the same threshold-type policies under these assumptions. These special cases were investigated also for noisy channels with a constant packet erasure probability in [13, 14]. The case for a battery with 2-units capacity was studied in [11] and the optimal policies for this case characterized as threshold-type policies similar to the optimal policy for unit battery capacity introduced in [8] and [9]. Optimal policies for arbitrary battery sizes were characterized via Lagrangian approach in [12] and using optimal stopping theory in [1].

III System Model

Consider an energy harvesting transmitter that sends update packets to a receiver, as illustrated in Fig 1. Suppose that the transmitter has a finite battery which is capable of storing up to $B$ units of energy. Similar to [8], we assume that the transmission of an update packet consumes one unit of energy. The energy that can be harvested arrive in units according to a Poisson process with rate $\mu_{H}$ . Let $E(t)$ denote the amount of energy stored in the battery at time $t$ such that $0\leq E(t)\leq B$ . The timing of status updates are controlled by a sampler which can monitor the battery level $E(t)$ for all $t$ . We assume that the initial age and the initial battery level are zero, i.e., $\Delta(0)=0$ and $E(0)=0$ .

Let $H(t)$ and $A(t)$ denote the number of energy units that have arrived during $[0,t]$ and the number of updates sent out during $[0,t]$ , respectively. Hence, $\{H(t),t\geq 0\}$ and $\{A(t),t\geq 0\}$ are two counting processes. If an energy unit arrives when the battery is full, it is lost because there is no capacity to store it.

The system starts to operate at time $t=0$ . Let $Z_{k}$ denote the generation time of the $k$ -th update packet such that $0=Z_{0}\leq Z_{1}\leq Z_{2}\leq\ldots$ . An update policy is represented by a sequence of update instants $\pi=(Z_{0},Z_{1},Z_{2},...)$ . Let $X_{k}$ represent the inter-update duration between updates $k-1$ and $k$ , i.e., $X_{k}=Z_{k}-Z_{k-1}$ . In many status-update systems (e.g., a sensor reporting temperature [30]), update packets are small in size and are only sent out sporadically. Typically, the duration for transmitting a packet is much smaller than the difference between two subsequent update times, i.e., $X_{k}$ s are typically large compared to the duration of a packet transmission. With such systems in mind, in our model, we will approximate the packet transmission durations as zero. In other words, once the $k$ -th update is generated and sent out at time $t=Z_{k}$ , it is immediately delivered to the receiver. Hence, the age of information $\Delta(t)$ at any time $t\geq 0$ is

[TABLE]

which satisfies $\Delta(t)=0$ at each update time $t=Z_{k}$ . Because an update costs one unit of energy, the battery level reduces by one upon each update, i.e.,

[TABLE]

where $Z_{k}^{-}$ is the time immediately before the $k$ -th update. Further, because the battery size is $B$ , the battery level evolves according to

[TABLE]

when $t\in[Z_{k},Z_{k+1})$ is between two subsequent updates.

In terms of energy available to the scheduler, we can define update policies, that do not violate causality, as in the following:

Definition 1.

A policy $\pi$ is said to be energy-causal if updates only occur when the battery is non-empty, that is, $E(Z_{k}^{-})\geq 1$ for each packet $k$ .

Another restriction on update instants is due to the information available to the scheduler which we define as follows,

Definition 2.

Information on the energy arrivals and updates by time $t$ is represented by the filtration 111Note that the filtration is right continuous as both $H(t)$ and $A(t)$ are right continuous. $\mathcal{F}_{t}=\sigma(\{(H(t^{\prime}),A(t^{\prime})),0\leq t^{\prime}<t\})$ which is the $\sigma$ -field generated by the sequence of energy arrivals and updates, i.e., $\{(H(t^{\prime}),A(t^{\prime})),0\leq t^{\prime}<t\}$ .

Similar to the definition of energy-causal policies, in the policy space that we will consider we merely assume the causality of available information besides energy causality. To formulate this assumption, we use the definition of $\mathcal{F}_{t}$ . In terms of information available to the scheduler, any random time instant $\theta$ does not violate causality if and only if $\left\{\theta\leq t\right\}\in\mathcal{F}_{t}$ for all $t\geq 0$ . We will refer such random instants as Markov times[31] and consider update times as Markov times based on the filtration $\mathcal{F}_{t}$ in general. Notice that such update times do not have to be finite, however, we will refer Markov times that are also finite with probability 1 (w.p.1.) as stopping times[31]. For a policy trying to regulate age, it is legitimate to assume that update instants are always finite w.p.1. as otherwise the age may grow unbounded with a positive probability. With this in mind, we will consider only the update instants that are stopping times.

Accordingly, we can define the online update policies combining the causality assumptions on available energy and information as follows:

Definition 3.

A policy is said to be online if (i) it is energy causal, (ii) no update instant is determined based on future information, i.e., all update times are stopping (finite Markov) times based on $\mathcal{F}_{t}$ , i.e., $Z_{k}$ is finite w.p.1. while $\left\{Z_{k}\leq t\right\}\in\mathcal{F}_{t}$ for all $t\geq 0$ and $k\geq 1$ .

Let $\Pi^{\mathsf{online}}$ denote the set of online update policies. To evaluate the performance of online policies, we consider an age-penalty function that relates the age at a particular time to a cost which increases by the age. This function is defined as in below:

We consider an age-penalty function $p(\cdot)$ that maps the age $\Delta(t)$ at time $t$ to a penalty $p(\Delta(t))$ :

Definition 4.

A function $p:[0,\infty)\rightarrow[0,\infty)$ of the age is said to be an age-penalty function if

•

$\lim_{\Delta\rightarrow\infty}p(\Delta)=\infty$ .

•

$p(\cdot)$ is a non-decreasing function.

•

$\int_{0}^{\infty}p(t)e^{-\alpha t}dt<\infty$ for all $\alpha>0$ .

Observe that the definition of age-penalty functions covers any non-decreasing function of age that is of sub-exponential order222This is due to the third property in the definition, which is a technical requirement for the proofs. and grows to infinity.

The time-average expected value of the age-penalty or simply the average age-penalty can be expressed as

[TABLE]

Let $\bar{p}_{\pi}$ denote the average age-penalty achieved by a particular policy $\pi$ . The goal of this paper is to find the optimal update policy for minimizing the average age-penalty, which is formulated as

[TABLE]

IV MAIN RESULTS

We begin with a result guaranteeing that the space of threshold-type policies (see Definition 5) contains optimal update policies hence we can focus our attention to these policies for finding solutions to (6).

Note that at time $t=Z_{k}$ , the age $\Delta(t)$ is equal to [math]. In the meanwhile, the battery level $E(t)$ will grow as more energy is harvested. In threshold policies, the threshold $\tau_{E(t)}$ changes according to the battery level $E(t)$ and a new sample is taken at the earliest time that the age $\Delta(t)$ exceeds the threshold $\tau_{E(t)}$ . We define such policies as follows:

Definition 5.

When $E(t)\in\{\ell=1,...,B\}$ represents the battery level at time $t$ , an online policy is said to be a threshold policy if there exists $\tau_{\ell}$ for $\ell=1,...,B$ s.t.

[TABLE]

Note that a policy is said to be stationary if its actions depend only on a current state while being independent of time. An immediate observation is that given $\Delta(t)$ and $E(t)$ threshold policies do not depend on time, hence:

Proposition 1.

All threshold policies are stationary.

Proof.

By definition, the update instants of a threshold policy only depend on the time elapsed since the last update, i.e., $\Delta(t)$ , and the current battery level. ∎

We expect that such stationary policies can minimize $\bar{\Delta}$ among all online policies as energy arrivals follow a Poisson process which is memoryless. Due to the memorylessness of energy arrivals, the evolution of the system can be understood through a renewal type behaviour which suggests that an optimal policy should be stationary.

Indeed, we note the following as the first key result of this paper,

Theorem 1.

There exists a threshold policy that is optimal for solving (6).

Proof.

See Appendix -A. ∎

One significant challenge in the proof of Theorem 1 is that (6) is an infinite time-horizon time-averaged MDP which has an uncountable state space. When the state space is countable, one can analyze infinite time-horizon time-averaged MDP by making a unichain assumption. However, this method cannot be directly applied when state space is uncountable. To resolve this, we use a modified version of the “vanishing discount factor” approach [32] to prove Theorem 1 in two steps:

Show that for every $\alpha>0$ , there exists a threshold policy that is optimal for solving

[TABLE]

Prove that this property also holds when the discount factor $\alpha$ vanishes to zero.

In our search for an optimal policy, we can further reduce the space of policies:

Definition 6.

A threshold policy is said to be a monotone threshold policy if $\tau_{1}\geq\tau_{2}\geq\ldots\geq\tau_{B}$ .

Note that the definition of monotone threshold policies refers only to the case of thresholds that non-increasing in battery levels as opposed to the non-decreasing case.

Let $\Pi^{\rm{MT}}$ be the set of monotone threshold policies, then, the following is true:

Theorem 2.

There exists a monotone threshold policy $\pi\in\Pi^{\rm{MT}}$ that is optimal for solving (6).

Proof.

See Appendix -B. ∎

Theorem 2 implies that in the optimal update policy, update packets are sent out more frequently when the battery level is high and less frequently when the battery level is low. This result is quite intuitive: If the battery is full, arrival energy cannot be harvested; if the battery is empty, update packets cannot be transmitted when needed and the age increases. Hence, both battery overflow and outage are harmful. Monotone threshold policies can address this issue. When the battery level $l$ is high, the threshold $\tau_{l}$ is small to reduce the chance of battery overflow; when the battery level $l$ is low, the threshold $\tau_{l}$ is high to avoid battery outage.

For a policy in $\Pi^{\rm{MT}}$ , the state $(\Delta(t),E(t))$ does not spend a measurable amount of time anywhere $\Delta(t)\geq\tau_{E(t)}$ in which an update is sent out instantly reducing the battery level. Otherwise, the battery level is incremented upon energy harvests while the age is increasing linearly in time. The illustration in Fig. 2 shows the time evolution of the state $(\Delta(t),E(t))$ for policies in $\Pi^{\rm{MT}}$ . If the energy level is $E(Z_{k})=j$ upon the previous update, then the inter-update time $X_{k+1}\in[\tau_{m},\tau_{m-1}]$ holds if and only if $m-j$ packets arrive during the inter-update time. In other words, reaching the battery state $m$ or higher is necessary and sufficient for the next inter-update duration being shorter than some $x$ when $x\in[\tau_{m},\tau_{m-1})$ . Let $Y_{i}$ denote the duration required for $i\geq 1$ successive energy arrivals, which obeys the Erlang distribution at rate $\mu_{H}$ with parameter $i$ ,

[TABLE]

and let $Y_{i}=0$ for $i\leq 0$ .

Accordingly, for policies in $\Pi^{\rm{MT}}$ , the cumulative distribution function (CDF) of inter-update durations, can be expressed as

[TABLE]

From (9), an expression for the transition probability $\Pr(E(Z_{k+1})=i\mid E(Z_{k})=j)$ for $i=0,1,....,B-1$ can be derived333Note that the event $E(Z_{k+1})=i$ happens if and only if $X_{k+1}\in[\tau_{i+1},\tau_{i})$ , accordingly $\Pr(E(Z_{k+1})=i\mid E(Z_{k})=j)=\Pr(X_{k+1}\leq\tau_{i}\mid E(Z_{k})=j)-\Pr(X_{k+1}\leq\tau_{i+1}\mid E(Z_{k})=j)$ .

[TABLE]

Hence, energy states sampled at update instants can be described as a Discrete Time Markov Chain (DTMC) with the transition probabilities in (10) (See Fig. 3). When thresholds are finite, this DTMC is ergodic as any energy state is reachable from any other energy state in $B-1$ steps with positive probability.

Any optimal policy in $\Pi^{\rm{MT}}$ has the following property:

Theorem 3.

An optimal policy for solving (6) is a monotone threshold policy that satisfies the following

[TABLE]

where $\pi^{*}$ is a monotone threshold policy solving (6) and $\tau^{*}_{B}$ is its age threshold for the full battery case.

Proof.

See Appendix -E. ∎

The result in Theorem 3 exhibits a structural property of optimal policies which also appears in the sampling problem that was studied in [19] . The sampling problem in [19] considered sources without energy harvesting, where the packet transmission times were i.i.d. and non-zero. On the one hand, the optimal sampling policy in Theorem 1 of [19] is a threshold policy on an expected age penalty term, and the threshold is exactly equal to the optimal objective value. On the other hand, a sampling problem for an energy harvesting source with zero packet transmission time is considered in the current paper. The optimal sampling policy in Theorem 3 can be rewritten as

[TABLE]

which is a multi-threshold policy on the age penalty function, each threshold $p(\tau_{\ell}^{*})$ corresponding to a battery level $\ell$ . Further, the threshold $p(\tau_{B}^{*})$ associated with a full battery size $E(t)=B$ is equal to the optimal objective value. The results in these two studies are similar to each other. Together, they provide a unified view on optimal sampler design for sources both with and without energy harvesting capability. The proof techniques in these two studies are of fundamental difference.

IV-A Average Age Case

If we take the age-penalty function as an identity function, i.e., $p(\Delta)=\Delta$ , then (6) becomes the problem of minimizing the time-average expected age. In this case, the result in Theorem 3 implies that in optimal monotone threshold policies, inter-update durations can be small as much as the minimum average AoI only when the battery is full. From results in [8] and [9], we know that the minimum average AoI for the infinite battery case is $\frac{1}{2\mu_{H}}$ and this can be achieved asymptotically using the best-effort scheme in [9] or with a threshold policy [8] where all thresholds are nearly equal to $\frac{1}{\mu_{H}}$ . On the other hand, according to Theorem 3, the optimal threshold for the full battery level tends to $\frac{1}{2\mu_{H}}$ as the battery capacity increases. This shows that the optimal monotone threshold policies remain structurally dissimilar to asymptotically optimal policies when the battery capacity is approaching to infinity. The result is more useful when the battery capacity is finite as it may lead to the optimal threshold values of the other battery levels. We will use this in an algorithm for finding near optimal policies for any given integer sized battery capacity. In addition, the special case of Theorem 3 for average age [1] can be derived from a more general result which we provide in Lemma 1. This result shows a relation between the partial derivatives of a non-negative random variable with respect to the thresholds determining the random variable in a similar way to the inter-update duration case.

Lemma 1.

Suppose $X$ is a r.v. that satisfies the following:

[TABLE]

where $0<\tau_{B}\leq...\leq\tau_{2}\leq\tau_{1}$ and for each $i\in\{1,...,B\}$ $F_{i}(x)$ is the CDF of a non-negative random variable. Then:

[TABLE]

Proof.

See Appendix -C. ∎

Corollary 1.

The inter-update intervals, $X$ , for any $\pi\in\Pi^{\rm{MT}}$ satisfy the following:

[TABLE]

$\forall(i,j)\in\{1,2,...,B\}^{2}$ * where $\mathbb{E}\left[X\mid E=j\right]\triangleq\mathbb{E}\left[X_{k}\mid E(Z_{k})=j\right]$ and $\mathbb{E}\left[X^{2}\mid E=j\right]\triangleq\mathbb{E}\left[X_{k}^{2}\mid E(Z_{k})=j\right]$ .*

Note that the transition probabilities (10) do not depend on $\tau_{B}$ hence the steady-state probabilities obtained from (10) also do not depend on $\tau_{B}$ . This leads to a property of $\tau_{B}$ the average age case of Theorem 3 as shown in [1]. The unit-battery case , i.e., $B=1$ case was solved in [8] and [9]. For completeness, this result is summarized in Theorem 4.

Theorem 4.

When $B=1$ , the average age $\bar{\Delta}$ can be expressed as

[TABLE]

and $\tau_{1}^{*}=\bar{\Delta}_{\pi^{*}}=\frac{1}{\mu_{H}}2W(\frac{1}{\sqrt{2}})$ where $W(\cdot)$ is the Lambert-W function.

Proof.

See Appendix -F. ∎

Theorem 5.

When $B=2$ , the average age $\bar{\Delta}$ can be expressed as:

[TABLE]

where

[TABLE]

and

[TABLE]

Proof.

See Appendix -G. ∎

IV-B An Algorithm for Finding Near Optimal Policies

We propose an algorithm to find a near optimal policy $\pi\in\Pi^{\rm{MT}}$ such that $\bar{\Delta}_{\pi}-\bar{\Delta}_{\pi^{*}}\leq\frac{1}{2^{q+1}\mu_{H}}$ for any given $B$ and $q\in\mathbb{Z}^{+}$ . Let $m_{1}(\tau_{1},\tau_{2},...,\tau_{B})$ and $m_{2}(\tau_{1},\tau_{2},...,\tau_{B})$ denote the functions such that:

[TABLE]

where $\Pr(E=j)$ is the steady-state probability for energy state $j$ , $\mathbb{E}\left[X\mid E=j\right]\triangleq\mathbb{E}\left[X_{k}\mid E(Z_{k})=j\right]$ and $\mathbb{E}\left[X^{2}\mid E=j\right]\triangleq\mathbb{E}\left[X_{k}^{2}\mid E(Z_{k})=j\right]$ .

Note that it is straight forward to derive $m_{1}(\tau_{1},\tau_{2},...,\tau_{B})$ and $m_{2}(\tau_{1},\tau_{2},...,\tau_{B})$ using (9) and (10), hence we assume these functions are available for any $B$ .

In the below theorem , we state the main result that we will use in an algorithm for finding near optimal policies:

Theorem 6.

For $B>1$ , the equation

[TABLE]

has a solution with monotone non-increasing thresholds, i.e., $\tau_{B}\leq...\leq\tau_{2}\leq\tau_{1}$ if and only if $\tau_{B}\geq\bar{\Delta}_{\pi^{*}}$ .

Algorithm 1 uses this result to find a near optimal policy $\pi\in\Pi^{\rm{MT}}$ such that $\bar{\Delta}_{\pi}-\bar{\Delta}_{\pi^{*}}\leq\frac{1}{2^{q+1}\mu_{H}}$ . Each iteration in Algorithm 1 halves the interval where the minimum average AoI can be found based on the existence of solution to (6) with the current estimate of the smallest threshold $\hat{\tau}_{B}$ . Accordingly, it is guaranteed that Algorithm 1 finds a solution within a gap to the optimal value that is $\frac{1}{2^{q+1}\mu_{H}}$ .

Algorithm 1 assumes a numerical solver that can solve the transcendental equation in (17), however, the exact solution is required only once at the final step while iterations only require verifying the existence of a solution to (6).

V NUMERICAL RESULTS

For battery sizes $B=1,2,3,4$ , the policies in $\Pi^{\rm{MT}}$ are numerically optimized giving AoI versus energy arrival rate (Poisson) curves in Fig 4. We give the corresponding threshold values in Table I. These results were obtained through exhaustive search for possible threshold values, and Monte Carlo analysis for approximating AoI values in the simulation of the considered system and policies without relying on analytical results. It can be seen that these optimal thresholds and corresponding AoI values (in Table I) validate Theorem 3. Fig. 5 and 6 show the dependency of AoI on threshold values $\tau_{1}$ and $\tau_{2}$ which is consistent with the result in Theorem 5 for the special case of $B=2$ .

VI CONCLUSION

We have studied optimizing a non-linear age penalty in the generation and transmission of status updates by an energy harvesting source with a finite battery. An optimal status updating policy for minimizing the time-average expectation of a general non-decreasing age function $p(\cdot)$ has been obtained. The policy has a monotonic threshold structure: (i) each new update is sent out only when the age is higher than a threshold and (ii) the threshold is a non-increasing function of the instantaneous battery level such that the updates are sent out more frequently when the battery level is high. Furthermore, we have identified an interesting relationship between the smallest optimal threshold $\tau_{B}^{*}$ (i.e., the threshold corresponding to a full battery level) and the optimal objective value $\bar{p}_{\pi^{*}}$ (i.e., the minimum achievable time-average expected age penalty), which is given by

[TABLE]

-A The Proof of Theorem 1

In order to prove Theorem 1, we use a modified version of the “vanishing discount factor” approach [32] which consists of 2 steps:

Step 1. Show that for every $\alpha>0$ , there exists a threshold policy that is optimal for solving

[TABLE]

Step 2. Prove that this property still holds when the discount factor $\alpha$ vanishes to zero.

We first discuss Step 1. Recall that $\mathcal{F}_{t}$ represents the information about the energy arrivals and the update policy during $[0,t]$ . Given $\mathcal{F}_{a}$ , we are interested in finding the optimal online policy during $[a,\infty)$ , which is formulated as

[TABLE]

Observe that, in (19), the term $e^{-\alpha(t-a)}$ ensures that the exponential decay always starts from unity so that the problem is independent of $a$ given $\mathcal{F}_{a}$ . In addition, this problem has the following nice property:

Lemma 2.

There exists an optimal solution to (19) that depends on $\mathcal{F}_{a}$ only through $(\Delta(a),E(a))$ . That is, $(\Delta(a),E(a))$ is a sufficient statistic for solving (19).

Proof.

In Problem (19), the age evolution $\{\Delta(t),t\geq a\}$ is determined by the initial age $\Delta(a)$ at time $a$ and the update policy during $[a,\infty)$ . Further, the update policy during $[a,\infty)$ is determined by the initial age $\Delta(a)$ , the initial battery level $E(a)$ , and the energy counting process $\{H(t)-H(a),t\geq a\}$ . Hence, $\{\Delta(t),t\geq a\}$ is determined by $\Delta(a)$ , $E(a)$ , and $\{H(t)-H(a),t\geq a\}$ .

Recall that $\Delta(0)$ and $E(0)$ are fixed. Hence, for any online update policy, the online update decisions during $[0,a]$ depends only on $\{H(t),t\leq a\}$ . Hence, $\mathcal{F}_{a}$ is determined by $\{H(t),t\leq a\}$ . Because $\{H(t),t\geq 0\}$ is a compound Poisson process, $\{H(t)-H(a),t\geq a\}$ is independent of $\{H(t),t\leq a\}$ . Hence, $\{\Delta(t),t\geq a\}$ depends on $\mathcal{F}_{a}$ only through $\Delta(a)$ and $E(a)$ . By this, $(\Delta(a),E(a))$ is a sufficient statistic for solving (19).∎

By using Lemma 2, we can simplify (19) as (-A) and define a cost function $J_{\alpha}(\Delta(a),E(a))$ which is the optimal objective value of (-A):

[TABLE]

Furthermore, one important question is: Given that the previous update occurs at $Z_{k}=a$ , how to choose the next update time $Z_{k+1}$ . This can be formulated as

[TABLE]

where we have used the fact that if $Z_{k}=a$ , then $\Delta(a)=\Delta(Z_{k})=0$ .

According to the definition of $\Pi^{\mathsf{online}}$ , $Z_{k+1}$ is a finite Markov time, i.e., stopping time, hence the problem of finding $Z_{k+1}$ for a solution to (21) can be formulated as an infinite horizon optimal stopping problem in the interval $[a,\infty)$ . We will consider a gain [31] process $G=(G_{t})_{t\geq a}$ adapted to the filtration $\mathcal{F}_{t}$ where a stopping time $Z_{k+1}$ for a solution to (21) maximizes $\mathbb{E}\left[G_{Z_{k+1}}\mid\mathcal{F}_{a}\right]$ when we choose $Z_{k+1}$ from a family of stopping times based on $\mathcal{F}_{t}$ . Let $\mathfrak{M}_{a}$ denote this family of $Z_{k+1}$ s which can be expressed as:

[TABLE]

Note that a stopping time in $\mathfrak{M}_{a}$ may violate energy causality however our definition of the gain process will guarantee that those stopping times cannot be optimal.

We will define the gain process $(G_{t})_{t\geq a}$ based on the value of the discounted cost when an update is sent at a particular time $t$ . The gain process $(G_{t})_{t\geq a}$ for $E(t)>0$ corresponds to the additive inverse of this cost and can be written as follows:

[TABLE]

Note that the stopping time cannot be at time $t$ when $E(t)=0$ as there is no energy to send another update in that case. To cover this case, we set $G_{t}$ to $-\infty$ so that a stopping time $Z_{k+1}$ maximizing $\mathbb{E}\left[G_{Z_{k+1}}\mid\mathcal{F}_{a}\right]$ should satisfy energy causality hence belongs to an online policy. In other words, the stopping time $Z_{k+1}$ in a solution to (21) maximizes $\mathbb{E}\left[G_{Z_{k+1}}\mid\mathcal{F}_{a}\right]$ among all the stopping times in $\mathfrak{M}_{a}$ .

Alternatively, the gain process $(G_{t})_{t\geq a}$ can be expressed in terms of the cost defined in (-A) as follows

[TABLE]

for $t\geq a$ and $E(t)>0$ .

Let’s define $J(0,-1):=\infty$ so that (-A) holds for the $E(t)=0$ as well. Notice that the process $G_{t}$ is driven by the random process $E(t)$ which is not conditioned on any particular value of $E(a)$ while being adapted to the filtration $\mathcal{F}_{t}$ . However, for a policy solving (21), the stopping time $Z_{k+1}$ depends on $E(a)$ as it maximizes $\mathbb{E}\left[G_{Z_{k+1}}\mid\mathcal{F}_{a}\right]$ which depends on $E(a)$ through the filtration $\mathcal{F}_{a}$ .

Accordingly, we define the stopping problem of maximizing the expected gain in the given interval $[a,\infty)$ as in the following:

[TABLE]

Based on this formulation, we will show that the optimal stopping time exists and is given by the following stopping rule for $Z_{k+1}$ :

[TABLE]

where $S$ is the Snell envelope [31] for $G$ :

[TABLE]

Showing that $Z_{k+1}$ in (25) is finite w.p.1 is sufficient to prove the existence of the optimal stopping time and the optimality of the stopping rule in (25)(see [31, Theorem 2.2.]). Consider the lemma below and its proof in order to see the finiteness of $Z_{k+1}$ in (25):

Lemma 3.

*For the stopping rule in (25) $Z_{k+1}$ is finite w.p.1, i.e., $\Pr(Z_{k+1}<\infty)=1$ . *

Proof.

Consider the Markov time $Q_{k+1}$ which is defined as follows:

[TABLE]

Clearly, the stopping time $Z_{k+1}$ chosen in (25) is earlier than $Q_{k+1}$ as $Q_{k+1}$ has an additional stopping condition $E(t)=B$ . This means that if $\Pr(Q_{k+1}<\infty)=1$ , then $\Pr(Z_{k+1}<\infty)=1$ .

Accordingly, for the proof of this lemma, it is sufficient to show that $Q_{k+1}$ is finite w.p.1. We will show this by showing the finiteness of (i) the first time $t\geq Z_{k}=a$ such that $E(t)=B$ , and (ii) the duration between this time and the Markov time $Q_{k+1}$ . Note that $E(t)=B$ condition is always satisfied after it reached for the first time. Let $R_{k+1}$ be the Markov time representing the first time when $E(t)=B$ is satisfied:

[TABLE]

(i) Observe that the Markov time $R_{k+1}$ is finite w.p.1 as it is stochastically dominated by $a+Y_{B}$ where $Y_{B}$ is an Erlang distributed random variable with parameter $B$ which obeys (8) and $\Pr(Y_{B}<\infty)=1$ .

(ii) In order to see that $Q_{k+1}-R_{k+1}$ is also finite, consider the time period after $R_{k+1}$ , i.e., $[R_{k+1},\infty)$ . As $E(t)=B$ for any $t\geq R_{k+1}$ , the evolution of $G_{t}$ becomes deterministic after $t\geq R_{k+1}$ :

[TABLE]

for $t\geq R_{k+1}$ .

On the other hand, for $t\geq R_{k+1}$ , the Snell envelope is $S_{t}=\operatorname*{ess\,sup}_{t^{\prime}\in\mathfrak{M}_{t}}G_{t^{\prime}}=\sup_{t^{\prime}\geq t}G_{t^{\prime}}$ . We will show that $G_{t}$ is always non-increasing after some finite time so that $S_{t}=G_{t}$ is always satisfied after that time.

In order to see this, consider the change in $G_{t}$ for $t\geq R_{k+1}$ . As

[TABLE]

and $p(t-a)$ is non-decreasing, for $t\geq R_{k+1}$ , $G_{t}$ is non-increasing if $t\geq t_{c}$ for some $t_{c}$ such that

[TABLE]

This implies that, for $t\geq\max\{R_{k+1},t_{c}\}$ , $G_{t}=\sup_{t^{\prime}\geq t}G_{t^{\prime}}$ and hence $S_{t}=G_{t}$ . Accordingly, the stopping conditions of $Q_{k+1}$ are satisfied for the first time when $t=\max\{R_{k+1},t_{c}\}$ which means $Q_{k+1}=\max\{R_{k+1},t_{c}\}$ .

As $\alpha J_{\alpha}(0,B-1)$ is finite, $t_{c}$ is finite which implies $Q_{k+1}$ is finite w.p.1 as $R_{k+1}$ is finite w.p.1. This completes the proof. ∎

We just showed that the Markov time in (25) is finite w.p.1 and this means that it is the optimal stopping time by [31, Theorem 2.2.]. Next, we show that the optimal stopping rule in (25) is a threshold policy by using the properties of the cost function in (-A). To relate the optimal stopping time and the cost function in (-A), we will express the Snell envelope in an alternative way.

Notice that the Snell envelope can be written by substituting (-A) in (26) as follows:

[TABLE]

Hence,

[TABLE]

Accordingly, using the definition of $J_{\alpha}(\Delta(a),E(a))$ , we can write

[TABLE]

Therefore, as the first terms in (29) and (34) are identical, the optimal stopping rule in (25) is equivalent to

[TABLE]

Next, we show that the stopping rule in (35) is a threshold rule in age. In order to show this, let us define the function $\rho_{\alpha}(\cdot):\{0,1,...,B\}\rightarrow[0,\infty)$ such that:

[TABLE]

We can show that for any $\Delta\geq\rho_{\alpha}(\ell)$ , it is guaranteed that $J_{\alpha}(0,\ell-1)=J_{\alpha}(\Delta,\ell)$ due to the following reasons:

•

For any $\Delta$ and $\ell\in\{0,1,2,..,B\}$ , $J_{\alpha}(\Delta,\ell)$ is smaller than or equal to $J_{\alpha}(0,\ell-1)$ as :

[TABLE]

where the inequality is true as the expectation is conditioned on policies with $Z_{k+1}=t_{a}$ .

•

For any $\ell\in\{0,1,2,..,B\}$ , $J_{\alpha}(\Delta,\ell)$ is non-decreasing in $\Delta$ as :

[TABLE]

for any $\Delta^{\prime}\geq\Delta$ and $\theta(\Delta):=(Z_{k}=t_{a}-\Delta,Z_{k+1}\geq t_{a},E(t_{a})=\ell)$ where the inequality follows from the fact that $p(\cdot)$ is non-decreasing and the second equality is due to that, given $Z_{k+1}$ , the integrated values are conditionally independent from $Z_{k}$ .

Accordingly, $J_{\alpha}(\Delta,\ell)=J_{\alpha}(0,\ell-1)$ for any $\ell\in\{0,1,2,..,B\}$ and $\Delta\geq\rho_{\alpha}(\ell)$ . Therefore, the stopping rule in (35) is equivalent to:

[TABLE]

for $\ell\in\{0,1,2,..,B\}$ .

We showed that the stopping rule in (37) gives the optimal stopping time $Z_{k+1}$ for a policy solving (21). Now, we can start discussing Step 2 in order to show that the optimal stopping rule with the same structure also gives a solution to (6).

In this part (Step 2) of the proof, we will consider the optimal stopping rules in (37) while the discount factor $\alpha$ is vanishing to zero. Notice that the policy solving (21) is identified by $\rho_{\alpha}(\ell)$ due to (37). Let $\pi_{\alpha}$ and $\Delta_{\pi_{\alpha}}(t)$ be a policy obeying (37) and solving (21) for discount factor $\alpha$ and the age at time $t$ for that policy, respectively. We will show the following

[TABLE]

which implies that for any $\{\beta_{n}\}_{n\geq 1}\downarrow 0$ sequence, $\pi_{\beta_{n}}$ converges to the policy solving (6).

To prove the equivalence in (-A), we will use Feller’s Tauberian theorem [33] (also see the Tauberian theorem in [34]) which can be stated as follows:

Theorem 7.

**(Feller 1971)

Let $f(t)$ be a Lebesgue-measurable, bounded, real function. Then,**

[TABLE]

Moreover, if the central inequality is an equality, then all inequalities are equalities.

This theorem can be applied for the function $f(t)=\mathbb{E}\left[p(\Delta_{\pi_{\beta}}(t))\right]$ where $\beta>0$ 444Note that the function $\mathbb{E}\left[p(\Delta_{\pi_{\beta}}(t))\right]$ is Lebesgue-measurable (as $p(\cdot)$ is non-decreasing) and bounded (as $X_{k}$ s are bounded w.p.1 for a policy obeying (37)).. To simplify the inequalities for this case, let’s define a function $J_{\alpha;\beta}(\Delta(a),E(a))$ for $\beta>0$ such that:

[TABLE]

Note that for $a=0$ :

[TABLE]

Accordingly, we can apply Feller’s Tauberian theorem for $f(t)=\mathbb{E}\left[p(\Delta_{\pi_{\beta}}(t))\right]$ when $a=0$ giving:

[TABLE]

We can show that the inequalities in (-A) are satisfied with equality for any $\pi_{\beta}$ with $\beta>0$ as $\lim_{t_{f}\rightarrow\infty}\frac{\int_{0}^{t_{f}}\mathbb{E}\left[\Delta_{\pi_{\beta}}(t)\right]dt}{t_{f}}$ exists for any $\pi_{\beta}$ with $\beta>0$ . To see this, consider the following lemma:

Lemma 4.

For $\alpha>0$ and $\{Z_{k+1},k\geq 0\}$ with $Z_{k+1}$ as in (37), the following holds:

[TABLE]

Proof.

The proof of Lemma 3 showed that for $Z_{k}=a$ and optimal stopping time solving (24) it is true that $\Pr(X_{k+1}\geq x)\leq\Pr(t_{c}-t_{a}+Y_{B}\geq x)$ where $t_{c}$ is the deterministic time defined in (31) and $Y_{B}$ is an Erlang distributed with parameter $B$ which obeys (8). Accordingly, $\mathbb{E}[p(X_{k+1})]$ is finite as $\mathbb{E}[p(\alpha J_{\alpha}(0,B)+Y_{B})]$ is finite for $\alpha>0$ . On the other hand, $\lim_{n\rightarrow+\infty}\frac{1}{n}\sum_{k=0}^{n}X_{k}<\infty$ w.p.1 and $\lim_{n\rightarrow+\infty}\frac{1}{n}\sum_{k=0}^{n}X_{k}>\frac{1}{\mu_{H}}$ w.p.1 due to the energy causality constraint. Therefore, we can apply the derivation steps in [35, Theorem 5.4.5] and obtain (4). This completes the proof. ∎

Lemma 4 and (-A) imply the following for for $a=0$ and $\beta>0$ :

[TABLE]

Now, consider an arbitrary online policy $\pi$ for which $\mathbb{E}\left[p(\Delta_{\pi}(t))\right]$ is Lebesgue-measurable and bounded, then apply Feller’s Tauberian theorem for $f(t)=\mathbb{E}\left[p(\Delta_{\pi}(t))\right]$ giving the following inequality when $t_{a}=0$ :

[TABLE]

Note that for $\alpha>0$ , $J_{\alpha;\beta}(0,0)$ is minimized for $\alpha=\beta$ , hence:

[TABLE]

Combining (43), (-A) and (-A), we get (-A). This completes the proof.

-B The Proof of Theorem 2

Theorem 2 follows from the proof of Theorem 1. To prove the theorem it is sufficient to show that for any $\alpha>0$ , $\rho_{\alpha}(\ell)$ (see (36)) is non-increasing in $\ell$ as this guarantees that the monotonicity of optimal thresholds holds for any sequence of $\alpha$ values that vanishes to zero. To see this, consider the following lemma and the argument provided below its proof:

Lemma 5.

For $J(\cdot,\cdot)$ is the function defined in (-A), $J_{\alpha}(0,\ell)-J_{\alpha}(0,\ell+1)$ is non-increasing in $\ell\in\{0,1,...,B-1\}$ for any $\alpha\geq 0$ .

Proof.

First, consider the alternative formulation of $J_{\alpha}(r,\ell+1)$ in below:

[TABLE]

where the outer expectation is taken over $Z_{k+1}$ .

Let

[TABLE]

be the joint distribution of $Z_{k+1}\in\mathfrak{M}_{a}$ and the energy harvested during $[a,z]$ . Then, we can write $J_{\alpha}(r,\ell+1)$ as follows:

[TABLE]

Similarly,

[TABLE]

Now, let $K_{r,\ell+2}^{*}(z,\sigma)$ be the distribution corresponding to the update time $Z_{k+1}\in\mathfrak{M}_{a}$ that is optimal in (-B), which means:

[TABLE]

Clearly, $K_{r,\ell+2}^{*}(z,\sigma)$ is not necessarily the joint distribution corresponding the update time $Z_{k+1}\in\mathfrak{M}_{a}$ that is optimal for (-B), hence:

[TABLE]

Combining (-B) and (-B) gives:

[TABLE]

which implies :

[TABLE]

Now, consider the case when $r=0$ and $\ell=B-2$ for (-B):

[TABLE]

which implies:

[TABLE]

Suppose that the inequality below is true for $j\geq\ell+1$ :

[TABLE]

Then, we have:

[TABLE]

This means that the inequality (53) is also true for $j=\ell$ so is for any $j=0,1,...,B-2$ by induction. Combining this and (-B):

[TABLE]

for $\alpha\geq 0$ , $r\geq 0$ and ∎

Lemma 5 shows that $\rho_{\alpha}(\ell)$ is non-increasing in $\ell$ for $\alpha>0$ . It is sufficient to consider (55) when $r=\rho_{\alpha}(\ell)$ :

[TABLE]

which implies $\rho_{\alpha}(\ell-1)\geq\rho_{\alpha}(\ell)$ combining

[TABLE]

and that $J_{\alpha}(r,\ell-1)$ is non-decreasing 555This fact is provided in the proof of Theorem 1. in $r$ . Accordingly, the optimal policies solving (19) are monotone threshold policies, i.e., $\pi_{\alpha}\in\Pi^{MT}$ for any $\alpha>0$ .

-C The proof of Lemma 1

Let $\tau_{B+1}=0$ . Then, consider:

[TABLE]

for $i=0,1,...,B$ .

-D Useful Results for Asymptotic Properties

Lemma 6, 7 and 8 provide some useful results that combine ergodicity properties and renewal-reward theorem for a DTMC with transition probabilities in (10).

Lemma 6.

The DTMC with the transition probabilities in (10) is ergodic for a monotone threshold policy where $\tau_{1}$ is finite.

Proof.

Consider an energy state $j$ in $[0,B-1]$ . We will show that any other energy state $i$ is reachable from $j$ in at most $B-1$ steps with a positive probability. For $i\geq j$ , the higher energy state $i$ is reachable from $j$ in one step with a positive probability as for $i=B-1$ , $\Pr(Y_{B-j}\leq\tau_{B-1})$ is strictly positive and for $j\leq i<B-1$ :

[TABLE]

as $\tau_{i+1}\leq\tau_{i}$ and $i-j\geq 0$ .

Similarly, the energy state $i=j-1$ for $j=1,....,B-1$ can be reached from $j$ with a probability $1-\Pr(Y_{1}\leq\tau_{j})$ which is strictly positive as $\tau_{j}$ is finite. This means that any state $i<j$ can be reached from $j$ in at most $B-1$ steps with a positive probability. ∎

Lemma 7.

For monotone threshold policies with finite $\tau_{1}$ , the following is true:

[TABLE]

where $\Pr(E=j)$ is the steady-state probability for energy state $j$ , $\mathbb{E}\left[X\mid E=j\right]\triangleq\mathbb{E}\left[X_{k}\mid E(Z_{k})=j\right]$ and $\mathbb{E}\left[X^{2}\mid E=j\right]\triangleq\mathbb{E}\left[X_{k}^{2}\mid E(Z_{k})=j\right]$ .

Proof.

Consider:

[TABLE]

where $L_{j}$ is the number of $k$ s in $[0,n]$ such that $E(Z_{k})=j$ and $X_{\ell;j}$ is a r.v. with the CDF $\Pr(X_{\ell;j}\leq x)=\Pr(X_{k}\leq x\mid E(Z_{k})=j)$ for some $k$ .

Note that the sequence $X_{0;j},X_{1;j},...,X_{L_{j};j}$ is i.i.d. for any $j$ and their mean is bounded as all thresholds are finite, hence:

[TABLE]

Due to the ergodicity of $E(Z_{k})$ s (Lemma 6):

[TABLE]

Therefore,

[TABLE]

Similarly,

[TABLE]

∎

Lemma 8.

For a threshold policy where $\tau_{1}$ is finite, the average age $\bar{\Delta}$ is finite (w.p.1) and given by the following expression.

[TABLE]

Proof.

The proof is a generalization of Theorem 5.4.5 in [35] for the case where $X_{k}$ s are non-i.i.d. but the limits still exist (w.p.1). When $X_{k}$ s are i.i.d. with $\mathbb{E}[X_{k}]<\infty$ and $\mathbb{E}[X_{k}^{2}]<\infty$ , the convergence (w.p.1) of the limits is guaranteed. ∎

-E The proof of Theorem 3

Theorem 3 follows from the proof of Theorem 1. The proof of Lemma 3 shows that given that $Z_{k}=a$ is the last update time and $E(t^{\prime})=B$ for some $t^{\prime}>a$ , the condition $S_{t}=G_{t}$ is satisfied for the first time when $t\geq\{t^{\prime},t_{c}\}$ (see (31)). This means that $\rho_{\alpha}(B)=\alpha J_{\alpha}(0,B-1)$ for $\rho_{\alpha}(E(t))$ in (37). Accordingly,

[TABLE]

which follows from the application of Feller’s Tauberian theorem (applying Theorem 7 for $f(t)=\mathbb{E}\left[p(\Delta_{\pi}(t))\mid E(0)=B\right]$ ). This completes the proof.

-F The Proof of Theorem 4

By Lemma 8 and Lemma 7, $\bar{\Delta}$ for $B=1$ can be computed as follows

[TABLE]

where $\Pr(E=0)=1$ , $\mathbb{E}\left[X^{2}\mid E=0\right]=\tau_{1}^{2}+(\frac{2}{\mu_{H}^{2}}+\frac{2}{\mu_{H}}\tau_{1})e^{-\mu_{H}\tau_{1}}$ and $\mathbb{E}\left[X\mid E=0\right]=\tau_{1}+$ $\frac{1}{\mu_{H}}e^{-\mu_{H}\tau_{1}}$ . Accordingly, $\bar{\Delta}$ is given by (13). By Theorem 3, $\tau_{1}^{*}=\bar{\Delta}_{\pi^{*}}$ and combining this with (13) results in

[TABLE]

Solving (61) gives that $(\tau_{1}^{*})^{2}=\frac{2}{\mu_{H}}e^{-\mu_{H}\tau_{1}^{*}}$ which means $\tau_{1}^{*}=\frac{1}{\mu_{H}}2W(\frac{1}{\sqrt{2}})$ .

-G The Proof of Theorem 5

By Lemma 8 and Lemma 7, $\bar{\Delta}$ for $B=2$ is the following:

[TABLE]

The probability of being in $E=1$ , i.e. $\Pr(E=1)$ can be solved using:

[TABLE]

Combining (63) and (9),

[TABLE]

Now, we can obtain $\mathbb{E}\left[X^{2}\mid E=j\right]$ , $\mathbb{E}\left[X\mid E=j\right]$ using (9). Combining these with (64) and substituting in (-G) gives (5).

-H The Proof of Theorem 6

First, we show that $\tau_{B}\geq\bar{\Delta}_{\pi^{*}}$ is necessary to find a solution to (17) with monotonic non-increasing thresholds. Then, we show that this condition is also sufficient.

The necessity part of the proof follows from the fact that $\tau_{B}=\bar{\Delta}_{\pi}$ for any solution of (17), as $\bar{\Delta}_{\pi}=m_{1}(\tau_{1},\tau_{2},...,\tau_{B})/2m_{2}(\tau_{1},\tau_{2},...,\tau_{B})$ by Lemma 8 and Lemma 7. Therefore, by the optimality of $\bar{\Delta}_{\pi^{*}}$ , $\tau_{B}\geq\bar{\Delta}_{\pi^{*}}$ must hold for any solution of (17).

Now, we consider the sufficiency part of the proof where it is useful to define a function $\phi:[0,\infty)^{B}\rightarrow\mathbb{R}$ as follows:

[TABLE]

Using this definition, (17) can be written as,

[TABLE]

We need to show that given $\tau_{B}\geq\bar{\Delta}_{\pi^{*}}$ , one can find a set of non-negative real numbers $d_{1},....,d_{B-1}$ such that $\phi(\tau_{B},d_{B-1},...,d_{1})=0$ . Accordingly, $\tau_{B}$ and $d_{1},....,d_{B-1}$ constitute a solution to (17) with monotonic non-decreasing thresholds where $\tau_{i}=\tau_{i+1}+d_{i}$ , for $i=1,...,B-1$ . In order to prove this, let us start with the optimal policy $\pi^{*}=(\tau_{1}^{*},\tau_{2}^{*}...,\tau_{B}^{*})$ where we know that $\tau_{B}^{*}=\bar{\Delta}_{\pi^{*}}$ by Theorem 3. Starting from the optimal policy $\pi^{*}$ , the policy will be modified following the procedure below:

•

Phase 1: Modify the policy $\pi^{(+)}=(\tau_{1}^{(+)},\tau_{2}^{(+)}...,\tau_{B}^{(+)})$ from the previous phase to the policy $\pi^{(-)}=(\tau_{1}^{(-)},\tau_{2}^{(-)}...,\tau_{B}^{(-)})$ so that $\tau_{B}^{(-)}=\min\{\tau_{B-1}^{(+)},\tau_{B}\}$ while $\tau_{i}^{(-)}=\tau_{i}^{(+)}$ , for $i=1,...,B-1$ . Then, go to Phase 2 with policy $\pi^{(-)}$ .

•

Phase 2: Modify the policy $\pi^{(-)}=(\tau_{1}^{(-)},\tau_{2}^{(-)}...,\tau_{B}^{(-)})$ from the previous phase to the policy $\pi^{(+)}=(\tau_{1}^{(+)},\tau_{2}^{(+)}...,\tau_{B}^{(+)})$ so that $\tau_{B}^{(+)}=\tau_{B}^{(-)}$ while $\tau_{i}^{(+)}=\tau_{i}^{(-)}+x$ for $i=1,...,B-1$ where $x>0$ is the solution of the following:

[TABLE]

If $\tau_{B}^{(-)}=\tau_{B}$ , the procedure stops and (65) gives the solution that $\phi(\tau_{B},d_{B-1},...,d_{1})=0$ , otherwise go to Phase 1 with policy $\pi^{(+)}$ .

It can be shown that the procedure always stops with a solution that $\phi(\tau_{B},d_{B-1},...,d_{1})=0$ . To see this, first observe that (65) always has a solution as long as:

[TABLE]

This is due to the following facts about the function $\phi(\tau_{B}^{(-)},\tau_{B-1}^{(-)}-\tau_{B}^{(-)}+x,...,\tau_{1}^{(-)}-\tau_{2}^{(-)}+x)$ : (i) it is a continuous function of $x$ , (ii) it goes to $-\infty$ as $x$ grows.

Next, observe that (66) always holds, i.e.,

[TABLE]

is positive. This can be seen by considering:

[TABLE]

which follows from the fact that $\Pr(E=j)$ does not depend on $\tau_{B}$ (see (10)) and can be further simplified by Lemma 1, hence:

[TABLE]

Accordingly, we have:

[TABLE]

where the inequality follows from the fact that $m_{1}(\tau_{1}^{(+)},\tau_{2}^{(+)},...,\tau)$ being the average inter-update time is always positive.

Therefore, (65) can be always satisfied in Phase 2. Also, as the second smallest threshold is strictly increased in Phase 2, the smallest threshold can be moved toward $\tau_{B}$ in Phase 1. Also, it can be shown that the procedure does not converge any policy other than the policy that $\phi(\tau_{B},d_{B-1},...,d_{1})=0$ . This can be seen considering the following:

[TABLE]

hence,

[TABLE]

which implies that the procedure cannot converge to a policy with $\tau_{B}^{(+)}<\tau_{B}$ as the RHS of (67) is positive 666This follows from the fact that any increase in thresholds causes an increase in the battery overflow probability which means an increase in the average inter-update duration, i.e, $m_{1}(\tau_{1},\tau_{2},...,\hat{\tau}_{B})$ . and does not vanish for a finite set of thresholds. Therefore, as the smallest threshold of the policies modified by the procedure is increased up to $\tau_{B}$ , a solution that $\phi(\tau_{B},d_{B-1},...,d_{1})=0$ is eventually reached. This completes the proof.

Bibliography35

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] B. T. Bacinoglu, Y. Sun, E. Uysal-Biyikoglu, and V. Mutlu, “Achieving the age-energy tradeoff with a finite-battery energy harvesting source,” in 2018 IEEE International Symposium on Information Theory (ISIT) , June 2018, pp. 876–880.
2[2] S. Kaul, M. Gruteser, V. Rai, and J. Kenney, “Minimizing age of information in vehicular networks,” in Sensor, Mesh and Ad Hoc Communications and Networks (SECON), 2011 8th Annual IEEE Communications Society Conference on , June 2011, pp. 350–358.
3[3] S. Kaul, R. Yates, and M. Gruteser, “Real-time status: How often should one update?” in INFOCOM 2012 , pp. 2731–2735.
4[4] R. Zviedris, A. Elsts, G. Strazdins, A. Mednis, and L. Selavo, “Lynxnet: Wild animal monitoring using sensor networks,” in REALWSN 2010 , 2010, pp. 170–173.
5[5] K. R. Chevli, P. Kim, A. Kagel, D. Moy, R. Pattay, R. Nichols, and A. D. Goldfinger, “Blue force tracking network modeling and simulation,” in MILCOM 2006 , Oct 2006, pp. 1–7.
6[6] Y. Sun, Y. Polyanskiy, and E. Uysal-Biyikoglu, “Remote estimation of the wiener process over a channel with random delay,” in 2017 IEEE International Symposium on Information Theory (ISIT) , June 2017, pp. 321–325.
7[7] T. Bacinoglu, E. T. Ceran, and E. Uysal-Biyikoglu, “Age of information under energy replenishment constraints,” in Proc. Info. Theory and Appl. Workshop , Feb. 2015.
8[8] T. Bacinoglu and E. Uysal-Biyikoglu, “Scheduling status updates to minimize age of information with an energy harvesting sensor,” in 2017 IEEE International Symposium on Information Theory (ISIT) , Jun. 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Optimal Status Updating with a Finite-Battery Energy Harvesting Source

Abstract

Index Terms:

I Introduction

I-A Contributions

I-B Paper Organization

II Related Work

III System Model

Definition 1**.**

Definition 2**.**

Definition 3**.**

Definition 4**.**

IV MAIN RESULTS

Definition 5**.**

Proposition 1**.**

Proof.

Theorem 1**.**

Proof.

Definition 6**.**

Theorem 2**.**

Proof.

Theorem 3**.**

Proof.

IV-A Average Age Case

Lemma 1**.**

Proof.

Corollary 1**.**

Theorem 4**.**

Proof.

Theorem 5**.**

Proof.

IV-B An Algorithm for Finding Near Optimal Policies

Theorem 6**.**

V NUMERICAL RESULTS

VI CONCLUSION

-A The Proof of Theorem 1

Lemma 2**.**

Proof.

Lemma 3**.**

Proof.

Theorem 7**.**

Lemma 4**.**

Proof.

-B The Proof of Theorem 2

Lemma 5**.**

Proof.

-C The proof of Lemma 1

-D Useful Results for Asymptotic Properties

Lemma 6**.**

Proof.

Lemma 7**.**

Proof.

Lemma 8**.**

Proof.

-E The proof of Theorem 3

-F The Proof of Theorem 4

-G The Proof of Theorem 5

-H The Proof of Theorem 6

Definition 1.

Definition 2.

Definition 3.

Definition 4.

Definition 5.

Proposition 1.

Theorem 1.

Definition 6.

Theorem 2.

Theorem 3.

Lemma 1.

Corollary 1.

Theorem 4.

Theorem 5.

Theorem 6.

Lemma 2.

Lemma 3.

Theorem 7.

Lemma 4.

Lemma 5.

Lemma 6.

Lemma 7.

Lemma 8.