Optimal Age over Erasure Channels

Elie Najm; Emre Telatar; Rajai Nasser

arXiv:1901.01573·cs.IT·October 22, 2021

Optimal Age over Erasure Channels

Elie Najm, Emre Telatar, Rajai Nasser

PDF

Open Access

TL;DR

This paper investigates optimal coding strategies to minimize the average age of information over erasure channels, providing closed-form solutions for equal alphabet sizes and bounds for different sizes, advancing understanding of age optimization in communication systems.

Contribution

It introduces a novel analysis of age minimization over erasure channels, deriving closed-form solutions for equal alphabet sizes and bounds for differing sizes, using random coding arguments.

Findings

01

Trivial coding strategy is optimal when source and channel alphabets are equal.

02

Closed-form expression for average age in the equal alphabet case.

03

Random coding approaches approach optimal age as source alphabet size increases.

Abstract

Previous works on age of information and erasure channels have dealt with specific models and computed the average age or average peak age for certain settings. In this paper, given a source that produces a letter every $T_{s}$ seconds and an erasure channel that can be used every $T_{c}$ seconds, we ask what is the coding strategy that minimizes the time-average age of information that an observer of the channel output incurs. We first analyze the case where the source alphabet and the channel-input alphabet have the same size. We show that a trivial coding strategy is optimal and a closed form expression for the age can be derived. We then analyze the case where the alphabets have different sizes. We use a random coding argument to bound the average age and show that the average age achieved using random codes converges to the optimal average age of linear block codes as the source…

Equations372

Δ = τ \to \infty lim \frac{1}{τ} \int_{0}^{τ} Δ (t) d t .

Δ = τ \to \infty lim \frac{1}{τ} \int_{0}^{τ} Δ (t) d t .

{f_{i}:\mathcal{U}^{\big{\lfloor}\frac{iT_{c}-t_{0}^{s}}{T_{s}}\big{\rfloor}}\to\mathcal{V}}.

{f_{i}:\mathcal{U}^{\big{\lfloor}\frac{iT_{c}-t_{0}^{s}}{T_{s}}\big{\rfloor}}\to\mathcal{V}}.

{g_{i}:(\mathcal{V}\cup\{?\})^{i}\to\Big{(}\mathcal{U}\times\Big{\{}1,2,\ldots,\Big{\lfloor}\frac{iT_{c}-t_{0}^{s}}{T_{s}}\Big{\rfloor}\Big{\}}\Big{)}\cup\{\textsc{erasure}\}.}

{g_{i}:(\mathcal{V}\cup\{?\})^{i}\to\Big{(}\mathcal{U}\times\Big{\{}1,2,\ldots,\Big{\lfloor}\frac{iT_{c}-t_{0}^{s}}{T_{s}}\Big{\rfloor}\Big{\}}\Big{)}\cup\{\textsc{erasure}\}.}

Δ_{C} (t) = t - m T_{s} - t_{0}^{s} .

Δ_{C} (t) = t - m T_{s} - t_{0}^{s} .

Δ_{C} (t) = t - m_{⌊ \frac{t}{T _{c}} ⌋} T_{s} - t_{0}^{s} .

Δ_{C} (t) = t - m_{⌊ \frac{t}{T _{c}} ⌋} T_{s} - t_{0}^{s} .

Δ_{C} = τ \to \infty lim \frac{1}{τ} \int_{0}^{τ} Δ_{C} (t) d t,

Δ_{C} = τ \to \infty lim \frac{1}{τ} \int_{0}^{τ} Δ_{C} (t) d t,

Δ_{ϵ, C} = τ \to \infty lim \frac{1}{τ} \int_{0}^{τ} Δ_{ϵ, C} (t) d t,

Δ_{ϵ, C} = τ \to \infty lim \frac{1}{τ} \int_{0}^{τ} Δ_{ϵ, C} (t) d t,

Δ_{ϵ, C} \leq D + δ,

Δ_{ϵ, C} \leq D + δ,

Δ_{ϵ} = C \in Γ in f Δ_{ϵ, C},

Δ_{ϵ} = C \in Γ in f Δ_{ϵ, C},

Δ_{ϵ} = \frac{1}{2 λ} + \frac{1 + ϵ}{2 μ ( 1 - ϵ )} .

Δ_{ϵ} = \frac{1}{2 λ} + \frac{1 + ϵ}{2 μ ( 1 - ϵ )} .

Δ_{ϵ} = \frac{1}{2 λ} + \frac{2 [ - d λ t _{0}^{s} ] - 1}{2 d λ} + \frac{1 + ϵ}{2 μ ( 1 - ϵ )},

Δ_{ϵ} = \frac{1}{2 λ} + \frac{2 [ - d λ t _{0}^{s} ] - 1}{2 d λ} + \frac{1 + ϵ}{2 μ ( 1 - ϵ )},

Y_{i} = max {S_{l} : l \geq 0 and S_{l} \leq i} .

Y_{i} = max {S_{l} : l \geq 0 and S_{l} \leq i} .

N \to \infty lim \frac{1}{N} i = 1 \sum N [ρ Y_{i} + α] = \frac{1}{2},

N \to \infty lim \frac{1}{N} i = 1 \sum N [ρ Y_{i} + α] = \frac{1}{2},

a = 0 \sum d - 1 [\frac{c}{d} a + α] = \frac{d - 1}{2} + [d α],

a = 0 \sum d - 1 [\frac{c}{d} a + α] = \frac{d - 1}{2} + [d α],

a = 0 \sum d - 1 [\frac{c}{d} a + α]

a = 0 \sum d - 1 [\frac{c}{d} a + α]

= a = 0 \sum d - 1 [\frac{( c a + ⌊ d α ⌋ mod d ) + [ d α ]}{d}] = (*) b = 0 \sum d - 1 [\frac{b + [ d α ]}{d}] = (†) b = 0 \sum d - 1 \frac{b + [ d α ]}{d} = \frac{d - 1}{2} + [d α],

Δ_{ϵ} = τ \to \infty lim \frac{1}{τ} \int_{0}^{τ} Δ_{ϵ} (t) d t .

Δ_{ϵ} = τ \to \infty lim \frac{1}{τ} \int_{0}^{τ} Δ_{ϵ} (t) d t .

Δ_{ϵ}

Δ_{ϵ}

τ \to \infty lim \frac{1}{τ} i = 1 \sum ⌊ \frac{τ}{T _{c}} ⌋ \int_{(i - 1) T_{c}}^{i T_{c}} Δ_{ϵ} (t) d t \leq Δ_{ϵ} \leq τ \to \infty lim \frac{1}{τ} i = 1 \sum ⌊ \frac{τ}{T _{c}} ⌋ + 1 \int_{(i - 1) T_{c}}^{i T_{c}} Δ_{ϵ} (t) d t .

τ \to \infty lim \frac{1}{τ} i = 1 \sum ⌊ \frac{τ}{T _{c}} ⌋ \int_{(i - 1) T_{c}}^{i T_{c}} Δ_{ϵ} (t) d t \leq Δ_{ϵ} \leq τ \to \infty lim \frac{1}{τ} i = 1 \sum ⌊ \frac{τ}{T _{c}} ⌋ + 1 \int_{(i - 1) T_{c}}^{i T_{c}} Δ_{ϵ} (t) d t .

M_{τ} = ⌊ \frac{τ}{T _{c}} ⌋ and Δ_{ϵ, i} = \frac{1}{T _{c}} \int_{(i - 1) T_{c}}^{i T_{c}} Δ_{ϵ} (t) d t .

M_{τ} = ⌊ \frac{τ}{T _{c}} ⌋ and Δ_{ϵ, i} = \frac{1}{T _{c}} \int_{(i - 1) T_{c}}^{i T_{c}} Δ_{ϵ} (t) d t .

τ \to \infty lim \frac{M _{τ}}{\frac{τ}{T _{c}}} = τ \to \infty lim \frac{M _{τ} + 1}{\frac{τ}{T _{c}}} = 1,

τ \to \infty lim \frac{M _{τ}}{\frac{τ}{T _{c}}} = τ \to \infty lim \frac{M _{τ} + 1}{\frac{τ}{T _{c}}} = 1,

τ \to \infty lim \frac{1}{M _{τ}} i = 1 \sum M_{τ} Δ_{ϵ, i} \leq Δ_{ϵ} \leq τ \to \infty lim \frac{1}{M _{τ} + 1} i = 1 \sum M_{τ} + 1 Δ_{ϵ, i},

τ \to \infty lim \frac{1}{M _{τ}} i = 1 \sum M_{τ} Δ_{ϵ, i} \leq Δ_{ϵ} \leq τ \to \infty lim \frac{1}{M _{τ} + 1} i = 1 \sum M_{τ} + 1 Δ_{ϵ, i},

Δ_{ϵ} = N \to \infty lim \frac{1}{N} i = 1 \sum N Δ_{ϵ, i} .

Δ_{ϵ} = N \to \infty lim \frac{1}{N} i = 1 \sum N Δ_{ϵ, i} .

u (t) = \frac{1}{λ} ⌊ \frac{λ}{μ} (i - 1 - K_{i}) - λ t_{0}^{s} ⌋ + t_{0}^{s},

u (t) = \frac{1}{λ} ⌊ \frac{λ}{μ} (i - 1 - K_{i}) - λ t_{0}^{s} ⌋ + t_{0}^{s},

Δ_{ϵ} (t) = t - u (t) = t - \frac{1}{λ} ⌊ \frac{λ}{μ} (i - 1 - K_{i}) - λ t_{0}^{s} ⌋ - t_{0}^{s} .

Δ_{ϵ} (t) = t - u (t) = t - \frac{1}{λ} ⌊ \frac{λ}{μ} (i - 1 - K_{i}) - λ t_{0}^{s} ⌋ - t_{0}^{s} .

Δ_{ϵ, i}

Δ_{ϵ, i}

= \frac{1}{T _{c}} \int_{(i - 1) T_{c}}^{i T_{c}} (t - \frac{1}{λ} ⌊ \frac{λ}{μ} (i - 1 - K_{i}) - λ t_{0}^{s} ⌋ - t_{0}^{s}) d t

= \frac{1}{T _{c}} (\frac{i ^{2} T _{c}^{2}}{2} - \frac{( i - 1 ) ^{2} T _{c}^{2}}{2} - \frac{T _{c}}{λ} ⌊ \frac{λ}{μ} (i - 1 - K_{i}) - λ t_{0}^{s} ⌋ - t_{0}^{s} T_{c})

= i T_{c} - \frac{T _{c}}{2} - \frac{1}{λ} ⌊ \frac{λ}{μ} (i - 1 - K_{i}) - λ t_{0}^{s} ⌋ - t_{0}^{s}

= \frac{i}{μ} - \frac{1}{2 μ} - \frac{1}{λ} ⌊ \frac{λ}{μ} (i - 1 - K_{i}) - λ t_{0}^{s} ⌋ - t_{0}^{s}

= \frac{i}{μ} - \frac{1}{2 μ} - \frac{1}{λ} (\frac{λ}{μ} (i - 1 - K_{i}) - λ t_{0}^{s}) + \frac{1}{λ} (\frac{λ}{μ} (i - 1 - K_{i}) - λ t_{0}^{s} - ⌊ \frac{λ}{μ} (i - 1 - K_{i}) - λ t_{0}^{s} ⌋) - t_{0}^{s}

= \frac{i}{μ} - \frac{1}{2 μ} - \frac{i}{μ} + \frac{1}{μ} + \frac{K _{i}}{μ} + t_{0}^{s} + \frac{1}{λ} [\frac{λ}{μ} (i - 1 - K_{i}) - λ t_{0}^{s}] - t_{0}^{s}

= \frac{1}{2 μ} + \frac{K _{i}}{μ} + \frac{1}{λ} [\frac{λ}{μ} (i - 1 - K_{i}) - λ t_{0}^{s}],

K_{i} = {K_{i - 1} + 1 0 with probability ϵ with probability 1 - ϵ .

K_{i} = {K_{i - 1} + 1 0 with probability ϵ with probability 1 - ϵ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAge of Information Optimization · IoT Networks and Protocols · Advanced biosensing and bioanalysis techniques

Full text

Optimal Age over Erasure Channels

Elie Najm, Emre Telatar, and Rajai Nasser This paper was presented in part at the IEEE International Symposium on Information Theory, Paris, July 2019.

Abstract

Previous works on age of information and erasure channels have dealt with specific models and computed the average age or average peak age for certain settings. In this paper, given a source that produces a letter every $T_{s}$ seconds and an erasure channel that can be used every $T_{c}$ seconds, we ask what is the coding strategy that minimizes the time-average ?age of information? that an observer of the channel output incurs. We first analyze the case where the source alphabet and the channel-input alphabet have the same size. We show that a trivial coding strategy is optimal and a closed form expression for the age can be derived. We then analyze the case where the alphabets have different sizes. We use a random coding argument to bound the average age and show that the average age achieved using random codes converges to the optimal average age of linear block codes as the source alphabet becomes large.

I Introduction

The concept of age as a performance metric in communication systems was first used in 2011 by Kaul et. al in [1, 2], in order to assess the performance of a given vehicular network. Vehicular networks are part of the growing group of real-time status-monitoring systems that are used also in healthcare, finance, transportation, smart homes, warehouse and natural environment surveillance, to name but a few. In such systems, a remote monitor is interested in the status of one or multiple processes. A sender takes samples of the observed processes and sends them to the monitor. However, the aim of the communication system in this case is not to transmit as fast as possible but to keep the information that the destination has about the observed processes as fresh as possible. Indeed, if, at any time $t$ , the last received update at the monitor was generated at time $u(t)$ , then the information at the receiver reflects the status of the observed process at time $u(t)$ , not at time $t$ . Hence, the monitor has a distorted version of reality. In fact, it has an obsolete version with an age of $\Delta(t)=t-u(t)$ .

Kaul et al. in [3] use a graphical method to compute and minimize an age-related metric: the average age. This metric is defined as

[TABLE]

A growing body of works has used this metric to evaluate the performance of multiple communication systems represented using queuing models [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]; some of them being subject to resource allocation constraints (such as energy [14, 15, 16, 17, 18]). For an excellent recent survey about age of information, see [19]. The works previously cited have mostly focused on computing the average age (AoI) given a certain status updating policy while assuming no errors. At a more physical level, the effect of noise and channel coding on the average age was also investigated, especially when the erasure channel is used [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]. Chen et al. in [28] assume a random service time but the transmitted packet has a certain probability of being lost at the end of the transmission. Parag et al. in [29] consider the binary erasure channel (BEC) and compute the average age for two transmission schemes: single transmission and hybrid automatic repeat request (HARQ). While the authors in [29] assume a just-in-time generation process, Najm et al. in [30] consider a Poisson process generation and two HARQ protocols to combat erasures: infinite incremental redundancy (IIR) and fixed redundancy (FR). Yates et al. in [31] consider the previous two schemes, IIR and FR, but assume a just-in-time generation policy. While both papers, [30] and [31], agree on the definition of IIR, they use the term FR to describe two different schemes. In [30], the update is divided into $k_{p}$ packets encoded ratelessly and each packet is encoded using an $(n_{s},k_{s})$ -maximum distance separable (MDS) code. In [31], FR means that each $k$ -bit update is encoded into an $n$ -bit codeword, and the update is successfully received if and only if at least $k$ bits are not erased. We are interested here in transmission schemes similar to the FR that is considered in [31].

Most of the works that addressed the presence of noise and erasure have assumed some form of feedback from the receiver to the transmitter. In this paper, we take an information-theoretic approach to the age problem and provide a characterization of the optimal achievable age when the channel used is the erasure channel and no feedback is assumed. This means that we consider the following question: Given a $q$ -ary erasure channel without feedback and with input alphabet $\mathcal{V}$ and a source with alphabet $\mathcal{U}$ , what is the lowest average age that can be achieved in this system?

Since the channel can introduce errors and since no feedback is available, we will be forced (at least in some cases) to use some form of coding. However, unlike classical communication systems where the primary role of coding is to guarantee reliable communication of all packets, in our system, we do not care too much if some packets are not delivered. The primary goal of our coding is that the monitor reliably receives enough timely packets so that it remains up-to-date as much as possible.

In order to study the age of information over erasure channels, we distinguish two cases:

•

Case 1: The source alphabet and the channel-input alphabet are of the same size.

•

Case 2: The source alphabet and the channel-input alphabet have different sizes.

For the first case we derive an exact closed-form expression for the average age and show that the optimal average age is achieved without any encoding done on the source symbols. Whereas for the second case, encoding is mandatory and we use random coding to give an upper and lower bounds on the achievable average age of the system, as well as an approximation of the lower bound inspired by [31, 32].

The rest of this paper is organized as follows: In Section II, we present the system model and some definitions which are common to all later sections. In Section III, we derive the optimal average age for Case 1 and in Section IV we study the optimal average age for Case 2.

II Preliminaries

We start by defining the communication system that we study. Fig. 1 illustrates such a system.

•

The channel: We consider a discrete memoryless $q$ -ary erasure channel with erasure probability $\epsilon$ . We refer to such channel by $q$ EC( $\epsilon$ ). The channel-input alphabet is given by $\mathcal{V}=\{0,1,\ldots,q-1\}$ , and the channel-output alphabet by $\mathcal{V}\cup\{?\}=\{0,1,\ldots,q-1,?\}$ . We also assume that there is no feedback from the receiver. This means that the output of the encoder depends only on the source symbols and the sender does not know whether a sent symbol was successfully received or not. In addition to that, we assume that transmitted channel-symbols are received instantaneously111If the transmitted channel-symbols are not received instantaneously but are received after a delay that is constant, then this constant delay can be added to all the age expressions that are derived in this paper.. Furthermore, there exists a period $T_{c}$ between two consecutive channel uses. More precisely, the $i^{th}$ channel-use takes place at time $t_{i}^{c}=iT_{c}$ . Note that we assumed without loss of generality that $t_{0}^{c}=0$ . We define the channel-use rate $\mu=\frac{1}{T_{c}}$ to be the allowed number of channel uses per second.

•

The source: We assume a single discrete memoryless source generating messages that belong to the set $\mathcal{U}=\{1,2,\ldots,L\}$ . So each symbol in this set is a message and we will use interchangeably the terms source symbol and message in this paper. We define $k=\left\lceil\log_{q}(L)\right\rceil=\left\lceil\frac{\ln(L)}{\ln(q)}\right\rceil$ where $\log_{q}$ is the base- $q$ logarithm. Hence, in order to represent one source symbol we need $k$ channel-input symbols. This means that there exists an injective function $h(.)$ that maps every message $m\in\mathcal{U}$ to a length- $k$ sequence $u^{k}=(u_{1},\ldots,u_{k})\in\mathcal{V}^{k}$ , with $u_{j}\in\mathcal{V}$ for $1\leq j\leq k$ . Thus, $h(\mathcal{U})\subseteq\mathcal{V}^{k}$ . Similar to the channel-use case, the source symbol generation is assumed to be periodic with period $T_{s}$ . More precisely, the $m^{th}$ source-symbol is generated at time $t_{m}^{s}=mT_{s}+t_{0}^{s}$ . Note that since the source and channel clocks might not be synchronized, we need to take into account the possibility that the starting time of the source $t_{0}^{s}$ is nonzero. We define the message generation rate $\lambda=\frac{1}{T_{s}}$ as the fixed number of source symbols generated per second.

Notice that if the source alphabet and the channel-input alphabet have the same size, then $h(\mathcal{U})=\mathcal{V}$ and $k=1$ . In this case, we can assume without loss of generality that $\mathcal{U}=\mathcal{V}$ . This is the system that we study in Section III.

In the case where the source alphabet and channel-input alphabet have different sizes, we focus on strategies induced by linear codes, so we will assume that $q$ is a power of a prime number, $\mathcal{V}=\mathbb{F}_{q}$ , and $\mathcal{U}=\mathcal{V}^{k}=\mathbb{F}_{q}^{k}$ . This is the system that we study in Section IV.

•

The encoder and decoder: At the $i^{th}$ channel use, the encoder uses all the generated source symbols and encodes them into a single channel-input letter, i.e., the encoder is a function

[TABLE]

The decoder, at the ${i^{th}}$ channel use, uses all the received channel-output symbols to compute an estimate of a transmitted message, along with its index. Thus, the decoder is a function

[TABLE]

We assume that the decoder never makes mistakes. In other words, if the generated source symbols are $U_{1},\ldots,U_{\big{\lfloor}\frac{iT_{c}-t_{0}^{s}}{T_{s}}\bigcap\rfloor}$ , the channel-output symbols are $Z_{1},\ldots,Z_{i}$ , and $g_{i}(Z_{1},\ldots,Z_{i})=(\hat{U}_{m},m)$ , then we have $\hat{U}_{m}=U_{m}$ with probability 1. In this case, the age of information at time $t\in\big{[}iT_{c},(i+1)T_{c}\big{)}$ is equal to

[TABLE]

It is easy to see that for a given sequence of encoders $(f_{i})_{i\geq 1}$ , the optimal decoder (from age perspective) is the one defined as $g_{i}(Z_{1},\ldots,Z_{i})=(\hat{U}_{m_{i}},m_{i})$ , where $m_{i}$ is the maximum index in $\left\{1,\ldots,\left\lfloor\frac{iT_{c}-t_{0}^{s}}{T_{s}}\right\rfloor\right\}$ such that $U_{m_{i}}$ can be deterministically decoded from $(Z_{1},\ldots,Z_{i})$ , and $\hat{U}_{m_{i}}=U_{m_{i}}$ . If no such $m_{i}\in\left\{1,\ldots,\left\lfloor\frac{iT_{c}-t_{0}^{s}}{T_{s}}\right\rfloor\right\}$ exists, we have $g_{i}(Z_{1},\ldots,Z_{i})=\textsc{erasure}$ and we adopt the convention that $m_{i}=0$ for such cases. By noticing that for every $t\geq 0$ we have $t\in\big{[}iT_{c},(i+1)T_{c}\big{)}$ where $i=\lfloor\frac{t}{T_{c}}\rfloor$ , we can see from (4) that the instantaneous age of information of the coding scheme $\mathcal{C}=(f_{i},g_{i})_{i\geq 1}$ is given by

[TABLE]

It is worth noting that the function $t\mapsto m_{\lfloor\frac{t}{T_{c}}\rfloor}$ is nondecreasing and piecewise constant. Furthermore, the discontinuities in the function $t\mapsto m_{\lfloor\frac{t}{T_{c}}\rfloor}$ correspond to instants at which the receiver successfully decodes new packets. From this and from (5), we can see that the instantaneous age $\Delta_{\mathcal{C}}(t)$ is a piecewise linearly-increasing function that has a sawtooth shape.

In Fig. 2, we show an example illustrating how the the instantaneous age varies with time. In this figure, $\tilde{t}_{\ell}$ represents the instant at which the $\ell^{th}$ successfully received message was decoded at the receiver, and $t_{\ell}$ represents the generation time of this message at the source. More precisely, $\tilde{t}_{\ell}=i_{\ell}T_{c}$ where $i_{\ell}$ is the index at which $(m_{i})_{i\geq 1}$ changes its value for the $\ell^{th}$ time, and $t_{\ell}=m_{i_{\ell}}T_{s}+t_{0}^{s}$ .

In the previous section, we indicated that we are interested in bounding the optimal achievable average age. Here, we formally define the concepts of achievable age and optimal achievable age.

Definition 1.

We call $\mathcal{C}=(f_{i},g_{i})_{i\geq 1}$ to be a coding scheme where $(f_{i})_{i\geq 1}$ is the sequence of encoders and $(g_{i})_{i\geq 1}$ is the sequence of decoders. The average age corresponding to such scheme is denoted by

[TABLE]

*where $\Delta_{\mathcal{C}}(t)$ is the instantaneous age that is obtained by using the coding scheme $\mathcal{C}$ , and which is given by (4). If the decoders $(g_{i})_{i\geq 1}$ are optimal for the encoders $(f_{i})_{i\geq 1}$ then $\Delta_{\mathcal{C}}(t)$ is given by (5).

Such a definition can be generalized to channels other than erasure channels. However, for the special case of the erasure channel with erasure probability $\epsilon$ , the average age relative to the coding scheme $\mathcal{C}$ will be denoted by*

[TABLE]

where $\Delta_{\epsilon,\mathcal{C}}(t)$ is the instantaneous age that is obtained by using the coding scheme $\mathcal{C}$ when the channel is $q$ EC( $\epsilon$ ).

Definition 2.

*We say that an age $D$ is achievable for $q$ EC( $\epsilon$ ), if for every $\delta>0$ there exists a coding scheme $\mathcal{C}=(f_{i},g_{i})_{i\geq 1}$ such that *

[TABLE]

and the probability of error on the decoded messages is zero.

Definition 3.

Given a channel $q$ EC( $\epsilon$ ), we define the optimal average age $\Delta_{\epsilon}$ to be the minimum achievable average age. Formally,

[TABLE]

*where $\Gamma$ is the set of all possible coding schemes.

The set $\mathcal{R}=\left\{(\epsilon,D);D\geq\Delta_{\epsilon}\text{ and }\epsilon\in[0,1]\right\}$ forms the set of achievable average ages over all erasure channels.*

III Optimal Age with the Same Source & Channel Alphabets

In this first case, we take $k=1$ which means that the source and channel-input alphabets are the same. We first show that to achieve the optimal age, no encoding is required and we provide the optimal transmission policy. We then compute the optimal average age.

III-A The Optimal Transmission Policy

Theorem 1.

For a channel $q$ EC( $\epsilon$ ), if the source alphabet and the channel-input alphabet are the same, then the optimal transmission policy from an age perspective is to keep transmitting the last-generated source-symbol until a new one is generated, at which point we start transmitting the newly generated source-symbol and discard all previous messages. This is an LCFS with no buffer policy.

Proof.

Let us assume that an oracle provides us with the erasure pattern. It is clear that at each non-erased channel use we should send the latest update so that the drop in the instantaneous age is the most important. Indeed, if there is a non-erased channel use at time $t^{\prime}$ and the latest update is generated at $t_{last}$ then the instantaneous age, $\Delta_{opt}(t)$ , that corresponds to the LCFS with no buffer policy drops to $\Delta_{opt}(t)=t^{\prime}-t_{last}$ . We cannot do better than this because there is no source-symbol that is generated after $t_{last}$ . This argument shows that the optimal transmission policy would send the latest generated source symbol at every non-erased channel use while it can transmit anything at the erased channel uses. However, since in practice the transmitter does not have access to the erasure pattern beforehand, the policy that consists of keeping on transmitting the last generated update until a new one is created satisfies the optimality criterion that is to send the latest generated message at each non-erased channel use.

For the case where $T_{c}\leq T_{s}$ or $\mu\geq\lambda$ , the LCFS with no buffer policy leads to the transmission of all source symbols at least once. Whereas for the case of $T_{c}>T_{s}$ or $\mu<\lambda$ , some messages will be dropped and will never be sent. ∎

III-B The Optimal Average Age

Theorem 2.

Given a source with message-generation rate $\lambda$ and starting time $t_{0}^{s}$ , an erasure channel $q$ EC( $\epsilon$ ) with channel-use rate $\mu$ , and utilization $\rho=\frac{\lambda}{\mu}$ , the optimal average age achieved over $q$ EC( $\epsilon$ ) is:

•

For irrational utilization $\rho\in\mathbb{R}\setminus\mathbb{Q}$ ,

[TABLE]

•

For rational utilization $\rho\in\mathbb{Q}$ ,

[TABLE]

where $[x]=x-\lfloor x\rfloor$ is the fractional part of $x$ , and $d$ is the denominator of $\rho=\frac{\lambda}{\mu}$ when it is written as a rational fraction of integers in irreducible form, i.e., $\rho=\frac{c}{d}$ with $c,d\in\mathbb{N}$ and $\gcd(c,d)=1$ .

Before giving the proof of 2, we need the following lemmas.

Lemma 1.

Let $(X_{l})_{l\geq 1}$ be a sequence of independent and identically distributed nondeterministic222A random variable $X\in\mathbb{N}^{*}$ is nondeterministic if there are at least two different integers with nonzero probability. It is worth noting that Lemma 1 remains true if $(X_{l})_{l\geq 1}$ are deterministic, but in this paper we are only interested in the case where $(X_{l})_{l\geq 1}$ are nondeterministic. random variables which take values in the set of strictly positive natural numbers $\mathbb{N}^{\ast}$ and which satisfy $\mathbb{E}(X_{l}^{2})<\infty$ . Let $S_{0}=0$ and $\displaystyle S_{l}=\sum_{r=1}^{l}X_{r}$ for $l\geq 1$ . For every $i\geq 0$ , let

[TABLE]

Let $\rho\in\mathbb{R}\setminus\mathbb{Q}$ be an irrational number, and let $\alpha\in\mathbb{R}$ be an arbitrary real number. Then, almost surely, we have

[TABLE]

where $[x]=x-\lfloor x\rfloor$ is the fractional part of $x$ .

Proof.

This lemma is a consequence of Weyl’s equidistribution theorem [33] (see Section -A). A full proof of this lemma can be found in Section -C. ∎

Lemma 2.

Let $c,d\in\mathbb{Z}$ be such that $d>0$ and $\gcd(c,d)=1$ . Then, for every $\alpha\in\mathbb{R}$ , we have

[TABLE]

where $[x]=x-\lfloor x\rfloor$ is the fractional part of $x$ .

Proof.

Since $\gcd(c,d)=1$ , then for every $r\in\mathbb{Z}$ , the mapping $a\mapsto(ca+r\bmod d)$ is a bijection from $\{0,\ldots,d-1\}$ to itself. Therefore,

[TABLE]

where $(\ast)$ follows from the fact that the mapping $a\mapsto(ca+\lfloor d\alpha\rfloor\bmod d)$ is a bijection from $\{0,\ldots,d-1\}$ to itself, and $(\dagger)$ follows from the fact that $b+[d\alpha]<d$ for every $b\leq d-1$ . ∎

Proof of 2.

We know that $\mu=\frac{1}{T_{c}}$ . In (1) we saw that the average age is given by

[TABLE]

We can rewrite the average age as

[TABLE]

Therefore,

[TABLE]

Let

[TABLE]

By noticing that

[TABLE]

we can deduce from (18) that

[TABLE]

which implies that

[TABLE]

At any instant $t\in\big{[}(i-1)T_{c},iT_{c}\big{)}$ , the last channel use took place at time $t_{i-1}^{c}=(i-1)T_{c}$ . Assume that the last successful channel use before time $iT_{c}$ was the $(i-1-K_{i})^{th}$ channel use, i.e., it took place at time $(i-1-K_{i})T_{c}$ . The source symbol that was transmitted at this time was generated at time $\left\lfloor\frac{(i-1-K_{i})T_{c}-t_{0}^{s}}{T_{s}}\right\rfloor T_{s}+t_{0}^{s}$ . Therefore, at any instant $t\in[(i-1)T_{c},iT_{c})$ , the timestamp of the last successfully received source symbol is

[TABLE]

which means that the age of information at time $t$ is equal to

[TABLE]

Hence,

[TABLE]

where $[x]=x-\lfloor x\rfloor$ is the fractional part of $x$ .

Setting $K_{1}=0$ , then for $i\geq 2$ we can write $K_{i}$ as

[TABLE]

So $(K_{i})_{i\geq 1}$ forms a Markov process represented by the Markov chain in Fig. 3.

This Markov process is ergodic and has a stationary distribution which is identical to a geometric random variable $K$ . This means that

[TABLE]

and almost surely, we have

[TABLE]

Replacing (III-B) in (22), we get

[TABLE]

where the third and fourth equalities follow from (28).

At this point, we need to distinguish between two cases:

•

$\rho=\frac{\lambda}{\mu}$ is irrational. In this case, we need to rewrite $i-1-K_{i}$ . Let $S_{0}=0$ , and for $r\geq 1$ let $S_{r}$ be the index of the $r^{th}$ channel-use which was not erased. Define $X_{l}=S_{l}-S_{l-1}$ . Clearly, $(X_{l})_{l\geq 1}$ are independent and identically distributed as geometric random variables, i.e., $\mathbb{P}(X_{l}=x)=\epsilon^{x-1}(1-\epsilon)$ for $x\in\mathbb{N}^{*}$ . It is easy to see that $i-1-K_{i}=Y_{i}$ , where

[TABLE]

Then, by Lemma 1,

[TABLE]

Using this result in (III-B) we get (10).

•

$\rho=\frac{\lambda}{\mu}=\frac{c}{d}$ is rational with $c,d\in\mathbb{N}$ , $d>0$ and $\gcd(c,d)=1$ . Since

[TABLE]

it follows that

[TABLE]

where $(\ast)$ follows from the fact that $(K_{i})_{i\geq 1}$ is ergodic. Continuing,

[TABLE]

where $(\dagger)$ follows from Lemma 2. Using this result in (III-B) we get (11).

∎

Remark 1.

One interesting application of 2 is the computation of the average age of information for the D/M/1 system with preemption: If we have a D/M/1 queue with deterministic interarrival time of rate $\lambda$ and exponential service time of rate $\mu$ , then we can model the random exponentially distributed service time as being the result of having an erasure channel that can be used every $T_{c}=dt\ll 1$ and where the erasure probability is $1-\mu dt$ . In this case, we get from (10) that the average age of information of a D/M/1 system with preemption is equal to

[TABLE]

which is consistent with the formula that was derived in [13] for D/M/1 systems with preemption.

IV Optimal Age with Different Source & Channel alphabets

In this setup, we consider the model described in Section II: The channel is a $q$ -ary erasure channel $q$ EC( $\epsilon$ ) without feedback and with input alphabet $\mathcal{V}=\{0,1,\ldots,q-1\}$ . The source alphabet is $\mathcal{U}=\mathcal{V}^{k}$ where $k>1$ . We only consider the special case where $\lambda=\mu$ , so that at every channel use, a new source symbol is generated. By combining the techniques of Section III and this section, one might be able to obtain reasonably good lower and upper bounds on the optimal age for the more general case where $\lambda$ can be different from $\mu$ , but we expect the calculation to be more complicated.

Since we only consider the case where $\lambda=\mu$ , we can assume without loss of generality that $\lambda=\mu=1$ . The $i^{th}$ channel use takes place at time $t^{c}_{i}=iT_{c}=i$ , and the $m^{th}$ source-symbol is generated at time $t^{s}_{m}=mT_{s}+t_{0}^{s}=m+t_{0}^{s}$ . The difference between the source alphabet and the channel-input alphabet as well as the presence of erasures impose the use of channel coding on the generated source symbols before their transmission. We will focus on coding schemes that are induced by linear block codes, so we will assume that $\mathcal{V}=\mathbb{F}_{q}$ , where $q$ is a power of a prime number. We fix a blocklength $n\geq k$ , and each transmitted message will be encoded into a block of $n$ channel-input symbols in $\mathcal{V}=\mathbb{F}_{q}$ , and then transmitted through $n$ consecutive channel uses. More precisely, the $l^{th}$ transmitted message is encoded using an $(n,k)$ linear block encoder $F_{l}:\mathbb{F}_{q}^{k}\to\mathbb{F}_{q}^{n}$ to produce $n$ channel-input symbols in $\mathcal{V}=\mathbb{F}_{q}$ , which will be transmitted through the $((l-1)n+1)^{th}$ , the $((l-1)n+2)^{th}$ , …, and the $(ln)^{th}$ channel uses. In order to transmit messages that are as fresh as possible, the $l^{th}$ transmitted message is the last message that was generated before time $t^{c}_{(l-1)n+1}=(l-1)n+1$ , i.e., the $l^{th}$ transmitted message is the $m(l)^{th}$ generated source symbol, where $m(l)=\lfloor(l-1)n+1-t_{0}^{s}\rfloor$ . All the source-symbols that are generated between $t_{m(l)+1}^{s}$ and $t_{m(l+1)}^{s}$ are discarded. Fig. 4 illustrates this concept. We emphasize the fact that the $(n,k)$ -linear codes $(F_{l})_{l\geq 1}$ used to encode different messages can be different. We denote a coding scheme that is induced by a given sequence of $(n,k)$ -linear codes $(F_{l})_{l\geq 1}$ as $\mathcal{C}(n,k)$ .

IV-A The Optimal Transmission Policy

Definition 4.

An $(n,k)$ -linear code is called maximum distance separable (MDS) if it achieves the Singleton bound:

[TABLE]

with $d$ denoting the minimum distance333See [34] for more details. between the codewords of the code.

Proposition 1.

If the encoder $F$ generates an MDS $(n,k)$ -linear code, then

•

any $k$ columns of the generator matrix $\mathbf{G}$ of the encoder $F$ are linearly independent,

•

any subset of size $k$ taken from a length- $n$ * codeword is sufficient to recover, with probability 1, the transmitted message.*

This means that if the channel is $q$ EC( $\epsilon$ ), the decoder needs to observe only $k$ unerased channel-input symbols in order to perfectly decode the transmitted source symbol.

Proposition 1 is well known. We refer the reader to [34] for more details. The following theorem presents the optimal channel codes from an age point of view when the channel does not have any feedback.

Theorem 3.

Consider a $q$ EC( $\epsilon$ ) channel without feedback and let $n,k$ be such that MDS $(n,k)$ -linear codes exist. Among all $(n,k)$ -linear codes, MDS codes are age optimal. This means that, to achieve age optimality, all codes used in the scheme $\mathcal{C}(n,k)$ should be MDS.

Proof.

Fix two positive integers $k$ and $n$ , and let $\mathcal{C}$ and $\mathcal{C}^{\text{MDS}}$ be two $\mathcal{C}(n,k)$ coding schemes such that $\mathcal{C}$ is induced by arbitrary $(n,k)$ -linear encoders $(F_{l})_{l\geq 1}$ and $\mathcal{C}^{\text{MDS}}$ is induced by $(n,k)$ -linear encoders $(F_{l}^{\text{MDS}})_{l\geq 1}$ which are MDS.

Now consider a source with alphabet $\mathcal{V}^{k}=\mathbb{F}_{q}^{k}$ generating messages and sending them through two parallel $q$ EC( $\epsilon$ ) channels but with the same erasure pattern $\mathcal{E}$ . For the first channel we use the coding scheme $\mathcal{C}$ , while for the second channel we use the coding scheme $\mathcal{C}^{\text{MDS}}$ . We will show that for every $t\geq 1$ , we have $\Delta_{\epsilon,\mathcal{C}^{\text{MDS}}}(t)\leq\Delta_{\epsilon,\mathcal{C}}(t)$ . We assume that the initial ages before transmission are equal, i.e., $\Delta_{\epsilon,\mathcal{C}^{\text{MDS}}}(t)=\Delta_{\epsilon,\mathcal{C}}(t)$ for $t<1$ .

For every integer $i\geq 1$ and every $t\in(i,i+1)$ , we have $\Delta_{\epsilon,\mathcal{C}}(t)=\Delta_{\epsilon,\mathcal{C}}(i)+t-i$ and $\Delta_{\epsilon,\mathcal{C}^{\text{MDS}}}(t)=\Delta_{\epsilon,\mathcal{C}^{\text{MDS}}}(i)+t-i$ . This is because the receiver does not receive any information during the interval $(i,i+1)$ . Therefore, it is sufficient to show that $\Delta_{\epsilon,\mathcal{C}^{\text{MDS}}}(i)\leq\Delta_{\epsilon,\mathcal{C}}(i)$ for every integer $i\geq 1$ .

Let $l\geq 1$ and assume that $\Delta_{\epsilon,\mathcal{C}^{\text{MDS}}}(i)\leq\Delta_{\epsilon,\mathcal{C}}(i)$ for every $i<(l-1)n+1$ . As described in the first paragraph of this section, for every $l\geq 1$ , the $n$ channel uses between time $t_{(l-1)n+1}^{c}=(l-1)n+1$ and time $t_{(l-1)n}^{c}=ln$ are used to transmit the message $U_{m(l)}$ that was generated at time $t_{m(l)}^{s}=m(l)+t_{0}^{s}$ , where $m(l)=\lfloor(l-1)n+1-t_{0}^{s}\rfloor$ . Let $Z_{(l-1)n+1},Z_{(l-1)n+2},\ldots,Z_{ln}$ be the respective outputs of the $((l-1)n+1)^{th}$ , the $((l-1)n+2)^{th}$ , …, and the $(ln)^{th}$ channel uses when the code $\mathcal{C}$ is used. Similarly, let $Z_{(l-1)n+1}^{\text{MDS}},Z_{(l-1)n+2}^{\text{MDS}},\ldots,Z_{ln}^{\text{MDS}}$ be the respective outputs of the $((l-1)n+1)^{th}$ , the $((l-1)n+2)^{th}$ , …, and the $(ln)^{th}$ channel uses when the code $\mathcal{C}^{\text{MDS}}$ is used. For every $i\in\{(l-1)n+1,\ldots,ln\}$ , we have

[TABLE]

where $i_{\text{dec}}(l)$ is the minimum $i\in\{(l-1)n+1,\ldots,ln\}$ such that $U_{m(l)}$ can be uniquely decoded from $(Z_{(l-1)n+1},\ldots,Z_{i})$ . If $U_{m(l)}$ cannot be uniquely decoded from $(Z_{(l-1)n+1},\ldots,Z_{ln})$ , we define $i_{\text{dec}}(l)=\infty$ . Similarly, for every $i\in\{(l-1)n+1,\ldots,ln\}$ , we have

[TABLE]

where $i^{\text{MDS}}_{\text{dec}}(l)$ is the minimum $i\in\{(l-1)n+1,\ldots,ln\}$ such that $U_{m(l)}$ can be uniquely decoded from $(Z^{\text{MDS}}_{(l-1)n+1},\ldots,Z^{\text{MDS}}_{i})$ . If $U_{m(l)}$ cannot be uniquely decoded from $(Z_{(l-1)n+1}^{\text{MDS}},\ldots,Z_{ln}^{\text{MDS}})$ , we define $i_{\text{dec}}^{\text{MDS}}(l)=\infty$ .

Now observe that if $U_{m(l)}$ can be uniquely decoded from $(Z_{(l-1)n+1},\ldots,Z_{i})$ , then $Z_{(l-1)n+1},\ldots,Z_{i}$ contain at least $k$ non-erased symbols. Therefore, $(Z_{(l-1)n+1}^{\text{MDS}},\ldots,Z_{i}^{\text{MDS}})$ contain at least $k$ non-erased symbols and $U_{m(l)}$ can be uniquely decoded from $(Z^{\text{MDS}}_{(l-1)n+1},\ldots,Z^{\text{MDS}}_{i})$ because $\mathcal{C}^{\text{MDS}}$ uses MDS codes. Therefore, $i^{\text{MDS}}_{\text{dec}}(l)\leq i_{\text{dec}}(l)$ . We have:

•

If $(l-1)n+1\leq i<\min\{ln,i^{\text{MDS}}_{\text{dec}}(l)\}$ , we have $\Delta_{\epsilon,\mathcal{C}}^{\text{MDS}}(i)=\Delta_{\epsilon,\mathcal{C}}^{\text{MDS}}((l-1)n)+i-(l-1)n$ and $\Delta_{\epsilon,\mathcal{C}}(i)=\Delta_{\epsilon,\mathcal{C}}((l-1)n)+i-(l-1)n$ . From the induction hypothesis we know that $\Delta_{\epsilon,\mathcal{C}}^{\text{MDS}}((l-1)n)\leq\Delta_{\epsilon,\mathcal{C}}((l-1)n)$ . Therefore, $\Delta_{\epsilon,\mathcal{C}}^{\text{MDS}}(i)\leq\Delta_{\epsilon,\mathcal{C}}(i)$ for every $(l-1)n+1\leq i<i^{\text{MDS}}_{\text{dec}}(l)$ .

•

If $i^{\text{MDS}}_{\text{dec}}(l)\leq i<\min\{ln,i_{\text{dec}}(l)\}$ , we have $\Delta_{\epsilon,\mathcal{C}}^{\text{MDS}}(i)=i-t_{m(l)}^{s}$ and $\Delta_{\epsilon,\mathcal{C}}(i)=\Delta_{\epsilon,\mathcal{C}}((l-1)n)+i-(l-1)n$ . Since the last decoded message by $\mathcal{C}$ before $t_{(l-1)n+1}^{c}$ has a timestamp that is earlier than $t_{m(l)}^{s}$ , we have $(l-1)n-\Delta_{\epsilon,\mathcal{C}}((l-1)n)<t_{m(l)}^{s}$ . Therefore, $\Delta_{\epsilon,\mathcal{C}}^{\text{MDS}}(i)<\Delta_{\epsilon,\mathcal{C}}(i)$ for every $i^{\text{MDS}}_{\text{dec}}(l)\leq i<\min\{ln,i_{\text{dec}}(l)\}$ .

•

For every $i_{\text{dec}}(l)\leq i<ln$ , we have $\Delta_{\epsilon,\mathcal{C}}^{\text{MDS}}(i)=\Delta_{\epsilon,\mathcal{C}}(i)$ .

This implies that $\Delta_{\epsilon,\mathcal{C}}^{\text{MDS}}(i)\leq\Delta_{\epsilon,\mathcal{C}}(i)$ for every $(l-1)n+1\leq i\leq ln$ . It follows by induction that $\Delta_{\epsilon,\mathcal{C}}^{\text{MDS}}(i)\leq\Delta_{\epsilon,\mathcal{C}}(i)$ for every integer $i\geq 1$ .

∎

3 shows that for a given couple $(n,k)$ , the optimal coding scheme is the one that uses only MDS codes. However, an explicit construction of such codes is not available for all values of $(n,k)$ . In the rest of this paper, we use random codes to give an upper bound on the optimal average age. The use of random coding to construct fountain-like codes was used by Shamai et al. in [35]. The authors of [35] showed that without any randomness we cannot properly define the notion of fountain capacity because there is always a case where the deterministic fountain codes cannot achieve any positive rate with an error probability tending to [math]. Nevertheless, we use the rateless (or fountain) codes, previously adopted in [30], to give a lower bound on the optimal achievable average-age $\Delta_{\epsilon}$ . As shown in [35], these codes cannot be implemented in practice, this is why we do not consider them as part of the possible coding schemes.

IV-B The Random Code

Consider a $\mathcal{C}(n,k)$ coding scheme. The encoder-decoder pair $(F_{l},G_{l})$ , corresponding to the $l^{th}$ message to be transmitted, is constructed as follows: Since we are interested in linear codes, we use the generator matrix in order to create our code. For that, we choose the $n$ columns of the generator matrix $\textbf{G}_{l}$ independently and uniformly at random from the set $\mathcal{V}^{k}\setminus\{0^{k}\}=\mathbb{F}_{q}^{k}\setminus\{0^{k}\}$ , where $0^{k}$ is the sequence of $k$ zeros. We denote by $(\textbf{g}_{1}^{(l)},\textbf{g}_{2}^{(l)},\ldots,\textbf{g}_{n}^{(l)})$ the $n$ columns444In this paper, we assume all vectors to be column vectors. of $\textbf{G}_{l}$ . Thus

[TABLE]

Once this matrix is generated, it is shared between the encoder and the decoder. For each new message to be transmitted, we generate a new generator matrix. However, the encoder and decoder work in a similar fashion for all messages:

•

Let $\textbf{u}^{(l)}=\begin{bmatrix}u_{1}^{(l)}&u_{2}^{(l)}&\cdots&u_{k}^{(l)}\end{bmatrix}^{T}\in\mathcal{V}^{k}=\mathbb{F}_{q}^{k}$ be the $l^{th}$ message to be sent. Then, at the $((l-1)n+i)^{th}$ channel use, we transmit the coded symbol $z_{(l-1)n+i}=z^{(l)}_{i}:=\sum_{j=1}^{k}u_{j}^{(l)}g_{ji}^{(l)}$ , where ${g_{ji}^{(l)}}$ is the $j^{th}$ element of ${\textbf{g}_{i}^{(l)}}$ . For each message $\textbf{u}^{(l)}$ , we send $n$ coded symbols. Hence, the encoder is given by $F_{l}:\mathcal{V}^{k}\to\mathcal{V}^{n}$ , with $\textbf{z}^{(l)}=F_{l}(\textbf{u}^{(l)})=(\textbf{u}^{(l)})^{T}\textbf{G}_{l}$ .

•

The decoder decodes on the fly. Whenever it receives $k$ linearly independent non-erased coded symbols, it decodes the message. Otherwise, it declares the packet to be erased.

We emphasize the fact that the matrices $(\textbf{G}_{l})_{l\geq 1}$ are generated in a i.i.d. fashion, which means that the linear codes corresponding to different messages can (and are likely to) be different.

IV-C Average Age of Random Codes

Fix the couple $(n,k)$ and let $\mathcal{C}$ be a random $\mathcal{C}(n,k)$ coding scheme generated as described in Section IV-B. We define $\Delta_{\epsilon,(n,k)}$ to be the expected average age of the coding scheme induced by a random linear $(n,k)$ -scheme generated as above.

Definition 5.

For every $n\geq k$ , and every $t\geq 0$ , define

[TABLE]

where the expectation in (37) is taken over the random $\mathcal{C}(n,k)$ coding scheme $\mathcal{C}$ , and over the randomness of the erasure patterns of the $q$ EC( $\epsilon$ ) channels.

Due to the ergodicity of the system, almost surely (over the randomly generated $\mathcal{C}$ and over the random erasure patterns), we have

[TABLE]

We will formally prove (38) in Lemma 5.

The contribution of the random coding argument in this context is the following: If we show that, for a given $n\geq k$ , we have $\Delta_{\epsilon,(n,k)}<\infty$ , then there must exist a linear $(n,k)$ -scheme $\mathcal{C}^{(n)}$ such that almost surely (over the random erasure patterns), $\Delta_{\epsilon,\mathcal{C}^{(n)}}=\Delta_{\epsilon,(n,k)}<\infty$ . In fact, as we mentioned above, for almost all $(n,k)$ -schemes $\mathcal{C}$ and almost all erasure patterns, we have $\Delta_{\epsilon,\mathcal{C}}=\Delta_{\epsilon,(n,k)}<\infty$ . Thus, the optimal average age $\Delta_{\epsilon}$ , and the optimal average age among linear block codes $\Delta_{\epsilon}^{lin}$ , satisfy

[TABLE]

Therefore,

[TABLE]

Equation (40) gives an upper bound on the optimal average age. In the rest of this paper we will focus on characterizing this bound.

IV-D Exact Upper Bound on the Optimal Average Age

IV-D1 Preliminaries

Let $\mathcal{C}$ be a randomly generated $(n,k)$ -scheme. Fig. 5 illustrates the variation of the instantaneous age $\Delta_{\epsilon,\mathcal{C}}(t)$ when $n=5$ and $k=3$ . Without loss of generality, we assume that we begin observing right after the reception of a successful packet. We denote by $t_{j}$ the generation time of the $j^{th}$ successful packet and by $t^{\prime}_{j}$ the end of transmission time of this packet. Assume that the $j^{th}$ successful message is the $l^{th}$ transmitted message. We have:

•

$t_{j}=t_{m(l)}^{s}=m(l)T_{s}+t_{0}^{s}=m(l)+t_{0}^{s}$ , where $m(l)=\lfloor(l-1)n+1-t_{0}^{s}\rfloor$ .

•

$t_{j}^{\prime}=t_{nl}^{c}=nlT_{c}=nl$ .

Therefore, the instantaneous age at the end of transmission of the $j^{th}$ successful package is

[TABLE]

In the scenario depicted in Fig. 5, we assume that $t_{0}^{s}=0$ . The first packet $\textbf{u}^{(1)}=({u}_{1}^{(1)},\ldots,{u}_{1}^{(k)})$ is generated and encoded into a codeword $\textbf{z}^{(1)}=\left({z}_{1}^{(1)},\ldots,{z}_{n}^{(1)}\right)=\left((\textbf{u}^{(1)})^{T}\textbf{g}_{1}^{(1)},\ldots,(\textbf{u}^{(1)})^{T}\textbf{g}_{n}^{(1)}\right)$ of length $n=5$ at time $t=1$ . At that same instant, $z_{1}^{(1)}$ , the first symbol of $\textbf{z}^{(1)}$ , is sent and received at the monitor. Since it is the first symbol, $z_{1}^{(1)}$ is linearly independent.555By “ $z_{1}^{(1)}$ is linearly independent”, we just mean that the corresponding column $\mathbf{g}_{1}^{(1)}$ of the generator matrix $\mathbf{G}_{1}$ forms a linearly independent family of vectors. This is true simply because $\mathbf{g}_{1}^{(1)}\neq 0^{k}$ . At time $t=2$ , the coded symbol $z_{2}^{(1)}$ is erased but the coded symbol $z_{1}^{(3)}$ , which is linearly independent from666Here, we just mean that in the particular example that is illustrated in Fig. 5, the random matrix $\mathbf{G}_{1}$ was such that $\mathbf{g}_{3}^{(1)}$ is linearly independent from $\mathbf{g}_{1}^{(1)}$ . $z_{1}^{(1)}$ , is received at time $t=3$ . The fourth coded symbol is also erased and the last coded symbol $z_{5}^{(1)}$ is received. However, as Fig. 5 shows, the received symbol $z_{5}^{(1)}$ is linearly dependent on the previously received symbols, namely $z_{1}^{(1)}$ and $z_{1}^{(3)}$ , i.e., $\mathbf{g}_{5}^{(1)}$ is linearly dependent on $\{\mathbf{g}_{1}^{(1)},\mathbf{g}_{3}^{(1)}\}$ . The first packet $\textbf{u}^{(1)}$ is declared erased by the decoder because it did not receive $3$ linearly independent symbols, and $\Delta_{\epsilon,\mathcal{C}}(t)$ increases linearly in the interval $t\in[1,6)$ . The packet generated at $t=t_{1}=6$ is a successful update since the monitor receives $k=3$ linearly independent symbols at times $t=7$ , $t=9$ and $t=10$ . Therefore, $\Delta_{\epsilon,\mathcal{C}}(t)$ drops to $10-6=4$ at time $t=t^{\prime}_{1}=10$ . Note that, for a given successful packet, once $k$ linearly independent coded symbols are received, any additional coded symbol must be linearly dependent on them.

In this section we use the following notation:

•

$Y_{j}=t^{\prime}_{j}-t^{\prime}_{j-1}$ is the interdeparture time between the ${(j-1)}^{th}$ and $j^{th}$ successfully received updates.

•

$T_{j}$ is the number of channel uses between the decoding instant of the $j^{th}$ successful packet and its generation time $t_{j}$ .

•

$R(\tau)=\max\left\{j:t^{\prime}_{j}\leq\tau\right\}$ is the number of successfully received updates in the interval $[0,\tau]$ .

•

Let $\mathbf{u}^{(l)}\in\mathbb{F}_{q}^{k}$ be the $l^{th}$ transmitted packet (not necessarily successful). Imagine that we generate infinitely many vectors $(\mathbf{g}_{i}^{(l)})_{i\geq 1}$ independently and uniformly in $\mathbb{F}_{q}^{k}\setminus\{0\}$ . Imagine also that we transmit the coded symbol $z_{i}^{(l)}=(\mathbf{u}^{(l)})^{T}\mathbf{g}_{i}^{(l)}$ over a $q$ EC( $\epsilon$ ) channel to a virtual monitor for every $i\geq 1$ . In reality, we only transmit $z_{1}^{(l)},\ldots,z_{n}^{(l)}$ to the real monitor. In other words, the first $n$ symbols are really transmitted and the rest are virtually transmitted. Let $B_{l}$ be the number of channel uses (or sent coded symbols) in order for the virtual monitor to receive exactly $k$ linearly independent equations (coded symbols). The $l^{th}$ packet is correctly decoded at the real monitor if and only if $B_{l}\leq n$ .

Since the channel is memoryless and the different codes used in the scheme $\mathcal{C}$ are generated independently and in the same fashion, then the process $(B_{l})_{l\geq 1}$ is i.i.d with a distribution identical to the random variable $B$ that we describe in the following subsection.

IV-D2 The Distribution of $B$

Fig. 6 shows the Markov chain that represents the dimension, at the (virtual) receiver, of the codeword relative to a certain update.

The monitor receives the first coded symbol of a new codeword with probability $p_{0}=\bar{\epsilon}=1-\epsilon$ and hence the dimension of this codeword at the receiver jumps to $1$ . If the first coded symbol is erased then the dimension of the codeword remains at [math]. If the monitor has already received $s$ linearly independent coded symbols, then it will receive the $(s+1)^{th}$ linearly independent coded symbol if:

(i)

the next transmitted coded symbol is not erased, and,

(ii)

the next transmitted coded symbol is linearly independent of all previously received symbols.

Event $(i)$ occurs with probability $\bar{\epsilon}=1-\epsilon$ . For event $(ii)$ , notice that the symbols that are linearly dependent with the received symbols form a subspace of dimension777Recall that $s$ linearly independent coded symbols have been received. $s$ , hence there are $q^{s}$ such symbols. Therefore, the number of nonzero symbols that are linearly dependent with the received symbols is $q^{s}-1$ . Now since coded symbols are generated uniformly at random from the set of nonzero symbols, we can see that event $(ii)$ happens with probability $\frac{q^{k}-1-(q^{s}-1)}{q^{k}-1}=\frac{q^{s}(q^{k-s}-1)}{q^{k}-1}$ . Hence, for a given message, the dimension of its codeword at the receiver jumps from $s$ to $s+1$ with probability

[TABLE]

where $0\leq s\leq k-1$ . If the next transmitted coded symbol is erased or linearly dependent on the previously received coded symbols, then the dimension of the codeword at the monitor remains at $s$ . As previously discussed, once the monitor receives $k$ linearly independent coded symbols, the dimension of the codeword remains at $k$ and all subsequent coded symbols are linearly dependent on the previously non-erased coded symbols.

From the above description, we can deduce that $B$ is the number of steps before reaching state $k$ for the first time.

Remark 2.

Since $p_{s}=\frac{\bar{\epsilon}(q^{k}-q^{s})}{q^{k}-1}$ , then $p_{s}$ is a decreasing function of $s$ . This means that whenever the decoder receives a non-erased coded symbol that is linearly independent from all previously received coded symbols, and the system jumps to state $s$ , then it becomes harder to receive a new linearly independent coded symbol. This is why, on average, the system spends more time in state $s$ than in previous states.

Definition 6.

Let $L_{s}$ be the number of trials needed to pass from state $s$ to state $s+1$ in Fig. 6, where $0\leq s\leq k-1$ . It is easy to see that $L_{s}$ has a geometric distribution with success probability $p_{s}=\frac{\bar{\epsilon}q^{s}(q^{k-s}-1)}{q^{k}-1}$ . Thus,

[TABLE]

Corollary 1.

From Definition 6, we can write

[TABLE]

where $(L_{s})_{0\leq s<k}$ are independent.

Lemma 3.

The moment generating function of the random variable $B$ is

[TABLE]

Proof.

[TABLE]

where the second equality follows from the fact that $(L_{s})_{0\leq s<k}$ are mutually independent. Replacing $p_{s}$ by its expression $p_{s}=\frac{\bar{\epsilon}q^{s}(q^{k-s}-1)}{q^{k}-1}$ , we obtain (45). ∎

Corollary 2.

The expected value of $B$ is

[TABLE]

Proof.

Using (44), we get

[TABLE]

We can also get (47) by using (45) and the fact that $\displaystyle\mathbb{E}(B)=\left.\frac{\mathrm{d}\phi_{B}(t)}{\mathrm{d}t}\right|_{t=0}$ . ∎

IV-D3 Packet Erasure Probability

The $l^{th}$ packet is correctly received if $B_{l}\leq n$ . Otherwise, we declare the packet to be lost. Therefore, the packet erasure probability $\epsilon_{p}$ is equal to

[TABLE]

where the distribution of $B$ is given by Lemma 3. We call $1-\epsilon_{p}=\mathbb{P}(B\leq n)$ to be the packet success probability.

IV-D4 The Age Analysis

Definition 7.

In every interdeparture interval $Y_{j}$ , we call $H_{j}$ the number of erased packets before the reception of a successful update. $H_{j}$ is geometric with success probability $\epsilon_{p}$ , so

[TABLE]

We use Definition 7 to characterize the interdeparture interval. Indeed, any interdeparture interval is the sum of two components: The time sending unsuccessful packets followed by the service time of the successful update. Since each transmitted packet takes $n$ channel uses and $\mu=1$ , then the $j^{th}$ interdeparture time can be written as

[TABLE]

Given that we assume a memoryless erasure channel and independently generated packets, then $(H_{j})_{j\geq 1}$ are independent and identically distributed. Since the interdeparture interval $Y_{j}$ is a function of $H_{j}$ , then $(Y_{j})_{j\geq 1}$ are also independent and identically distributed. Hence the following lemma:

Lemma 4.

The process $R(\tau)=\max\left\{n:t^{\prime}_{n}\leq\tau\right\}$ is a renewal process with the interdeparture times $(Y_{j})_{j\geq 1}$ being the renewal intervals.

The importance of Lemma 4 stems from the fact that it shows that $\Delta_{\epsilon,\mathcal{C}}$ exists and the system is ergodic.

Lemma 5.

Almost surely (over the random choice of the $(n,k)$ -scheme $\mathcal{C}$ , and over the random erasure patterns of the $q$ EC( $\epsilon$ ) channels), we have

[TABLE]

*where $Q$ is a generic random variable that has the same distribution as $\displaystyle Q_{j}=\int_{t^{\prime}_{j-1}}^{t^{\prime}_{j}}\Delta_{\epsilon,\mathcal{C}}(t)\mathrm{d}t$ which is represented by the shaded areas in Fig. 5, and $Y$ is a generic random variable that has the same distribution as the interdeparture interval $Y_{j}$ . *

Proof.

By Lemma 4, $R(\tau)$ forms a renewal process and hence by [36] we know that $\displaystyle\lim_{\tau\to\infty}\frac{R(\tau)}{\tau}=\frac{1}{\mathbb{E}(Y)}$ . By defining $\displaystyle Q_{j}=\int_{t^{\prime}_{j-1}}^{t^{\prime}_{j}}\Delta_{\epsilon,\mathcal{C}}(t)\mathrm{d}t$ to be the reward function over the renewal period $Y_{j}$ , we get (using renewal reward theory [37, 36]) that almost surely

[TABLE]

∎

Before computing the average age, we still need one more lemma that gives the distribution of the random variables $(T_{j})_{j\geq 1}$ .

Lemma 6.

Let $T$ be a generic random variable that has the same distribution as the number of channel uses $T_{j}$ between the decoding instant of the $j^{th}$ successful packet and its generation time $t_{j}$ . Then,

[TABLE]

where $\mathbbm{1}_{\{.\}}$ is the indicator function.

Proof.

A packet is successfully decoded if the decoder receives exactly $k$ linearly independent coded symbols after at most $n$ channel uses. Thus, for the $j^{th}$ successful packet we have that

[TABLE]

∎

We are now ready to give the main theorem of this section.

Theorem 4.

Assume a $q$ EC( $\epsilon$ ) and an $(n,k)$ -coding scheme $\mathcal{C}$ as defined in Section IV-B. Almost surely, the average age $\Delta_{\epsilon,\mathcal{C}}$ corresponding to such setup is given by

[TABLE]

where $\epsilon_{p}$ is the packet erasure probability given by (49).

Proof.

From (52), we know that we need to compute $\mathbb{E}(Q)$ and $\mathbb{E}(Y)$ . We start with $\mathbb{E}(Y)$ . We have shown that for every $j\geq 1$ , $Y_{j}=n(H_{j}+1)$ . Thus,

[TABLE]

where the third equality is due to the fact that $H$ has a geometric distribution with success probability $\epsilon_{p}$ as seen in Definition 7.

Now we turn to $\mathbb{E}(Q)$ . For every $j\geq 1$ , the shaded area $Q_{j}$ shown in Fig. 5 is the sum of the areas of two trapezoids: a large trapezoid with height $n(H_{j}+T_{j})$ and a smaller one with height $n-T_{j}$ . Recall from (41) that the instantaneous age at the end of transmission of the $j^{th}$ successful package is $\Delta_{\epsilon,\mathcal{C}}(t^{\prime}_{j})=n-1+[-t_{0}^{s}]$ . Thus,

[TABLE]

Note that $H_{j}$ and $T_{j}$ are independent. Therefore,

[TABLE]

Replacing $\mathbb{E}(Y)$ and $\mathbb{E}(Q)$ in (52) by their expressions in (57) and (59), we obtain (56). ∎

In the expression of $\Delta_{\epsilon,\mathcal{C}}$ in (56), $\mathbb{E}(T)$ and $\epsilon_{p}$ cannot be easily expressed in terms of $\epsilon$ , $k$ and $n$ . This is why we study $\Delta_{\epsilon,\mathcal{C}}$ in the next two subsections by presenting upper and lower bounds on the expression in (56).

IV-E Bounding $\Delta_{\epsilon,\mathcal{C}}$

As we mentioned in the previous paragraph, the expression of $\Delta_{\epsilon,\mathcal{C}}$ is not easy to calculate. This is mainly because the distribution of the random variable $B$ is complicated. In this section, we provide upper and lower bounds on $\Delta_{\epsilon,\mathcal{C}}$ which are computed using random variables that have simpler distributions compared to $B$ .

Definition 8.

We define $\tilde{B}$ to be the sum of $k$ i.i.d random variables distributed like $L_{0}$ . We also define $\hat{B}$ to be the sum of $k$ i.i.d random variables distributed like $L_{k-1}$ . Formally,

[TABLE]

where $L_{0}$ is geometrically distributed with success probability $\bar{\epsilon}=1-\epsilon$ and $L_{k-1}$ is also geometrically distributed with success probability $p_{k-1}=\frac{\bar{\epsilon}q^{k-1}(q-1)}{q^{k}-1}$ .

Lemma 7.

The random variables $\tilde{B}$ and $\hat{B}$ defined in Definition 8 are both negative binomials with

[TABLE]

and

[TABLE]

where $i=k,k+1,k+2,\ldots$

Proof.

$\tilde{B}$ is the sum of $k$ i.i.d geometric random variables with success probability $1-\epsilon$ . Similarly, $\hat{B}$ is the sum of $k$ i.i.d geometric random variables with success probability $p_{k-1}$ . ∎

We will show that the random variables $\tilde{B}$ and $\hat{B}$ can be coupled with the random variable $B$ in such a way that $\tilde{B}\leq B\leq\hat{B}$ with probability 1.

Lemma 8.

Let $B=\sum_{s=0}^{k-1}L_{s}$ , and let $\tilde{B}$ and $\hat{B}$ be as in Definition 8. It is possible to couple $B$ , $\tilde{B}$ and $\hat{B}$ in such a way that $\tilde{B}\leq B\leq\hat{B}$ with probability 1. More precisely, we can define three random variables $O$ , $\tilde{O}$ and $\hat{O}$ on the same probability space such that:

•

$O$ , $\tilde{O}$ and $\hat{O}$ have the same distributions as $B$ , $\tilde{B}$ and $\hat{B}$ , respectively, i.e., for every $i\geq 1$ , we have $\mathbb{P}(O=i)=\mathbb{P}(B=i)$ , $\mathbb{P}(\hat{O}=i)=\mathbb{P}(\hat{B}=i)$ and $\mathbb{P}(\tilde{O}=i)=\mathbb{P}(\tilde{B}=i)$ .

•

$\tilde{O}\leq O\leq\hat{O}$ * with probability 1.*

Proof.

The proof can be found in Section -D. ∎

Corollary 3.

Given $B=\sum_{s=0}^{k-1}L_{s}$ and $\tilde{B}$ and $\hat{B}$ as defined in Definition 8, the following relations hold for $i\geq k$ :

$\mathbb{P}(\tilde{B}\leq i)\geq\mathbb{P}(B\leq i)$ , 2. 2.

$\mathbb{E}(\tilde{B})\leq\mathbb{E}(B)$ , 3. 3.

$\mathbb{P}(\hat{B}\leq i)\leq\mathbb{P}(B\leq i)$ . 4. 4.

$\mathbb{E}(\hat{B})\geq\mathbb{E}(B)$ ,

Proof.

Let $\tilde{O},O$ and $\hat{O}$ be as in Lemma 8. Since $O\geq\tilde{O}$ with probability 1, we deduce that the event $\{O\leq i\}$ is a subset of the event $\{\tilde{O}\leq i\}$ . Hence,

[TABLE]

This inequality also implies that $\mathbb{P}(B\geq i)\geq\mathbb{P}(\tilde{B}\geq i)$ . Furthermore, since $O\geq\tilde{O}$ with probability 1, we have

[TABLE]

On the other hand, since $\hat{O}\geq O$ with probability 1, we deduce that the event $\{\hat{O}\leq i\}$ is a subset of the event $\{O\leq i\}$ . Hence,

[TABLE]

This inequality also implies that $\mathbb{P}(\hat{B}\geq i)\geq\mathbb{P}(B\geq i)$ . Furthermore, since $\hat{O}\geq O$ with probability 1, we have

[TABLE]

∎

Corollary 3 can be interpreted as follows: $\tilde{B}$ can be seen as the number of channel uses in order to receive exactly $k$ linearly independent coded symbols when any $k$ coded symbols are linearly independent. This means that $\tilde{B}$ corresponds to the number of channel uses needed to decode a packet when the encoders of the $(n,k)$ -scheme only use MDS codes. Hence, $\tilde{B}$ is equivalent to the number of channel uses needed to receive exactly $k$ non-erased coded symbols. Intuitively, we would expect to need a number $\tilde{B}$ of channel uses to receive $k$ non-erased coded symbols which is smaller than the number $B$ needed to receive $k$ linearly independent coded symbols. This explains the intuition behind items $(1)$ and $(2)$ in Corollary 3. On the opposite side of the spectrum, $\hat{B}$ can be seen as a worst case scenario since the jump from state $s$ to state $s+1$ in Fig. 6 occurs with the smallest possible probability, namely $p_{k-1}$ . This discussion leads us to the idea that $\Delta_{\epsilon,\mathcal{C}}$ could be upper bounded by the average age corresponding to a coding system with $\hat{B}$ as the number of channel uses needed to receive exactly $k$ linearly independent coded symbols. Similarly, $\Delta_{\epsilon,\mathcal{C}}$ could be lower bounded by the average age achieved using only MDS codes with $\tilde{B}$ as the number of channel uses needed to receive $k$ linearly independent coded symbols.

By applying Lemma 8, we can define a sequence of independent and identically distributed triplets $(\tilde{B}_{l},B_{l},\hat{B}_{l})_{l\geq 1}$ such that for every $l\geq 1$ , we have:

•

$\tilde{B}_{l}\leq B_{l}\leq\hat{B}_{l}$ with probability 1.

•

$\tilde{B}_{l},B_{l}$ and $\hat{B}_{l}$ are distributed as $\tilde{B},B$ and $\hat{B}$ , respectively.

We will use $(B_{l})_{l\geq 1}$ to describe the age of information of the system as we explained at the beginning of Section IV-D. More precisely, for $t\geq 1$ , we have

[TABLE]

where $l_{t}=\lfloor\frac{t-1}{n}\rfloor+1$ is the number of the packet that is being transmitted at time $t$ . Note that (67) can be shown exactly as (34).

We now define two virtual ages, that we denote as $\tilde{\Delta}_{\epsilon,\mathcal{C}}(t)$ and $\hat{\Delta}_{\epsilon,\mathcal{C}}(t)$ . These virtual ages are initially equal to the real age $\Delta_{\epsilon,\mathcal{C}}(t)$ , but instead of using $(B_{l})_{l\geq 1}$ , the evolution of $\tilde{\Delta}_{\epsilon,\mathcal{C}}(t)$ and $\hat{\Delta}_{\epsilon,\mathcal{C}}(t)$ will be governed by $(\tilde{B}_{l})_{l\geq 1}$ and $(\hat{B}_{l})_{l\geq 1}$ , respectively. More precisely,

[TABLE]

and

[TABLE]

Similarly to the proof of 3, since $\tilde{B}_{l_{t}}\leq B_{l_{t}}\leq\hat{B}_{l_{t}}$ for every $t\geq 1$ , we can show by induction on $l_{t}$ that $\tilde{\Delta}_{\epsilon,\mathcal{C}}(t)\leq\Delta_{\epsilon,\mathcal{C}}(t)\leq\hat{\Delta}_{\epsilon,\mathcal{C}}(t)$ for every $t$ . Therefore,

[TABLE]

where

[TABLE]

and

[TABLE]

IV-E1 Upper Bound on $\Delta_{\epsilon,\mathcal{C}}$

From (70) we know that $\Delta_{\epsilon,\mathcal{C}}\leq\hat{\Delta}_{\epsilon,\mathcal{C}}$ .

Since $\hat{\Delta}_{\epsilon,\mathcal{C}}$ was defined in a similar way as $\Delta_{\epsilon,\mathcal{C}}$ but using $(\hat{B}_{l})_{l\geq 1}$ instead of $(B_{l})_{l\geq 1}$ , $\hat{\Delta}_{\epsilon,\mathcal{C}}$ will satisfy a similar equation as (56) but the terms will be defined using $\hat{B}$ instead of $B$ . More precisely, by using the same techniques that were used to prove 4, we can show that almost surely, we have

[TABLE]

where the distribution of $\hat{T}$ is given by

[TABLE]

and

[TABLE]

From Lemma 7, we know that $\hat{B}$ is a negative binomial random variable. Hence,

[TABLE]

Let $\hat{\hat{B}}=\sum_{s=0}^{k}\hat{L}_{s}$ , where $(\hat{L}_{s})_{0\leq s\leq k}$ are i.i.d with a marginal distribution identical to $L_{k-1}$ . Hence $\hat{\hat{B}}$ is also a negative binomial and

[TABLE]

We use the same trick as in [31] and set $i^{\prime}=i+1$ in (IV-E1). This leads to

[TABLE]

where

$\displaystyle\mathbb{P}(\hat{\hat{B}}\leq n+1)=\sum_{i=k+1}^{n+1}{i-1\choose k}(1-p_{k-1})^{i-k-1}(p_{k-1})^{k+1}$

.

Using this result, together with (73) and (75), we get

[TABLE]

where the second equality is obtained by using

[TABLE]

We denote by $\Delta_{\epsilon,(n,k)}^{ub}$ the upper bound we just found. Thus,

[TABLE]

IV-E2 Lower Bound on $\Delta_{\epsilon,\mathcal{C}}$

Let $\displaystyle\tilde{\tilde{B}}=\sum_{s=0}^{k}\tilde{L}_{s}$ , where $(\tilde{L}_{s})_{0\leq s\leq k}$ are i.i.d with a marginal distribution identical to $L_{0}$ . Hence $\tilde{\tilde{B}}$ is also a negative binomial and

[TABLE]

From (70), we know that $\Delta_{\epsilon,\mathcal{C}}\geq\tilde{\Delta}_{\epsilon,\mathcal{C}}$ . Using an argument identical to that used for the computation of the upper bound $\Delta_{\epsilon,\mathcal{C}}^{ub}$ we show that $\Delta_{\epsilon,\mathcal{C}}\geq\Delta_{\epsilon,(n,k)}^{lb}$ , where

[TABLE]

Remark 3.

The lower bound found here is similar to the average age derived in [31] for the finite redundancy (FR) case. However, the time scale is different since Yates et al. in [31] assume that the source generates a new update at the same instant it finishes transmitting the previous one. Whereas in our case, when $t_{0}^{s}=0$ , we assume we generate and begin transmitting a new packet $\frac{1}{\mu}$ seconds after the last update finishes transmission.

IV-F Age-Optimal Codes

We have already discussed that the lower bound on $\Delta_{\epsilon,\mathcal{C}}$ , $\Delta_{\epsilon,(n,k)}^{lb}$ , corresponds to the average age when the $(n,k)$ -scheme uses only MDS codes with $\tilde{B}$ as the number of channel uses needed to receive $k$ linearly independent coded symbols. Recall from 3 that, for a given couple $(n,k)$ , using an MDS code is optimal. This observation gives a different explanation on why the expression found in (83) is indeed a lower bound on the average age corresponding to a scheme using any other type of codes than MDS, in particular a code generated randomly. This means that the lower bound is universal over all codes and the optimal achievable age

[TABLE]

where $\mathcal{C}$ is a random $(n,k)$ -scheme, and $\Delta_{\epsilon}^{lin}$ is the optimal average age over coding schemes that are induced by linear block codes. However, for a given $(n,k)$ , an explicit construction of an MDS code is not always available. In this section, we show that if the channel-input alphabet is large enough, then random codes are (almost) age-optimal among linear block codes.

Theorem 5.

Fix a couple $(n,k)$ . We have that $\forall\delta>0$ , $\exists q_{0}>0$ such that $\forall q\geq q_{0}$ , a random $(n,k)$ -coding scheme $\mathcal{C}$ almost surely satisfies

[TABLE]

This means that for a channel-input alphabet large enough ( $q$ large), random codes are (almost) age-optimal among linear block codes and

[TABLE]

where $\mathcal{C}$ is a random $(n,k)$ -coding scheme, and the dot above the equal sign refers to the fact that the difference between the two sides approaches zero as $q$ gets large.

Proof.

For a given random code $\mathcal{C}$ , recall that

[TABLE]

From (54) and (49), we notice that $\mathbb{E}(T)$ and $\epsilon_{p}$ both depend only on the distribution of $B=\sum_{s=0}^{k-1}L_{s}$ . However, for every $s\in\{0,1,\ldots,k-1\}$ ,

[TABLE]

This means that, for every $s$ , $L_{s}$ converges in distribution to $L_{0}$ as $q\to\infty$ . Therefore, $B$ converges in distribution to $\tilde{B}=\sum_{s=0}^{k-1}L_{0}^{(s)}$ , as $q\to\infty$ . Hence, as $q\to\infty$ , $\Delta_{\epsilon,\mathcal{C}}$ converges to $\Delta_{\epsilon,(n,k)}^{lb}$ . So, for $q$ large enough, we can write

[TABLE]

From (40), we know that the optimal age among linear block codes, for a given $q$ , is $\displaystyle\Delta_{\epsilon}^{lin}\leq\min_{n\geq k}\Delta_{\epsilon,(n,k)}$ . For large enough $q$ , we have $\Delta_{\epsilon,(n,k)}\doteq\Delta_{\epsilon,(n,k)}^{lb}$ . This means that asymptotically, $\displaystyle\Delta_{\epsilon}^{lin}\dot{\leq}\min_{n\geq k}\Delta_{\epsilon,(n,k)}^{lb}$ . However, from (84), we have that $\displaystyle\Delta_{\epsilon}^{lin}\geq\min_{n\geq k}\Delta_{\epsilon,(n,k)}^{lb}$ for every $q$ . Therefore, asymptotically

[TABLE]

∎

Notice that for very large $k$ , it is extremely unlikely that a (randomly generated) coded symbol is linearly dependent with any subset of size $k-1$ of the $n-1$ remaining coded symbols. This means that as $k$ becomes large, the behavior of random codes approaches that of MDS codes. This is essentially the main reason why 5 is true

IV-G Other Bounds and Approximations

IV-G1 Upper Bounding the Lower Bound

In Remark 3, we discussed how the lower bound found in (83) is similar, up to a time scale difference, to the average age computed by Yates et al. in [31, Section 3]. In this paper, the authors present a tight upper bound on the computed average age. We borrow the same techniques as in [31, Section 3.A] to upper bound $\Delta_{\epsilon,(n,k)}^{lb}$ . Interestingly, simulations will show that the upper bound to $\Delta_{\epsilon,(n,k)}^{lb}$ is a tight approximation to $\Delta_{\epsilon,\mathcal{C}}$ , the average age achieved when using a random $(n,k)$ -scheme $\mathcal{C}$ .

Recall that

[TABLE]

Denote by $\tilde{\mu}_{n}=\frac{k\mathbb{P}(\tilde{\tilde{B}}\leq n+1)}{(1-\epsilon)\mathbb{P}(\tilde{B}\leq n)}$ . From [31, Lemma 1], we know that $\tilde{\mu}_{n}\leq\min\left(n,\frac{k}{1-\epsilon}\right)$ . Hence,

[TABLE]

We denote by $\Delta_{\epsilon,(n,k)}^{*}$ this approximation. Thus,

[TABLE]

Remark 4.

We can apply the techniques discussed in [31, Section 3.A] in order to approximate the optimal codeword length $n$ for $\Delta_{\epsilon,(n,k)}^{lb}$ and write $\Delta_{\epsilon,(n,k)}^{*}$ solely in function of $\epsilon$ , $k$ , $n$ and the size $q$ of the channel-input alphabet.

IV-G2 Another Upper Bound on $\Delta_{\epsilon,\mathcal{C}}$

We derive here a second upper bound on $\Delta_{\epsilon,\mathcal{C}}$ which is easier to compute than

$\Delta_{\epsilon,(n,k)}^{ub}$ . First recall from 4 that

[TABLE]

However,

[TABLE]

Hence,

[TABLE]

Whereas $\mathbb{E}(B)$ (given in (47)) is easy to compute,

[TABLE]

is hard to compute due to the complex nature of the distribution of $B$ (given in Lemma 3). To solve this problem, we use $\hat{B}$ as defined in Definition 8 to upper bound $\epsilon_{p}$ . Indeed, from Corollary 3 we know that

[TABLE]

Hence,

[TABLE]

Therefore, using (47), the new upper bound $\hat{\hat{\Delta}}_{\epsilon,(n,k)}$ is

[TABLE]

IV-H Numerical Results

Fig. 7a and Fig. 8a correspond to a system with $k=3$ , $q=|\mathcal{V}|=5$ , $t_{0}^{s}=0$ , and using a random $(n,k)$ -coding scheme $\mathcal{C}$ . Fig. 7a plots $\Delta_{\epsilon,\mathcal{C}}$ as well as the bounds and the approximation derived in Sections IV-E and IV-G with respect to the blocklength $n$ , for four erasure channels with erasure probabilities $0.1,0.3,0.5,0.8$ . The tightness of the bounds with respect to $\Delta_{\epsilon,\mathcal{C}}$ differs according to the erasure probability:

•

For all error probabilities, we notice that the upper bound $\hat{\hat{\Delta}}_{\epsilon,(n,k)}$ (the orange curve) is very tight (almost equal to $\Delta_{\epsilon,\mathcal{C}}$ ) at large enough $n$ . However, the value $n^{*}$ of the blocklength $n$ starting which $\hat{\hat{\Delta}}_{\epsilon,(n,k)}$ becomes tight depends on $\epsilon$ : The larger the erasure probability, the larger the blocklength $n$ . For instance, for $\epsilon=0.1$ we have $n^{*}=7$ . But for $\epsilon=0.5$ , $n^{*}=12$ and for $\epsilon=0.8$ we have $n^{*}=30$ . For $n>n^{*}$ , the upper bound $\hat{\hat{\Delta}}_{\epsilon,(n,k)}$ is tighter than all other bounds. Notice that for every $n$ and every $\epsilon$ , $\Delta_{\epsilon,\mathcal{C}}\leq\hat{\Delta}_{\epsilon,\mathcal{C}}$ .

•

For the approximation $\Delta_{\epsilon,(n,k)}^{*}$ , we notice that it becomes tighter as the erasure probability becomes larger. This is true especially at low values of $n$ , more particularly for $n<n^{*}$ . For this range of blocklength values the approximation $\Delta_{\epsilon,(n,k)}^{*}$ is the extremely close to $\Delta_{\epsilon,\mathcal{C}}$ .

•

For every $n$ and every $\epsilon$ , we have $\Delta_{\epsilon,(n,k)}^{lb}\leq\Delta_{\epsilon,\mathcal{C}}$ and $\Delta_{\epsilon,(n,k)}^{lb}\leq\Delta_{\epsilon,(n,k)}^{*}$ . We notice that, for all values of $\epsilon$ , $\Delta_{\epsilon,(n,k)}^{lb}$ is close to $\Delta_{\epsilon,\mathcal{C}}$ at large $n$ . Whereas, for small values of $n$ , this lower bound does not show any noticeable behavioral modification as $\epsilon$ increases.

•

The upper bound $\Delta_{\epsilon,(n,k)}^{ub}$ is always larger than $\Delta_{\epsilon,\mathcal{C}}$ . Even though at $n>n^{*}$ we observe that $\hat{\hat{\Delta}}_{\epsilon,(n,k)}\leq\Delta_{\epsilon,(n,k)}^{ub}$ , for $n\leq n^{*}$ the upper bound $\Delta_{\epsilon,(n,k)}^{ub}$ is closer to $\Delta_{\epsilon,\mathcal{C}}$ than $\hat{\hat{\Delta}}_{\epsilon,(n,k)}$ . In fact, as $\epsilon$ increases, the gap between the two upper bounds also increases.

Fig. 7a also suggests that there exists, for each erasure probability, an optimal blocklength that minimizes $\Delta_{\epsilon,\mathcal{C}}$ . This echoes the observations presented in [30] and in [31]. Moreover, each bound also has its optimal blocklength. Although the channel-input alphabet chosen is small ( $k=3$ and $q=5$ ), we remark that the gap between $\Delta_{\epsilon,\mathcal{C}}$ and the lower bound $\Delta_{\epsilon,(n,k)}^{lb}$ is not too great irrespective of the value of $\epsilon$ . This means that even for small channel-input alphabets, the performance of the optimal linear code is not too far from the performance achieved by random coding. This idea is illustrated in Fig. 8a. In this last figure, we find and plot, at each value of $\epsilon$ , the minimum (with respect to $n$ ) of $\Delta_{\epsilon,\mathcal{C}}$ and $\Delta_{\epsilon,(n,k)}^{lb}$ . We observe that these two minimums are close to each other. Since $\min_{n\geq k}\Delta_{\epsilon,(n,k)}^{lb}\leq\Delta_{\epsilon}^{lin}\leq\min_{n\geq k}\Delta_{\epsilon,\mathcal{C}}$ , then Fig. 8a suggests that, for every $\epsilon$ , if we use the optimal blocklength, then random codes achieve an age-performance close to the optimal linear code.

Fig. 7b and Fig. 8b mirror Fig. 7a and Fig. 8a respectively, but for a larger channel-input alphabet with $q=25$ . We can apply the same analysis as the one we just presented for the case $q=5$ . In this case we can notice the effect of increasing the size of the channel-input alphabet, while keeping $k$ constant. In fact, comparing Fig. 7a and Fig. 7b, we observe a clear convergence of $\Delta_{\epsilon,\mathcal{C}}$ toward the lower bound $\Delta_{\epsilon,(n,k)}^{lb}$ . In Fig. 7b, the approximation $\Delta_{\epsilon,(n,k)}^{*}$ is not as tight as for the case of $q=5$ , for all $\epsilon$ and $n$ . Indeed, we can notice that, for $\epsilon=0.9$ , $\Delta_{\epsilon,(n,k)}^{*}$ is worse than $\Delta_{\epsilon,(n,k)}^{ub}$ for $n\leq 20$ . For large $n$ , all bounds are tight except for the upper bound $\Delta_{\epsilon,(n,k)}^{ub}$ . In fact, in Fig. 7b, the lower bound $\Delta_{\epsilon,(n,k)}^{lb}$ is the tightest bound on $\Delta_{\epsilon,\mathcal{C}}$ . However, the convergence of $\Delta_{\epsilon,\mathcal{C}}$ toward the lower bound $\Delta_{\epsilon,(n,k)}^{lb}$ is best observed in Fig. 8b. In this figure, we remark that the performance of the random code with the optimal blocklength is almost optimal. These simulations support our claim that random codes are age-optimal as $q$ grows and the channel-input alphabet becomes large.

V Conclusion

In this paper, we have studied the optimal achievable average age over an erasure channel in two scenarios: $(i)$ When the source alphabet and channel-input alphabet are be the same, and $(ii)$ when they are different. We have demonstrated that in the first case, we do not need any type of channel coding to achieve the minimal average age, for which we have computed the exact expression. As for the second case, we have used random coding technique to compute bounds on the optimal achievable age. We have also shown that for a large enough source alphabet, random codes are (almost) age-optimal among linear block codes. Finally, the numerical results have pointed out an interesting observation: Even for a small source alphabet, the performance of random codes is not too far from optimal from an age point of view.

Acknowledgements

We would like to thank Roy Yates and an anonymous reviewer for helpful comments. This research was supported in part by grant No. 200021_166106/1 of the Swiss National Science Foundation.

-A Equidistribution and Weyl’s Equidistribution Theorem

In this section888The material in this section is based on [33, 38]., for every real number $x$ , we use $[x]$ to denote its fractional part, i.e., $[x]=x-\lfloor x\rfloor$ .

Definition 9.

A sequence $(u_{i})_{i\geq 1}\in[0,1)$ is said to be equidistributed on $[0,1)$ if for every interval $(a,b)\subset[0,1]$ we have

[TABLE]

where $|A|$ denotes the cardinality of the set $A$ .

Remark 5.

In Definition 9, we can replace $(a,b)$ with $[a,b)$ , $(a,b]$ or $[a,b]$ in (101) and the limit still holds.

Theorem 6.

Let $(u_{i})_{i\geq 1}$ be a sequence of real numbers and denote by $[u_{i}]=u_{i}-\lfloor u_{i}\rfloor$ the fractional part of $u_{i}$ . Then the following are equivalent:

The sequence $([u_{i}])_{i\geq 1}$ is equidistributed on $[0,1{[}$ . 2. 2.

For every $k\in\mathbb{N}^{*}$ ,

[TABLE]

where $j^{2}=-1$ . 3. 3.

For every Riemann-integrable function $f:[0,1]\rightarrow\mathbb{C}$ , we have

[TABLE]

The proof of 6 is outside the scope of this paper but we encourage the reader to check [38] for the full proof. An important application of this theorem is given next.

Corollary 4.

If $(u_{i})_{i\geq 1}$ is a sequence that is equidistributed over $[0,1)$ , then we have

[TABLE]

Proof.

From the third condition of 6 we have

[TABLE]

∎

-B A variation of the strong law of large numbers

In this section, we prove a well known variation of the strong law of large numbers.

Lemma 9.

If $(X_{i})_{i\geq 1}$ is a sequence of complex-valued random variables satisfying $\displaystyle\sum_{i=1}^{\infty}\mathbb{E}(|X_{i}|^{2})<\infty$ , then almost surely, we have $\displaystyle\lim_{i\to\infty}X_{i}=0$ .

Proof.

Observe that $\displaystyle\mathbb{E}\left(\sum_{i=1}^{\infty}|X_{i}|^{2}\right)=\sum_{i=1}^{\infty}\mathbb{E}\left(|X_{i}|^{2}\right)<\infty$ . Therefore, $\displaystyle\mathbb{P}\left(\sum_{i=1}^{\infty}|X_{i}|^{2}=\infty\right)=0$ which can be rewritten as $\displaystyle\mathbb{P}\left(\sum_{i=1}^{\infty}|X_{i}|^{2}<\infty\right)=1$ . This implies that $\displaystyle\mathbb{P}\left(\lim_{i\to\infty}|X_{i}|^{2}=0\right)=1$ . We conclude that almost surely, we have $\displaystyle\lim_{i\to\infty}X_{i}=0$ . ∎

Proposition 2.

Let $(X_{i})_{i\geq 1}$ be a sequence of complex-valued random variables. If there exists $0<C<\infty$ such that

[TABLE]

then almost surely

[TABLE]

Proof.

Let $S_{0}=0$ and for every $N\geq 1$ , let

[TABLE]

For every $N_{2}>N_{1}\geq 0$ , we have

[TABLE]

In particular, for every $N\geq 1$ , we have

[TABLE]

Therefore,

[TABLE]

Lemma 9 now implies that

[TABLE]

Now, for every $N\geq 1$ , define

[TABLE]

We have

[TABLE]

where $(\ast)$ follows from (109). Thus,

[TABLE]

It follows from Lemma 9 that

[TABLE]

Now observe that for every $N\geq 1$ , we have

[TABLE]

Equations (112) and (116) now imply that

[TABLE]

∎

Corollary 5.

Let $(X_{i})_{i\geq 1}$ be a sequence of complex-valued random variables. If there exists $0<C<\infty$ and $0<\beta<1$ such that for every $i,l\geq 1$ we have

[TABLE]

then almost surely

[TABLE]

Proof.

This is a direct corollary of Proposition 2. ∎

-C Proof of Lemma 1

Let $(X_{l})_{l\geq 1}$ be a sequence of independent and identically distributed random variables which take values in the set of strictly positive natural numbers $\mathbb{N}^{\ast}$ and which satisfy $\mathbb{E}(X_{l}^{2})=\mathbb{E}(X^{2})<\infty$ . Let $S_{0}=0$ and $\displaystyle S_{l}=\sum_{r=1}^{l}X_{r}$ for $l\geq 1$ . For every $i\geq 0$ , let

[TABLE]

and

[TABLE]

Clearly, we have $Y_{i}=S_{L_{i}}$ . Furthermore, since $X_{r}\geq 1$ for every $r\geq 1$ , we have $L_{i}\leq i$ with probability 1.

Let $\rho\in\mathbb{R}\setminus\mathbb{Q}$ be an irrational number, and let $\alpha\in\mathbb{R}$ be an arbitrary real number. From Corollary 4 and the second criterion of 6, we know that in order to show that

[TABLE]

it is sufficient to show that

[TABLE]

Now fix $k\in\mathbb{N}^{\ast}$ . For every $N\geq 1$ , we have

[TABLE]

We would like to show that almost surely $\displaystyle\lim_{N\to\infty}\frac{Y_{N}}{N}=1$ . First, observe that

[TABLE]

It follows from Lemma 9 that almost surely $\displaystyle\lim_{l\to\infty}\frac{X_{l}}{l}=0$ .

It is easy to see that as $N\to\infty$ , we have $L_{N}\to\infty$ and $Y_{N}\to\infty$ . Now since $L_{N}\leq N$ with probability 1, we have

[TABLE]

Furthermore, since $Y_{N}\leq N<Y_{N}+X_{L_{N}+1}$ , and since we have just showed that $\displaystyle\lim_{N\to\infty}\frac{X_{L_{N}+1}}{N}=0$ , it follows that

[TABLE]

Now observe that

[TABLE]

which implies that

[TABLE]

From (125), (128) and (130), we conclude that it is sufficient to show that

[TABLE]

Notice that

[TABLE]

Now since $L_{N}\to\infty$ as $N\to\infty$ , the strong law of large numbers implies that

[TABLE]

Therefore, it is sufficient to show that

[TABLE]

where

[TABLE]

For every $l\geq 1$ , we have

[TABLE]

Furthermore, for every $l_{1}>l_{2}\geq 1$ , we have

[TABLE]

Thus,

[TABLE]

Now since $X$ is nondeterministic and takes values in $\mathbb{N}^{\ast}$ , there are two different integers $x_{1},x_{2}\in\mathbb{N}^{\ast}$ such that $\mathbb{P}(X=x_{1})>0$ and $\mathbb{P}(X=x_{2})>0$ . We have

[TABLE]

which implies that

[TABLE]

Now since $\rho$ is irrational and $x_{2}-x_{1}$ is a nonzero integer, we have $e^{j2\pi k\rho(x_{2}-x_{1})}\neq 1$ , which means that

[TABLE]

is a convex combination between 1 and $e^{j2\pi k\rho(x_{2}-x_{1})}\neq 1$ . This implies that

[TABLE]

By combining this with (140), we get

[TABLE]

Now (143), (138) and Corollary 5 imply that

[TABLE]

-D Proof of Lemma 8

We need the following lemma:

Lemma 10.

Let $0<\delta<\gamma\leq 1$ . We can define three random variables $X,Y$ and $Z$ taking values in the set of natural numbers $\mathbb{N}$ such that:

•

$X$ * is geometrically distributed with success probability $\gamma$ , i.e., $\mathbb{P}(X=i)=\gamma(1-\gamma)^{i-1},\forall i\geq 1$ .*

•

$Z$ * is independent of $X$ .*

•

$Y=X+Z$ .

•

$Y$ * is geometrically distributed with success probability $\delta$ , i.e., $\mathbb{P}(Y=i)=\delta(1-\delta)^{i-1},\forall i\geq 1$ .*

Proof.

Let $X$ and $Z$ be two independent random variables such that

[TABLE]

and

[TABLE]

The distribution of $Y$ is given by:

[TABLE]

∎

Now we are ready to prove Lemma 8

Proof of Lemma 8.

First notice that the probabilities $p_{s}=\frac{\bar{\epsilon}q^{s}(q^{k-s}-1)}{q^{k}-1}$ are decreasing in $s$ , where $0\leq s\leq k-1$ . This means that

[TABLE]

It follows from Lemma 10 that for each $0\leq s<k$ , we can define five random variables: $\tilde{A}_{s},J_{s},A_{s},\hat{J}_{s}$ and $\hat{A}_{s}$ , such that:

•

$\tilde{A}_{s}$ is geometrically distributed with success probability $p_{0}$ , i.e., $\tilde{A}_{s}$ is distributed as $L_{0}$ and so $\mathbb{P}(\tilde{A}_{s}=i)=p_{0}(1-p_{0})^{i-1}=\mathbb{P}(L_{0}=i),\forall i\geq 1$ .

•

$J_{s}$ is independent of $\tilde{A}_{s}$ .

•

$A_{s}=\tilde{A}_{s}+J_{s}$ .

•

$A_{s}$ is geometrically distributed with success probability $p_{s}$ , i.e., $A_{s}$ is distributed as $L_{s}$ and so $\mathbb{P}(A_{s}=i)=p_{s}(1-p_{s})^{i-1}=\mathbb{P}(L_{s}=i),\forall i\geq 1$ .

•

$\hat{J}_{s}$ is independent of $(\tilde{A}_{s},J_{s},A_{s})$ .

•

$\hat{A}_{s}=A_{s}+\hat{J}_{s}$ .

•

$\hat{A}_{s}$ is geometrically distributed with success probability $p_{k-1}$ , i.e., $\hat{A}_{s}$ is distributed as $L_{k-1}$ and so $\mathbb{P}(\hat{A}_{s}=i)=p_{k-1}(1-p_{k-1})^{i-1}=\mathbb{P}(L_{k-1}=i),\forall i\geq 1$ .

Assume that $(\tilde{A}_{s},J_{s},A_{s},\hat{J}_{s},\hat{A}_{s})$ is independent of $(\tilde{A}_{s^{\prime}},J_{s^{\prime}},A_{s^{\prime}},\hat{J}_{s^{\prime}},\hat{A}_{s^{\prime}})$ if $s\neq s^{\prime}$ . Now define

[TABLE]

and

[TABLE]

Clearly, the distribution of $\tilde{O},O$ and $\hat{O}$ is the same as that of $\tilde{B},B$ and $\hat{B}$ , respectively. Furthermore, we have $\tilde{O}\leq O\leq\hat{O}$ with probability 1. ∎

Bibliography38

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. K. Kaul, R. D. Yates, and M. Gruteser, “On piggybacking in vehicular networks,” in IEEE Global Telecommunications Conference, GLOBECOM 2011 , Dec. 2011.
2[2] S. Kaul, M. Gruteser, V. Rai, and J. Kenney, “Minimizing age of information in vehicular networks,” in IEEE Conference on Sensor, Mesh and Ad Hoc Communications and Networks (SECON) , Salt Lake City, Utah, USA, 2011.
3[3] S. Kaul, R. D. Yates, and M. Gruteser, “Real-time status: How often should one update?” in Proc. INFOCOM , 2012.
4[4] R. D. Yates and S. Kaul, “Real-time status updating: Multiple sources,” in Proc. IEEE Int’l. Symp. Info. Theory , Jul. 2012.
5[5] C. Kam, S. Kompella, and A. Ephremides, “Age of information under random updates,” in Proc. IEEE Int’l. Symp. Info. Theory , 2013, pp. 66–70.
6[6] ——, “Effect of message transmission diversity on status age,” in Proc. IEEE Int’l. Symp. Info. Theory , June 2014, pp. 2411–2415.
7[7] M. Costa, M. Codreanu, and A. Ephremides, “Age of information with packet management,” in Proc. IEEE Int’l. Symp. Info. Theory , June 2014, pp. 1583–1587.
8[8] Y. Sun, E. Uysal-Biyikoglu, R. D. Yates, C. E. Koksal, and N. B. Shroff, “Update or wait: How to keep your data fresh,” in IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications , April 2016, pp. 1–9.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Optimal Age over Erasure Channels

Abstract

I Introduction

II Preliminaries

Definition 1**.**

Definition 2**.**

Definition 3**.**

III Optimal Age with the Same Source & Channel Alphabets

III-A The Optimal Transmission Policy

Theorem 1**.**

Proof.

III-B The Optimal Average Age

Theorem 2**.**

Lemma 1**.**

Proof.

Lemma 2**.**

Proof.

Proof of 2.

Remark 1**.**

IV Optimal Age with Different Source & Channel alphabets

IV-A The Optimal Transmission Policy

Definition 4**.**

Proposition 1**.**

Theorem 3**.**

Proof.

IV-B The Random Code

IV-C Average Age of Random Codes

Definition 5**.**

IV-D Exact Upper Bound on the Optimal Average Age

IV-D1 Preliminaries

IV-D2 The Distribution of BBB

Remark 2**.**

Definition 6**.**

Corollary 1**.**

Lemma 3**.**

Proof.

Corollary 2**.**

Proof.

IV-D3 Packet Erasure Probability

IV-D4 The Age Analysis

Definition 7**.**

Lemma 4**.**

Lemma 5**.**

Proof.

Lemma 6**.**

Proof.

Theorem 4**.**

Proof.

IV-E Bounding Δϵ,C\Delta_{\epsilon,\mathcal{C}}Δϵ,C​

Definition 8**.**

Lemma 7**.**

Proof.

Lemma 8**.**

Proof.

Corollary 3**.**

Proof.

IV-E1 Upper Bound on Δϵ,C\Delta_{\epsilon,\mathcal{C}}Δϵ,C​

IV-E2 Lower Bound on Δϵ,C\Delta_{\epsilon,\mathcal{C}}Δϵ,C​

Remark 3**.**

IV-F Age-Optimal Codes

Theorem 5**.**

Proof.

IV-G Other Bounds and Approximations

IV-G1 Upper Bounding the Lower Bound

Remark 4**.**

IV-G2 Another Upper Bound on Δϵ,C\Delta_{\epsilon,\mathcal{C}}Δϵ,C​

IV-H Numerical Results

V Conclusion

Acknowledgements

-A Equidistribution and Weyl’s Equidistribution Theorem

Definition 9**.**

Remark 5**.**

Theorem 6**.**

Definition 1.

Definition 2.

Definition 3.

Theorem 1.

Theorem 2.

Lemma 1.

Lemma 2.

Remark 1.

Definition 4.

Proposition 1.

Theorem 3.

Definition 5.

IV-D2 The Distribution of $B$

Remark 2.

Definition 6.

Corollary 1.

Lemma 3.

Corollary 2.

Definition 7.

Lemma 4.

Lemma 5.

Lemma 6.

Theorem 4.

IV-E Bounding $\Delta_{\epsilon,\mathcal{C}}$

Definition 8.

Lemma 7.

Lemma 8.

Corollary 3.

IV-E1 Upper Bound on $\Delta_{\epsilon,\mathcal{C}}$

IV-E2 Lower Bound on $\Delta_{\epsilon,\mathcal{C}}$

Remark 3.

Theorem 5.

Remark 4.

IV-G2 Another Upper Bound on $\Delta_{\epsilon,\mathcal{C}}$

Definition 9.

Remark 5.

Theorem 6.

Corollary 4.

Lemma 9.

Proposition 2.

Corollary 5.

Lemma 10.