Optimal Age over Erasure Channels
Elie Najm, Emre Telatar, Rajai Nasser

TL;DR
This paper investigates optimal coding strategies to minimize the average age of information over erasure channels, providing closed-form solutions for equal alphabet sizes and bounds for different sizes, advancing understanding of age optimization in communication systems.
Contribution
It introduces a novel analysis of age minimization over erasure channels, deriving closed-form solutions for equal alphabet sizes and bounds for differing sizes, using random coding arguments.
Findings
Trivial coding strategy is optimal when source and channel alphabets are equal.
Closed-form expression for average age in the equal alphabet case.
Random coding approaches approach optimal age as source alphabet size increases.
Abstract
Previous works on age of information and erasure channels have dealt with specific models and computed the average age or average peak age for certain settings. In this paper, given a source that produces a letter every seconds and an erasure channel that can be used every seconds, we ask what is the coding strategy that minimizes the time-average age of information that an observer of the channel output incurs. We first analyze the case where the source alphabet and the channel-input alphabet have the same size. We show that a trivial coding strategy is optimal and a closed form expression for the age can be derived. We then analyze the case where the alphabets have different sizes. We use a random coding argument to bound the average age and show that the average age achieved using random codes converges to the optimal average age of linear block codes as the source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAge of Information Optimization · IoT Networks and Protocols · Advanced biosensing and bioanalysis techniques
Optimal Age over Erasure Channels
Elie Najm, Emre Telatar, and Rajai Nasser This paper was presented in part at the IEEE International Symposium on Information Theory, Paris, July 2019.
Abstract
Previous works on age of information and erasure channels have dealt with specific models and computed the average age or average peak age for certain settings. In this paper, given a source that produces a letter every seconds and an erasure channel that can be used every seconds, we ask what is the coding strategy that minimizes the time-average ?age of information? that an observer of the channel output incurs. We first analyze the case where the source alphabet and the channel-input alphabet have the same size. We show that a trivial coding strategy is optimal and a closed form expression for the age can be derived. We then analyze the case where the alphabets have different sizes. We use a random coding argument to bound the average age and show that the average age achieved using random codes converges to the optimal average age of linear block codes as the source alphabet becomes large.
I Introduction
The concept of age as a performance metric in communication systems was first used in 2011 by Kaul et. al in [1, 2], in order to assess the performance of a given vehicular network. Vehicular networks are part of the growing group of real-time status-monitoring systems that are used also in healthcare, finance, transportation, smart homes, warehouse and natural environment surveillance, to name but a few. In such systems, a remote monitor is interested in the status of one or multiple processes. A sender takes samples of the observed processes and sends them to the monitor. However, the aim of the communication system in this case is not to transmit as fast as possible but to keep the information that the destination has about the observed processes as fresh as possible. Indeed, if, at any time , the last received update at the monitor was generated at time , then the information at the receiver reflects the status of the observed process at time , not at time . Hence, the monitor has a distorted version of reality. In fact, it has an obsolete version with an age of .
Kaul et al. in [3] use a graphical method to compute and minimize an age-related metric: the average age. This metric is defined as
[TABLE]
A growing body of works has used this metric to evaluate the performance of multiple communication systems represented using queuing models [4, 5, 6, 7, 8, 9, 10, 11, 12, 13]; some of them being subject to resource allocation constraints (such as energy [14, 15, 16, 17, 18]). For an excellent recent survey about age of information, see [19]. The works previously cited have mostly focused on computing the average age (AoI) given a certain status updating policy while assuming no errors. At a more physical level, the effect of noise and channel coding on the average age was also investigated, especially when the erasure channel is used [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]. Chen et al. in [28] assume a random service time but the transmitted packet has a certain probability of being lost at the end of the transmission. Parag et al. in [29] consider the binary erasure channel (BEC) and compute the average age for two transmission schemes: single transmission and hybrid automatic repeat request (HARQ). While the authors in [29] assume a just-in-time generation process, Najm et al. in [30] consider a Poisson process generation and two HARQ protocols to combat erasures: infinite incremental redundancy (IIR) and fixed redundancy (FR). Yates et al. in [31] consider the previous two schemes, IIR and FR, but assume a just-in-time generation policy. While both papers, [30] and [31], agree on the definition of IIR, they use the term FR to describe two different schemes. In [30], the update is divided into packets encoded ratelessly and each packet is encoded using an -maximum distance separable (MDS) code. In [31], FR means that each -bit update is encoded into an -bit codeword, and the update is successfully received if and only if at least bits are not erased. We are interested here in transmission schemes similar to the FR that is considered in [31].
Most of the works that addressed the presence of noise and erasure have assumed some form of feedback from the receiver to the transmitter. In this paper, we take an information-theoretic approach to the age problem and provide a characterization of the optimal achievable age when the channel used is the erasure channel and no feedback is assumed. This means that we consider the following question: Given a -ary erasure channel without feedback and with input alphabet and a source with alphabet , what is the lowest average age that can be achieved in this system?
Since the channel can introduce errors and since no feedback is available, we will be forced (at least in some cases) to use some form of coding. However, unlike classical communication systems where the primary role of coding is to guarantee reliable communication of all packets, in our system, we do not care too much if some packets are not delivered. The primary goal of our coding is that the monitor reliably receives enough timely packets so that it remains up-to-date as much as possible.
In order to study the age of information over erasure channels, we distinguish two cases:
- •
Case 1: The source alphabet and the channel-input alphabet are of the same size.
- •
Case 2: The source alphabet and the channel-input alphabet have different sizes.
For the first case we derive an exact closed-form expression for the average age and show that the optimal average age is achieved without any encoding done on the source symbols. Whereas for the second case, encoding is mandatory and we use random coding to give an upper and lower bounds on the achievable average age of the system, as well as an approximation of the lower bound inspired by [31, 32].
The rest of this paper is organized as follows: In Section II, we present the system model and some definitions which are common to all later sections. In Section III, we derive the optimal average age for Case 1 and in Section IV we study the optimal average age for Case 2.
II Preliminaries
We start by defining the communication system that we study. Fig. 1 illustrates such a system.
- •
The channel: We consider a discrete memoryless -ary erasure channel with erasure probability . We refer to such channel by EC(). The channel-input alphabet is given by , and the channel-output alphabet by . We also assume that there is no feedback from the receiver. This means that the output of the encoder depends only on the source symbols and the sender does not know whether a sent symbol was successfully received or not. In addition to that, we assume that transmitted channel-symbols are received instantaneously111If the transmitted channel-symbols are not received instantaneously but are received after a delay that is constant, then this constant delay can be added to all the age expressions that are derived in this paper.. Furthermore, there exists a period between two consecutive channel uses. More precisely, the channel-use takes place at time . Note that we assumed without loss of generality that . We define the channel-use rate to be the allowed number of channel uses per second.
- •
The source: We assume a single discrete memoryless source generating messages that belong to the set . So each symbol in this set is a message and we will use interchangeably the terms source symbol and message in this paper. We define where is the base- logarithm. Hence, in order to represent one source symbol we need channel-input symbols. This means that there exists an injective function that maps every message to a length- sequence , with for . Thus, . Similar to the channel-use case, the source symbol generation is assumed to be periodic with period . More precisely, the source-symbol is generated at time . Note that since the source and channel clocks might not be synchronized, we need to take into account the possibility that the starting time of the source is nonzero. We define the message generation rate as the fixed number of source symbols generated per second.
Notice that if the source alphabet and the channel-input alphabet have the same size, then and . In this case, we can assume without loss of generality that . This is the system that we study in Section III.
In the case where the source alphabet and channel-input alphabet have different sizes, we focus on strategies induced by linear codes, so we will assume that is a power of a prime number, , and . This is the system that we study in Section IV.
- •
The encoder and decoder: At the channel use, the encoder uses all the generated source symbols and encodes them into a single channel-input letter, i.e., the encoder is a function
[TABLE]
The decoder, at the channel use, uses all the received channel-output symbols to compute an estimate of a transmitted message, along with its index. Thus, the decoder is a function
[TABLE]
We assume that the decoder never makes mistakes. In other words, if the generated source symbols are U_{1},\ldots,U_{\big{\lfloor}\frac{iT_{c}-t_{0}^{s}}{T_{s}}\bigcap\rfloor}, the channel-output symbols are , and , then we have with probability 1. In this case, the age of information at time t\in\big{[}iT_{c},(i+1)T_{c}\big{)} is equal to
[TABLE]
It is easy to see that for a given sequence of encoders , the optimal decoder (from age perspective) is the one defined as , where is the maximum index in such that can be deterministically decoded from , and . If no such exists, we have and we adopt the convention that for such cases. By noticing that for every we have t\in\big{[}iT_{c},(i+1)T_{c}\big{)} where , we can see from (4) that the instantaneous age of information of the coding scheme is given by
[TABLE]
It is worth noting that the function is nondecreasing and piecewise constant. Furthermore, the discontinuities in the function correspond to instants at which the receiver successfully decodes new packets. From this and from (5), we can see that the instantaneous age is a piecewise linearly-increasing function that has a sawtooth shape.
In Fig. 2, we show an example illustrating how the the instantaneous age varies with time. In this figure, represents the instant at which the successfully received message was decoded at the receiver, and represents the generation time of this message at the source. More precisely, where is the index at which changes its value for the time, and .
In the previous section, we indicated that we are interested in bounding the optimal achievable average age. Here, we formally define the concepts of achievable age and optimal achievable age.
Definition 1**.**
We call to be a coding scheme where is the sequence of encoders and is the sequence of decoders. The average age corresponding to such scheme is denoted by
[TABLE]
*where is the instantaneous age that is obtained by using the coding scheme , and which is given by (4). If the decoders are optimal for the encoders then is given by (5).
Such a definition can be generalized to channels other than erasure channels. However, for the special case of the erasure channel with erasure probability , the average age relative to the coding scheme will be denoted by*
[TABLE]
where is the instantaneous age that is obtained by using the coding scheme when the channel is EC().
Definition 2**.**
*We say that an age is achievable for EC(), if for every there exists a coding scheme such that *
[TABLE]
and the probability of error on the decoded messages is zero.
Definition 3**.**
Given a channel EC(), we define the optimal average age to be the minimum achievable average age. Formally,
[TABLE]
*where is the set of all possible coding schemes.
The set forms the set of achievable average ages over all erasure channels.*
III Optimal Age with the Same Source & Channel Alphabets
In this first case, we take which means that the source and channel-input alphabets are the same. We first show that to achieve the optimal age, no encoding is required and we provide the optimal transmission policy. We then compute the optimal average age.
III-A The Optimal Transmission Policy
Theorem 1**.**
For a channel EC(), if the source alphabet and the channel-input alphabet are the same, then the optimal transmission policy from an age perspective is to keep transmitting the last-generated source-symbol until a new one is generated, at which point we start transmitting the newly generated source-symbol and discard all previous messages. This is an LCFS with no buffer policy.
Proof.
Let us assume that an oracle provides us with the erasure pattern. It is clear that at each non-erased channel use we should send the latest update so that the drop in the instantaneous age is the most important. Indeed, if there is a non-erased channel use at time and the latest update is generated at then the instantaneous age, , that corresponds to the LCFS with no buffer policy drops to . We cannot do better than this because there is no source-symbol that is generated after . This argument shows that the optimal transmission policy would send the latest generated source symbol at every non-erased channel use while it can transmit anything at the erased channel uses. However, since in practice the transmitter does not have access to the erasure pattern beforehand, the policy that consists of keeping on transmitting the last generated update until a new one is created satisfies the optimality criterion that is to send the latest generated message at each non-erased channel use.
For the case where or , the LCFS with no buffer policy leads to the transmission of all source symbols at least once. Whereas for the case of or , some messages will be dropped and will never be sent. ∎
III-B The Optimal Average Age
Theorem 2**.**
Given a source with message-generation rate and starting time , an erasure channel EC() with channel-use rate , and utilization , the optimal average age achieved over EC() is:
- •
For irrational utilization ,
[TABLE]
- •
For rational utilization ,
[TABLE]
where is the fractional part of , and is the denominator of when it is written as a rational fraction of integers in irreducible form, i.e., with and .
Before giving the proof of 2, we need the following lemmas.
Lemma 1**.**
Let be a sequence of independent and identically distributed nondeterministic222A random variable is nondeterministic if there are at least two different integers with nonzero probability. It is worth noting that Lemma 1 remains true if are deterministic, but in this paper we are only interested in the case where are nondeterministic. random variables which take values in the set of strictly positive natural numbers and which satisfy . Let and for . For every , let
[TABLE]
Let be an irrational number, and let be an arbitrary real number. Then, almost surely, we have
[TABLE]
where is the fractional part of .
Proof.
This lemma is a consequence of Weyl’s equidistribution theorem [33] (see Section -A). A full proof of this lemma can be found in Section -C. ∎
Lemma 2**.**
Let be such that and . Then, for every , we have
[TABLE]
where is the fractional part of .
Proof.
Since , then for every , the mapping is a bijection from to itself. Therefore,
[TABLE]
where follows from the fact that the mapping is a bijection from to itself, and follows from the fact that for every . ∎
Proof of 2.
We know that . In (1) we saw that the average age is given by
[TABLE]
We can rewrite the average age as
[TABLE]
Therefore,
[TABLE]
Let
[TABLE]
By noticing that
[TABLE]
we can deduce from (18) that
[TABLE]
which implies that
[TABLE]
At any instant t\in\big{[}(i-1)T_{c},iT_{c}\big{)}, the last channel use took place at time . Assume that the last successful channel use before time was the channel use, i.e., it took place at time . The source symbol that was transmitted at this time was generated at time . Therefore, at any instant , the timestamp of the last successfully received source symbol is
[TABLE]
which means that the age of information at time is equal to
[TABLE]
Hence,
[TABLE]
where is the fractional part of .
Setting , then for we can write as
[TABLE]
So forms a Markov process represented by the Markov chain in Fig. 3.
This Markov process is ergodic and has a stationary distribution which is identical to a geometric random variable . This means that
[TABLE]
and almost surely, we have
[TABLE]
Replacing (III-B) in (22), we get
[TABLE]
where the third and fourth equalities follow from (28).
At this point, we need to distinguish between two cases:
- •
is irrational. In this case, we need to rewrite . Let , and for let be the index of the channel-use which was not erased. Define . Clearly, are independent and identically distributed as geometric random variables, i.e., for . It is easy to see that , where
[TABLE]
Then, by Lemma 1,
[TABLE]
Using this result in (III-B) we get (10).
- •
is rational with , and . Since
[TABLE]
it follows that
[TABLE]
where follows from the fact that is ergodic. Continuing,
[TABLE]
where follows from Lemma 2. Using this result in (III-B) we get (11).
∎
Remark 1**.**
One interesting application of 2 is the computation of the average age of information for the D/M/1 system with preemption: If we have a D/M/1 queue with deterministic interarrival time of rate and exponential service time of rate , then we can model the random exponentially distributed service time as being the result of having an erasure channel that can be used every and where the erasure probability is . In this case, we get from (10) that the average age of information of a D/M/1 system with preemption is equal to
[TABLE]
which is consistent with the formula that was derived in [13] for D/M/1 systems with preemption.
IV Optimal Age with Different Source & Channel alphabets
In this setup, we consider the model described in Section II: The channel is a -ary erasure channel EC() without feedback and with input alphabet . The source alphabet is where . We only consider the special case where , so that at every channel use, a new source symbol is generated. By combining the techniques of Section III and this section, one might be able to obtain reasonably good lower and upper bounds on the optimal age for the more general case where can be different from , but we expect the calculation to be more complicated.
Since we only consider the case where , we can assume without loss of generality that . The channel use takes place at time , and the source-symbol is generated at time . The difference between the source alphabet and the channel-input alphabet as well as the presence of erasures impose the use of channel coding on the generated source symbols before their transmission. We will focus on coding schemes that are induced by linear block codes, so we will assume that , where is a power of a prime number. We fix a blocklength , and each transmitted message will be encoded into a block of channel-input symbols in , and then transmitted through consecutive channel uses. More precisely, the transmitted message is encoded using an linear block encoder to produce channel-input symbols in , which will be transmitted through the , the , …, and the channel uses. In order to transmit messages that are as fresh as possible, the transmitted message is the last message that was generated before time , i.e., the transmitted message is the generated source symbol, where . All the source-symbols that are generated between and are discarded. Fig. 4 illustrates this concept. We emphasize the fact that the -linear codes used to encode different messages can be different. We denote a coding scheme that is induced by a given sequence of -linear codes as .
IV-A The Optimal Transmission Policy
Definition 4**.**
An -linear code is called maximum distance separable (MDS) if it achieves the Singleton bound:
[TABLE]
with denoting the minimum distance333See [34] for more details. between the codewords of the code.
Proposition 1**.**
If the encoder generates an MDS -linear code, then
- •
any columns of the generator matrix of the encoder are linearly independent,
- •
any subset of size taken from a length-* codeword is sufficient to recover, with probability 1, the transmitted message.*
This means that if the channel is EC(), the decoder needs to observe only unerased channel-input symbols in order to perfectly decode the transmitted source symbol.
Proposition 1 is well known. We refer the reader to [34] for more details. The following theorem presents the optimal channel codes from an age point of view when the channel does not have any feedback.
Theorem 3**.**
Consider a EC() channel without feedback and let be such that MDS -linear codes exist. Among all -linear codes, MDS codes are age optimal. This means that, to achieve age optimality, all codes used in the scheme should be MDS.
Proof.
Fix two positive integers and , and let and be two coding schemes such that is induced by arbitrary -linear encoders and is induced by -linear encoders which are MDS.
Now consider a source with alphabet generating messages and sending them through two parallel EC() channels but with the same erasure pattern . For the first channel we use the coding scheme , while for the second channel we use the coding scheme . We will show that for every , we have . We assume that the initial ages before transmission are equal, i.e., for .
For every integer and every , we have and . This is because the receiver does not receive any information during the interval . Therefore, it is sufficient to show that for every integer .
Let and assume that for every . As described in the first paragraph of this section, for every , the channel uses between time and time are used to transmit the message that was generated at time , where . Let be the respective outputs of the , the , …, and the channel uses when the code is used. Similarly, let be the respective outputs of the , the , …, and the channel uses when the code is used. For every , we have
[TABLE]
where is the minimum such that can be uniquely decoded from . If cannot be uniquely decoded from , we define . Similarly, for every , we have
[TABLE]
where is the minimum such that can be uniquely decoded from . If cannot be uniquely decoded from , we define .
Now observe that if can be uniquely decoded from , then contain at least non-erased symbols. Therefore, contain at least non-erased symbols and can be uniquely decoded from because uses MDS codes. Therefore, . We have:
- •
If , we have and . From the induction hypothesis we know that . Therefore, for every .
- •
If , we have and . Since the last decoded message by before has a timestamp that is earlier than , we have . Therefore, for every .
- •
For every , we have .
This implies that for every . It follows by induction that for every integer .
∎
3 shows that for a given couple , the optimal coding scheme is the one that uses only MDS codes. However, an explicit construction of such codes is not available for all values of . In the rest of this paper, we use random codes to give an upper bound on the optimal average age. The use of random coding to construct fountain-like codes was used by Shamai et al. in [35]. The authors of [35] showed that without any randomness we cannot properly define the notion of fountain capacity because there is always a case where the deterministic fountain codes cannot achieve any positive rate with an error probability tending to [math]. Nevertheless, we use the rateless (or fountain) codes, previously adopted in [30], to give a lower bound on the optimal achievable average-age . As shown in [35], these codes cannot be implemented in practice, this is why we do not consider them as part of the possible coding schemes.
IV-B The Random Code
Consider a coding scheme. The encoder-decoder pair , corresponding to the message to be transmitted, is constructed as follows: Since we are interested in linear codes, we use the generator matrix in order to create our code. For that, we choose the columns of the generator matrix independently and uniformly at random from the set , where is the sequence of zeros. We denote by the columns444In this paper, we assume all vectors to be column vectors. of . Thus
[TABLE]
Once this matrix is generated, it is shared between the encoder and the decoder. For each new message to be transmitted, we generate a new generator matrix. However, the encoder and decoder work in a similar fashion for all messages:
- •
Let be the message to be sent. Then, at the channel use, we transmit the coded symbol , where is the element of . For each message , we send coded symbols. Hence, the encoder is given by , with .
- •
The decoder decodes on the fly. Whenever it receives linearly independent non-erased coded symbols, it decodes the message. Otherwise, it declares the packet to be erased.
We emphasize the fact that the matrices are generated in a i.i.d. fashion, which means that the linear codes corresponding to different messages can (and are likely to) be different.
IV-C Average Age of Random Codes
Fix the couple and let be a random coding scheme generated as described in Section IV-B. We define to be the expected average age of the coding scheme induced by a random linear -scheme generated as above.
Definition 5**.**
For every , and every , define
[TABLE]
where the expectation in (37) is taken over the random coding scheme , and over the randomness of the erasure patterns of the EC() channels.
Due to the ergodicity of the system, almost surely (over the randomly generated and over the random erasure patterns), we have
[TABLE]
We will formally prove (38) in Lemma 5.
The contribution of the random coding argument in this context is the following: If we show that, for a given , we have , then there must exist a linear -scheme such that almost surely (over the random erasure patterns), . In fact, as we mentioned above, for almost all -schemes and almost all erasure patterns, we have . Thus, the optimal average age , and the optimal average age among linear block codes , satisfy
[TABLE]
Therefore,
[TABLE]
Equation (40) gives an upper bound on the optimal average age. In the rest of this paper we will focus on characterizing this bound.
IV-D Exact Upper Bound on the Optimal Average Age
IV-D1 Preliminaries
Let be a randomly generated -scheme. Fig. 5 illustrates the variation of the instantaneous age when and . Without loss of generality, we assume that we begin observing right after the reception of a successful packet. We denote by the generation time of the successful packet and by the end of transmission time of this packet. Assume that the successful message is the transmitted message. We have:
- •
, where .
- •
.
Therefore, the instantaneous age at the end of transmission of the successful package is
[TABLE]
In the scenario depicted in Fig. 5, we assume that . The first packet is generated and encoded into a codeword of length at time . At that same instant, , the first symbol of , is sent and received at the monitor. Since it is the first symbol, is linearly independent.555By “ is linearly independent”, we just mean that the corresponding column of the generator matrix forms a linearly independent family of vectors. This is true simply because . At time , the coded symbol is erased but the coded symbol , which is linearly independent from666Here, we just mean that in the particular example that is illustrated in Fig. 5, the random matrix was such that is linearly independent from . , is received at time . The fourth coded symbol is also erased and the last coded symbol is received. However, as Fig. 5 shows, the received symbol is linearly dependent on the previously received symbols, namely and , i.e., is linearly dependent on . The first packet is declared erased by the decoder because it did not receive linearly independent symbols, and increases linearly in the interval . The packet generated at is a successful update since the monitor receives linearly independent symbols at times , and . Therefore, drops to at time . Note that, for a given successful packet, once linearly independent coded symbols are received, any additional coded symbol must be linearly dependent on them.
In this section we use the following notation:
- •
is the interdeparture time between the and successfully received updates.
- •
is the number of channel uses between the decoding instant of the successful packet and its generation time .
- •
is the number of successfully received updates in the interval .
- •
Let be the transmitted packet (not necessarily successful). Imagine that we generate infinitely many vectors independently and uniformly in . Imagine also that we transmit the coded symbol over a EC() channel to a virtual monitor for every . In reality, we only transmit to the real monitor. In other words, the first symbols are really transmitted and the rest are virtually transmitted. Let be the number of channel uses (or sent coded symbols) in order for the virtual monitor to receive exactly linearly independent equations (coded symbols). The packet is correctly decoded at the real monitor if and only if .
Since the channel is memoryless and the different codes used in the scheme are generated independently and in the same fashion, then the process is i.i.d with a distribution identical to the random variable that we describe in the following subsection.
IV-D2 The Distribution of
Fig. 6 shows the Markov chain that represents the dimension, at the (virtual) receiver, of the codeword relative to a certain update.
The monitor receives the first coded symbol of a new codeword with probability and hence the dimension of this codeword at the receiver jumps to . If the first coded symbol is erased then the dimension of the codeword remains at [math]. If the monitor has already received linearly independent coded symbols, then it will receive the linearly independent coded symbol if:
- (i)
the next transmitted coded symbol is not erased, and,
- (ii)
the next transmitted coded symbol is linearly independent of all previously received symbols.
Event occurs with probability . For event , notice that the symbols that are linearly dependent with the received symbols form a subspace of dimension777Recall that linearly independent coded symbols have been received. , hence there are such symbols. Therefore, the number of nonzero symbols that are linearly dependent with the received symbols is . Now since coded symbols are generated uniformly at random from the set of nonzero symbols, we can see that event happens with probability . Hence, for a given message, the dimension of its codeword at the receiver jumps from to with probability
[TABLE]
where . If the next transmitted coded symbol is erased or linearly dependent on the previously received coded symbols, then the dimension of the codeword at the monitor remains at . As previously discussed, once the monitor receives linearly independent coded symbols, the dimension of the codeword remains at and all subsequent coded symbols are linearly dependent on the previously non-erased coded symbols.
From the above description, we can deduce that is the number of steps before reaching state for the first time.
Remark 2**.**
Since , then is a decreasing function of . This means that whenever the decoder receives a non-erased coded symbol that is linearly independent from all previously received coded symbols, and the system jumps to state , then it becomes harder to receive a new linearly independent coded symbol. This is why, on average, the system spends more time in state than in previous states.
Definition 6**.**
Let be the number of trials needed to pass from state to state in Fig. 6, where . It is easy to see that has a geometric distribution with success probability . Thus,
[TABLE]
Corollary 1**.**
From Definition 6, we can write
[TABLE]
where are independent.
Lemma 3**.**
The moment generating function of the random variable is
[TABLE]
Proof.
[TABLE]
where the second equality follows from the fact that are mutually independent. Replacing by its expression , we obtain (45). ∎
Corollary 2**.**
The expected value of is
[TABLE]
Proof.
Using (44), we get
[TABLE]
We can also get (47) by using (45) and the fact that . ∎
IV-D3 Packet Erasure Probability
The packet is correctly received if . Otherwise, we declare the packet to be lost. Therefore, the packet erasure probability is equal to
[TABLE]
where the distribution of is given by Lemma 3. We call to be the packet success probability.
IV-D4 The Age Analysis
Definition 7**.**
In every interdeparture interval , we call the number of erased packets before the reception of a successful update. is geometric with success probability , so
[TABLE]
We use Definition 7 to characterize the interdeparture interval. Indeed, any interdeparture interval is the sum of two components: The time sending unsuccessful packets followed by the service time of the successful update. Since each transmitted packet takes channel uses and , then the interdeparture time can be written as
[TABLE]
Given that we assume a memoryless erasure channel and independently generated packets, then are independent and identically distributed. Since the interdeparture interval is a function of , then are also independent and identically distributed. Hence the following lemma:
Lemma 4**.**
The process is a renewal process with the interdeparture times being the renewal intervals.
The importance of Lemma 4 stems from the fact that it shows that exists and the system is ergodic.
Lemma 5**.**
Almost surely (over the random choice of the -scheme , and over the random erasure patterns of the EC() channels), we have
[TABLE]
*where is a generic random variable that has the same distribution as which is represented by the shaded areas in Fig. 5, and is a generic random variable that has the same distribution as the interdeparture interval . *
Proof.
By Lemma 4, forms a renewal process and hence by [36] we know that . By defining to be the reward function over the renewal period , we get (using renewal reward theory [37, 36]) that almost surely
[TABLE]
∎
Before computing the average age, we still need one more lemma that gives the distribution of the random variables .
Lemma 6**.**
Let be a generic random variable that has the same distribution as the number of channel uses between the decoding instant of the successful packet and its generation time . Then,
[TABLE]
where is the indicator function.
Proof.
A packet is successfully decoded if the decoder receives exactly linearly independent coded symbols after at most channel uses. Thus, for the successful packet we have that
[TABLE]
∎
We are now ready to give the main theorem of this section.
Theorem 4**.**
Assume a EC() and an -coding scheme as defined in Section IV-B. Almost surely, the average age corresponding to such setup is given by
[TABLE]
where is the packet erasure probability given by (49).
Proof.
From (52), we know that we need to compute and . We start with . We have shown that for every , . Thus,
[TABLE]
where the third equality is due to the fact that has a geometric distribution with success probability as seen in Definition 7.
Now we turn to . For every , the shaded area shown in Fig. 5 is the sum of the areas of two trapezoids: a large trapezoid with height and a smaller one with height . Recall from (41) that the instantaneous age at the end of transmission of the successful package is . Thus,
[TABLE]
Note that and are independent. Therefore,
[TABLE]
Replacing and in (52) by their expressions in (57) and (59), we obtain (56). ∎
In the expression of in (56), and cannot be easily expressed in terms of , and . This is why we study in the next two subsections by presenting upper and lower bounds on the expression in (56).
IV-E Bounding
As we mentioned in the previous paragraph, the expression of is not easy to calculate. This is mainly because the distribution of the random variable is complicated. In this section, we provide upper and lower bounds on which are computed using random variables that have simpler distributions compared to .
Definition 8**.**
We define to be the sum of i.i.d random variables distributed like . We also define to be the sum of i.i.d random variables distributed like . Formally,
[TABLE]
where is geometrically distributed with success probability and is also geometrically distributed with success probability .
Lemma 7**.**
The random variables and defined in Definition 8 are both negative binomials with
[TABLE]
and
[TABLE]
where
Proof.
is the sum of i.i.d geometric random variables with success probability . Similarly, is the sum of i.i.d geometric random variables with success probability . ∎
We will show that the random variables and can be coupled with the random variable in such a way that with probability 1.
Lemma 8**.**
Let , and let and be as in Definition 8. It is possible to couple , and in such a way that with probability 1. More precisely, we can define three random variables , and on the same probability space such that:
- •
, and have the same distributions as , and , respectively, i.e., for every , we have , and .
- •
* with probability 1.*
Proof.
The proof can be found in Section -D. ∎
Corollary 3**.**
Given and and as defined in Definition 8, the following relations hold for :
, 2. 2.
, 3. 3.
. 4. 4.
,
Proof.
Let and be as in Lemma 8. Since with probability 1, we deduce that the event is a subset of the event . Hence,
[TABLE]
This inequality also implies that . Furthermore, since with probability 1, we have
[TABLE]
On the other hand, since with probability 1, we deduce that the event is a subset of the event . Hence,
[TABLE]
This inequality also implies that . Furthermore, since with probability 1, we have
[TABLE]
∎
Corollary 3 can be interpreted as follows: can be seen as the number of channel uses in order to receive exactly linearly independent coded symbols when any coded symbols are linearly independent. This means that corresponds to the number of channel uses needed to decode a packet when the encoders of the -scheme only use MDS codes. Hence, is equivalent to the number of channel uses needed to receive exactly non-erased coded symbols. Intuitively, we would expect to need a number of channel uses to receive non-erased coded symbols which is smaller than the number needed to receive linearly independent coded symbols. This explains the intuition behind items and in Corollary 3. On the opposite side of the spectrum, can be seen as a worst case scenario since the jump from state to state in Fig. 6 occurs with the smallest possible probability, namely . This discussion leads us to the idea that could be upper bounded by the average age corresponding to a coding system with as the number of channel uses needed to receive exactly linearly independent coded symbols. Similarly, could be lower bounded by the average age achieved using only MDS codes with as the number of channel uses needed to receive linearly independent coded symbols.
By applying Lemma 8, we can define a sequence of independent and identically distributed triplets such that for every , we have:
- •
with probability 1.
- •
and are distributed as and , respectively.
We will use to describe the age of information of the system as we explained at the beginning of Section IV-D. More precisely, for , we have
[TABLE]
where is the number of the packet that is being transmitted at time . Note that (67) can be shown exactly as (34).
We now define two virtual ages, that we denote as and . These virtual ages are initially equal to the real age , but instead of using , the evolution of and will be governed by and , respectively. More precisely,
[TABLE]
and
[TABLE]
Similarly to the proof of 3, since for every , we can show by induction on that for every . Therefore,
[TABLE]
where
[TABLE]
and
[TABLE]
IV-E1 Upper Bound on
From (70) we know that .
Since was defined in a similar way as but using instead of , will satisfy a similar equation as (56) but the terms will be defined using instead of . More precisely, by using the same techniques that were used to prove 4, we can show that almost surely, we have
[TABLE]
where the distribution of is given by
[TABLE]
and
[TABLE]
From Lemma 7, we know that is a negative binomial random variable. Hence,
[TABLE]
Let , where are i.i.d with a marginal distribution identical to . Hence is also a negative binomial and
[TABLE]
We use the same trick as in [31] and set in (IV-E1). This leads to
[TABLE]
where
.
Using this result, together with (73) and (75), we get
[TABLE]
where the second equality is obtained by using
[TABLE]
We denote by the upper bound we just found. Thus,
[TABLE]
IV-E2 Lower Bound on
Let , where are i.i.d with a marginal distribution identical to . Hence is also a negative binomial and
[TABLE]
From (70), we know that . Using an argument identical to that used for the computation of the upper bound we show that , where
[TABLE]
Remark 3**.**
The lower bound found here is similar to the average age derived in [31] for the finite redundancy (FR) case. However, the time scale is different since Yates et al. in [31] assume that the source generates a new update at the same instant it finishes transmitting the previous one. Whereas in our case, when , we assume we generate and begin transmitting a new packet seconds after the last update finishes transmission.
IV-F Age-Optimal Codes
We have already discussed that the lower bound on , , corresponds to the average age when the -scheme uses only MDS codes with as the number of channel uses needed to receive linearly independent coded symbols. Recall from 3 that, for a given couple , using an MDS code is optimal. This observation gives a different explanation on why the expression found in (83) is indeed a lower bound on the average age corresponding to a scheme using any other type of codes than MDS, in particular a code generated randomly. This means that the lower bound is universal over all codes and the optimal achievable age
[TABLE]
where is a random -scheme, and is the optimal average age over coding schemes that are induced by linear block codes. However, for a given , an explicit construction of an MDS code is not always available. In this section, we show that if the channel-input alphabet is large enough, then random codes are (almost) age-optimal among linear block codes.
Theorem 5**.**
Fix a couple . We have that , such that , a random -coding scheme almost surely satisfies
[TABLE]
This means that for a channel-input alphabet large enough ( large), random codes are (almost) age-optimal among linear block codes and
[TABLE]
where is a random -coding scheme, and the dot above the equal sign refers to the fact that the difference between the two sides approaches zero as gets large.
Proof.
For a given random code , recall that
[TABLE]
From (54) and (49), we notice that and both depend only on the distribution of . However, for every ,
[TABLE]
This means that, for every , converges in distribution to as . Therefore, converges in distribution to , as . Hence, as , converges to . So, for large enough, we can write
[TABLE]
From (40), we know that the optimal age among linear block codes, for a given , is . For large enough , we have . This means that asymptotically, . However, from (84), we have that for every . Therefore, asymptotically
[TABLE]
∎
Notice that for very large , it is extremely unlikely that a (randomly generated) coded symbol is linearly dependent with any subset of size of the remaining coded symbols. This means that as becomes large, the behavior of random codes approaches that of MDS codes. This is essentially the main reason why 5 is true
IV-G Other Bounds and Approximations
IV-G1 Upper Bounding the Lower Bound
In Remark 3, we discussed how the lower bound found in (83) is similar, up to a time scale difference, to the average age computed by Yates et al. in [31, Section 3]. In this paper, the authors present a tight upper bound on the computed average age. We borrow the same techniques as in [31, Section 3.A] to upper bound . Interestingly, simulations will show that the upper bound to is a tight approximation to , the average age achieved when using a random -scheme .
Recall that
[TABLE]
Denote by . From [31, Lemma 1], we know that . Hence,
[TABLE]
We denote by this approximation. Thus,
[TABLE]
Remark 4**.**
We can apply the techniques discussed in [31, Section 3.A] in order to approximate the optimal codeword length for and write solely in function of , , and the size of the channel-input alphabet.
IV-G2 Another Upper Bound on
We derive here a second upper bound on which is easier to compute than
. First recall from 4 that
[TABLE]
However,
[TABLE]
Hence,
[TABLE]
Whereas (given in (47)) is easy to compute,
[TABLE]
is hard to compute due to the complex nature of the distribution of (given in Lemma 3). To solve this problem, we use as defined in Definition 8 to upper bound . Indeed, from Corollary 3 we know that
[TABLE]
Hence,
[TABLE]
Therefore, using (47), the new upper bound is
[TABLE]
IV-H Numerical Results
Fig. 7a and Fig. 8a correspond to a system with , , , and using a random -coding scheme . Fig. 7a plots as well as the bounds and the approximation derived in Sections IV-E and IV-G with respect to the blocklength , for four erasure channels with erasure probabilities . The tightness of the bounds with respect to differs according to the erasure probability:
- •
For all error probabilities, we notice that the upper bound (the orange curve) is very tight (almost equal to ) at large enough . However, the value of the blocklength starting which becomes tight depends on : The larger the erasure probability, the larger the blocklength . For instance, for we have . But for , and for we have . For , the upper bound is tighter than all other bounds. Notice that for every and every , .
- •
For the approximation , we notice that it becomes tighter as the erasure probability becomes larger. This is true especially at low values of , more particularly for . For this range of blocklength values the approximation is the extremely close to .
- •
For every and every , we have and . We notice that, for all values of , is close to at large . Whereas, for small values of , this lower bound does not show any noticeable behavioral modification as increases.
- •
The upper bound is always larger than . Even though at we observe that , for the upper bound is closer to than . In fact, as increases, the gap between the two upper bounds also increases.
Fig. 7a also suggests that there exists, for each erasure probability, an optimal blocklength that minimizes . This echoes the observations presented in [30] and in [31]. Moreover, each bound also has its optimal blocklength. Although the channel-input alphabet chosen is small ( and ), we remark that the gap between and the lower bound is not too great irrespective of the value of . This means that even for small channel-input alphabets, the performance of the optimal linear code is not too far from the performance achieved by random coding. This idea is illustrated in Fig. 8a. In this last figure, we find and plot, at each value of , the minimum (with respect to ) of and . We observe that these two minimums are close to each other. Since , then Fig. 8a suggests that, for every , if we use the optimal blocklength, then random codes achieve an age-performance close to the optimal linear code.
Fig. 7b and Fig. 8b mirror Fig. 7a and Fig. 8a respectively, but for a larger channel-input alphabet with . We can apply the same analysis as the one we just presented for the case . In this case we can notice the effect of increasing the size of the channel-input alphabet, while keeping constant. In fact, comparing Fig. 7a and Fig. 7b, we observe a clear convergence of toward the lower bound . In Fig. 7b, the approximation is not as tight as for the case of , for all and . Indeed, we can notice that, for , is worse than for . For large , all bounds are tight except for the upper bound . In fact, in Fig. 7b, the lower bound is the tightest bound on . However, the convergence of toward the lower bound is best observed in Fig. 8b. In this figure, we remark that the performance of the random code with the optimal blocklength is almost optimal. These simulations support our claim that random codes are age-optimal as grows and the channel-input alphabet becomes large.
V Conclusion
In this paper, we have studied the optimal achievable average age over an erasure channel in two scenarios: When the source alphabet and channel-input alphabet are be the same, and when they are different. We have demonstrated that in the first case, we do not need any type of channel coding to achieve the minimal average age, for which we have computed the exact expression. As for the second case, we have used random coding technique to compute bounds on the optimal achievable age. We have also shown that for a large enough source alphabet, random codes are (almost) age-optimal among linear block codes. Finally, the numerical results have pointed out an interesting observation: Even for a small source alphabet, the performance of random codes is not too far from optimal from an age point of view.
Acknowledgements
We would like to thank Roy Yates and an anonymous reviewer for helpful comments. This research was supported in part by grant No. 200021_166106/1 of the Swiss National Science Foundation.
-A Equidistribution and Weyl’s Equidistribution Theorem
In this section888The material in this section is based on [33, 38]., for every real number , we use to denote its fractional part, i.e., .
Definition 9**.**
A sequence is said to be equidistributed on if for every interval we have
[TABLE]
where denotes the cardinality of the set .
Remark 5**.**
In Definition 9, we can replace with , or in (101) and the limit still holds.
Theorem 6**.**
Let be a sequence of real numbers and denote by the fractional part of . Then the following are equivalent:
The sequence is equidistributed on . 2. 2.
For every ,
[TABLE]
where . 3. 3.
For every Riemann-integrable function , we have
[TABLE]
The proof of 6 is outside the scope of this paper but we encourage the reader to check [38] for the full proof. An important application of this theorem is given next.
Corollary 4**.**
If is a sequence that is equidistributed over , then we have
[TABLE]
Proof.
From the third condition of 6 we have
[TABLE]
∎
-B A variation of the strong law of large numbers
In this section, we prove a well known variation of the strong law of large numbers.
Lemma 9**.**
If is a sequence of complex-valued random variables satisfying , then almost surely, we have .
Proof.
Observe that . Therefore, which can be rewritten as . This implies that . We conclude that almost surely, we have . ∎
Proposition 2**.**
Let be a sequence of complex-valued random variables. If there exists such that
[TABLE]
then almost surely
[TABLE]
Proof.
Let and for every , let
[TABLE]
For every , we have
[TABLE]
In particular, for every , we have
[TABLE]
Therefore,
[TABLE]
Lemma 9 now implies that
[TABLE]
Now, for every , define
[TABLE]
We have
[TABLE]
where follows from (109). Thus,
[TABLE]
It follows from Lemma 9 that
[TABLE]
Now observe that for every , we have
[TABLE]
Equations (112) and (116) now imply that
[TABLE]
∎
Corollary 5**.**
Let be a sequence of complex-valued random variables. If there exists and such that for every we have
[TABLE]
then almost surely
[TABLE]
Proof.
This is a direct corollary of Proposition 2. ∎
-C Proof of Lemma 1
Let be a sequence of independent and identically distributed random variables which take values in the set of strictly positive natural numbers and which satisfy . Let and for . For every , let
[TABLE]
and
[TABLE]
Clearly, we have . Furthermore, since for every , we have with probability 1.
Let be an irrational number, and let be an arbitrary real number. From Corollary 4 and the second criterion of 6, we know that in order to show that
[TABLE]
it is sufficient to show that
[TABLE]
Now fix . For every , we have
[TABLE]
We would like to show that almost surely . First, observe that
[TABLE]
It follows from Lemma 9 that almost surely .
It is easy to see that as , we have and . Now since with probability 1, we have
[TABLE]
Furthermore, since , and since we have just showed that , it follows that
[TABLE]
Now observe that
[TABLE]
which implies that
[TABLE]
From (125), (128) and (130), we conclude that it is sufficient to show that
[TABLE]
Notice that
[TABLE]
Now since as , the strong law of large numbers implies that
[TABLE]
Therefore, it is sufficient to show that
[TABLE]
where
[TABLE]
For every , we have
[TABLE]
Furthermore, for every , we have
[TABLE]
Thus,
[TABLE]
Now since is nondeterministic and takes values in , there are two different integers such that and . We have
[TABLE]
which implies that
[TABLE]
Now since is irrational and is a nonzero integer, we have , which means that
[TABLE]
is a convex combination between 1 and . This implies that
[TABLE]
By combining this with (140), we get
[TABLE]
Now (143), (138) and Corollary 5 imply that
[TABLE]
-D Proof of Lemma 8
We need the following lemma:
Lemma 10**.**
Let . We can define three random variables and taking values in the set of natural numbers such that:
- •
* is geometrically distributed with success probability , i.e., .*
- •
* is independent of .*
- •
.
- •
* is geometrically distributed with success probability , i.e., .*
Proof.
Let and be two independent random variables such that
[TABLE]
and
[TABLE]
The distribution of is given by:
[TABLE]
∎
Now we are ready to prove Lemma 8
Proof of Lemma 8.
First notice that the probabilities are decreasing in , where . This means that
[TABLE]
It follows from Lemma 10 that for each , we can define five random variables: and , such that:
- •
is geometrically distributed with success probability , i.e., is distributed as and so .
- •
is independent of .
- •
.
- •
is geometrically distributed with success probability , i.e., is distributed as and so .
- •
is independent of .
- •
.
- •
is geometrically distributed with success probability , i.e., is distributed as and so .
Assume that is independent of if . Now define
[TABLE]
[TABLE]
and
[TABLE]
Clearly, the distribution of and is the same as that of and , respectively. Furthermore, we have with probability 1. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. K. Kaul, R. D. Yates, and M. Gruteser, “On piggybacking in vehicular networks,” in IEEE Global Telecommunications Conference, GLOBECOM 2011 , Dec. 2011.
- 2[2] S. Kaul, M. Gruteser, V. Rai, and J. Kenney, “Minimizing age of information in vehicular networks,” in IEEE Conference on Sensor, Mesh and Ad Hoc Communications and Networks (SECON) , Salt Lake City, Utah, USA, 2011.
- 3[3] S. Kaul, R. D. Yates, and M. Gruteser, “Real-time status: How often should one update?” in Proc. INFOCOM , 2012.
- 4[4] R. D. Yates and S. Kaul, “Real-time status updating: Multiple sources,” in Proc. IEEE Int’l. Symp. Info. Theory , Jul. 2012.
- 5[5] C. Kam, S. Kompella, and A. Ephremides, “Age of information under random updates,” in Proc. IEEE Int’l. Symp. Info. Theory , 2013, pp. 66–70.
- 6[6] ——, “Effect of message transmission diversity on status age,” in Proc. IEEE Int’l. Symp. Info. Theory , June 2014, pp. 2411–2415.
- 7[7] M. Costa, M. Codreanu, and A. Ephremides, “Age of information with packet management,” in Proc. IEEE Int’l. Symp. Info. Theory , June 2014, pp. 1583–1587.
- 8[8] Y. Sun, E. Uysal-Biyikoglu, R. D. Yates, C. E. Koksal, and N. B. Shroff, “Update or wait: How to keep your data fresh,” in IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications , April 2016, pp. 1–9.
