LP Formulations of Discrete Time Long-Run Average Optimal Control Problems: The Non-Ergodic Case
Vivek S. Borkar, Vladimir Gaitsgory, Ilya Shvartsman

TL;DR
This paper develops an LP framework for deterministic discrete-time long-run average optimal control problems, especially addressing cases where the optimal value depends on initial conditions, expanding the theoretical understanding of such problems.
Contribution
It introduces a novel LP formulation and duality approach for non-ergodic long-run average control problems with initial condition dependence.
Findings
LP formulation characterizes the optimal value
Dual problem provides optimality conditions
Addresses non-ergodic cases with initial dependence
Abstract
We formulate and study the infinite dimensional linear programming (LP) problem associated with the deterministic discrete time long-run average criterion optimal control problem. Along with its dual, this LP problem allows one to characterize the optimal value of the optimal control problem. The novelty of our approach is that we focus on the general case wherein the optimal value may depend on the initial condition of the system.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
LP Formulations of Discrete Time Long-Run Average Optimal Control Problems: The Non-Ergodic Case
Vivek S. Borkar, Vladimir Gaitsgory and Ilya Shvartsman Department of Electrical Engineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India, [email protected]; the work of this author was supported by a J. C. Bose Fellowship from the Government of IndiaDepartment of Mathematics and Statistics, Macquarie University, Sydney, NSW 2109, Australia, [email protected]; the work of this author was supported by the Australian Research Council Discovery Grants DP130104432Department of Mathematics and Computer Science, Penn State Harrisburg, Middletown, PA 17057, USA, [email protected]
Abstract
We formulate and study the infinite dimensional linear programming (LP) problem associated with the deterministic discrete time long-run average criterion optimal control problem. Along with its dual, this LP problem allows one to characterize the optimal value of the optimal control problem. The novelty of our approach is that we focus on the general case wherein the optimal value may depend on the initial condition of the system.
1 Introduction and Preliminaries
In this paper, we formulate and study the infinite dimensional (ID) linear programming (LP) problem associated with the deterministic discrete time optimal control problem with long-run average cost, in which the optimal value may depend on the initial condition of the system. The paper continues the line of research started in [10], where similar issues were dealt with in the context of systems evolving in continuous time. Note that, although ideas behind the consideration of continuous and discrete time cases are similar, results in the discrete time case are stronger and are obtained under weaker assumptions comparatively to their continuous time counterparts presented in [10] (we discuss relationships between the two groups of results in detail in the conclusions section at the end of the paper).111An updated and extended version of this paper has been published in SIAM Journal on Control and Optimization, Vol. 57, No 3, pp.1783-1817, DOI. 10.1137/18M1229432
Allowing one to use the convex duality theory and linear programming based numerical techniques, LP formulations of various classes of optimal control problems have been studied extensively in the literature. For example, LP formulations of problems of optimal control of stochastic systems evolving in continuous time have been considered in [5, 8, 11, 16, 29, 37]. Various aspects of the LP approach to problems of optimization of discrete time stochastic systems (controlled Markov chains) have been discussed in [9, 25, 26, 27]. In the deterministic setting, the LP approach has been developed/applied in [21, 24, 30, 35, 38] for systems evolving in continuous time considered on a finite time interval. The applicability of the LP approach to deterministic continuous and discrete time systems considered on the infinite time horizon has been explored in [17, 18, 19, 20, 34].222Infinite time horizon optimal control problems have been traditionally studied with the help of other (not LP related) techniques; see, e.g., [7, 13, 14, 15, 22, 23, 39, 40] and references therein. Note that the list of references mentioned above represents only a sample of the available literature and is not even close to being exhaustive.
Note that, while the form and the properties of the IDLP problem related to the ergodic case (that is, the case when the optimal value is independent of the initial conditions) have been well understood, the linear programming formulation of the long-run average optimal control problem in the non-ergodic case has not been discussed much in the literature. In fact, a justification of counterparts of LP formulations for reducible finite state Markov chains, as in, e.g., [26] and [27], presents a significant mathematical challenge. First steps to address this challenge have been made in [10], and (as mentioned above) the present paper is a continuation of this work.
Everywhere in what follows, we will be dealing with the discrete time controlled dynamical system
[TABLE]
Here is a given nonempty compact subset of , is an upper semicontinuous compact-valued mapping to a given compact metric space , is a continuous function.
It can be observed that the last two constraints of (1.1) can be rewritten as one:
[TABLE]
where the map is defined by the equation
[TABLE]
The map is upper semicontinuous and its graph ,
[TABLE]
is a compact subset of .
A control and the pair will be called an admissible control and an admissible process, respectively, if the relationships (1.1) are satisfied. The set of admissible controls will be denoted or , depending on whether the problem is considered on the infinite time horizon or on a finite time sequence .
Everywhere in the paper, it is assumed that
A1. *The set is not empty for any .
This assumption implies that the sets (with being an arbitrary positive interger) and the set are not empty for any . That is, there exists at least one admissible control for any initial condition (systems that satisfy such a property are called viable; see [4]).
On the trajectories of (1.1), we consider the following optimal control problems:
[TABLE]
[TABLE]
where is a continuous function and is a discount factor. Note that, under Assumption A1, the minima in (1.2) and (1.3) are achieved and the optimal value functions , are lower semicontinuous (see, e.g., Propositions 1-3 and Corollary 1 in [19]).
An extensive literature is devoted to matters related to the existence and equality of the limits and . The ergodic case, when these limits are constants (that is, when they do not depend on the initial condition ), was studied, for example, in [3, 5, 7, 17] (see also references therein). Results for the non-ergodic case were obtained in [12, 22, 23, 28, 31, 32, 33]. In particular, it was results of [12] that were instrumental for obtaining the IDLP representation for the aforementioned limits for systems evolving in continuous time in [10]. Some ideas from [12] are used in this paper too.
The paper is organized as follows. In the remainder of this introductory section, we give some definitions and state some earlier results that are used further in the text. In Section 2, we introduce an IDLP problem and its dual, the optimal value of the latter giving a lower bound for and (see Proposition 2.3). In Section 3, we establish (see Theorem 3.1) that and are bounded from above by the optimal value of the IDLP problem introduced in Section 2 provided that the value functions , are continuous. Note that the proof of Theorem 3.1 is based on a lemma that extends some results of [12] to the discrete time case (see Lemma 3.2). A direct corollary from the above mentioned results is Proposition 4.1 of Section 4 stating that the limits and exist and are equal to the optimal value of the IDLP problem if there is no duality gap. The main result of Section 4 is Theorem 4.2 establishing that, if the pointwise limits and exist and are continuous, then they are equal to the optimal value of the dual problem. Also in this section, we use the optimal solution of the dual IDLP problem to state sufficient and necessary optimality conditions for the long-run average optimal control problem (see Propostions 4.5 and 4.6), these optimality conditions are illustrated with an elementary “toy example”. In Section 5, we establish some auxiliary results used in the proofs of the previous sections and in Section 6, we present some conclusions summarizing results obtained and comparing them with results of [10].
We conclude this section with the introduction of notations and results that are used in the sequel. Let be an admissible process. A probability measure is called the occupational measure generated by the process over the time sequence if, for any Borel set ,
[TABLE]
A probability measure is called the discounted occupational measure generated by the process if, for any Borel set ,
[TABLE]
where is the indicator function of .
It can be shown that, if is the occupational measure generated by the process over the time sequence , then
[TABLE]
for any Borel measurable function on . Also, it can be shown that if is the discounted occupational measure generated by the process , then
[TABLE]
for any Borel measurable function on .
Let us introduce the following notations for the sets of occupational measures:
[TABLE]
[TABLE]
Note that, due to (1.5) and (1.6), problems (1.2) and (1.3) can be rewritten in the form
[TABLE]
and
[TABLE]
respectively.
To describe convergence properties of occupational measures, we introduce the following metric on (the space of probability measures defined on Borel subsets of ):
[TABLE]
for , where is a sequence of Lipschitz continuous functions dense in the unit ball of the space of continuous functions from to . This metric is consistent with the weak∗ convergence topology on , that is, a sequence converges to in this metric if and only if
[TABLE]
for any . Using the metric , we can define the “distance” between and and the Hausdorff metric between and as follows:
[TABLE]
Note that, although, by some abuse of terminology, we refer to as a metric on the set of subsets of , it is, in fact, a semi metric on this set (since implies if and are closed, but the equality may not be true if at least one of these sets is not closed).
Let us define the sets and by the equations:
[TABLE]
[TABLE]
Note that the sets and are convex and compact in the topology specified above. The following equalities establish relationships between these sets and the occupational measures sets introduced earlier (see Theorem 5.4 in [19]):
[TABLE]
Also (see Corollary 2 in [19]),
[TABLE]
Here and in what follows, stands for the closed convex hull of the corresponding set.
2 Estimates of the Limit Optimal Value Functions from Below
Consider the IDLP problem
[TABLE]
where
[TABLE]
with standing for the space of nonnegative measures defined on Borel subsets of . Also consider the problem
[TABLE]
where is the set of triplets that for all satisfy the inequalities
[TABLE]
Note that the optimal value of problem (2.3) can be equivalently represented as
[TABLE]
where and are continuous functions, and satisfies the second inequality in (2.4). The optimal values of (2.3) and (2.1) are related by the inequality
[TABLE]
(see Lemma 5.3 in Section 5.2). Problem (2.3) is, in fact, dual with respect to (2.1), with (2.6) being a part of the duality relationships (see more details in Section 5.2).
As can be readily seen, problem (2.1) can be equivalently written as
[TABLE]
where
[TABLE]
Along with (2.7), consider the problem
[TABLE]
where
[TABLE]
It is easy to see that both sets and are convex, set is closed (and, therefore, compact), and
[TABLE]
Lemma 2.1
The following inclusions are true:
[TABLE]
This implies, in particular, that the set is not empty.
Proof. Note first that since the sets and are not empty for all admissible and , so are the sets and . Note also that from (1.11) it follows that
[TABLE]
Let . Then there exist sequences and such that as . Let be the control generating and be the corresponding trajectory. For any we have
[TABLE]
Define the functional (here and in what follows, stands for the space of continuous linear functionals on ) by the equation
[TABLE]
Due to Riesz representation theorem (see, e.g., Theorem 4.3.9, p. 181 in [6]), there exists such that
[TABLE]
Then (2.11) can be written as
[TABLE]
Passing to the limit, we obtain
[TABLE]
Since (due to (2.10)), the latter equality implies that Thus, the first inclusion in (2.9) is proved.
Let us prove the second inclusion. By (1.12), to prove the second inclusion in (2.9), it is sufficient to prove that
[TABLE]
Note that from (1.11) and (1.12) it follows that
[TABLE]
Take . There exist sequences and such that as . Since , we have
[TABLE]
where . Passing to the limit as we obtain
[TABLE]
Since , the second inclusion in (2.9) is proved.
The next lemma establishes a relation between the optimal values in problems (2.3) and (2.8).
Lemma 2.2
The optimal value in problems (2.3) and (2.8) are equal, that is,
[TABLE]
Proof. The proof of the lemma is given in Section 5.2.
Proposition 2.3
The lower limits of the optimal value functions in problems (1.2) and (1.3) are bounded from below by the optimal value of (2.3), that is,
[TABLE]
Proof. This proposition follows from Lemmas 2.1 and 2.2, and from the fact that the equalities
[TABLE]
are valid.
Let be a positive integer and let be a -periodic admissible process. This process will be referred to as finite time (FT) reachable from if there exist an integer and a control such that the solution of (1.1) obtained with this control satisfies the equality .
Consider the optimal control problem
[TABLE]
where is over all integer and over all -periodic pairs that are FT reachable from . Similarly to (1.9), this problem can be reformulated in terms of occupational measures
[TABLE]
where is the set of occupational measures generated by all FT reachable from -admissible periodic pairs. Note that
[TABLE]
and, therefore,
[TABLE]
Proposition 2.4
The following relationships are valid:
[TABLE]
Proof. Due to (2.7) and (2.15), it is sufficient to prove only the first relationship. Note that from (2.10) and (2.16) it follows that
[TABLE]
Take now an arbitrary . By definition, it means that is generated by a -periodic pair that is FT reachable from . That is, for any continuous function ,
[TABLE]
Consequently, for any ,
[TABLE]
[TABLE]
[TABLE]
where is a solution of (1.1) that satisfies the equality (the existence of and the existence of a control that ensure the validity of this equality follows from the fact that is FT reachable from ). Since and , from (2.20) it follows that
[TABLE]
[TABLE]
Define by the equation
[TABLE]
Due to Riesz representation theorem, there exists such that
[TABLE]
Therefore, (2.21) can be rewritten as
[TABLE]
Since (by (2.19)), the latter implies that . Thus, the first relationship in (2.18) is established.
Corollary 2.5
If
[TABLE]
then
[TABLE]
3 Estimates of the Limit Optimal Value Functions from Above
Theorem 3.1
(a)* Let be continuous on for all natural . Then*
[TABLE]
(b)* Let be continuous on for all . Then*
[TABLE]
Proof of the theorem is based on the following lemma.
Lemma 3.2
For any natural ,
[TABLE]
Also, for any ,
[TABLE]
The proof of the lemma is given at the end of the section.
**Proof of Theorem 3.1.
**
Proof of (a). Let us fix an arbitrary natural and let us consider the following IDLP problem
[TABLE]
where is the set of pairs that satisfy the inequalities
[TABLE]
with
[TABLE]
Let us show that, for an arbitrary small , there exists a function such that
[TABLE]
Note that, if the inclusion above is established, it would imply that
[TABLE]
Let us first verify that there exists such that the pair satisfies the first inequality in (3.6). To this end, note that the inequality (3.3) is equivalent to the inequality
[TABLE]
which, in turn, is equivalent to
[TABLE]
The problem on the left hand side of (3.10), i.e.,
[TABLE]
is an IDLP problem, its dual being
[TABLE]
The optimal values of (3.11) and (3.12) are equal (see Proposition 6 in [19]). Therefore, (3.10) is equivalent to
[TABLE]
From (3.13) it follows that, for any , there exists a function such that
[TABLE]
The latter implies that that the pair , where , satisfies the first inequality in (3.6).
Let us now verify that the function satisfies the second inequality in (3.6). From the dynamic programming principle applied to problem (1.2), it follows that, for any ,
[TABLE]
Also, as can be readily seen,
[TABLE]
[TABLE]
Consequently,
[TABLE]
Thus, satisfies the second inequality in (3.6). Hence, (3.8) is valid and, consequently, (3.9) is valid too.
[TABLE]
where
[TABLE]
(Note that, to adjust the notations used above and those used in Lemma 5.3, one should write and as and , where .)
From (3.9) and (3.17) it follows that which implies that
[TABLE]
since is arbitrary small. Due to (3.19), to prove (3.1), it is sufficient to establish that
[TABLE]
One can readily see that is a decreasing function of and that for any . Hence,
[TABLE]
Let us now show that the opposite inequality is also valid. Let be arbitrary small and let be -optimal for (2.1). That is,
[TABLE]
Then
[TABLE]
[TABLE]
[TABLE]
( can be arbitrary small). Thus (3.20) is established and statement (a) is proved.
*Proof of *(b) The proof of (b) is very similar to that of (a). We fix an arbitrary and consider the IDLP problem
[TABLE]
where is the set of pairs that satisfy the inequalities
[TABLE]
We then show that, for an arbitrary small , there exists a function such that
[TABLE]
with the inclusion above implying that
[TABLE]
To verify (3.23), we first show that there exists such that the pair satisfies the first inequality in (3.22). As in the proof of (a), we rewrite the inequality (3.4) in the form
[TABLE]
which is equivalent to
[TABLE]
The problem on the left hand side of (3.25), i.e.,
[TABLE]
is an IDLP problem, the dual of which is
[TABLE]
The optimal values of (3.26) and (3.27) are equal (Proposition 6 in [19]). Therefore, (3.25) is equivalent to
[TABLE]
From (3.28) it follows that, for any , there exists a function such that
[TABLE]
The latter implies that the pair , where , satisfies the first inequality in (3.22).
To verify that the function satisfies the second inequality in (3.22), note that from the dynamic programming principle applied to problem (1.3), it follows that
[TABLE]
(see, e.g., Proposition 4 in [19]). The latter implies that
[TABLE]
which, in turn, implies that
[TABLE]
(since, as can be readily seen, ). Thus, satisfies the second inequality in (3.22), and, therefore, (3.24) is valid too. Starting from this point, the proof of (b) follows exactly the same steps as that of (a).
Proof of Lemma 3.2. Let us prove (3.3). To this end, let us show first that, for any natural and ,
[TABLE]
where is as in (3.7). Take , , and let be a control that generates on . Extend from the interval to the interval so that . Such extension is possible due to viability of . Let be the corresponding trajectory. Taking into account that for all , we obtain
[TABLE]
Thus the inequality (3.32) is established. From this inequality it follows that
[TABLE]
where is the union of over (see (1.7)). Take an arbitrary . From (1.11) it follows that there exist sequences , such that and . Passing to the limit along these sequences in (3.33) and having in mind that
[TABLE]
(since is lower semicontinuous for any ; see, e.g., Theorem 3.1.5 in [36]), one arrives at inequality (3.3).
Let us now prove (3.4). To this end, let us show first that, for any and any ,
[TABLE]
[TABLE]
Take , , and let be a control that generates . Let also be the trajectory corresponding to . We have
[TABLE]
From (3.34) it follows that
[TABLE]
where is the union of over (see (1.8)). Take an arbitrary . From (1.11) it follows that there exist sequences , such that and . Passing to the limit along these sequences in (3.35) and keeping in mind that
[TABLE]
(since is lower semicontinuous for any ; see also Theorem 3.1.5 in [36]), one arrives at inequality (3.4).
4 LP Representation for the Optimal Value and Related Sufficient/Necessary Optimality Conditions
The following statement is a direct corollary of Theorem 3.1 and Proposition 2.3.
Proposition 4.1
If
[TABLE]
then, provided that is continuous for any , there exists the pointwise limit
[TABLE]
Also, provided that is continuous for any , there exists the pointwise limit
[TABLE]
Note that a statement about the LP representation of the pointwise limits (4.2) and (4.3) can be established without the strong duality assumption (4.1) . Namely, the following result is valid.
Theorem 4.2
(a)* Let the pointwise limit*
[TABLE]
exist and let the function be continuous. Then
[TABLE]
(b)* Let the pointwise limit*
[TABLE]
exist and the function be continuous. Then
[TABLE]
Proof. The proof of the theorem is given at the end of this section.
Remark 4.3
If (4.4) and (4.5) are valid, then the strong duality equality (4.1) is true provided that condition (2.22) of Corollary 2.5 is satisfied. **
In the rest of this section, we assume that the pointwise limit exists and is continuous, and, therefore, it is equal to the optimal value of the dual problem (2.3) (by Theorem 4.2). That is, (4.4) and (4.5) are valid.
Consider the optimal control problem
[TABLE]
Note that, due to (4.4), the optimal value of (4.8) is equal to (see Proposition 5.4 in Section 5). Below, we discuss sufficient and necessary optimality conditions for problem (4.8) stated in terms of an optimal solution of problem (2.3).
DEFINITION. A pair will be called an optimal solution of (2.3) if it satisfies the inequalities (compare with (2.4))
[TABLE]
Proposition 4.4
(a)* A pair is an optimal solution of (2.3) if and only if satisfies the second inequality in (4.9) and*
[TABLE]
(b)* If is such that*
[TABLE]
then the pair , where , is an optimal solution of problem (2.3).
Proof. By (2.5), the first inequality in (4.9) is equivalent to the equality
[TABLE]
Also, (4.12) is equivalent to (4.10) (due to (4.5)). Thus (a) is proved.
If is such that (4.11) is satisfied, then the pair , where , satisfies (4.10). Therefore, by (a), this pair is an optimal solution of (2.3). This proves (b).
Proposition 4.5
Let an optimal solution of (2.3) exist. Then, for an admissible process to be optimal in (4.8) it is sufficient that the equalities
[TABLE]
[TABLE]
are satisfied for all t=0,1,...\.
Proof. From (4.13) and (4.14) it follows that
[TABLE]
for all t=0,1,...\. Therefore, for any ,
[TABLE]
Taking into account that
[TABLE]
we obtain
[TABLE]
That is, the process is optimal in (4.8).
We will now establish that the fulfillment of (4.13)-(4.14) is also a necessary condition of optimality of an admissible process provided that the latter is periodic, that is, there exists a positive integer such that, for any ,
[TABLE]
Proposition 4.6
Let an optimal solution of (2.3) exist. Then, for an admissible process satisfying the periodicity conditions (4.16) to be optimal in (4.8), it is necessary that the equalities (4.13)-(4.14) are satisfied for all .
Proof. Note that the fact that the periodic admissible process is optimal in (4.8) means that
[TABLE]
Note also that from Proposition 4.4 it follows that, for any ,
[TABLE]
[TABLE]
From (4.17) and (4.18) it follows that
[TABLE]
which implies that
[TABLE]
due to the fact that
[TABLE]
(by (4.16)). The inequalities (4.19) and (4.20) establish the validity of (4.14). In view of (4.14), the inequality (4.18) is equivalent to that
[TABLE]
for all . If the above inequality was strict for at least one , then one would obtain
[TABLE]
which, by (4.21), would lead to
[TABLE]
The latter contradicts (4.17). Hence, (4.22) is satisfied as equality for all . This proves (4.13).
Remark 4.7
As established by Proposition 4.5, an admissible process is optimal if it satisfies the equalities (4.13), (4.14). Assuming that these are valid, one may conclude (due to (4.10)) that the equality (4.13) is equivalent to
[TABLE]
which leads to
[TABLE]
The latter implies that the feedback control
[TABLE]
is optimal in the sense that, being used in (1.1), it allows one to obtain the optimal “open loop” admissible process .**
Let us illustrate the optimality conditions discussed above with the following “toy example”.
Example. Let the dynamics be one-dimensional and be described by the equation (compare with (1.1))
[TABLE]
with and with (that is, the control can be either equal to or to ). Consider problem (1.2) with . As can be readily understood, the optimal admissible processes in this example are as follows. If , then
[TABLE]
If , then
[TABLE]
Also, if , then the system is uncontrollable, and the only admissible trajectory is . The admissible processes described above are optimal on any time horizon (both finite and infinite), with the optimal value function being defined by the equation
[TABLE]
Thus, . Note that condition (2.22) of Corollary 2.5 is satisfied and, therefore, the strong duality equality (4.1) is valid in the given example (see Remark 4.3).
Define the function by the equation
[TABLE]
One can readily verify that
[TABLE]
the latter implying that
[TABLE]
That is, satisfies (4.11). Therefore, the pair , where , is an optimal solution of (2.3). The feedback control defined in (4.23) takes in this case the form
[TABLE]
This feedback control is optimal and it is consistent with the optimal open loop solution shown above.
Remark 4.8
If (4.13), (4.14) are valid, then the relationships (4.15) are valid, the latter implying that
[TABLE]
This provides an interpretation of as a function that defines the difference between the running cost and the optimal value along the optimal trajectory. Note that, if
[TABLE]
that is the process is optimal on any finite time horizon as well, then (4.26) can be rewritten as follows
[TABLE]
That was the case in the example considered above, in which the optimal trajectory satisfies the equalities: and for all . This leads to (see (4.25)) and, consequently, to that
[TABLE]
Thus, the relationships in (4.24) are consistent with (4.27). **
Proof of Theorem 4.2. If the pointwise limit (4.4) exists, then, by Proposition 2.3, the limit function satisfies the inequality
[TABLE]
Therefore, to prove the statement (a), one needs to show that
[TABLE]
Similarly, if the pointwise limit (4.6) exists, then, by Proposition 2.3, the limit function satisfies the inequality
[TABLE]
Therefore, to prove the statement (b), one needs to show that
[TABLE]
Proof of (4.28). Firstly, note that, by dividing (3.15) by and passing to the limit as , one obtains
[TABLE]
Also, by passing to the limit as in (3.3), one obtains
[TABLE]
Inequality (4.31) can be rewritten in the form
[TABLE]
which is equivalent to that
[TABLE]
The problem in the left hand side of the above inequality,
[TABLE]
is an IDLP problem, whose dual is
[TABLE]
Through equality of the optimal values of (4.33) and (4.34) (see Proposition 6 in [19]), we conclude that (4.32) is equivalent to
[TABLE]
From (4.35) it follows that, for any , there exists a function such that
[TABLE]
Consider the problem
[TABLE]
where is the set of pairs that satisfy inequalities
[TABLE]
Note that the optimal value of problem (4.37) is the same as that of (2.3) (see (5.15) in the proof of Lemma 5.3 taken with ). Due to (4.30) and (4.36), the pair , where , satisfies the inequalities (4.38). Consequently,
[TABLE]
This proves (4.28) since is arbitrarily small .
Proof of (4.29). By passing to the limit as in (3.30), we conclude that satisfies the inequality
[TABLE]
Also, by passing to the limit as in (3.4) we establish that
[TABLE]
Proceeding from this point in exactly the same way as above, one establishes the validity of (4.29)
5 Appendix
5.1 Another representation for the limit optimal values
Let be the set of continuous functions that satisfy the following relationships:
[TABLE]
and
[TABLE]
In these notations, the relationships (4.30), (4.31) and (4.39), (4.40) are equivalent to the inclusions
[TABLE]
and
[TABLE]
respectively.
Proposition 5.1
(a)* Let the pointwise limit (4.4) exist and the function be continuous. Then*
[TABLE]
(b)* Let the pointwise limit (4.6) exists and the function be continuous. Then*
[TABLE]
Proof. Note that, due to (5.3) and (5.4)
[TABLE]
Therefore, to prove the proposition, it is sufficient to establish that the inequalities opposite to (5.7) are valid. For a natural , let be an optimal control in (1.2), be the occupational measure generated by this control, and be the corresponding trajectory. Then
[TABLE]
Let converge to in weak∗ topology as along a subsequence (we do not relabel). Note that (due to (1.11)). From the equality above, by passing to the limit as , we obtain
[TABLE]
For , taking into account the monotonicity property (5.1), we have
[TABLE]
Since is continuous, we can pass to the limit as and obtain
[TABLE]
Combining this with (5.2) and (5.8) we obtain
[TABLE]
The latter implies that the inequality opposite to the first inequality in (5.7) is valid. This proves part (a) of the proposition.
The proof of the inequality opposite to the second inequality in (5.7) is similar. For let be an optimal control in (1.3), be the occupational measure generated by this control, and be the corresponding trajectory. Then
[TABLE]
Let converge to in weak∗ topology as along a subsequence (we do not relabel). Note that (due to (1.11)). From the equality above, by passing to the limit as we obtain
[TABLE]
Combining this with (5.2) and (5.9) we obtain
[TABLE]
The latter implies that the inequality opposite to the second inequality in (5.7) is valid, and, thus, proves part (b) of the proposition.
Remark 5.2
It can be verified directly that the optimal value of the problem in the right hand side of (5.5) and (5.6) is equal to (the optimal value of the dual problem (2.3)). Results establishing the validity of presentations similar to (5.5) and (5.6) in continuous time setting were obtained in [12]. **
5.2 Results referred to in Sections 3 and 4
Consider a perturbed version of the IDLP problem (2.1)
[TABLE]
and the corresponding perturbed version of the dual problem (2.3)
[TABLE]
where is the set of triplets that satisfy the inequalities
[TABLE]
Note that is a perturbation parameter and note that (5.10) and (5.11) become (2.1) and (2.3) with . Consider also the problem
[TABLE]
where is the set of pairs that satisfy the inequalities
[TABLE]
Lemma 5.3
The following relationships are valid:
[TABLE]
Proof. Let us prove, first, that
[TABLE]
In fact, the inequality is true (since, for any pair , the triplet with ). Let us prove the opposite inequality. Let a triplet be such that , with being arbitrarily small. Then the pair , with . Since , it leads to the inequality and, consequently, to the inequality since is arbitrarily small. Thus, (5.16) is proved.
Let us now prove the inequality
[TABLE]
Take any and . Integrating the first inequality in (5.12) with respect to and taking into account that we conclude that
[TABLE]
Taking into account that and the second inequality in (5.12), we obtain
[TABLE]
Therefore,
[TABLE]
This proves (5.17).
Let stand for the space of continuous linear functionals on and let stand for the space of measures defined on Borel subsets of . Define a linear operator as follows: for any ,
[TABLE]
where are defined by the equation: ,
[TABLE]
[TABLE]
In this notation, the set defined in (2.2) can be rewritten as follows
[TABLE]
where stands for the zero element of . Also, problem (2.1) takes the form
[TABLE]
where (also, in the sequel) denoting the integral of the corresponding function over (respectively, over ). Note that, for any ,
[TABLE]
[TABLE]
[TABLE]
Define now the linear operator
in such a way that, for any ,
[TABLE]
Thus,
[TABLE]
[TABLE]
That is, the operator is the adjoint of . The problem dual to (5.19) is of the form (see [1] and [2])
[TABLE]
[TABLE]
[TABLE]
the latter being equivalent to (2.3).
Proof of Lemma 2.2. Let
[TABLE]
[TABLE]
and let stand for the closure of in the weak∗ topology of . Consider the problem
[TABLE]
Its optimal value is called the subvalue of the IDLP problem (5.19). Let us show that the optimal value of (2.8) is equal to the subvalue. In fact, as can be readily seen, if . Consequently,
[TABLE]
From the fact that is defined as the optimal value in (5.20) it follows that there exists a sequence such that converges (in weak∗ topology) to , with converging to as tends to infinity. That is (see (5.18)),
[TABLE]
[TABLE]
Without loss of generality, one may assume that converges in weak∗ topology to a measure that satisfies the relationships
[TABLE]
Also, and . That is, and therefore,
[TABLE]
Thus, the optimal value of (2.8) is equal to the subvalue. To complete the proof, it is sufficient to note that the subvalue of an IDLP problem is equal to the optimal value of its dual provided that the former is bounded (see, e.g., Theorem 3 in [1]). That is, .
Let us conclude this section with proving the validity of the following proposition.
Proposition 5.4
The optimal value of the problem in the left hand side of (4.8) is equal to . That is,
[TABLE]
Proof. Let and let be the corresponding trajectory. Then
[TABLE]
Therefore,
[TABLE]
and, hence,
[TABLE]
Let us prove the opposite inequality. For any and , and for sufficiently large ,
[TABLE]
where . Therefore,
[TABLE]
and, consequently,
[TABLE]
Hence,
[TABLE]
The proposition is proved.
6 Conclusions
We have introduced the IDLP problem, the optimal value of which gives an upper bound for and , with the optimal value of the corresponding dual problem providing a lower bound for and . While the result establishing the validity of the lower bound (Proposition 2.3) is very similar to the corresponding result in [10], the statement about the validity of the upper bound (Theorem 3.1) is much stronger than its continuous time counterpart in [10], where it was assumed that the uniform limits and exist and are Lipschitz continuous. Note also that, in contrast to the result of [10], we did not assume that the set is invariant (only that it is viable). We believe that establishing the validity of the upper bound for systems evolving in continuous time under assumptions similar to those of Theorem 3.1 is possible, and it can be a subject for future research.
We have also established that, if the pointwise limits and exist and are continuous, then they are equal to the optimal value of the dual problem (Theorem 4.2). A similar statement in the continuous time setting can be established using a similar argument if the limits of the optimal value functions exist and are continuously differentiable. This assumption is, however, too strong, and finding less restrictive conditions, under which a statement similar to Theorem 4.2 for systems in continuous time is valid, can also be a subject for future research.
Finally, we have stated sufficient and necessary optimality conditions for the long-run average optimal control problem using an optimal solution of the dual problem (Propositions 4.5 and 4.6). Similar results can be readily obtained in the continuous time case too.
Acknowledgment. We would like to express our gratitude to D. Khlopin and to M. Quincampoix for useful discussions and for sharing with us some insightful examples.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] E.J. Anderson, A Review of Duality Theory for Linear Programming over Topological Vector Spaces, J. of Math. Analysis and App. , 97:2 (1983), pp. 380-392
- 2[2] E.J. Anderson and P. Nash, Linear Programming in Infinite-Dimensional Spaces, Wiley, Chichester, 1987.
- 3[3] M. Arisawa and P.-L. Lions, On Ergodic Stochastic Control, Commun. in Partial Differential Equations , 23:11 (1998), pp. 2187-2217.
- 4[4] J.-P. Aubin, Viability Theory, Birkhauser, Basel, 1991.
- 5[5] A. Arapostathis, V.S. Borkar and M.K. Ghosh, Ergodic Control of Diffusion Processes, Cambridge Uni. Press, Cambridge, UK, 2012.
- 6[6] R. Ash, Measure, Integration and Functional Analysis , Academic Press, 1972.
- 7[7] M. Bardi and I. Capuzzo-Dolcetta, Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations, Birkhauser, Boston, 1997.
- 8[8] A.G. Bhatt and V.S. Borkar, Occupation measures for controlled Markov processes: characterization and optimality, The Annals of Probability , 24:3 1996), pp. 1531-1562.
