Linear Programming Formulations of Deterministic Infinite Horizon Optimal Control Problems in Discrete Time
Vladimir Gaitsgory, Alex Parkinson, I. Shvartsman

TL;DR
This paper explores the connection between infinite horizon optimal control problems in discrete time and infinite-dimensional linear programming, analyzing their relationships and asymptotic behaviors for discounted and average criteria.
Contribution
It establishes a novel link between discrete-time optimal control problems and IDLP formulations, including asymptotic relationships between different criteria.
Findings
Optimal control problems relate to specific IDLP problems.
Asymptotic relationships between discounted and average criteria are established.
The study provides a new framework for analyzing long-term control strategies.
Abstract
This paper is devoted to a study of infinite horizon optimal control problems with time discounting and time averaging criteria in discrete time. We establish that these problems are related to certain infinite-dimensional linear programming (IDLP) problems. We also establish asymptotic relationships between the optimal values of problems with time discounting and long-run average criteria.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Variational Analysis · Aerospace Engineering and Control Systems · Advanced Optimization Algorithms Research
Linear Programming Formulations of Deterministic Infinite Horizon Optimal Control Problems in Discrete Time
**V. Gaitsgorya, A. Parkinsona and I. Shvartsmanb
a** *Department of Mathematics, Macquarie University, Eastern Road, Macquarie Park, NSW 2113, Australia
b* Department of Mathematics and Computer Science, Penn State Harrisburg, Middletown, PA 17057, USA
Abstract. This paper is devoted to a study of infinite horizon optimal control problems with time discounting and time averaging criteria in discrete time. We establish that these problems are related to certain infinite-dimensional linear programming (IDLP) problems. We also establish asymptotic relationships between the optimal values of problems with time discounting and long-run average criteria.
**Key words: Optimal control, discrete systems, infinite horizon, long-run average, occupational measures, linear programming, duality 111AMS subject classification: 49N15, 93C55 **
1 Introduction
The linear programming (LP) approach to control systems is based on the fact that the occupational measures generated by admissible controls and the corresponding solutions of a dynamical system satisfy certain linear equations that represent the system’s dynamics in an integral form. The idea of such linearization was explored extensively in both deterministic and stochastic settings (see, e.g., [5], [8], [9], [13], [24], [30], [31] and, respectively, [1], [12], [14], [15], [16], [17], [18], [21], [23], [25], [27], [29], [33] as well as references therein). In [15] and [16] in particular, the validity of LP formulations of deterministic infinite time horizon problems of optimal control with time average and time discounting criteria was proved for systems evolving in continuous time (note that other approachers/techniques for dealing with deterministic optimal control problems on the infinite time horizon have been studied, e.g., in [4], [7], [10], [34]; see also references therein). In the present paper, we show that the LP formulations of problems of optimal control with time average and time discounting criteria are valid for systems evolving in discrete time.
Note that some of the results of [15] and [16] were obtained under certain technical assumptions. For example, the statement implying the validity of the LP formulation of the long run average optimal control problem (see Theorem 2.6 in [16]) was proved under the assumption that the dependence of the control set on the state variables is Lipschitz continuous. These assumptions can be significantly relaxed in dealing with the discrete time systems. In particular, the result about the validity of the LP formulation of the long run average optimal control problem in discrete time is established in this paper under the assumption that the dependence of the control set on the state variables is upper semicontinuous. Also, it is worth noting that the results in [16] (see also Remark 4.5 in [15]) are stated with the use of the relaxed controls formalism, the latter playing no role in tackling the discrete time systems.
Everywhere in what follows we will be dealing with the discrete time controlled dynamical system
[TABLE]
Here is a given nonempty compact subset of , is an upper semicontinuous compact-valued mapping to a given compact metric space , is a continuous function.
Note that the last two constraints of (1) can be rewritten as one:
[TABLE]
where the map is defined by the equation
[TABLE]
As can be readily verified, the map is upper semicontinuous and its graph ,
[TABLE]
is a compact subset of .
A control and the pair will be called an admissible control and, respectively, an admissible process if the relationships (1) are satisfied. The sets of admissible controls will be denoted by or , depending on whether the problem is considered on the infinite time horizon () or on a finite time sequence (, where is a positive integer).
Consider the optimal control problem
[TABLE]
where is a continuous function and is a discount factor. Consider also the optimal control problem
[TABLE]
Everywhere in the paper, it is assumed that
A1. The set is not empty (that is, there exists at least one admissible control).
As shown below (see Propositions 2.1 and 2.3), the minima in (2) and (3) are achieved if A1 is satisfied. To obtain our main results, we use a stronger assumption:
A2. The set is not empty for any .
This assumption implies non-emptiness of for any (systems that satisfy such a property are called viable; see [3]).
Along with optimal control problems (2) and (3), let us consider two infinite-dimensional (ID) linear programming (LP) problems:
[TABLE]
and
[TABLE]
where and are subsets of (here and in what follows stands for the space of probability measures on Borel subsets of ) defined by the equations:
[TABLE]
and
[TABLE]
Note that (4) and (5) are indeed LP problems since both the objective functions and the constraints defining and are linear in the “decision variable” . Note also that can be obtained from by setting .
In the paper, we prove that, under Assumption A2,
[TABLE]
and the limits and exist and are equal to :
[TABLE]
It is worth mentioning that there exists an extensive literature devoted to the relationship between the limits of the sums and as and , respectively. There are many examples showing that these limit may not exist (see, e.g., [6], where relationships between the corresponding lower and upper limits were investigated). However, provided that the sequence is bounded, the existence of one of these limits implies the existence of the other and their equality (see, e.g., [32]). In the context of optimal control in discrete time, relationships between the lower and upper limits of and were studied, e.g., in [26] and [28]. The (full) aforementioned limits may not exist, and, as was shown in [26] (without the assumption about the compactness of the set of admissible states ), these limits, even if exist, may be different. As mentioned above, in this paper we establish that, under the validity of A2, the limits of the minima over the initial conditions of and exist and are equal to the optimal value of the IDLP problem (5).
The paper is organized as follows. Section 2 contains some preliminary results used in the sequel. In Section 3, we introduce discounted and “non-discounted” occupational measures and we reformulate problems (2) and (3) in terms of minimization over the sets of such measures. In Section 4, we establish that (8) is valid, and in Section 5 we prove the validity of (9). In this section, we also establish asymptotic properties of the sets of discounted and non-discounted occupational measures. In Section 6, we prove auxiliary results that are used in Sections 4 and 5.
2 Preliminaries
Everywhere in this and the following sections, it is assumed that A1 is satisfied.
Proposition 2.1
The minimum in (2) is achieved.
Proof. For an admissible process , denote . Let , be a minimizing sequence of controls and let be the corresponding sequence of trajectories. By using the diagonalization argument and taking into account compactness of , we can find convergent subsequences (we do not relabel) and for all . By passing to the limit in the relation as we conclude that the process is admissible. For any natural we have
[TABLE]
Take and find large enough so that the second sum does not exceed for all , then the first sum can be made less than by taking sufficiently large . Therefore, as , which implies that the process is optimal.
Proposition 2.2
The optimal value function is lower semicontinuous.
Proof. Take a sequence as such that . Let be the corresponding sequence of minimizing controls, that is, controls such that . We want to show that Without loss of generality assume that is reached on the same sequence . Again, using the diagonalization argument and passing to a subsequence, we can assume that converges to admissible control for all . Using the same argument as in the proof of Proposition 2.1 we can show that . We have
[TABLE]
which is the required inequality.
Proposition 2.3
The minimum in (3) is achieved and the optimal value function is lower semicontinuous.
Proof. The fact that the minimum in (3) is achieved is obvious (since it is a finite-dimensional problem on a compact set), and the fact that is lower semicontinuous is proved similarly to Proposition 2.2.
Corollary 2.4
The minima in (9) are achieved.
Proof. The proof follows from the fact that the functions and are lower semicontinuous.
Proposition 2.5
For any such that , the following equation is valid
[TABLE]
Proof. The proposition is the well known dynamic programming principle for problem (2). For completeness of the exposition, we reproduce its proof in Section 6.
For a lower semicontinuous function , let be defined as follows
[TABLE]
Then equation (10) can be written as
[TABLE]
which resembles the Hamilton-Jacobi-Bellman equation for continuous time systems; see, e.g., [4].
3 Occupational Measure Formulations
Let be an admissible process. A probability measure is called the discounted occupational measure generated by the process if, for any Borel set ,
[TABLE]
where is the indicator function of . A probability measure is called the occupational measure generated by the process over the time sequence if, for any Borel set ,
[TABLE]
It can be shown that if is the discounted occupational measure generated by the process , then
[TABLE]
for any Borel measurable function on . Also, it can be shown that if is the occupational measure generated by the process over the time sequence , then
[TABLE]
for any Borel measurable function on .
To describe convergence properties of occupational measures, we introduce the following metric on :
[TABLE]
for , where is a sequence of Lipschitz continuous functions dense in the unit ball of the space of continuous functions from to . This metric is consistent with the weak∗ convergence topology on , that is, a sequence converges to in this metric if and only if
[TABLE]
for any . Note that the sets and are compact in this topology.
Using the metric , we can define the “distance” between and and the Hausdorff metric between and as follows:
[TABLE]
Note that, although, by some abuse of terminology, we refer to as a metric on the set of subsets of , it is, in fact, a semi metric on this set (since implies if and are closed and the equality may not be true if at least one of these sets is not closed).
Introduce the following notation for the sets of occupational measures:
[TABLE]
[TABLE]
Due to (13) and (14), problems (2) and (3) can be rewritten in the form
[TABLE]
and
[TABLE]
respectively.
4 Validity of (8)
Proposition 4.1
The inclusion is true.
Proof. For arbitrary and admissible process we have
[TABLE]
Multiplying both sides by and taking into account (13), we obtain
[TABLE]
where is generated by . The latter is equivalent to
[TABLE]
This implies that , which concludes the proof of the proposition.
Remark 4.2
Due to the assumed validity of A1, and, hence, .
Note that from Proposition 4.1 it follows that
[TABLE]
Let be the class of bounded lower semicontinuous functions from to . Note that if Assumption A2 is satisfied. In fact, in this case
[TABLE]
From this point on, it is everywhere assumed that Assumption A2 is indeed satisfied.
Consider the max-min problem
[TABLE]
We say that is a solution of (18) if
[TABLE]
Our first main result is the following theorem.
Theorem 4.3
The optimal values in problems (4) and (18) coincide and are equal to the optimal value of (2) multiplied by , that is,
[TABLE]
Moreover, the supremum in (18) is reached at .
Proof. From Proposition 2.5 we have
[TABLE]
which implies that
[TABLE]
Therefore,
[TABLE]
Taking into account (16), we get
[TABLE]
Let us show the opposite inequality. For denote
[TABLE]
so that . Take , arbitrary and let be a bounded sequence of continuous functions such that point-wise on as (due to (17), such a sequence exists; see, e.g., Theorem A6.6 in [2]). From (22), from Lebesgue dominated convergence theorem and from the definition of it follows that
[TABLE]
Taking supremum with respect to and minimum with respect leads to which, together with (21), implies (19). It also follows from (20) that
[TABLE]
which implies the second part of the theorem.
Corollary 4.4
The following equality is valid
[TABLE]
where stands for the closure of the convex hull of the corresponding set.
Proof. Due to (4) and (15), the equality (8) can be rewritten in the form
[TABLE]
which implies that
[TABLE]
Since the latter is valid for any continuous , it proves the validity of (23).
Remark 4.5
Note that problem (18) can be shown to be equivalent to the problem dual to the IDLP problem (4) (see Appendix of [15]), with the equality of the optimal values being a part of the duality relationships between these two problems.**
5 Validity of (9)
Let us introduce the following notation:
[TABLE]
where the minimization is over admissible controls and over the initial conditions in .
The main results of this section are Theorems 5.1 and 5.7 below. In Theorem 5.1 we, in particular, establish existence and equality of the limits in (9). Theorem 5.7 deals with a limiting property of the sets of occupational measures and is closely related to Theorem 5.1. Continuous-time analogs of Theorems 5.1 and 5.7 are proved in [15], Chapter 6. However, in continuous time, as opposed to discrete time, a few strong assumptions are needed for the validity of the corresponding results (e.g., Lipschitz continuity of the value function).
Let
[TABLE]
Theorem 5.1
The limits and exist and
[TABLE]
The proof is broken down into a series of propositions and lemmas.
Proposition 5.2
The equality holds true.
Proof. Take any . Integrating the inequality
[TABLE]
with respect to arbitrary we obtain
[TABLE]
Taking minimum with respect to and supremum with respect to , we conclude that
[TABLE]
Let us show the opposite inequality. Define
[TABLE]
that is, compared to (25), supremum in the formula above is taken with respect to continuous, rather than lower semicontinuous bounded functions. It is clear that
[TABLE]
therefore .
Let be a sequence of functions in with the following properties: (i) any finite collection of functions from this sequence is linearly independent on , (ii) for any and any there exist and scalars , such that . (An example of such sequence is the sequence of monomials , where stands for the th component of .)
Let us notice first that for any we have
[TABLE]
Indeed, if this was not the case, then, for with positive integer we would get
[TABLE]
which contradicts boundedness of .
Assume that functions are normalized so that . Define by
[TABLE]
It’s easy to see that the set is compact and for any the point does not belong to where 0 is the zero element of (otherwise, is not the minimum in (5)). Due to Hahn-Banach separation theorem (see, e.g., [11], Section V.2) there exists a sequence (where ) such that
[TABLE]
where for all and . From the last formula it is easy to see that . Let us show that, in fact, . Indeed, if it was not the case and , then we would have
[TABLE]
which is a contradiction to (29). Thus, . Dividing (30) through by we obtain
[TABLE]
Therefore, . Taking into account inequalities (26) and (28) we conclude that .
Proposition 5.3
The limit exists and is equal to .
Proof. Let us show that
[TABLE]
Indeed, let , and be such that . We have
[TABLE]
Passing to the limit as in this equality we obtain , therefore, , i.e, (31) holds. It follows from (31) and (19) that
[TABLE]
From (10) it follows that for any we have
[TABLE]
Therefore,
[TABLE]
Consequently,
[TABLE]
and
[TABLE]
Along with Proposition 5.2, the latter implies
[TABLE]
The assertion of the proposition follows from this relation and (32).
The following two lemmas, proved in the Appendix, are discrete-time analogs of [19], Lemma 3.5 (ii) and [20], Lemma 3.8. For the notation stands for the integer part of .
Lemma 5.4
Let be a function such that for all . Let and
[TABLE]
Then for any there exists a positive integer satisfying
[TABLE]
Lemma 5.5
Let be a function such that for all . Let be an arbitrary positive integer and
[TABLE]
For any there exists such that
[TABLE]
Moreover,
[TABLE]
Proposition 5.6
The limit exists and is equal to .
Proof. Let us show first that
[TABLE]
Take a sequence as and let be such that . Since , there exists an initial condition and a control such that for the corresponding trajectory and any we have
[TABLE]
Therefore,
[TABLE]
due to boundedness of . Thus, , i.e, inclusion (38) holds, which implies that
[TABLE]
Take a sequence . Due to Proposition 5.3 there exists a sequence of initial conditions , controls and the corresponding trajectories such that
[TABLE]
where . Applying Lemma 5.4 with and we conclude that there exists a sequence , such that ( is a constant independent of ) and
[TABLE]
therefore, . Together with (39) this implies that
[TABLE]
The latter means that
[TABLE]
where . Let us apply Lemma 5.5 in which plays the role of and . Set , denote the value corresponding to by and . We conclude that as and
[TABLE]
Let , . Note that is an admissible process. It follows from (42) that
[TABLE]
hence,
[TABLE]
which, along with (41), completes the proof of the proposition.
Combining the assertions of Propositions 5.2, 5.3, and 5.6, we complete the proof of Theorem 5.1.
The theorem below asserts convergence of the sets of occupational measures and defined in Section 2 to given by (43).
Theorem 5.7
The following holds:
[TABLE]
Proof. The assertion of Proposition 5.3 in terms of occupational measures can be written as
[TABLE]
which, due to linearity of the integral with respect to , implies that
[TABLE]
Since in the equality above can be any continuous function, we can write
[TABLE]
Denote
[TABLE]
Due to (31) we have
[TABLE]
which, due to convexity of , implies that
[TABLE]
that is,
[TABLE]
From the inclusion
[TABLE]
proved in Proposition 4.1, by taking the union with respect to and, then, closure of the convex hull, we conclude that
[TABLE]
Therefore, from (45) we get
[TABLE]
To complete the proof of the equality
[TABLE]
it remains to show that
[TABLE]
The proof of this relation is based on formula (43) and weak∗ separation theorem. It follows the same steps as the proof of Proposition 6.1 in [15], starting with formula (6.6). The only difference is that the parameter , approaching 0 in [15], should be replaced with , approaching 1. We do not reproduce this proof here.
The proof of the second equality of the theorem is very similar to the proof of (46). Namely, Proposition 5.6 can be written in terms of occupational measures as
[TABLE]
which implies that
[TABLE]
Further, from (38) we derive that (cf. (44)-(45))
[TABLE]
The rest of the proof follows from (47) and (48) using weak∗ separation theorem following the lines of [15], as described above.
6 Appendix
Proof of Proposition 2.5. We have
[TABLE]
The second minimum is equal to , therefore,
[TABLE]
Replacing now and with and , respectively, we obtain relation (10).
Lemma 6.1
([19], Lemma 3.5 (ii))* Let be a measurable function such that for a.a. . Let be arbitrary and*
[TABLE]
Then for any there exists satisfying
[TABLE]
Proof of Lemma 5.4. Lemma 5.4 is a discrete-time analog of Lemma 6.1.
Define the piecewise constant function by
[TABLE]
and apply Lemma 6.1 with . Let us first evaluate given by (49). For we have
[TABLE]
therefore,
[TABLE]
Due to Lemma 6.1 there exists \tilde{T}\geq{\varepsilon/\big{(}(4M+4|\sigma|+\varepsilon)(-\ln\alpha)\big{)}} such that
[TABLE]
In the case if , then and inequality (35) holds in the form
[TABLE]
with . Assume, therefore, that .
Let and denote . We have
[TABLE]
For the second integral we have
[TABLE]
Taking into account that for we have
[TABLE]
therefore, in the case if , for the first integral on the right hand side of (52) we have
[TABLE]
If , then and the inequality above still holds. Thus, we obtain from (52)-(54), that
[TABLE]
and (35) follows from (51) and (55).
Proof of Lemma 5.5. Let If then the statement of the lemma holds with . Assume, therefore, that and set
[TABLE]
Let us show that this satisfies the required properties. Indeed, due to the definition of , hence, . Let us show that (36) is satisfied. Assume the contrary, that is, there exists such that . This implies that
[TABLE]
which contradicts the definition of .
Let us show now that as . We have
[TABLE]
This can be equivalently written as
[TABLE]
or,
[TABLE]
which implies that as , that is, (37) holds.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. Adelman and D. Klabjan, Duality and existence of optimal policies in generalized joint replenishment , Mathematics of Operations Research, 30(1) (2005), 28–-50.
- 2[2] R. Ash, “Measure, Integration and Functional Analysis”, Academic Press, 2014.
- 3[3] J.-P. Aubin, “Viability Theory”, Birkhauser, 1991.
- 4[4] M. Bardi and I. Capuzzo-Dolcetta, “Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations,” Systems and Control: Foundations and Applications, Birkhäuser, Boston, 1997.
- 5[5] A.G. Bhatt and V.S. Borkar, Occupation measures for controlled Markov processes: characterization and optimality, Annals of Probability, 24 (1996), 1531-1562.
- 6[6] C.J. Bishop, E.A. Feinberg and J. Zhang, Examples concerning Abel and Cesàro limits , Journal of Mathematical Analysis and Applications, 420 (2014), 1654-1661
- 7[7] J. Blot, A Pontryagin principle for infinite-horizon problems under constraints, Dynamics of Continuous, Discrete and Impulsive Systems Series B: Applications and Algorithms , 19 (2012), 267-275.
- 8[8] V.S. Borkar, A convex analytic approach to Markov decision processes , Probability Theory and Related Fields, 78 (1988), 583-602.
