On the adjoint Markov policies in stochastic differential games
N.V. Krylov

TL;DR
This paper introduces a method for constructing near-optimal strategies in stochastic differential games using adjoint Markov strategies, which are based on a coupled system of the original and adjoint stochastic equations.
Contribution
It proposes a novel approach to find $ ext{epsilon}$-optimal strategies via adjoint Markov policies linked to a modified Isaacs equation, expanding the toolkit for stochastic differential games.
Findings
Constructed $ ext{epsilon}$-optimal strategies using adjoint Markov policies.
Showed solvability of a modified Isaacs equation in Sobolev spaces.
Provided an example where assumptions fail and $ ext{epsilon}$-optimal strategies may not exist.
Abstract
We consider time-homogeneous uniformly nondegenerate stochastic differential games in domains and propose constructing -optimal strategies and policies by using adjoint Markov strategies and adjoint Markov policies which are actually time-homogeneous Markov, however, relative not to the original process but to a couple of processes governed by a system consisting of the main original equation and of an adjoint stochastic equations of the same type as the main one. We show how to find -optimal strategies and policies in these classes by using the solvability in Sobolev spaces of not the original Isaacs equation but of its appropriate modification. We also give an example of a uniformly nondegenerate game where our assumptions are not satisfied and where we conjecture that there are no not only optimal Markov but even -optimal adjoint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and financial applications · Stability and Controllability of Differential Equations · Nonlinear Partial Differential Equations
On the adjoint Markov policies in stochastic differential games
N.V. Krylov
127 Vincent Hall, University of Minnesota, Minneapolis, MN, 55455
Abstract.
We consider time-homogeneous uniformly nondegenerate stochastic differential games in domains and propose constructing -optimal strategies and policies by using adjoint Markov strategies and adjoint Markov policies which are actually time-homogeneous Markov, however, relative not to the original process but to a couple of processes governed by a system consisting of the main original equation and of an adjoint stochastic equations of the same type as the main one. We show how to find -optimal strategies and policies in these classes by using the solvability in Sobolev spaces of not the original Isaacs equation but of its appropriate modification. We also give an example of a uniformly nondegenerate game where our assumptions are not satisfied and where we conjecture that there are no not only optimal Markov but even -optimal adjoint (time-homogeneous) Markov strategies for one of the players.
Key words and phrases:
Stochastic differential games, Isaacs equation, value functions
2010 Mathematics Subject Classification:
91A05, 91A15, 91A25
1. Introduction
Let be a -dimensional Euclidean space and be an integer. Assume that we are given separable metric spaces and , and let, for each , , the following functions on are given:
(i) matrix-valued ,
(ii) -valued , and
(iii) real-valued , , and .
Under natural assumptions which will be specified later, on a probability space carrying a -dimensional Wiener process one associates with these objects and a bounded domain of class a stochastic differential game with the diffusion term , drift term , discount rate , running cost , and the final cost paid when the underlying process first exits from . More precisely we consider the process defined by the equation
[TABLE]
where and are admissible actions of two players one of which is maximizing and the other minimizing an expression like
[TABLE]
where is the first-exit time of the process from . We adopt the setting almost identical to that of [1] (although our set of admissible policies of and is, generally, wider) and define the order of players and their policies and strategies. Then under very general conditions the value function turns out to be a viscosity solution of the Isaacs equation (see [1]). As in the case of controlled diffusion processes and Bellman’s equations it is natural to use the Isaacs equation to construct -optimal strategy of one player and -optimal policies of the other. By using discrete time approximations of this equation this was done in [2] and lead to the so-called almost optimal approximately Markov time-inhomogeneous policies, whose actions at time depend on a very near past history. Similar constructions one can find in [10].
In this article to find near optimal strategies and policies, we propose using adjoint Markov strategies and adjoint Markov policies which are actually time-homogeneous Markov, however, relative not to the original process but to a couple which is given as a solution of a time-homogeneous system consisting of (1.1) and adjoint stochastic equations of the same type as (1.1). We show how to find -optimal strategies and policies by using the solvability in Sobolev spaces of not the original Isaacs equation but of its appropriate modification. Observe that it is unknown if general even uniformly nondegenerate Isaacs equations have solutions in Sobolev spaces. We also give an example of a uniformly nondegenerate game where our assumptions are not satisfied and where we conjecture that there are no not only optimal Markov but even -optimal adjoint (time-homogeneous) Markov strategy for one of the players.
As a point of comparison note that in [1] and [2] the authors deal with time-inhomogeneous possibly degenerate stochastic differential games on a finite time interval in the whole space. In our case we have a uniformly nondegenerate time-homogeneous stochastic differential game in a domain where it is quite natural to look for time-homogeneous Markov strategies and policies.
The article is organized as follows. In the next section we present our main results. In Section 3 we prove some auxiliary results. Theorems 2.1 and 2.2 and Lemma 2.3 are proved in Section 4. In Section 5 we apply the previous results to the case of controlled diffusion processes, to which belongs Theorem 2.4 proved in Section 6. Finally, in Section 7 we prove Theorem 2.5 saying what happens if the Isaacs condition is satisfied.
By sometimes with arguments we denote various constants, depending only on the arguments if they are present, but which may change from one occurrence to another and, if in a statement, we are proving, there is a claim that depends only on , then in the proof all constants called depend only on unless specifically indicated otherwise.
2. Main results
Set a^{\alpha\beta}=(1/2)\sigma^{\alpha\beta}\big{(}\sigma^{\alpha\beta}\big{)}^{*}.
Assumption 2.1**.**
(i) a) The functions are continuous with respect to for each and continuous with respect to uniformly with respect to for each . b) These functions are continuous with respect to uniformly with respect to and , the function .
(ii) There are constants and such that and for any
[TABLE]
[TABLE]
(iii) There is a constant such that for any , , and we have
[TABLE]
The reader understands, of course, that the summation convention is adopted throughout the article.
Note that Assumption 2.1 (iii) obviously implies that .
Let be a complete probability space, let be an increasing filtration of -fields such that each is complete with respect to , and let , be a standard -dimensional Wiener process given on such that is a Wiener process relative to the filtration .
The following by now standard setting originated in [1] although we prefer the notation introduced in [7]. The set of progressively measurable -valued processes is denoted by . Similarly we define as the set of -valued progressively measurable functions. These are the sets of policies. By we denote the set of (strategies) -valued functions \text{\raise-0.86108pt\hbox{\bm{\beta}}}(\alpha_{\cdot}) on such that, for any and any satisfying
[TABLE]
we have
[TABLE]
For , , and define as a unique solution of the Itô equation (1.1) and set
[TABLE]
Next, recall that is a bounded domain in of class , define as the first exit time of from , and introduce
[TABLE]
where the indices , , and at the expectation sign are written to mean that they should be placed inside the expectation sign wherever and as appropriate, that is
[TABLE]
[TABLE]
Observe that this definition makes perfect sense due to Theorem 2.2.1 of [4] and in . Similar abbreviated notation will be used in other cases when the underlying processes and functions depend on initial data or other parameters and functions.
Before stating our first main result we introduce two more assumptions and a notation.
Assumption 2.2**.**
For any , there exists a finite set such that for any there exists an such that for it holds that
[TABLE]
As is easy to see one can choose satisfying (2.2) to be a Borel function.
Assumption 2.3**.**
Either are symmetric positive-definite matrix-valued functions or there is a constant such that for all and all .
The second part of this assumption means that the last columns of form an identity matrix multiplied by . The only use of this assumption is (4.7) which can be satisfied in very many other situations.
Take and fix a with unit integral and for a Borel measurable -valued function on and bounded measurable functions given on and set
[TABLE]
Theorem 2.1**.**
Under the above assumptions for any there exist a Borel measurable -valued function on and such that, if, for , , and , we define the process as a solution of
[TABLE]
where and are defined according to (2.3), and set \text{\raise-0.86108pt\hbox{\bm{\beta}}}^{\rho}_{t}(\alpha_{\cdot},x)=\beta(\alpha_{t},y_{t}^{\alpha_{\cdot}x}(\rho)), then
[TABLE]
Furthermore, there exists a finite number of mutually disjoint subsets , of such that and for each we have whenever .
Observe that, obviously, (2.4) has a unique solution. Strategies like
[TABLE]
are naturally called adjoint Markov strategies, because their actions at time albeit are not based only on the current action of and the current state of but still use instead of the latter the current state of an adjoint process , which, as we will see, is close to x_{t}=x_{t}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{\bm{\beta}}}^{\rho}(\alpha_{\cdot},x)x} if is small.
In the next theorem Assumption 2.3 is not used.
Theorem 2.2**.**
In Theorem 2.1 drop Assumption 2.3 but suppose that on there is a Wiener process , independent of . Then for any there exists a constant such that all assertions of Theorem 2.1 hold true if we add to the right-hand side of (2.4) the term .
Here we see another instance of adjoint Markov strategies of the player . With the choice \text{\raise-0.86108pt\hbox{\bm{\beta}}}^{\rho}_{t}(\alpha_{\cdot},x)=\beta(\alpha_{t},y_{t}^{\alpha_{\cdot}x}(\rho)) the process x_{t}=x_{t}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{\bm{\beta}}}^{\rho}(\alpha_{\cdot},x)x} satisfies
[TABLE]
where is defined from (2.4). Therefore, for the player to find an adequate response to the above strategy \text{\raise-0.86108pt\hbox{\bm{\beta}}}^{\rho}_{t}(\alpha_{\cdot},x), he should solve a more or less standard problem of optimal control of the two-component diffusion process governed by the system (2.4)-(2.6) and maximize the expectation in (2.5). An unpleasant feature of this couple is that it is always a degenerate process. It turns out that one can reduce the problem to optimal control of only when is sufficiently small and then the same Theorem 2.1 applied in the case of only one player will provide an adjoint Markov policy while controlling which will become an adjoint Markov policy of in the original game. The above mentioned reduction of the optimal control problem is based on the following.
Lemma 2.3**.**
One more assertion can be added in Theorems 2.1 and 2.2: for any
[TABLE]
[TABLE]
where
[TABLE]
where and are defined according to (2.3), and is the first exit time of from .
This lemma and Theorems 2.1 and 2.2 almost immediately lead to the following result about -optimal adjoint Markov policies for .
Theorem 2.4**.**
Let either
(a) the assumptions of Theorem 2.1 be satisfied, or
(b) the assumptions of Theorem 2.2 be satisfied.
Take , , , and \text{\raise-0.86108pt\hbox{\bm{\beta}}}^{\rho}(\alpha_{\cdot},x) from Theorem 2.1 or 2.2, respectively. Then there exist Lipschitz continuous in -matrix valued and -valued given on , there exists a Borel measurable -valued function on , and in case (b) there also exists a constant , such that, if for we define the process by
[TABLE]
in case (a) with the additional term on the right-hand side of (2.8) in case (b) and set , then
[TABLE]
[TABLE]
Remark 2.1*.*
The above results hold under milder assumptions than the ones imposed. For instance, an absolutely cheep generalization is that it suffices to have rather than because one can use uniform approximations of . The domain also need not be in . It is quite sufficient for it to satisfy the exterior cone condition or be even worse than that. Again appropriate approximations would do the job.
The point of the article was to promote adjoint Markov policies and strategies, rather than deal with numerous side problems arising along the way.
Example 2.1*.*
Let , , , , , , . The Isaacs equation is
[TABLE]
which is equivalent to
[TABLE]
The solution of this equation in with zero boundary data is zero. The inf inside is zero for any and is obtained on ().
Like in [1] and [2], let our probability space be the space of real-valued continuous functions on with Wiener measure on the -field of Borel subsets of . Let the Wiener process be defined by , . Also let be the -field generated by .
In such situation the equation
[TABLE]
does not have -adapted solutions at all (Tanaka’s example), and cannot use the strategy , since can choose to be 1 for all times.
The author believes that in this example there is no (time-homogeneous) -optimal adjoint Markov strategies for if is small enough. Regarding time-inhomogeneous adjoint Markov strategies the reader is referred to [5]. However, our results show that, if we just take two independent copies of our probability space with being the Wiener process on one copy and being the Wiener process on the other, take a mollification of take a and introduce an adjoint process by
[TABLE]
then the strategy \text{\raise-0.86108pt\hbox{\bm{\beta}}}_{t}(\alpha_{\cdot})=\alpha_{t}\text{sign}\,y_{t} will be -optimal for if the mollification is done with kernel of sufficiently small size and is sufficiently small. By the way, on thus extended probability space (2.10) still does not have solutions.
Assumption 2.4**.**
Assumption 2.2 is not necessarily satisfied, but for any , there exists a finite set such that for any there exists an such that for it holds that
[TABLE]
and for any on we have
[TABLE]
[TABLE]
When the Isaacs condition (2.12) is satisfied it is natural to introduce as the set of -valued functions \text{\hbox{\bm{\alpha}}}(\beta_{\cdot}) on such that, for any and any satisfying
[TABLE]
we have
[TABLE]
Theorem 2.5**.**
Under the Assumptions 2.1, 2.3, and 2.4 for any there exist a Borel measurable -valued function on and such that, if for , , and we define the process as a solution of
[TABLE]
where and are found following the example
[TABLE]
and set \text{\hbox{\bm{\alpha}}}^{\rho}_{t}(\beta_{\cdot},x)=\alpha(y_{t}^{\beta_{\cdot}x}(\rho)), then
[TABLE]
Remark 2.2*.*
Analogous theorem is valid when we drop Assumption 2.3 in Theorem 2.5 but suppose that on there is a Wiener process , independent of .
Remark 2.3*.*
Observe that in Theorem 2.1 we are talking about the function depending both on and and in Theorem 2.5 we have a function of only . Of course, this is because (2.11) is assumed in Theorem 2.5.
Remark 2.4*.*
As a corollary of Theorems 2.1 and 2.5 we obtain a well-known fact that our game has value and our strategies for and form, so to speak, -saddle point and the game may be called fair.
3. Auxiliary results
Here is a well-known result which, for instance, is a particular case of Lemma 2.1 of [7].
Lemma 3.1**.**
Let be a -matrix-valued and be an -valued progressively measurable functions on . Suppose that
[TABLE]
[TABLE]
for all and , where is a fixed constant. Take and define as the first exit time from of
[TABLE]
Then for any there exists a constant , depending only on , , , , and the diameter of , such that .
The following result is also very well known (can be obtained, for instance, by combining Lemma 2.8 of [3] and Lemma 8.5 and Theorem 3.1 of [6]). By we denote the set of symmetric matrices whose eigenvalues are between and . Introduce , and let denote the gradient of .
Lemma 3.2**.**
Let . Then there exists a function such that on , on , on , and
[TABLE]
on for any and such that .
The next few results are needed while investigating how far off the adjoint processes are of real controlled ones.
Lemma 3.3**.**
Let , , be -matrix-valued and , , be -valued functions on . Suppose that for each these functions restricted to are measurable with respect to , where is the Borel -field in . Assume that and are progressively measurable for any , and are Lipschitz continuous with respect to with constant , and and are Lipschitz continuous with respect to with a constant independent of . Suppose that there exists a function on such that for any
[TABLE]
for all . Also suppose that and satisfy (3.1) and satisfies (3.2) for all values of indices, arguments, and all .
Take and define the processes and by
[TABLE]
[TABLE]
Obviously this system has a unique solution. Finally, set to be the minimum of the exit times of and from . Then, for any , we have
[TABLE]
where depends only on , , , , and the diameter of .
Proof. We modify the coefficients of system (3.4) by multiplying them by , which does not affect (3.5), allows us to eliminate from it and also allows us to formally apply Theorem 2.5.9 of [4] according to which the left-hand side of (3.5) is less than
[TABLE]
where . In light of (3.3), the expectation here is estimated by
[TABLE]
and it only remains to apply Theorem 2.2.2 of [4]. The lemma is proved.
Corollary 3.4**.**
Under the assumptions of Lemma 3.3, For any , we have
[TABLE]
where is the right-hand side of (3.5) and depends only on , and the diameter of .
Indeed, it suffices to use Lemma 3.3 and observe that
[TABLE]
Lemma 3.5**.**
Let , , , be as in Lemma 3.3 but independent of and assume that they satisfy (3.1) and (3.2) for all values of indices, arguments, and all . Take , , and set
[TABLE]
Introduce as the minimum of the first exit times of , , from . Let , , be real-valued jointly measurable processes given on and bounded by a constant .
Then for any
[TABLE]
[TABLE]
where depends only on , , , , , and the diameter of , and as , depends only on , , and the diameter of , and as and depends only on , , , and the diameter of .
Proof. First observe that
[TABLE]
where
[TABLE]
By Theorem 2.2.2 of [4]
[TABLE]
where depends only on , , , and the diameter of . It follows that it suffices to prove the lemma for .
In that case we extend beyond by setting it to be zero there, which does not affect (3.6), introduce as the convolution of and , and replace in the left-hand side of (3.6) with . The error of the replacement is less than
[TABLE]
which by Theorem 2.2.2 of [4] is less than a constant, depending only on , , , and the diameter of , times
[TABLE]
which tends to zero as . This gives us the term on the right in (3.6). Finally,
[TABLE]
[TABLE]
The lemma is proved.
4. Proof of Theorems 2.1
Recall that a^{\alpha\beta}=(1/2)\sigma^{\alpha\beta}\big{(}\sigma^{\alpha\beta}\big{)}^{*} and for sufficiently smooth functions introduce
[TABLE]
Also set
[TABLE]
Lemma 4.1**.**
Take and . Then for any there exists a Borel -valued function on such that for almost all
[TABLE]
where
[TABLE]
Proof. Fix and and choose , , and so that they are Borel functions. Then let be a countable everywhere dense set in . Since are continuous in ,
[TABLE]
and for any there exists with the least for which
[TABLE]
As is easy to see, is a Borel function and such is as well. For set , where is any element of . Then we get a function we need and the lemma is proved.
Lemma 4.2**.**
Take and . Then there exists a finite family of Borel -valued functions on and a Borel -valued function on such that
(i) for any ;
(ii) for
[TABLE]
we have
[TABLE]
Proof. Again choose , , and so that they are Borel functions and take from Assumption 2.2 for . Then let be functions found from Lemma 4.1 corresponding to , . Define to be the first for which (2.2) holds. Finally, set
[TABLE]
By Assumption 2.2, for any , , and ,
[TABLE]
[TABLE]
where and below the constants denoted by depend only on . By plugging in we find that, for any and
[TABLE]
[TABLE]
where the last inequality is due to . This yields (4.3) with on the right multiplied by times the -norm of . Obviously, this is enough and the lemma is proved.
Set
[TABLE]
By Theorem 14.1.6 of [9] for each the equation
[TABLE]
in (a.e.) with boundary condition has a solution for any . By following the arguments in Section 7 of [8], we conclude that the ’s admit a representation as the value functions in the corresponding stochastic games and by Theorem 7.1 of [8] we have uniformly on as . Observe that (a.e.) in
[TABLE]
Next, fix and . Below we introduce some objects which may change as we change and , but we still do not exhibit their dependence on for simplicity of notation and because are fixed for now.
Let and be the family of functions and function from Lemma 4.2 with in place of . Observe that by construction and (4.4)
[TABLE]
where is such that .
Use this in (2.3) and (2.4) to define , \text{\raise-0.86108pt\hbox{\bm{\beta}}}^{\rho}_{t}(\alpha_{\cdot},x)=\beta(\alpha_{t},y_{t}^{\alpha_{\cdot}x}(\rho)), and x_{t}=x_{t}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{\bm{\beta}}}^{\rho}(\alpha_{\cdot},x)x}. First, we want to prove that and are close when is sufficiently small. This will be based in part on the fact that the couple is a solution of the system
[TABLE]
An important and easy consequence of Assumption 2.3 is that
[TABLE]
for all .
Lemma 4.3**.**
For any vector-valued define
[TABLE]
Then for any there exist and a function such that, for all , , , and we have
[TABLE]
Proof. According to Assumption 2.2 for any there exists a finite subset (independent of ) of such that
[TABLE]
Take an and observe that the set
[TABLE]
is finite (see Lemma 4.2) and each element of this set is bounded and measurable with respect to . By the Lebesgue theorem
[TABLE]
as at almost any point . Hence,
[TABLE]
where are bounded uniformly with respect to and tend to zero as (a.e.) in , in particular, in for any . As a result, for any and ,
[TABLE]
where for all sufficiently small
[TABLE]
This is, certainly, enough and the lemma is proved.
Lemma 4.4**.**
Introduce \theta=\theta^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{\bm{\beta}}}^{\rho}(\alpha_{\cdot},x)x}(\rho) as the minimum of the first exit times of x_{t}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{\bm{\beta}}}^{\rho}(\alpha_{\cdot},x)x} and of from . Then
[TABLE]
as uniformly with respect to .
Proof. By Corollary 3.4 and Lemma 4.3, for any the left-hand side of (4.9) is less than , where is independent of , for all small enough and so is its lim sup as . Sending first and then yields the desired result. The lemma is proved.
Corollary 4.5**.**
For we have
[TABLE]
as uniformly with respect to , where is introduced according to (2.3).
Indeed, since is continuous in uniformly with respect to , one can replace in (4.10) with only incurring the error
[TABLE]
[TABLE]
where , , is a bounded continuous function, . By Lemmas 3.1 and 4.4 this error tends to zero as uniformly with respect to . Due to Theorem 2.2.2 of [4] and Lemma 4.3, what remains after the above mentioned replacement is less than a constant independent of times the -norm of , which also tends to zero as uniformly with respect to .
Theorem 4.6**.**
For any , we have
[TABLE]
[TABLE]
where is independent of , as , is independent of , as , depends only on , and the diameter of , is independent of and as .
Proof. For simplicity of notation we drop the argument of and . Take and observe that in the notation from Lemma 4.4 by Itô’s formula
[TABLE]
[TABLE]
[TABLE]
where, dropping obvious values of indices,
[TABLE]
By Lemma 3.5 with , for any , the last term in (4.13) is less than
[TABLE]
[TABLE]
[TABLE]
[TABLE]
where are independent of , and , as , as , and we use the notation
[TABLE]
By Corollary 4.5 the factor of in (4.14) is dominated by for an appropriate function which tends to zero as . The last term in (4.14) is dominated by .
After that taking into account (4.5) and Theorem 2.2.2 of [4] we see that
[TABLE]
[TABLE]
where depend only on and the diameter of . We can replace the last in the integrand in (4.15) by incurring as in Corollary 4.5 another error term like which goes to zero as . By adding to this that
[TABLE]
[TABLE]
[TABLE]
[TABLE]
where depends only on , , and , we see that to prove (4.12) it suffices now to show that
[TABLE]
as uniformly with respect to . By Lemma 3.2 and Itô’s formula we have
[TABLE]
and it only remains to use Lemma 4.4 once more. The theorem is proved.
Proof of Theorem 2.1. First choose and fix and so that and , where is taken from Theorem 4.6. Then find and fix and from . Finally find such that
[TABLE]
Then (4.12) will become (2.14).
The last statement of the theorem follows by construction of \text{\raise-0.86108pt\hbox{\bm{\beta}}}^{\rho}(\alpha_{\cdot},x). The theorem is proved.
Remark 4.1*.*
An important particular case of Theorem 2.1 is when are independent of , so that we are actually dealing with a controlled diffusion process. Also, clearly, similar statements to Theorem 2.1 hold true if we exchange the roles of and and consider the stochastic differential game corresponding to
[TABLE]
in place of (4.1). Of course, one should then replace Assumption 2.2 with a similar one about . To reduce this game to the one we are treating, it suffices just to rename and and take , and in place of , , and , respectively.
Proof of Theorem 2.2. Fix and replace (1.1) with
[TABLE]
The solution of this equation is denoted by and by we denote its first exit time from . We take the same and define by (2.1) where we replace , and with , and
[TABLE]
respectively. Obviously to thus obtained new stochastic differential game we can apply Theorem 2.1 and conclude that for any there exists , with the properties described in Theorem 2.1 and such that if for , , and we define the process as a solution of
[TABLE]
then
[TABLE]
[TABLE]
It follows that to prove the theorem it suffices to show that
[TABLE]
[TABLE]
as uniformly with respect to , , and .
First observe (although this is an overkill) that Lemma 3.3 is applicable here when ’s are independent of the first space variable. Then Corollary 3.4 is also applicable which as in Lemma 4.4 leads to the conclusion that
[TABLE]
as uniformly with respect to , , and , where is the minimum of exit times of and from .
Next, while proving (4.18) first assume that . Observe that, owing to (4.19), the argument at the end of the proof of Theorem 4.6 shows that it suffices to prove the version of (4.18) when both and are replaced with (assuming ).
Then notice that in light of the continuity of in uniform with respect to (cf. also (4.11))
[TABLE]
as uniformly with respect to , , and .
Also
[TABLE]
[TABLE]
[TABLE]
[TABLE]
where
[TABLE]
One sees easily as above that as uniformly with respect to , , and .
It remains to deal with the terms containing in (4.18). Since , by Itô’s formula we have
[TABLE]
[TABLE]
[TABLE]
[TABLE]
[TABLE]
The second term on the right in (4.21) clearly goes to zero as uniformly with respect to , , and . The difference of the remaining ones in (4.20) and (4.21) is shown to do the same by the first part of the proof. The theorem is proved.
Proof of Lemma 2.3. This proof if very similar to the second part of the proof of Theorem 2.2. First we assume that . Take \theta=\theta^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{\bm{\beta}}}^{\rho}(\alpha_{\cdot},x)x}(\rho) from Lemma 4.4 and note that the argument at the end of the proof of Theorem 4.6 shows that it suffices to prove the version of (2.7) when both and are replaced with (assuming ).
Next, observe that
[TABLE]
[TABLE]
By Corollary 4.5 the last expression tends to zero as uniformly with respect to and .
Also as in the above proof
[TABLE]
[TABLE]
where J_{x}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{\bm{\beta}}}^{(\rho)}(\alpha_{\cdot},x)} stands for
[TABLE]
[TABLE]
Lemma 3.1 and Corollary 4.5 convince us that I_{x}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{\bm{\beta}}}^{(\rho)}(\alpha_{\cdot},x)}\to 0 as uniformly with respect to and .
It remains to deal with the terms containing in (2.7). Again by using Itô’s formula we write
[TABLE]
[TABLE]
Similarly we transform the term with involving and then we reduce the problem to estimating the terms like the ones we started with. The lemma is proved.
5. A particular case where is a singleton
Here we assume that is a singleton and will not write and in our notation. In particular, now we are dealing with a controlled diffusion process given as a solution of the equation
[TABLE]
Its solution is denoted by . Our goal is to minimize
[TABLE]
over , where (according to our standard notation) is the first exit time of from , ,
[TABLE]
In this case Theorem 2.1 becomes the following.
Theorem 5.1**.**
Under the assumptions of Theorem 2.1 for any there exist a Borel measurable -valued function on and such that, if for , we define
[TABLE]
introduce similarly, and for define the process by
[TABLE]
and set , then
[TABLE]
[TABLE]
Here is a version of Theorem 2.2
Theorem 5.2**.**
In Theorem 5.1 drop Assumption 2.3 but suppose that on there is a Wiener process , independent of . Then for any there exists a constant such that all assertions of Theorem 2.1 hold true if we add to the right-hand side of (5.3) the term .
Remark 5.1*.*
In Section 6 we are going to maximize (5.2) instead of minimizing it. One problem is reduced to another just by changing signs of and . Also it is worth noting that in Section 6 the parameter used in maximization is called instead of .
6. Adjoint -optimal Markov policies
for
Take , , from Theorem 2.1 use the notation (2.3) and, for and , defined the controlled diffusion process by
[TABLE]
with the reward function
[TABLE]
We are going to maximize (6.2) treating here as in Section 5 and adjusting the maximization problem to the one of minimization.
However, there is a formal objection to overcome before we can translate the results of Section 5 to our situation. Namely, in Section 5, the functions as inherited from taking as a singleton were assumed to be continuous with respect to . Therefore, here we need our to be continuous with respect to and they may fail to be such because, even if in (2.3) is continuous in the first argument uniformly with respect to , can be discontinuous as a function of . Indeed, for different , can be very different functions of . However, in light of the second statement in Theorem 2.1 to make continuous with respect to it suffices just to change the distance function in keeping it the same as belong to the same and defining it as otherwise. By the way, this change in no way affects the set of policies of and only allows us to formally apply the results of Section 5.
According to Theorem 5.1 for any there exist a Borel measurable -valued function on and a Lipschitz continuous functions and on with values in the set of -matrices and in , respectively, such that, if for we define the process by
[TABLE]
and set , then
[TABLE]
[TABLE]
Finally, due to Lemma 2.3, (6.4) implies that (2.9) holds with in place of . This proves part (a) of Theorem 2.4. The proof of part (b) is quite similar and the theorem is proved.
7. Proof of Theorem 2.5
If in Theorem 14.1.6 of [9] we replace and by and , then we will see that for any the equation
[TABLE]
in (a.e.) with boundary condition has a solution for any . By following the arguments in Section 7 of [8], we conclude that uniformly on as . Observe that (a.e.) in
[TABLE]
Fix and . In the same way in which we found above the function we find a Borel -valued function such that in
[TABLE]
Our goal is to prove that if and are large enough and is small enough, then the above is the one we are talking about in Theorem 2.5.
Take and \text{\hbox{\bm{\alpha}}}^{\rho}_{t}(\beta_{\cdot},x)=\alpha(y_{t}^{\beta_{\cdot}x}(\rho)) from the statement of the theorem. Introduce \theta=\theta^{\text{\hbox{\bm{\alpha}}}^{\rho}(\beta_{\cdot},x)\beta_{\cdot}x}(\rho) as the minimum of the first exit times of x_{t}^{\text{\hbox{\bm{\alpha}}}^{\rho}(\beta_{\cdot},x)\beta_{\cdot}x} and of from . Then in the same way in which we arrived at Lemma 4.4 we obtain that
[TABLE]
as uniformly with respect to .
Then following closely the argument in Section 4 we get an analog of Theorem 4.6 that for any , we have
[TABLE]
[TABLE]
where is independent of , as , is independent of , as , depends only on , and the diameter of , is independent of and as .
After that the assertion of Theorem 2.5 is obtained by the same short argument as in Section 4 in the proof of Theorem 2.1.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] W. H. Fleming and P. E. Souganidis, On the existence of value functions of two-player, zero-sum stochastic differential games , Indiana Univ. Math. J., Vol. 38 (1989), No. 2, 293–314.
- 2[2] W.H. Fleming and D. Hernández-Hernández, On the value of stochastic differential games , Commun. Stoch. Anal., Vol. 5 (2011), No. 2, 341–351.
- 3[3] D. Gilbarg and L. Hörmander, Intermediate Schauder estimates , Archive Rational Mech. Anal., Vol. 74, No. 4 (1980), 297-318.
- 4[4] N.V. Krylov, “Controlled diffusion processes”, Nauka, Moscow, 1977 in Russian; English translation Springer, 1980.
- 5[5] N.V. Krylov, The sufficiency of the adjoint Markov strategies for controlled diffusion processes , Teoriya Veroyatnostei i eye Primeneniya, Vol. 31 (1986), No. 2, 353–358 in Russian; English transl. in Theor. Probability Appl., Vol. 31 (1987), No. 2, 304–309,
- 6[6] N.V. Krylov, On a representation of fully nonlinear elliptic operators in terms of pure second order derivatives and its applications , Problemy Matemat. Analiza, Vol. 59, July 2011, p. 3–24 in Russian; English translation: Journal of Mathematical Sciences, New York, Vol. 177 (2011), No. 1, 1-26.
- 7[7] N.V. Krylov, On the dynamic programming principle for uniformly nondegenerate stochastic differential games in domains , Stochastic Processes and their Applications, Vol. 123 (2013), No. 8, 3273–3298.
- 8[8] N.V. Krylov, On the dynamic programming principle for uniformly nondegenerate stochastic differential games in domains and the Isaacs equations , Probab. Theory Relat. Fields, Vol. 158 (2014), No. 3, 751–783.
