On the adjoint Markov policies in stochastic differential games

N.V. Krylov

arXiv:1903.10072·math.OC·March 26, 2019

On the adjoint Markov policies in stochastic differential games

N.V. Krylov

PDF

Open Access

TL;DR

This paper introduces a method for constructing near-optimal strategies in stochastic differential games using adjoint Markov strategies, which are based on a coupled system of the original and adjoint stochastic equations.

Contribution

It proposes a novel approach to find $ ext{epsilon}$-optimal strategies via adjoint Markov policies linked to a modified Isaacs equation, expanding the toolkit for stochastic differential games.

Findings

01

Constructed $ ext{epsilon}$-optimal strategies using adjoint Markov policies.

02

Showed solvability of a modified Isaacs equation in Sobolev spaces.

03

Provided an example where assumptions fail and $ ext{epsilon}$-optimal strategies may not exist.

Abstract

We consider time-homogeneous uniformly nondegenerate stochastic differential games in domains and propose constructing $ε$ -optimal strategies and policies by using adjoint Markov strategies and adjoint Markov policies which are actually time-homogeneous Markov, however, relative not to the original process but to a couple of processes governed by a system consisting of the main original equation and of an adjoint stochastic equations of the same type as the main one. We show how to find $ε$ -optimal strategies and policies in these classes by using the solvability in Sobolev spaces of not the original Isaacs equation but of its appropriate modification. We also give an example of a uniformly nondegenerate game where our assumptions are not satisfied and where we conjecture that there are no not only optimal Markov but even $ε$ -optimal adjoint…

Equations309

x_{t} = x + \int_{0}^{t} σ^{α_{s} β_{s}} (x_{s}) d w_{s} + \int_{0}^{t} b^{α_{s} β_{s}} (x_{s}) d s,

x_{t} = x + \int_{0}^{t} σ^{α_{s} β_{s}} (x_{s}) d w_{s} + \int_{0}^{t} b^{α_{s} β_{s}} (x_{s}) d s,

E \int_{0}^{τ} f^{α_{t} β_{t}} (x_{t}) d t,

E \int_{0}^{τ} f^{α_{t} β_{t}} (x_{t}) d t,

∥ σ^{α β} (x) - σ^{α β} (y) ∥ \leq K_{1} ∣ x - y ∣, ∣ b^{α β} (x) - b^{α β} (y) ∣ \leq K_{1} ∣ x - y ∣,

∥ σ^{α β} (x) - σ^{α β} (y) ∥ \leq K_{1} ∣ x - y ∣, ∣ b^{α β} (x) - b^{α β} (y) ∣ \leq K_{1} ∣ x - y ∣,

∥ σ^{α β} (x) ∥, ∣ b^{α β} (x) ∣, ∣ c^{α β} (x) ∣, ∣ f^{α β} (x) ∣ \leq K_{0} .

∥ σ^{α β} (x) ∥, ∣ b^{α β} (x) ∣, ∣ c^{α β} (x) ∣, ∣ f^{α β} (x) ∣ \leq K_{0} .

δ ∣ λ ∣^{2} \leq a_{ij}^{α β} (x) λ^{i} λ^{j} \leq δ^{- 1} ∣ λ ∣^{2} .

δ ∣ λ ∣^{2} \leq a_{ij}^{α β} (x) λ^{i} λ^{j} \leq δ^{- 1} ∣ λ ∣^{2} .

P (α_{t}^{1} = α_{t}^{2} for almost all t \leq T) = 1,

P (α_{t}^{1} = α_{t}^{2} for almost all t \leq T) = 1,

P (\raise -0.86108pt β_{t} (α_{\cdot}^{1}) = \raise -0.86108pt β_{t} (α_{\cdot}^{2}) for almost all t \leq T) = 1.

P (\raise -0.86108pt β_{t} (α_{\cdot}^{1}) = \raise -0.86108pt β_{t} (α_{\cdot}^{2}) for almost all t \leq T) = 1.

ϕ_{t}^{α_{\cdot} β_{\cdot} x} = \int_{0}^{t} c^{α_{s} β_{s}} (x_{s}^{α_{\cdot} β_{\cdot} x}) d s .

ϕ_{t}^{α_{\cdot} β_{\cdot} x} = \int_{0}^{t} c^{α_{s} β_{s}} (x_{s}^{α_{\cdot} β_{\cdot} x}) d s .

v(x)=\operatornamewithlimits{inf\,\,\,sup}_{\text{\raise-0.60275pt\hbox{$\bm{\beta}$}}\in\mathbb{B}\,\,\alpha_{\cdot}\in\mathfrak{A}}E_{x}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{$\bm{\beta}$}}(\alpha_{\cdot})}\big{[}\int_{0}^{\tau}f(x_{t})e^{-\phi_{t}}\,dt+g(x_{\tau})e^{-\phi_{\tau}}\big{]},

v(x)=\operatornamewithlimits{inf\,\,\,sup}_{\text{\raise-0.60275pt\hbox{$\bm{\beta}$}}\in\mathbb{B}\,\,\alpha_{\cdot}\in\mathfrak{A}}E_{x}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{$\bm{\beta}$}}(\alpha_{\cdot})}\big{[}\int_{0}^{\tau}f(x_{t})e^{-\phi_{t}}\,dt+g(x_{\tau})e^{-\phi_{\tau}}\big{]},

E_{x}^{\alpha_{\cdot}\beta_{\cdot}}\big{[}\int_{0}^{\tau}f(x_{t})e^{-\phi_{t}}\,dt+g(x_{\tau})e^{-\phi_{\tau}}\big{]}

E_{x}^{\alpha_{\cdot}\beta_{\cdot}}\big{[}\int_{0}^{\tau}f(x_{t})e^{-\phi_{t}}\,dt+g(x_{\tau})e^{-\phi_{\tau}}\big{]}

:=E\big{[}g(x^{\alpha_{\cdot}\beta_{\cdot}x}_{\tau^{\alpha_{\cdot}\beta_{\cdot}x}})e^{-\phi^{\alpha_{\cdot}\beta_{\cdot}x}_{\tau^{\alpha_{\cdot}\beta_{\cdot}x}}}+\int_{0}^{\tau^{\alpha_{\cdot}\beta_{\cdot}x}}f^{\alpha_{t}\beta_{t}}(x^{\alpha_{\cdot}\beta_{\cdot}x}_{t})e^{-\phi^{\alpha_{\cdot}\beta_{\cdot}x}_{t}}\,dt\big{]}.

:=E\big{[}g(x^{\alpha_{\cdot}\beta_{\cdot}x}_{\tau^{\alpha_{\cdot}\beta_{\cdot}x}})e^{-\phi^{\alpha_{\cdot}\beta_{\cdot}x}_{\tau^{\alpha_{\cdot}\beta_{\cdot}x}}}+\int_{0}^{\tau^{\alpha_{\cdot}\beta_{\cdot}x}}f^{\alpha_{t}\beta_{t}}(x^{\alpha_{\cdot}\beta_{\cdot}x}_{t})e^{-\phi^{\alpha_{\cdot}\beta_{\cdot}x}_{t}}\,dt\big{]}.

β \in B x \in G sup ∣ u^{α β} (x) - u^{α (i) β} (x) ∣ \leq ε .

β \in B x \in G sup ∣ u^{α β} (x) - u^{α (i) β} (x) ∣ \leq ε .

h^{(ρ)} (α, y, x) = \int_{R^{d}} h (α, β (α, y + ρ z), x + ρ z) ζ (z) d z, h^{(ρ)} (α, y) = h^{(ρ)} (α, y, y) .

h^{(ρ)} (α, y, x) = \int_{R^{d}} h (α, β (α, y + ρ z), x + ρ z) ζ (z) d z, h^{(ρ)} (α, y) = h^{(ρ)} (α, y, y) .

d y_{t} = σ^{(ρ)} (α_{t}, y_{t}) d w_{t} + b^{(ρ)} (α_{t}, y_{t}) d t, t \geq 0, y_{0} = x,

d y_{t} = σ^{(ρ)} (α_{t}, y_{t}) d w_{t} + b^{(ρ)} (α_{t}, y_{t}) d t, t \geq 0, y_{0} = x,

v(x)\leq\sup_{\alpha_{\cdot}\in\mathfrak{A}}E_{x}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{$\bm{\beta}$}}^{\rho}(\alpha_{\cdot},x)}\big{[}\int_{0}^{\tau}f(x_{t})e^{-\phi_{t}}\,dt+g(x_{\tau})e^{-\phi_{\tau}}\big{]}\leq v(x)+\varepsilon.

v(x)\leq\sup_{\alpha_{\cdot}\in\mathfrak{A}}E_{x}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{$\bm{\beta}$}}^{\rho}(\alpha_{\cdot},x)}\big{[}\int_{0}^{\tau}f(x_{t})e^{-\phi_{t}}\,dt+g(x_{\tau})e^{-\phi_{\tau}}\big{]}\leq v(x)+\varepsilon.

β (α_{t}, y_{t}^{α_{\cdot} x} (ρ))

β (α_{t}, y_{t}^{α_{\cdot} x} (ρ))

d x_{t} = σ (α_{t}, β (α_{t}, y_{t}), x_{t}) d w_{t} + b (α_{t}, β (α_{t}, y_{t}), x_{t}) d t, t \geq 0, x_{0} = x,

d x_{t} = σ (α_{t}, β (α_{t}, y_{t}), x_{t}) d w_{t} + b (α_{t}, β (α_{t}, y_{t}), x_{t}) d t, t \geq 0, x_{0} = x,

\Big{|}E_{x}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{$\bm{\beta}$}}^{\rho}(\alpha_{\cdot},x)}\big{[}\int_{0}^{\tau}f(x_{t})e^{-\phi_{t}}\,dt+g(x_{\tau})e^{-\phi_{\tau}}\big{]}

\Big{|}E_{x}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{$\bm{\beta}$}}^{\rho}(\alpha_{\cdot},x)}\big{[}\int_{0}^{\tau}f(x_{t})e^{-\phi_{t}}\,dt+g(x_{\tau})e^{-\phi_{\tau}}\big{]}

-E_{x}^{\alpha_{\cdot}}\big{[}\int_{0}^{\tau(\rho)}f^{(\rho)}(y_{t}(\rho))e^{-\phi_{t}(\rho)}\,dt+g(y_{\tau(\rho)}(\rho))e^{-\phi_{\tau(\rho)}(\rho)}\big{]}\Big{|}\leq\varepsilon,

-E_{x}^{\alpha_{\cdot}}\big{[}\int_{0}^{\tau(\rho)}f^{(\rho)}(y_{t}(\rho))e^{-\phi_{t}(\rho)}\,dt+g(y_{\tau(\rho)}(\rho))e^{-\phi_{\tau(\rho)}(\rho)}\big{]}\Big{|}\leq\varepsilon,

ϕ_{t}^{α_{\cdot} x} (ρ) = \int_{0}^{t} c^{(ρ)} (α_{s}, y_{s}^{α_{\cdot} x} (ρ)) d s,

ϕ_{t}^{α_{\cdot} x} (ρ) = \int_{0}^{t} c^{(ρ)} (α_{s}, y_{s}^{α_{\cdot} x} (ρ)) d s,

d z_{t} = \overset{σ}{^} (z_{t}) d w_{t} + \hat{b} (z_{t}) d t, t \geq 0, z_{0} = x,

d z_{t} = \overset{σ}{^} (z_{t}) d w_{t} + \hat{b} (z_{t}) d t, t \geq 0, z_{0} = x,

\sup_{\alpha_{\cdot}\in\mathfrak{A}}E_{x}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{$\bm{\beta}$}}^{\rho}(\alpha_{\cdot},x)}\big{[}\int_{0}^{\tau}f(x_{t})e^{-\phi_{t}}\,dt+g(x_{\tau})e^{-\phi_{\tau}}\big{]}

\sup_{\alpha_{\cdot}\in\mathfrak{A}}E_{x}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{$\bm{\beta}$}}^{\rho}(\alpha_{\cdot},x)}\big{[}\int_{0}^{\tau}f(x_{t})e^{-\phi_{t}}\,dt+g(x_{\tau})e^{-\phi_{\tau}}\big{]}

\leq E_{x}^{\alpha^{\varepsilon}_{\cdot}\text{\raise-0.60275pt\hbox{$\bm{\beta}$}}^{\rho}(\alpha^{\varepsilon}_{\cdot},x)}\big{[}\int_{0}^{\tau}f(x_{t})e^{-\phi_{t}}\,dt+g(x_{\tau})e^{-\phi_{\tau}}\big{]}+\varepsilon.

\leq E_{x}^{\alpha^{\varepsilon}_{\cdot}\text{\raise-0.60275pt\hbox{$\bm{\beta}$}}^{\rho}(\alpha^{\varepsilon}_{\cdot},x)}\big{[}\int_{0}^{\tau}f(x_{t})e^{-\phi_{t}}\,dt+g(x_{\tau})e^{-\phi_{\tau}}\big{]}+\varepsilon.

α \in A β \in B sup inf [(1/2) u^{''} + (1 - ∣ x + α β ∣)_{+}] = 0,

α \in A β \in B sup inf [(1/2) u^{''} + (1 - ∣ x + α β ∣)_{+}] = 0,

0 = (1/2) u^{''} + α \in A β \in B sup inf (1 - ∣ x + α β ∣)_{+} = (1/2) u^{''} .

0 = (1/2) u^{''} + α \in A β \in B sup inf (1 - ∣ x + α β ∣)_{+} = (1/2) u^{''} .

d x_{t} = sign x_{t} d w_{t}, t \geq 0, x_{0} = 0

d x_{t} = sign x_{t} d w_{t}, t \geq 0, x_{0} = 0

d y_{t} = α_{t} χ (y_{t}) d w_{t} + ν d \overset{w}{^}_{t}, t > 0, y_{0} = 0,

d y_{t} = α_{t} χ (y_{t}) d w_{t} + ν d \overset{w}{^}_{t}, t > 0, y_{0} = 0,

α \in A x \in G sup ∣ u^{α β} (x) - u^{α β (i)} (x) ∣ \leq ε,

α \in A x \in G sup ∣ u^{α β} (x) - u^{α β (i)} (x) ∣ \leq ε,

\operatornamewithlimits{sup\,\,\,inf}_{\alpha\in A\,\,\beta\in B}\big{[}a^{\alpha\beta}_{ij}u_{ij}+b^{\alpha\beta}_{i}u_{i}-c^{\alpha\beta}u+f^{\alpha\beta}\big{]}

\operatornamewithlimits{sup\,\,\,inf}_{\alpha\in A\,\,\beta\in B}\big{[}a^{\alpha\beta}_{ij}u_{ij}+b^{\alpha\beta}_{i}u_{i}-c^{\alpha\beta}u+f^{\alpha\beta}\big{]}

=\operatornamewithlimits{inf\,\,\,sup}_{\beta\in B\,\,\alpha\in A}\big{[}a^{\alpha\beta}_{ij}u_{ij}+b^{\alpha\beta}_{i}u_{i}-c^{\alpha\beta}u+f^{\alpha\beta}\big{]}.

=\operatornamewithlimits{inf\,\,\,sup}_{\beta\in B\,\,\alpha\in A}\big{[}a^{\alpha\beta}_{ij}u_{ij}+b^{\alpha\beta}_{i}u_{i}-c^{\alpha\beta}u+f^{\alpha\beta}\big{]}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic processes and financial applications · Stability and Controllability of Differential Equations · Nonlinear Partial Differential Equations

Full text

On the adjoint Markov policies in stochastic differential games

N.V. Krylov

[email protected]

127 Vincent Hall, University of Minnesota, Minneapolis, MN, 55455

Abstract.

We consider time-homogeneous uniformly nondegenerate stochastic differential games in domains and propose constructing $\varepsilon$ -optimal strategies and policies by using adjoint Markov strategies and adjoint Markov policies which are actually time-homogeneous Markov, however, relative not to the original process but to a couple of processes governed by a system consisting of the main original equation and of an adjoint stochastic equations of the same type as the main one. We show how to find $\varepsilon$ -optimal strategies and policies in these classes by using the solvability in Sobolev spaces of not the original Isaacs equation but of its appropriate modification. We also give an example of a uniformly nondegenerate game where our assumptions are not satisfied and where we conjecture that there are no not only optimal Markov but even $\varepsilon$ -optimal adjoint (time-homogeneous) Markov strategies for one of the players.

Key words and phrases:

Stochastic differential games, Isaacs equation, value functions

2010 Mathematics Subject Classification:

91A05, 91A15, 91A25

1. Introduction

Let $\mathbb{R}^{d}=\{x=(x^{1},...,x^{d})\}$ be a $d$ -dimensional Euclidean space and $d_{1}\geq 1$ be an integer. Assume that we are given separable metric spaces $A$ and $B$ , and let, for each $\alpha\in A$ , $\beta\in B$ , the following functions on $\mathbb{R}^{d}$ are given:

(i) $d\times d_{1}$ matrix-valued $\sigma^{\alpha\beta}(x)=\sigma(\alpha,\beta,x)=(\sigma^{\alpha\beta}_{ij}(x))$ ,

(ii) $\mathbb{R}^{d}$ -valued $b^{\alpha\beta}(x)=b(\alpha,\beta,x)=(b^{\alpha\beta}_{i}(x))$ , and

(iii) real-valued $c^{\alpha\beta}(x)=c(\alpha,\beta,x)\geq 0$ , $f^{\alpha\beta}(x)=f(\alpha,\beta,x)$ , and $g(x)$ .

Under natural assumptions which will be specified later, on a probability space $(\Omega,\mathcal{F},P)$ carrying a $d_{1}$ -dimensional Wiener process $w_{t}$ one associates with these objects and a bounded domain $G\subset\mathbb{R}^{d}$ of class $C^{2}$ a stochastic differential game with the diffusion term $\sigma^{\alpha\beta}(x)$ , drift term $b^{\alpha\beta}(x)$ , discount rate $c^{\alpha\beta}(x)$ , running cost $f^{\alpha\beta}(x)$ , and the final cost $g(x)$ paid when the underlying process first exits from $G$ . More precisely we consider the process defined by the equation

[TABLE]

where $\alpha_{\cdot}$ and $\beta_{\cdot}$ are admissible actions of two players one of which is maximizing and the other minimizing an expression like

[TABLE]

where $\tau$ is the first-exit time of the process from $G$ . We adopt the setting almost identical to that of [1] (although our set of admissible policies of $\alpha$ and $\beta$ is, generally, wider) and define the order of players and their policies and strategies. Then under very general conditions the value function turns out to be a viscosity solution of the Isaacs equation (see [1]). As in the case of controlled diffusion processes and Bellman’s equations it is natural to use the Isaacs equation to construct $\varepsilon$ -optimal strategy of one player and $\varepsilon$ -optimal policies of the other. By using discrete time approximations of this equation this was done in [2] and lead to the so-called almost optimal approximately Markov time-inhomogeneous policies, whose actions at time $t$ depend on a very near past history. Similar constructions one can find in [10].

In this article to find near optimal strategies and policies, we propose using adjoint Markov strategies and adjoint Markov policies which are actually time-homogeneous Markov, however, relative not to the original process $x_{t}$ but to a couple $(x_{t},y_{t})$ which is given as a solution of a time-homogeneous system consisting of (1.1) and adjoint stochastic equations of the same type as (1.1). We show how to find $\varepsilon$ -optimal strategies and policies by using the solvability in Sobolev spaces of not the original Isaacs equation but of its appropriate modification. Observe that it is unknown if general even uniformly nondegenerate Isaacs equations have solutions in Sobolev spaces. We also give an example of a uniformly nondegenerate game where our assumptions are not satisfied and where we conjecture that there are no not only optimal Markov but even $\varepsilon$ -optimal adjoint (time-homogeneous) Markov strategy for one of the players.

As a point of comparison note that in [1] and [2] the authors deal with time-inhomogeneous possibly degenerate stochastic differential games on a finite time interval in the whole space. In our case we have a uniformly nondegenerate time-homogeneous stochastic differential game in a domain where it is quite natural to look for time-homogeneous Markov strategies and policies.

The article is organized as follows. In the next section we present our main results. In Section 3 we prove some auxiliary results. Theorems 2.1 and 2.2 and Lemma 2.3 are proved in Section 4. In Section 5 we apply the previous results to the case of controlled diffusion processes, to which belongs Theorem 2.4 proved in Section 6. Finally, in Section 7 we prove Theorem 2.5 saying what happens if the Isaacs condition is satisfied.

By $N$ sometimes with arguments we denote various constants, depending only on the arguments if they are present, but which may change from one occurrence to another and, if in a statement, we are proving, there is a claim that $N$ depends only on $a,b,...$ , then in the proof all constants called $N$ depend only on $a,b,...$ unless specifically indicated otherwise.

2. Main results

Set $a^{\alpha\beta}=(1/2)\sigma^{\alpha\beta}\big{(}\sigma^{\alpha\beta}\big{)}^{*}$ .

Assumption 2.1.

(i) a) The functions $\sigma,b,c,f$ are continuous with respect to $\beta\in B$ for each $(\alpha,x)$ and continuous with respect to $\alpha\in A$ uniformly with respect to $\beta\in B$ for each $x$ . b) These functions are continuous with respect to $x$ uniformly with respect to $\alpha$ and $\beta$ , the function $g\in C^{2}(\mathbb{R}^{d})$ .

(ii) There are constants $K_{0}$ and $K_{1}$ such that and for any $x,y\in\mathbb{R}^{d}$ $(\alpha,\beta)\in A\times B$

[TABLE]

(iii) There is a constant $\delta\in(0,1]$ such that for any $\alpha\in A$ , $\beta\in B$ , and $x,\lambda\in\mathbb{R}^{d}$ we have

[TABLE]

The reader understands, of course, that the summation convention is adopted throughout the article.

Note that Assumption 2.1 (iii) obviously implies that $d_{1}\geq d$ .

Let $(\Omega,\mathcal{F},P)$ be a complete probability space, let $\{\mathcal{F}_{t},t\geq 0\}$ be an increasing filtration of $\sigma$ -fields $\mathcal{F}_{t}\subset\mathcal{F}$ such that each $\mathcal{F}_{t}$ is complete with respect to $\mathcal{F},P$ , and let $w_{t},t\geq 0$ , be a standard $d_{1}$ -dimensional Wiener process given on $\Omega$ such that $w_{t}$ is a Wiener process relative to the filtration $\{\mathcal{F}_{t},t\geq 0\}$ .

The following by now standard setting originated in [1] although we prefer the notation introduced in [7]. The set of progressively measurable $A$ -valued processes $\alpha_{t}=\alpha_{t}(\omega)$ is denoted by $\mathfrak{A}$ . Similarly we define $\mathfrak{B}$ as the set of $B$ -valued progressively measurable functions. These are the sets of policies. By $\mathbb{B}$ we denote the set of (strategies) $\mathfrak{B}$ -valued functions $\text{\raise-0.86108pt\hbox{$ \bm{\beta} $}}(\alpha_{\cdot})$ on $\mathfrak{A}$ such that, for any $T\in(0,\infty)$ and any $\alpha^{1}_{\cdot},\alpha^{2}_{\cdot}\in\mathfrak{A}$ satisfying

[TABLE]

we have

[TABLE]

For $\alpha_{\cdot}\in\mathfrak{A}$ , $\beta_{\cdot}\in\mathfrak{B}$ , and $x\in\mathbb{R}^{d}$ define $x^{\alpha_{\cdot}\beta_{\cdot}x}_{t}$ as a unique solution of the Itô equation (1.1) and set

[TABLE]

Next, recall that $G$ is a bounded domain in $\mathbb{R}^{d}$ of class $C^{2}$ , define $\tau^{\alpha_{\cdot}\beta_{\cdot}x}$ as the first exit time of $x^{\alpha_{\cdot}\beta_{\cdot}x}_{t}$ from $G$ , and introduce

[TABLE]

where the indices $\alpha_{\cdot}$ , $\bm{\beta}$ , and $x$ at the expectation sign are written to mean that they should be placed inside the expectation sign wherever and as appropriate, that is

[TABLE]

Observe that this definition makes perfect sense due to Theorem 2.2.1 of [4] and $v(x)=g(x)$ in $\mathbb{R}^{d}\setminus D$ . Similar abbreviated notation will be used in other cases when the underlying processes and functions depend on initial data or other parameters and functions.

Before stating our first main result we introduce two more assumptions and a notation.

Assumption 2.2.

For any $\varepsilon>0$ , there exists a finite set $\{\alpha(1),...,\alpha(n_{\varepsilon})\}\subset A$ such that for any $\alpha\in A$ there exists an $i\in\{1,...,n_{\varepsilon}\}$ such that for $u=\sigma,b,c,f$ it holds that

[TABLE]

As is easy to see one can choose $i=i_{\varepsilon}(\alpha)$ satisfying (2.2) to be a Borel function.

Assumption 2.3.

Either $\sigma^{\alpha\beta}(x)$ are symmetric positive-definite matrix-valued functions or there is a constant $\nu>0$ such that $\sigma_{i,d_{1}-d+j}^{\alpha\beta}(x)=\nu\delta_{ij}$ for all $i,j\leq d$ and all $\alpha,\beta,x$ .

The second part of this assumption means that the last $d$ columns of $\sigma$ form an identity matrix multiplied by $\nu$ . The only use of this assumption is (4.7) which can be satisfied in very many other situations.

Take and fix a $\zeta\in C^{\infty}_{0}(\mathbb{R}^{d})$ with unit integral and for a Borel measurable $B$ -valued function $\beta(\alpha,x)$ on $A\times\mathbb{R}^{d}$ and bounded measurable functions $h(\alpha,\beta,x)$ given on $A\times B\times\mathbb{R}^{d}$ and $\rho>0$ set

[TABLE]

Theorem 2.1.

Under the above assumptions for any $\varepsilon>0$ there exist a Borel measurable $B$ -valued function $\beta(\alpha,x)$ on $A\times\mathbb{R}^{d}$ and $\rho_{0}>0$ such that, if, for $\rho\in(0,\rho_{0}]$ , $x\in G$ , and $\alpha_{\cdot}\in\mathfrak{A}$ , we define the process $y_{t}=y_{t}^{\alpha_{\cdot}x}(\rho)$ as a solution of

[TABLE]

where $\sigma^{(\rho)}$ and $b^{(\rho)}$ are defined according to (2.3), and set $\text{\raise-0.86108pt\hbox{$ \bm{\beta} $}}^{\rho}_{t}(\alpha_{\cdot},x)=\beta(\alpha_{t},y_{t}^{\alpha_{\cdot}x}(\rho))$ , then

[TABLE]

Furthermore, there exists a finite number of mutually disjoint subsets $A_{i},i=1,...,n$ , of $A$ such that $A=\bigcup_{i}A_{i}$ and for each $i$ we have $\beta(\alpha_{1},x)=\beta(\alpha_{2},x)$ whenever $\alpha_{1},\alpha_{2}\in A_{i}$ .

Observe that, obviously, (2.4) has a unique solution. Strategies like

[TABLE]

are naturally called adjoint Markov strategies, because their actions at time $t$ albeit are not based only on the current action of $\alpha$ and the current state of $x_{t}$ but still use instead of the latter the current state of an adjoint process $y_{t}=y_{t}^{\alpha_{\cdot}x}(\rho)$ , which, as we will see, is close to $x_{t}=x_{t}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{$ \bm{\beta} $}}^{\rho}(\alpha_{\cdot},x)x}$ if $\rho$ is small.

In the next theorem Assumption 2.3 is not used.

Theorem 2.2.

In Theorem 2.1 drop Assumption 2.3 but suppose that on $(\Omega,\mathcal{F},P)$ there is a Wiener process $(\hat{w}_{t},\mathcal{F}_{t}),t\geq 0$ , independent of $w_{t}$ . Then for any $\varepsilon>0$ there exists a constant $\nu>0$ such that all assertions of Theorem 2.1 hold true if we add to the right-hand side of (2.4) the term $\nu\,d\hat{w}_{t}$ .

Here we see another instance of adjoint Markov strategies of the player $\beta$ . With the choice $\text{\raise-0.86108pt\hbox{$ \bm{\beta} $}}^{\rho}_{t}(\alpha_{\cdot},x)=\beta(\alpha_{t},y_{t}^{\alpha_{\cdot}x}(\rho))$ the process $x_{t}=x_{t}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{$ \bm{\beta} $}}^{\rho}(\alpha_{\cdot},x)x}$ satisfies

[TABLE]

where $y_{t}$ is defined from (2.4). Therefore, for the player $\alpha$ to find an adequate response to the above strategy $\text{\raise-0.86108pt\hbox{$ \bm{\beta} $}}^{\rho}_{t}(\alpha_{\cdot},x)$ , he should solve a more or less standard problem of optimal control of the two-component diffusion process $(y_{t},x_{t})$ governed by the system (2.4)-(2.6) and maximize the expectation in (2.5). An unpleasant feature of this couple is that it is always a degenerate process. It turns out that one can reduce the problem to optimal control of only $y_{t}$ when $\rho$ is sufficiently small and then the same Theorem 2.1 applied in the case of only one player will provide an adjoint Markov policy while controlling $y_{t}$ which will become an adjoint Markov policy of $\alpha$ in the original game. The above mentioned reduction of the optimal control problem is based on the following.

Lemma 2.3.

One more assertion can be added in Theorems 2.1 and 2.2: for any $\alpha_{\cdot}\in\mathfrak{A}$

[TABLE]

where

[TABLE]

where $f^{(\rho)}$ and $c^{(\rho)}$ are defined according to (2.3), and $\tau^{\alpha_{\cdot}x}(\rho)$ is the first exit time of $y^{\alpha_{\cdot}x}_{t}(\rho)$ from $G$ .

This lemma and Theorems 2.1 and 2.2 almost immediately lead to the following result about $\varepsilon$ -optimal adjoint Markov policies for $\alpha$ .

Theorem 2.4.

Let either

(a) the assumptions of Theorem 2.1 be satisfied, or

(b) the assumptions of Theorem 2.2 be satisfied.

Take $\varepsilon>0$ , $x\in G$ , $\rho$ , and $\text{\raise-0.86108pt\hbox{$ \bm{\beta} $}}^{\rho}(\alpha_{\cdot},x)$ from Theorem 2.1 or 2.2, respectively. Then there exist Lipschitz continuous in $x$ $d\times d_{1}$ -matrix valued $\hat{\sigma}(x)$ and $\mathbb{R}^{d}$ -valued $\hat{b}(x)$ given on $\mathbb{R}^{d}$ , there exists a Borel measurable $A$ -valued function $\alpha^{\varepsilon}(x)$ on $\mathbb{R}^{d}$ , and in case (b) there also exists a constant $\nu>0$ , such that, if for $x\in G$ we define the process $z_{t}=z^{x}_{t}$ by

[TABLE]

in case (a) with the additional term $\nu\,d\hat{w}_{t}$ on the right-hand side of (2.8) in case (b) and set $\alpha^{\varepsilon}_{t}=\alpha^{\varepsilon}(z^{x}_{t})$ , then

[TABLE]

*Remark 2.1**.*

The above results hold under milder assumptions than the ones imposed. For instance, an absolutely cheep generalization is that it suffices to have $g\in C(\mathbb{R}^{d})$ rather than $g\in C^{2}(\mathbb{R}^{d})$ because one can use uniform approximations of $g$ . The domain $\Omega$ also need not be in $C^{2}$ . It is quite sufficient for it to satisfy the exterior cone condition or be even worse than that. Again appropriate approximations would do the job.

The point of the article was to promote adjoint Markov policies and strategies, rather than deal with numerous side problems arising along the way.

*Example 2.1**.*

Let $d=1$ , $G=(-1,1)$ , $A=B=\{\pm 1\}$ , $\sigma(\alpha,\beta)=\beta$ , $c=0$ , $f=(1-|x+\alpha\beta|)_{+}$ , $g\equiv 0$ . The Isaacs equation is

[TABLE]

which is equivalent to

[TABLE]

The solution of this equation in $G$ with zero boundary data is zero. The inf inside is zero for any $\alpha$ and is obtained on $\beta(\alpha,x)=\alpha\,\text{sign}\,x$ ( $\text{sign}\,0:=-1$ ).

Like in [1] and [2], let our probability space be the space $C([0,\infty))$ of real-valued continuous functions on $[0,\infty)$ with Wiener measure on the $\sigma$ -field of Borel subsets of $C([0,\infty))$ . Let the Wiener process be defined by $w_{t}(x_{\cdot})=x_{t}$ , $t\geq 0$ . Also let $\mathcal{F}_{t}$ be the $\sigma$ -field generated by $w_{s},s\leq t$ .

In such situation the equation

[TABLE]

does not have $\mathcal{F}_{t}$ -adapted solutions at all (Tanaka’s example), and $\beta$ cannot use the strategy $\beta(\alpha,x)=\alpha\,\text{sign}\,x$ , since $\alpha$ can choose to be 1 for all times.

The author believes that in this example there is no (time-homogeneous) $\varepsilon$ -optimal adjoint Markov strategies for $\beta$ if $\varepsilon$ is small enough. Regarding time-inhomogeneous adjoint Markov strategies the reader is referred to [5]. However, our results show that, if we just take two independent copies of our probability space with $w_{t}$ being the Wiener process on one copy and $\hat{w}_{t}$ being the Wiener process on the other, take a mollification $\chi(x)$ of $\text{sign}\,x$ take a $\nu>0$ and introduce an adjoint process by

[TABLE]

then the strategy $\text{\raise-0.86108pt\hbox{$ \bm{\beta} $}}_{t}(\alpha_{\cdot})=\alpha_{t}\text{sign}\,y_{t}$ will be $\varepsilon$ -optimal for $\beta$ if the mollification is done with kernel of sufficiently small size and $\nu$ is sufficiently small. By the way, on thus extended probability space (2.10) still does not have solutions.

Assumption 2.4.

Assumption 2.2 is not necessarily satisfied, but for any $\varepsilon>0$ , there exists a finite set $\{\beta(1),...,\beta(n_{\varepsilon})\}\subset B$ such that for any $\beta\in B$ there exists an $i\in\{1,...,n_{\varepsilon}\}$ such that for $u=\sigma,b,c,f$ it holds that

[TABLE]

and for any $u_{ij},u_{i},u$ on $G$ we have

[TABLE]

When the Isaacs condition (2.12) is satisfied it is natural to introduce $\mathbb{A}$ as the set of $\mathfrak{A}$ -valued functions $\text{\hbox{$ \bm{\alpha} $}}(\beta_{\cdot})$ on $\mathfrak{B}$ such that, for any $T\in(0,\infty)$ and any $\beta^{1}_{\cdot},\beta^{2}_{\cdot}\in\mathfrak{B}$ satisfying

[TABLE]

we have

[TABLE]

Theorem 2.5.

Under the Assumptions 2.1, 2.3, and 2.4 for any $\varepsilon>0$ there exist a Borel measurable $A$ -valued function $\alpha(x)$ on $\mathbb{R}^{d}$ and $\rho_{0}>0$ such that, if for $\rho\in(0,\rho_{0}]$ , $x\in G$ , and $\beta_{\cdot}\in\mathfrak{B}$ we define the process $y_{t}=y_{t}^{\beta_{\cdot}x}(\rho)$ as a solution of

[TABLE]

where $\sigma^{(\rho)}$ and $b^{(\rho)}$ are found following the example

[TABLE]

and set $\text{\hbox{$ \bm{\alpha} $}}^{\rho}_{t}(\beta_{\cdot},x)=\alpha(y_{t}^{\beta_{\cdot}x}(\rho))$ , then

[TABLE]

*Remark 2.2**.*

Analogous theorem is valid when we drop Assumption 2.3 in Theorem 2.5 but suppose that on $(\Omega,\mathcal{F},P)$ there is a Wiener process $(\hat{w}_{t},\mathcal{F}_{t}),t\geq 0$ , independent of $w_{t}$ .

*Remark 2.3**.*

Observe that in Theorem 2.1 we are talking about the function $\beta(\alpha,x)$ depending both on $\alpha$ and $x$ and in Theorem 2.5 we have a function $\alpha(x)$ of only $x$ . Of course, this is because (2.11) is assumed in Theorem 2.5.

*Remark 2.4**.*

As a corollary of Theorems 2.1 and 2.5 we obtain a well-known fact that our game has value and our strategies for $\beta$ and $\alpha$ form, so to speak, $\varepsilon$ -saddle point and the game may be called fair.

3. Auxiliary results

Here is a well-known result which, for instance, is a particular case of Lemma 2.1 of [7].

Lemma 3.1.

Let $\sigma_{t}$ be a $d\times d_{1}$ -matrix-valued and $b_{t}$ be an $\mathbb{R}^{d}$ -valued progressively measurable functions on $\Omega\times(0,\infty)$ . Suppose that

[TABLE]

for all $\lambda\in\mathbb{R}^{d}$ and $(\omega,t)$ , where $\nu>0$ is a fixed constant. Take $x\in G$ and define $\tau$ as the first exit time from $G$ of

[TABLE]

Then for any $n=1,2,...$ there exists a constant $N$ , depending only on $n$ , $d$ , $\nu$ , $K_{0}$ , and the diameter of $G$ , such that $E\tau^{n}\leq N$ .

The following result is also very well known (can be obtained, for instance, by combining Lemma 2.8 of [3] and Lemma 8.5 and Theorem 3.1 of [6]). By $\mathbb{S}_{\delta}$ we denote the set of $d\times d$ symmetric matrices whose eigenvalues are between $\delta$ and $\delta^{-1}$ . Introduce $D_{i}=\partial/\partial x^{i}$ , $D_{ij}=D_{i}D_{j}$ and let $Du$ denote the gradient of $u$ .

Lemma 3.2.

Let $\nu\in(0,1]$ . Then there exists a function $\Phi\in C^{2}(G)$ such that $\Phi>0$ on $G$ , $\Phi=0$ on $\partial G$ , $|D\Phi|\geq 1$ on $\partial G$ , and

[TABLE]

on $G$ for any $a=(a_{ij})\in\mathbb{S}_{\nu}$ and $b=(b_{i})$ such that $|b|\leq K_{0}$ .

The next few results are needed while investigating how far off the adjoint processes are of real controlled ones.

Lemma 3.3.

Let $\sigma^{(i)}_{t}(y,x)$ , $i=1,2$ , be $d\times d_{1}$ -matrix-valued and $b^{(i)}_{t}(y,x)$ , $i=1,2$ , be $\mathbb{R}^{d}$ -valued functions on $\Omega\times[0,\infty)\times\mathbb{R}^{d}\times\mathbb{R}^{d}$ . Suppose that for each $T\in[0,\infty)$ these functions restricted to $\Omega\times[0,T]\times\mathbb{R}^{d}\times\mathbb{R}^{d}$ are measurable with respect to $\mathcal{F}_{T}\otimes\mathcal{B}(\mathbb{R}^{d})\otimes\mathcal{B}(\mathbb{R}^{d})$ , where $\mathcal{B}(\mathbb{R}^{d})$ is the Borel $\sigma$ -field in $\mathbb{R}^{d}$ . Assume that $\sigma^{(i)}_{t}$ and $b^{(i)}_{t}$ are progressively measurable for any $(x,y)$ , $\sigma^{(1)}_{t}(y,x)$ and $b^{(1)}_{t}(y,x)$ are Lipschitz continuous with respect to $x$ with constant $K_{1}$ , and $\sigma^{(2)}_{t}(y,y)$ and $b^{(2)}_{t}(y,y)$ are Lipschitz continuous with respect to $y$ with a constant independent of $(\omega,t)$ . Suppose that there exists a function $\Delta(y)$ on $G$ such that for any $y\in G$

[TABLE]

for all $(\omega,t)$ . Also suppose that $\sigma_{t}^{(i)}$ and $b_{t}^{(i)}$ satisfy (3.1) and $\sigma_{t}^{(2)}$ satisfies (3.2) for all values of indices, arguments, and all $\lambda\in\mathbb{R}^{d}$ .

Take $x\in G$ and define the processes $x_{t}$ and $y_{t}$ by

[TABLE]

Obviously this system has a unique solution. Finally, set $\theta$ to be the minimum of the exit times of $x_{t}$ and $y_{t}$ from $G$ . Then, for any $T\in(0,\infty)$ , we have

[TABLE]

where $N$ depends only on $d$ , $\nu$ , $K_{0}$ , $K_{1}$ , and the diameter of $G$ .

Proof. We modify the coefficients of system (3.4) by multiplying them by $I_{\theta>t}$ , which does not affect (3.5), allows us to eliminate $\theta$ from it and also allows us to formally apply Theorem 2.5.9 of [4] according to which the left-hand side of (3.5) is less than

[TABLE]

where $N=N(K_{1})$ . In light of (3.3), the expectation here is estimated by

[TABLE]

and it only remains to apply Theorem 2.2.2 of [4]. The lemma is proved.

Corollary 3.4.

Under the assumptions of Lemma 3.3, For any $T\in(0,\infty)$ , we have

[TABLE]

where $I$ is the right-hand side of (3.5) and $N$ depends only on $d,\nu,K_{0}$ , and the diameter of $G$ .

Indeed, it suffices to use Lemma 3.3 and observe that

[TABLE]

Lemma 3.5.

Let $\sigma^{(i)}_{t}$ , $b^{(i)}_{t}$ , $i=1,2$ , be as in Lemma 3.3 but independent of $(y,x)$ and assume that they satisfy (3.1) and (3.2) for all values of indices, arguments, and all $\lambda\in\mathbb{R}^{d}$ . Take $h\in L_{d}(G)$ , $x\in G$ , and set

[TABLE]

Introduce $\theta$ as the minimum of the first exit times of $x^{(i)}_{t}$ , $i=1,2$ , from $G$ . Let $\chi_{t}^{(i)}$ , $i=1,2$ , be real-valued jointly measurable processes given on $[0,\theta]$ and bounded by a constant $K_{2}$ .

Then for any $\kappa,\gamma>0$

[TABLE]

where $N_{1}(\gamma)$ depends only on $h$ , $\gamma$ , $d$ , $\nu$ , $K_{0}$ , and the diameter of $G$ , and $N_{1}(\gamma)\to 0$ as $\gamma\to\infty$ , $N_{2}(\kappa)$ depends only on $h$ , $\kappa,d,\nu,K_{0}$ , and the diameter of $G$ , and $N_{2}(\kappa)\to 0$ as $\kappa\downarrow 0$ and $N_{3}$ depends only on $d$ , $\nu$ , $K_{0}$ , and the diameter of $G$ .

Proof. First observe that

[TABLE]

where

[TABLE]

By Theorem 2.2.2 of [4]

[TABLE]

where $N$ depends only on $d$ , $\nu$ , $K_{0}$ , and the diameter of $G$ . It follows that it suffices to prove the lemma for $\chi^{(i)}\equiv 1$ .

In that case we extend $h$ beyond $G$ by setting it to be zero there, which does not affect (3.6), introduce $h^{(\kappa)}$ as the convolution of $h$ and $\kappa^{-d}\zeta(x/\kappa)$ , and replace $h$ in the left-hand side of (3.6) with $h^{(\kappa)}$ . The error of the replacement is less than

[TABLE]

which by Theorem 2.2.2 of [4] is less than a constant, depending only on $\nu$ , $d$ , $K_{0}$ , and the diameter of $G$ , times

[TABLE]

which tends to zero as $\kappa\downarrow 0$ . This gives us the term $N_{2}(\kappa)$ on the right in (3.6). Finally,

[TABLE]

The lemma is proved.

4. Proof of Theorems 2.1

and 2.2 and Lemma 2.3

Recall that $a^{\alpha\beta}=(1/2)\sigma^{\alpha\beta}\big{(}\sigma^{\alpha\beta}\big{)}^{*}$ and for sufficiently smooth functions $u=u(x)$ introduce

[TABLE]

Also set

[TABLE]

Lemma 4.1.

Take $u\in W^{2}_{d}(G)$ and $m\in\{1,2,...\}$ . Then for any $\alpha\in A$ there exists a Borel $B$ -valued function $\beta(x)$ on $\mathbb{R}^{d}$ such that for almost all $x\in G$

[TABLE]

where

[TABLE]

Proof. Fix $\alpha\in A$ and $u\in W^{2}_{d}(G)$ and choose $u$ , $Du$ , and $D^{2}u$ so that they are Borel functions. Then let $\{\beta(i),i=1,2,...\}$ be a countable everywhere dense set in $B$ . Since $a,b,c,f$ are continuous in $\beta$ ,

[TABLE]

and for any $x\in G$ there exists $\beta(i)$ with the least $i=i(x)$ for which

[TABLE]

As is easy to see, $i(x)$ is a Borel function and such is $\beta(i(x))$ as well. For $x\not\in G$ set $\beta(x)=\beta_{0}$ , where $\beta_{0}$ is any element of $B$ . Then we get a function we need and the lemma is proved.

Lemma 4.2.

Take $u\in W^{2}_{d}(G)$ and $m\in\{1,2,...\}$ . Then there exists a finite family of Borel $B$ -valued functions $\{\beta(1),...,\beta(n_{m})\}$ on $\mathbb{R}^{d}$ and a Borel $B$ -valued function $\beta(\alpha,x)$ on $A\times\mathbb{R}^{d}$ such that

(i) $\beta(\alpha,\cdot)\in\{\beta(1),...,\beta(n_{m})\}$ for any $\alpha\in A$ ;

(ii) for

[TABLE]

we have

[TABLE]

Proof. Again choose $u$ , $Du$ , and $D^{2}u$ so that they are Borel functions and take $\{\alpha(1),...,\alpha(n_{\varepsilon})\}$ from Assumption 2.2 for $\varepsilon=1/m$ . Then let $\beta(i,x)$ be functions found from Lemma 4.1 corresponding to $\alpha(i)$ , $i=1,...,n_{\varepsilon}$ . Define $i(\alpha)$ to be the first $i$ for which (2.2) holds. Finally, set

[TABLE]

By Assumption 2.2, for any $\alpha\in A$ , $\beta\in B$ , and $x\in G$ ,

[TABLE]

where and below the constants denoted by $N$ depend only on $d$ . By plugging in $\beta=\beta(\alpha,x)=\beta(i(\alpha),x)$ we find that, for any $\alpha\in A$ and $x\in G$

[TABLE]

where the last inequality is due to $H^{\alpha}[u]\leq H[u]$ . This yields (4.3) with $m^{-1}$ on the right multiplied by $N$ times the $L_{d}(G)$ -norm of $1+|u|+|Du|+|D^{2}u|$ . Obviously, this is enough and the lemma is proved.

Set

[TABLE]

By Theorem 14.1.6 of [9] for each $K$ the equation

[TABLE]

in $G$ (a.e.) with boundary condition $u_{K}=g\in C^{2}$ has a solution $u_{K}\in W^{2}_{p}(G)$ for any $p>1$ . By following the arguments in Section 7 of [8], we conclude that the $u_{K}$ ’s admit a representation as the value functions in the corresponding stochastic games and by Theorem 7.1 of [8] we have $u_{K}\downarrow v$ uniformly on $\bar{G}$ as $K\to\infty$ . Observe that (a.e.) in $G$

[TABLE]

Next, fix $K>0$ and $m\in\{1,2,...\}$ . Below we introduce some objects which may change as we change $K$ and $m$ , but we still do not exhibit their dependence on $K,m$ for simplicity of notation and because $K,m$ are fixed for now.

Let $\{\beta(1),...,\beta(n)\}$ and $\beta(\alpha,x)$ be the family of functions $\beta(i)$ and function $\beta(\alpha,x)$ from Lemma 4.2 with $u_{K}$ in place of $u$ . Observe that by construction and (4.4)

[TABLE]

where $h\geq 0$ is such that $\|h\|_{L_{d}(G)}\leq 1/m$ .

Use this $\beta(\alpha,x)$ in (2.3) and (2.4) to define $y_{t}=y_{t}^{\alpha_{\cdot}x}(\rho)$ , $\text{\raise-0.86108pt\hbox{$ \bm{\beta} $}}^{\rho}_{t}(\alpha_{\cdot},x)=\beta(\alpha_{t},y_{t}^{\alpha_{\cdot}x}(\rho))$ , and $x_{t}=x_{t}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{$ \bm{\beta} $}}^{\rho}(\alpha_{\cdot},x)x}$ . First, we want to prove that $x_{t}$ and $y_{t}$ are close when $\rho$ is sufficiently small. This will be based in part on the fact that the couple $(y_{t},x_{t})$ is a solution of the system

[TABLE]

An important and easy consequence of Assumption 2.3 is that

[TABLE]

for all $\rho,\alpha,y$ .

Lemma 4.3.

For any vector-valued $h=h(\alpha,\beta,x)$ define

[TABLE]

Then for any $\varepsilon>0$ there exist $\rho_{0}>0$ and a function $\Delta^{h}_{\rho}(y)$ such that, for all $\rho\in(0,\rho_{0}]$ , $\hat{\alpha},\alpha\in A$ , $y\in G$ , and $h=\sigma,b,c,f$ we have

[TABLE]

Proof. According to Assumption 2.2 for any $\varepsilon>0$ there exists a finite subset $\hat{A}(\varepsilon)$ (independent of $\rho$ ) of $A$ such that

[TABLE]

Take an $\hat{\alpha}\in\hat{A}(\varepsilon)$ and observe that the set

[TABLE]

is finite (see Lemma 4.2) and each element of this set is bounded and measurable with respect to $y$ . By the Lebesgue theorem

[TABLE]

as $\rho\downarrow 0$ at almost any point $y\in\mathbb{R}^{d}$ . Hence,

[TABLE]

where $\Delta^{h}_{\rho,\varepsilon}$ are bounded uniformly with respect to $\rho$ and tend to zero as $\rho\downarrow 0$ (a.e.) in $\mathbb{R}^{d}$ , in particular, in $L_{d}(G)$ for any $\varepsilon$ . As a result, for any $\hat{\alpha},\alpha\in A$ and $y\in G$ ,

[TABLE]

where for all sufficiently small $\rho$

[TABLE]

This is, certainly, enough and the lemma is proved.

Lemma 4.4.

Introduce $\theta=\theta^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{$ \bm{\beta} $}}^{\rho}(\alpha_{\cdot},x)x}(\rho)$ as the minimum of the first exit times of $x_{t}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{$ \bm{\beta} $}}^{\rho}(\alpha_{\cdot},x)x}$ and of $y_{t}^{\alpha_{\cdot}x}(\rho)$ from $G$ . Then

[TABLE]

as $\rho\downarrow 0$ uniformly with respect to $x\in G$ .

Proof. By Corollary 3.4 and Lemma 4.3, for any $\varepsilon,T>0$ the left-hand side of (4.9) is less than $Ne^{NT}\varepsilon+N/T$ , where $N$ is independent of $\rho,\varepsilon,T$ , for all small enough $\rho$ and so is its lim sup as $\rho\downarrow 0$ . Sending first $\varepsilon\downarrow 0$ and then $T\to\infty$ yields the desired result. The lemma is proved.

Corollary 4.5.

For $h=\sigma,b,c,f$ we have

[TABLE]

as $\rho\downarrow 0$ uniformly with respect to $x\in G$ , where $h^{(\rho)}(\alpha,y,x)$ is introduced according to (2.3).

Indeed, since $h$ is continuous in $x$ uniformly with respect to $(\alpha,\beta)$ , one can replace $x_{t}$ in (4.10) with $y_{t}(\rho)$ only incurring the error

[TABLE]

where $w(r)$ , $r\geq 0$ , is a bounded continuous function, $w(0)=0$ . By Lemmas 3.1 and 4.4 this error tends to zero as $\rho\downarrow 0$ uniformly with respect to $x\in G$ . Due to Theorem 2.2.2 of [4] and Lemma 4.3, what remains after the above mentioned replacement is less than a constant independent of $\rho$ times the $L_{d}$ -norm of $\Delta^{h}_{\rho}$ , which also tends to zero as $\rho\downarrow 0$ uniformly with respect to $x\in G$ .

Theorem 4.6.

For any $x\in G$ , $\rho,\gamma,\kappa>0$ we have

[TABLE]

where $N_{1}(\gamma)$ is independent of $\rho,\kappa$ , $N_{1}(\gamma)\to 0$ as $\gamma\to\infty$ , $N_{2}(\kappa)$ is independent of $\rho$ , $N_{2}(\kappa)\to 0$ as $\kappa\downarrow 0$ , $N$ depends only on $d,\delta,K_{0}$ , and the diameter of $G$ , $\mu(\rho)$ is independent of $\gamma,\kappa$ and $\mu(\rho)\to 0$ as $\rho\downarrow 0$ .

Proof. For simplicity of notation we drop the argument $\rho$ of $\theta$ and $y_{t}$ . Take $\alpha_{\cdot}\in\mathfrak{A}$ and observe that in the notation from Lemma 4.4 by Itô’s formula

[TABLE]

where, dropping obvious values of indices,

[TABLE]

By Lemma 3.5 with $h=u_{K},Du_{K},D^{2}u_{K}$ , for any $\kappa,\gamma>0$ , the last term in (4.13) is less than

[TABLE]

where $N_{1},N_{2},N_{3}$ are independent of $\alpha_{\cdot},\rho$ , and $x$ , $N_{1}(\gamma)\to 0$ as $\gamma\to\infty$ , $N_{2}(\kappa)\to 0$ as $\kappa\downarrow 0$ , and we use the notation

[TABLE]

By Corollary 4.5 the factor of $\gamma$ in (4.14) is dominated by $\mu(\rho)$ for an appropriate function $\mu(\rho)$ which tends to zero as $\rho\downarrow 0$ . The last term in (4.14) is dominated by $\mu(\rho)\kappa^{-2}$ .

After that taking into account (4.5) and Theorem 2.2.2 of [4] we see that

[TABLE]

where $N$ depend only on $d,\nu,K_{0}$ and the diameter of $G$ . We can replace the last $y_{t}$ in the integrand in (4.15) by $x_{t}$ incurring as in Corollary 4.5 another error term like $\mu(\rho)$ which goes to zero as $\rho\downarrow 0$ . By adding to this that

[TABLE]

where $N_{4}$ depends only on $u_{K}$ , $d$ , and $K_{0}$ , we see that to prove (4.12) it suffices now to show that

[TABLE]

as $\rho\downarrow 0$ uniformly with respect to $x$ . By Lemma 3.2 and Itô’s formula we have

[TABLE]

and it only remains to use Lemma 4.4 once more. The theorem is proved.

Proof of Theorem 2.1. First choose and fix $K$ and $m$ so that $|v-u_{K}|\leq\varepsilon/4$ and $Nm^{-1}\leq\varepsilon/4$ , where $N$ is taken from Theorem 4.6. Then find and fix $\kappa$ and $\gamma$ from $N_{1}(\gamma)+N_{2}(\kappa)\leq\varepsilon/4$ . Finally find $\rho$ such that

[TABLE]

Then (4.12) will become (2.14).

The last statement of the theorem follows by construction of $\text{\raise-0.86108pt\hbox{$ \bm{\beta} $}}^{\rho}(\alpha_{\cdot},x)$ . The theorem is proved.

*Remark 4.1**.*

An important particular case of Theorem 2.1 is when $\sigma,b,c,f$ are independent of $\alpha$ , so that we are actually dealing with a controlled diffusion process. Also, clearly, similar statements to Theorem 2.1 hold true if we exchange the roles of $\alpha$ and $\beta$ and consider the stochastic differential game corresponding to

[TABLE]

in place of (4.1). Of course, one should then replace Assumption 2.2 with a similar one about $B$ . To reduce this game to the one we are treating, it suffices just to rename $A$ and $B$ and take $-u$ , $-g$ and $-f$ in place of $u$ , $g$ , and $f$ , respectively.

Proof of Theorem 2.2. Fix $\nu>0$ and replace (1.1) with

[TABLE]

The solution of this equation is denoted by $x_{t}^{\alpha_{\cdot}\beta_{\cdot}x}(\nu)$ and by $\tau^{\alpha_{\cdot}\beta_{\cdot}x}(\nu)$ we denote its first exit time from $G$ . We take the same $c,f,g$ and define $v(x,\nu)$ by (2.1) where we replace $x_{t},\tau$ , and $\phi_{t}$ with $x_{t}(\nu),\tau(\nu)$ , and

[TABLE]

respectively. Obviously to thus obtained new stochastic differential game we can apply Theorem 2.1 and conclude that for any $\varepsilon>0$ there exists $\beta(\alpha,x)$ , with the properties described in Theorem 2.1 and $\rho_{0}>0$ such that if for $\rho\in(0,\rho_{0}]$ , $x\in G$ , and $\alpha_{\cdot}\in\mathfrak{A}$ we define the process $y_{t}=y_{t}^{\alpha_{\cdot}x}(\rho)$ as a solution of

[TABLE]

then

[TABLE]

It follows that to prove the theorem it suffices to show that

[TABLE]

as $\nu\downarrow 0$ uniformly with respect to $\alpha_{\cdot}\in\mathfrak{A}$ , $\beta_{\cdot}\in\mathfrak{B}$ , and $x\in G$ .

First observe (although this is an overkill) that Lemma 3.3 is applicable here when $\sigma^{i}$ ’s are independent of the first space variable. Then Corollary 3.4 is also applicable which as in Lemma 4.4 leads to the conclusion that

[TABLE]

as $\nu\downarrow 0$ uniformly with respect to $\alpha_{\cdot}\in\mathfrak{A}$ , $\beta_{\cdot}\in\mathfrak{B}$ , and $x\in G$ , where $\theta(\nu)$ is the minimum of exit times of $x_{t}$ and $x_{t}(\nu)$ from $G$ .

Next, while proving (4.18) first assume that $g\equiv 0$ . Observe that, owing to (4.19), the argument at the end of the proof of Theorem 4.6 shows that it suffices to prove the version of (4.18) when both $\tau$ and $\tau(\nu)$ are replaced with $\theta(\nu)$ (assuming $g\equiv 0$ ).

Then notice that in light of the continuity of $f$ in $x$ uniform with respect to $(\alpha,\beta)$ (cf. also (4.11))

[TABLE]

as $\nu\downarrow 0$ uniformly with respect to $\alpha_{\cdot}\in\mathfrak{A}$ , $\beta_{\cdot}\in\mathfrak{B}$ , and $x\in G$ .

Also

[TABLE]

where

[TABLE]

One sees easily as above that $I_{x}^{\alpha_{\cdot}\beta_{\cdot}}\to 0$ as $\nu\downarrow 0$ uniformly with respect to $\alpha_{\cdot}\in\mathfrak{A}$ , $\beta_{\cdot}\in\mathfrak{B}$ , and $x\in G$ .

It remains to deal with the terms containing $g$ in (4.18). Since $g\in C^{2}(\bar{G})$ , by Itô’s formula we have

[TABLE]

The second term on the right in (4.21) clearly goes to zero as $\nu\downarrow 0$ uniformly with respect to $\alpha_{\cdot}\in\mathfrak{A}$ , $\beta_{\cdot}\in\mathfrak{B}$ , and $x\in G$ . The difference of the remaining ones in (4.20) and (4.21) is shown to do the same by the first part of the proof. The theorem is proved.

Proof of Lemma 2.3. This proof if very similar to the second part of the proof of Theorem 2.2. First we assume that $g\equiv 0$ . Take $\theta=\theta^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{$ \bm{\beta} $}}^{\rho}(\alpha_{\cdot},x)x}(\rho)$ from Lemma 4.4 and note that the argument at the end of the proof of Theorem 4.6 shows that it suffices to prove the version of (2.7) when both $\tau$ and $\tau(\rho)$ are replaced with $\theta(\rho)$ (assuming $g\equiv 0$ ).

Next, observe that

[TABLE]

By Corollary 4.5 the last expression tends to zero as $\rho\downarrow 0$ uniformly with respect to $\alpha_{\cdot}\in\mathfrak{A}$ and $x\in G$ .

Also as in the above proof

[TABLE]

where $J_{x}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{$ \bm{\beta} $}}^{(\rho)}(\alpha_{\cdot},x)}$ stands for

[TABLE]

Lemma 3.1 and Corollary 4.5 convince us that $I_{x}^{\alpha_{\cdot}\text{\raise-0.60275pt\hbox{$ \bm{\beta} $}}^{(\rho)}(\alpha_{\cdot},x)}\to 0$ as $\rho\downarrow 0$ uniformly with respect to $\alpha_{\cdot}\in\mathfrak{A}$ and $x\in G$ .

It remains to deal with the terms containing $g$ in (2.7). Again by using Itô’s formula we write

[TABLE]

Similarly we transform the term with $g$ involving $\tau(\rho)$ and then we reduce the problem to estimating the terms like the ones we started with. The lemma is proved.

5. A particular case where $A$ is a singleton

Here we assume that $A$ is a singleton and will not write $\alpha$ and $\alpha_{\cdot}$ in our notation. In particular, now we are dealing with a controlled diffusion process given as a solution of the equation

[TABLE]

Its solution is denoted by $y^{\beta_{\cdot}x}_{t}$ . Our goal is to minimize

[TABLE]

over $\beta_{\cdot}\in\mathfrak{B}$ , where (according to our standard notation) $\tau^{\beta_{\cdot}x}$ is the first exit time of $y^{\beta_{\cdot}x}_{t}$ from $G$ , $f(y_{t})=f(\beta_{t},y^{\beta_{\cdot}x}_{t})$ ,

[TABLE]

In this case Theorem 2.1 becomes the following.

Theorem 5.1.

Under the assumptions of Theorem 2.1 for any $\varepsilon>0$ there exist a Borel measurable $B$ -valued function $\beta(x)$ on $\mathbb{R}^{d}$ and $\rho_{0}>0$ such that, if for $\rho\in(0,\rho_{0}]$ , we define

[TABLE]

introduce $b^{(\rho)}(z,y)$ similarly, and for $x\in G$ define the process $z_{t}=z_{t}^{x}(\rho)$ by

[TABLE]

and set $\beta^{\rho}_{t}(x)=\beta(z^{x}_{t}(\rho))$ , then

[TABLE]

Here is a version of Theorem 2.2

Theorem 5.2.

In Theorem 5.1 drop Assumption 2.3 but suppose that on $(\Omega,\mathcal{F},P)$ there is a Wiener process $(\hat{w}_{t},\mathcal{F}_{t}),t\geq 0$ , independent of $w_{t}$ . Then for any $\varepsilon>0$ there exists a constant $\nu>0$ such that all assertions of Theorem 2.1 hold true if we add to the right-hand side of (5.3) the term $\nu\,d\hat{w}_{t}$ .

*Remark 5.1**.*

In Section 6 we are going to maximize (5.2) instead of minimizing it. One problem is reduced to another just by changing signs of $f$ and $g$ . Also it is worth noting that in Section 6 the parameter used in maximization is called $\alpha_{\cdot}$ instead of $\beta_{\cdot}$ .

6. Adjoint $\varepsilon$ -optimal Markov policies

for $\alpha$

Take $\varepsilon>0$ , $\rho>0$ , $\beta(\alpha,x)$ from Theorem 2.1 use the notation (2.3) and, for $\alpha_{\cdot}\in\mathfrak{A}$ and $x\in\mathbb{R}^{d}$ , defined the controlled diffusion process $y_{t}(\rho)=y_{t}^{\alpha_{\cdot}x}(\rho)$ by

[TABLE]

with the reward function

[TABLE]

We are going to maximize (6.2) treating $\alpha$ here as $\beta$ in Section 5 and adjusting the maximization problem to the one of minimization.

However, there is a formal objection to overcome before we can translate the results of Section 5 to our situation. Namely, in Section 5, the functions $\sigma,b,c,f$ as inherited from taking $A$ as a singleton were assumed to be continuous with respect to $\beta$ . Therefore, here we need our $\sigma^{(\rho)},b^{(\rho)},c^{(\rho)},f^{(\rho)}$ to be continuous with respect to $\alpha$ and they may fail to be such because, even if $h$ in (2.3) is continuous in the first argument $\alpha$ uniformly with respect to $\beta$ , $\beta(\alpha,y+\rho z)$ can be discontinuous as a function of $\alpha$ . Indeed, for different $\alpha$ , $\beta(\alpha,x)$ can be very different functions of $x$ . However, in light of the second statement in Theorem 2.1 to make $\beta(\alpha,x)$ continuous with respect to $\alpha$ it suffices just to change the distance function in $A$ keeping it the same as $\alpha_{1},\alpha_{2}$ belong to the same $A_{i}$ and defining it as $1$ otherwise. By the way, this change in no way affects the set of policies of $\alpha$ and only allows us to formally apply the results of Section 5.

According to Theorem 5.1 for any $\varepsilon>0$ there exist a Borel measurable $A$ -valued function $\alpha^{\varepsilon}(z)$ on $\mathbb{R}^{d}$ and a Lipschitz continuous functions $\hat{\sigma}(z)$ and $\hat{b}(z)$ on $\mathbb{R}^{d}$ with values in the set of $d\times d_{1}$ -matrices and in $\mathbb{R}^{d}$ , respectively, such that, if for $x\in G$ we define the process $z^{x}_{t}$ by

[TABLE]

and set $\alpha^{\varepsilon,x}_{t}(x)=\alpha^{\varepsilon}(z^{x}_{t})$ , then

[TABLE]

Finally, due to Lemma 2.3, (6.4) implies that (2.9) holds with $3\varepsilon$ in place of $\varepsilon$ . This proves part (a) of Theorem 2.4. The proof of part (b) is quite similar and the theorem is proved.

7. Proof of Theorem 2.5

If in Theorem 14.1.6 of [9] we replace $H[u]$ and $P[u]$ by $-H[-u]$ and $-P[-u]$ , then we will see that for any $K>0$ the equation

[TABLE]

in $G$ (a.e.) with boundary condition $u_{K}=g\in C^{2}$ has a solution $u_{-K}\in W^{2}_{p}(G)$ for any $p>1$ . By following the arguments in Section 7 of [8], we conclude that $u_{-K}\uparrow v$ uniformly on $\bar{G}$ as $K\to\infty$ . Observe that (a.e.) in $G$

[TABLE]

Fix $K>0$ and $m\in\{1,2,...\}$ . In the same way in which we found above the function $\beta(x)$ we find a Borel $A$ -valued function $\alpha(x)$ such that in $G$

[TABLE]

Our goal is to prove that if $K$ and $m$ are large enough and $\rho$ is small enough, then the above $\alpha(x)$ is the one we are talking about in Theorem 2.5.

Take $y^{\beta_{\cdot}x}_{t}(\rho)$ and $\text{\hbox{$ \bm{\alpha} $}}^{\rho}_{t}(\beta_{\cdot},x)=\alpha(y_{t}^{\beta_{\cdot}x}(\rho))$ from the statement of the theorem. Introduce $\theta=\theta^{\text{\hbox{$ \bm{\alpha} $}}^{\rho}(\beta_{\cdot},x)\beta_{\cdot}x}(\rho)$ as the minimum of the first exit times of $x_{t}^{\text{\hbox{$ \bm{\alpha} $}}^{\rho}(\beta_{\cdot},x)\beta_{\cdot}x}$ and of $y_{t}^{\beta_{\cdot}x}(\rho)$ from $G$ . Then in the same way in which we arrived at Lemma 4.4 we obtain that

[TABLE]

as $\rho\downarrow 0$ uniformly with respect to $x\in G$ .

Then following closely the argument in Section 4 we get an analog of Theorem 4.6 that for any $x\in G$ , $\rho,\gamma,\kappa>0$ we have

[TABLE]

where $N_{1}(\gamma)$ is independent of $\rho,\kappa$ , $N_{1}(\gamma)\to 0$ as $\gamma\to\infty$ , $N_{2}(\kappa)$ is independent of $\rho$ , $N_{2}(\kappa)\to 0$ as $\kappa\downarrow 0$ , $N$ depends only on $d,\delta,K_{0}$ , and the diameter of $G$ , $\mu(\rho)$ is independent of $\gamma,\kappa$ and $\mu(\rho)\to 0$ as $\rho\downarrow 0$ .

After that the assertion of Theorem 2.5 is obtained by the same short argument as in Section 4 in the proof of Theorem 2.1.

Bibliography10

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] W. H. Fleming and P. E. Souganidis, On the existence of value functions of two-player, zero-sum stochastic differential games , Indiana Univ. Math. J., Vol. 38 (1989), No. 2, 293–314.
2[2] W.H. Fleming and D. Hernández-Hernández, On the value of stochastic differential games , Commun. Stoch. Anal., Vol. 5 (2011), No. 2, 341–351.
3[3] D. Gilbarg and L. Hörmander, Intermediate Schauder estimates , Archive Rational Mech. Anal., Vol. 74, No. 4 (1980), 297-318.
4[4] N.V. Krylov, “Controlled diffusion processes”, Nauka, Moscow, 1977 in Russian; English translation Springer, 1980.
5[5] N.V. Krylov, The sufficiency of the adjoint Markov strategies for controlled diffusion processes , Teoriya Veroyatnostei i eye Primeneniya, Vol. 31 (1986), No. 2, 353–358 in Russian; English transl. in Theor. Probability Appl., Vol. 31 (1987), No. 2, 304–309,
6[6] N.V. Krylov, On a representation of fully nonlinear elliptic operators in terms of pure second order derivatives and its applications , Problemy Matemat. Analiza, Vol. 59, July 2011, p. 3–24 in Russian; English translation: Journal of Mathematical Sciences, New York, Vol. 177 (2011), No. 1, 1-26.
7[7] N.V. Krylov, On the dynamic programming principle for uniformly nondegenerate stochastic differential games in domains , Stochastic Processes and their Applications, Vol. 123 (2013), No. 8, 3273–3298.
8[8] N.V. Krylov, On the dynamic programming principle for uniformly nondegenerate stochastic differential games in domains and the Isaacs equations , Probab. Theory Relat. Fields, Vol. 158 (2014), No. 3, 751–783.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

On the adjoint Markov policies in stochastic differential games

Abstract.

Key words and phrases:

2010 Mathematics Subject Classification:

1. Introduction

2. Main results

Assumption 2.1**.**

Assumption 2.2**.**

Assumption 2.3**.**

Theorem 2.1**.**

Theorem 2.2**.**

Lemma 2.3**.**

Theorem 2.4**.**

Remark 2.1*.*

Example 2.1*.*

Assumption 2.4**.**

Theorem 2.5**.**

Remark 2.2*.*

Remark 2.3*.*

Remark 2.4*.*

3. Auxiliary results

Lemma 3.1**.**

Lemma 3.2**.**

Lemma 3.3**.**

Corollary 3.4**.**

Lemma 3.5**.**

4. Proof of Theorems 2.1

Lemma 4.1**.**

Lemma 4.2**.**

Lemma 4.3**.**

Lemma 4.4**.**

Corollary 4.5**.**

Theorem 4.6**.**

Remark 4.1*.*

5. A particular case where AAA is a singleton

Theorem 5.1**.**

Theorem 5.2**.**

Remark 5.1*.*

6. Adjoint ε\varepsilonε-optimal Markov policies

7. Proof of Theorem 2.5

Assumption 2.1.

Assumption 2.2.

Assumption 2.3.

Theorem 2.1.

Theorem 2.2.

Lemma 2.3.

Theorem 2.4.

*Remark 2.1**.*

*Example 2.1**.*

Assumption 2.4.

Theorem 2.5.

*Remark 2.2**.*

*Remark 2.3**.*

*Remark 2.4**.*

Lemma 3.1.

Lemma 3.2.

Lemma 3.3.

Corollary 3.4.

Lemma 3.5.

Lemma 4.1.

Lemma 4.2.

Lemma 4.3.

Lemma 4.4.

Corollary 4.5.

Theorem 4.6.

*Remark 4.1**.*

5. A particular case where $A$ is a singleton

Theorem 5.1.

Theorem 5.2.

*Remark 5.1**.*

6. Adjoint $\varepsilon$ -optimal Markov policies