Yet again on iteration improvement for averaged expected cost control   for 1D ergodic diffusions

Svetlana Anulova; Hilmar Mai; Alexander Veretennikov

arXiv:1812.10665·math.PR·September 1, 2020

Yet again on iteration improvement for averaged expected cost control for 1D ergodic diffusions

Svetlana Anulova, Hilmar Mai, Alexander Veretennikov

PDF

TL;DR

This paper advances ergodic control for one-dimensional diffusions by establishing the ergodic HJB equation, proving solution existence and uniqueness, and demonstrating convergence of a reward improvement algorithm.

Contribution

It provides a comprehensive treatment of ergodic control with strategy-dependent coefficients, including theoretical foundations and convergence results.

Findings

01

Proved existence and uniqueness of solutions to the ergodic HJB equation.

02

Established convergence of the reward improvement algorithm.

03

Extended ergodic control theory to strategy-dependent drift and diffusion coefficients.

Abstract

The paper is a full version of the short presentation in \cite{amv17}. Ergodic control for one-dimensional controlled diffusion is tackled; both drift and diffusion coefficients may depend on a strategy which is assumed markovian. Ergodic HJB equation is established and existence and uniqueness of its solution is proved, as well as the convergence of the reward improvement algorithm.

Equations273

d X_{t}^{α}

d X_{t}^{α}

X_{0}^{α}

L^{u} (x) = b (u, x) \frac{d}{d x} + \frac{1}{2} σ^{2} (u, x) \frac{d ^{2}}{d x ^{2}}, x \in R,

L^{u} (x) = b (u, x) \frac{d}{d x} + \frac{1}{2} σ^{2} (u, x) \frac{d ^{2}}{d x ^{2}}, x \in R,

L^{α} (x) = b (α (x), x) \frac{d}{d x} + \frac{1}{2} σ^{2} (α (x), x) \frac{d ^{2}}{d x ^{2}}, x \in R .

L^{α} (x) = b (α (x), x) \frac{d}{d x} + \frac{1}{2} σ^{2} (α (x), x) \frac{d ^{2}}{d x ^{2}}, x \in R .

ρ^{α} (x) := T \to \infty lim sup \frac{1}{T} \int_{0}^{T} E_{x} f (α (X_{t}^{α}), X_{t}^{α}) d t .

ρ^{α} (x) := T \to \infty lim sup \frac{1}{T} \int_{0}^{T} E_{x} f (α (X_{t}^{α}), X_{t}^{α}) d t .

ρ^{α} (x) = T \to \infty lim sup \frac{1}{T} \int_{0}^{T} E_{x} f^{α} (X_{t}^{α}) d t .

ρ^{α} (x) = T \to \infty lim sup \frac{1}{T} \int_{0}^{T} E_{x} f^{α} (X_{t}^{α}) d t .

ρ (x) := α \in A in f T \to \infty lim sup \frac{1}{T} \int_{0}^{T} E_{x} f^{α} (X_{t}^{α}) d t .

ρ (x) := α \in A in f T \to \infty lim sup \frac{1}{T} \int_{0}^{T} E_{x} f^{α} (X_{t}^{α}) d t .

ρ^{α} (x) \equiv ρ^{α} := \int f^{α} (x) μ^{α} (d x) =: ⟨ f^{α}, μ^{α} ⟩ .

ρ^{α} (x) \equiv ρ^{α} := \int f^{α} (x) μ^{α} (d x) =: ⟨ f^{α}, μ^{α} ⟩ .

ρ := α \in A in f \int f^{α} (x) μ^{α} (d x) = α \in A in f ⟨ f^{α}, μ^{α} ⟩ .

ρ := α \in A in f \int f^{α} (x) μ^{α} (d x) = α \in A in f ⟨ f^{α}, μ^{α} ⟩ .

v^{α} (x) := \int_{0}^{\infty} E_{x} (f^{α} (X_{t}^{α}) - ρ^{α}) d t .

v^{α} (x) := \int_{0}^{\infty} E_{x} (f^{α} (X_{t}^{α}) - ρ^{α}) d t .

u \in U in f [L^{u} V (x) + f^{u} (x) - ρ] = 0, x \in R .

u \in U in f [L^{u} V (x) + f^{u} (x) - ρ] = 0, x \in R .

∣ x ∣ \to \infty lim u \in U sup x b (u, x) = - \infty.

∣ x ∣ \to \infty lim u \in U sup x b (u, x) = - \infty.

u \in U sup ∣ f^{u} (x) ∣ \leq C_{1} (1 + ∣ x ∣^{m_{1}}) .

u \in U sup ∣ f^{u} (x) ∣ \leq C_{1} (1 + ∣ x ∣^{m_{1}}) .

t \geq 0 sup ∣ E_{x} g (X_{t}^{α}) ∣ \leq C (1 + ∣ x ∣^{m}) .

t \geq 0 sup ∣ E_{x} g (X_{t}^{α}) ∣ \leq C (1 + ∣ x ∣^{m}) .

α \in A sup \int ∣ x ∣^{k} μ^{α} (d x) < \infty, \forall k > 0.

α \in A sup \int ∣ x ∣^{k} μ^{α} (d x) < \infty, \forall k > 0.

α \in A sup ∣ ρ^{α} ∣ \leq C < \infty;

α \in A sup ∣ ρ^{α} ∣ \leq C < \infty;

α \in A sup ∣ E_{x} f^{α} (X_{t}^{α}) - ρ^{α} ∣ \leq C \frac{1 + ∣ x ∣ ^{m}}{1 + t ^{k}},

α \in A sup ∣ E_{x} f^{α} (X_{t}^{α}) - ρ^{α} ∣ \leq C \frac{1 + ∣ x ∣ ^{m}}{1 + t ^{k}},

α \in A sup \frac{1}{T} \int_{0}^{T} E_{x} f^{α} (X_{t}^{α}) d t - ρ^{α} \to 0, T \to \infty.

α \in A sup \frac{1}{T} \int_{0}^{T} E_{x} f^{α} (X_{t}^{α}) d t - ρ^{α} \to 0, T \to \infty.

t \geq 0 sup ∣ E_{x} 1 (∣ X_{t}^{α} ∣ > N) ∣ \leq t \geq 0 sup E_{x} \frac{∣ X _{t}^{α} ∣ ^{m}}{N ^{m}} \leq \frac{C ( 1 + ∣ x ∣ ^{m} )}{N ^{m}} .

t \geq 0 sup ∣ E_{x} 1 (∣ X_{t}^{α} ∣ > N) ∣ \leq t \geq 0 sup E_{x} \frac{∣ X _{t}^{α} ∣ ^{m}}{N ^{m}} \leq \frac{C ( 1 + ∣ x ∣ ^{m} )}{N ^{m}} .

p^{α} (x) := \frac{d μ ^{α} ( x )}{d x} = C_{α} \frac{1}{σ ^{2} ( α ( x ) , x )} exp (2 \int_{0}^{x} \frac{b ( α ( y ) , y )}{σ ^{2} ( α ( y ) , y )} d y),

p^{α} (x) := \frac{d μ ^{α} ( x )}{d x} = C_{α} \frac{1}{σ ^{2} ( α ( x ) , x )} exp (2 \int_{0}^{x} \frac{b ( α ( y ) , y )}{σ ^{2} ( α ( y ) , y )} d y),

α sup (∣ v^{α} (x) ∣ + ∣ v^{α} (x)^{'} ∣) \leq C (1 + ∣ x ∣^{m}) .

α sup (∣ v^{α} (x) ∣ + ∣ v^{α} (x)^{'} ∣) \leq C (1 + ∣ x ∣^{m}) .

L^{α} v^{α} + f^{α} - ⟨ f^{α}, μ^{α} ⟩ = 0,

L^{α} v^{α} + f^{α} - ⟨ f^{α}, μ^{α} ⟩ = 0,

L^{α} (x) v^{α} (x) + f^{α} (x) - ⟨ f^{α}, μ^{α} ⟩ = 0.

L^{α} (x) v^{α} (x) + f^{α} (x) - ⟨ f^{α}, μ^{α} ⟩ = 0.

α sup ∣ v^{α} (x) ∣ \leq C (1 + ∣ x ∣^{m})

α sup ∣ v^{α} (x) ∣ \leq C (1 + ∣ x ∣^{m})

v^{α} (x) = \int_{0}^{\infty} E_{x} (f^{α} (X_{t}^{α}) - ρ^{α}) d t = \int_{0}^{\infty} E_{x} \overset{ˉ}{f}^{α} (\overset{ˉ}{X}_{s}^{α}) d s,

v^{α} (x) = \int_{0}^{\infty} E_{x} (f^{α} (X_{t}^{α}) - ρ^{α}) d t = \int_{0}^{\infty} E_{x} \overset{ˉ}{f}^{α} (\overset{ˉ}{X}_{s}^{α}) d s,

\overset{ˉ}{f}^{α} (x) = \frac{f ^{α} ( x ) - ρ ^{α}}{a ^{α} ( x )},

\overset{ˉ}{f}^{α} (x) = \frac{f ^{α} ( x ) - ρ ^{α}}{a ^{α} ( x )},

\overset{ˉ}{X}_{t}^{α} := X_{t^{'} (t)}^{α},

\overset{ˉ}{X}_{t}^{α} := X_{t^{'} (t)}^{α},

t \mapsto \int_{0}^{t} σ^{2} (X_{s}^{α}) d s,

t \mapsto \int_{0}^{t} σ^{2} (X_{s}^{α}) d s,

d \overset{ˉ}{X}_{t}^{α} = d \overset{ˉ}{W}_{t} + \overset{ˉ}{b}^{α} (\overset{ˉ}{X}_{t}^{α}) d t, \overset{ˉ}{b}^{α} (x) = \frac{b ^{α} ( x )}{σ ^{2} ( α ( x ) , x )},

d \overset{ˉ}{X}_{t}^{α} = d \overset{ˉ}{W}_{t} + \overset{ˉ}{b}^{α} (\overset{ˉ}{X}_{t}^{α}) d t, \overset{ˉ}{b}^{α} (x) = \frac{b ^{α} ( x )}{σ ^{2} ( α ( x ) , x )},

\overset{ˉ}{L}^{α} v (x) + \overset{ˉ}{f}^{α} (x) = 0,

\overset{ˉ}{L}^{α} v (x) + \overset{ˉ}{f}^{α} (x) = 0,

\overset{ˉ}{L}^{α} (x) = \overset{ˉ}{b} (α (x), x) \frac{d}{d x} + \frac{1}{2} \frac{d ^{2}}{d x ^{2}}, x \in R .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

1812.10665.tex

Yet again on iteration improvement for averaged expected cost control for 1D ergodic diffusions

S.V. Anulova111Institute for Information Transmission Problems, Moscow, Russia; email: anulova @ mail.ru 222For the first author this research has been supported by the Russian Foundation for Basic Research grant no. 17-01-00633 $\_$ a, H. Mai333CREST and ENSAE ParisTech, France; email: hilmar.mai @ gmail.com 444The second author thanks the Institut Louis Bachelier for financial support., A.Yu. Veretennikov555University of Leeds, UK, & National Research University Higher School of Economics, & Institute for Information Transmission Problems, Moscow, Russia; email: a.veretennikov @ leeds.ac.uk 666The third author is grateful to the financial support by the DFG through the CRC 1283 “Taming uncertainty and profiting from randomness and low regularity in analysis, stochastics and their applications” at Bielefeld University during his stay there in August 2017; also, for this author this study has been funded by the Russian Academic Excellence Project ’5-100’ and by the Russian Foundation for Basic Research grant no. 17-01-00633 $\_$ a. All the authors gratefully acknowledge the support and hospitality of the Oberwolfach Research Institute for Mathematics (MFO) during the RiP programme in June 2014 where this study was initiated.

Abstract

An ergodic Bellman’s (HJB) equation is proved for a uniformly ergodic 1D controlled diffusion with variable diffusion and drift coefficients both depending on control; convergence of the values provided by Howard’s reward improvement algorithm to the value which is a component of the unique solution of Bellman’s equation is established.

1 Introduction

The paper is a complete version of the short presentation without detailed proofs in [1]. Issues of reliability which was in the title of [1] are not addressed here, all proofs are completed and the results are extended in comparison to the cited article. However, an application to reliability seems fruitful and is one of the motivations for the present paper; a corresponding remark about it can be found below. One more motivation is to allow the diffusion coefficient to depend on control. Indirectly, the main result below may be considered as a version of a rigorous realisation of the rather instructive and deliberately non-rigorous example from [15, Ch. 1, §1] where the point was the vanishing at infinity of the expectation of a current cost. Beside a more detailed calculus in step 3 of the proof, here we tackle the issue of the HJB equation(s) satisfied everywhere and/or almost everywhere more precisely than in [1].

We consider a one-dimensional stochastic differential equation (SDE) on the probability space $(\Omega,\mathcal{F},(\mathcal{F}_{t}),P)$ with a one-dimensional $(\mathcal{F}_{t})$ Wiener process $B=(B_{t})_{t\geq 0}$ with coefficients $b$ and $\sigma$ , and with a stationary control function $\alpha$ (called strategy in the sequel)

[TABLE]

Let a compact set $U\subset\mathbb{R}$ be a set where any strategy takes its values. The functions $b$ and $\sigma$ on $U\times\mathbb{R}$ are assumed Borel; later on some further conditions will be imposed, but we note straight away that $\sigma$ will be assumed non-degenerate and that a weak solution of the equation (1) always exists and is Markov and strong Markov, see [16, 17, 14]. Denote the class of all Borel functions $\alpha$ with values in $U$ by $\cal A$ . For $u\in U$ and $\alpha(\cdot)\in{\cal A}$ denote

[TABLE]

and

[TABLE]

Denote by ${\cal K}$ the class of functions on $U\times\mathbb{R}$ (also just on $\mathbb{R}$ ) growing no faster than some polynomial. The running cost function $f$ will always be chosen from this class. The averaged cost function corresponding to the strategy $\alpha\in{\cal A}$ is then defined as

[TABLE]

For a strategy $\alpha\in{\cal A}$ the function $f^{\alpha}:\mathbb{R}\to\mathbb{R},\;f^{\alpha}(x)=f(\alpha(x),x),\;x\in\mathbb{R}$ , is defined. Then (2) has an equivalent form

[TABLE]

Now, the cost function for the model under consideration is defined as

[TABLE]

It will be assumed that for every $\alpha\in{\cal A}$ the solution of the equation (1) $X^{\alpha}$ is Markov ergodic, i.e., there exists a limiting in total variation distribution $\mu^{\alpha}$ of $X_{t}^{\alpha},\;t\to\infty$ , this distribution $\mu^{\alpha}$ does not depend on the initial condition $X_{0}=x\in\mathbb{R}$ , is unique and is invariant for the generator $L^{\alpha}$ . The cost function $\rho^{\alpha}$ then does not depend on $x$ and can be rewritten as

[TABLE]

Then what we want to find (compute) is the value

[TABLE]

For any strategy $\alpha\in{\cal A}$ let us also define an auxiliary function

[TABLE]

The convergence of this integral will follow from the assumptions.

The first goal of this paper is to show the ergodic HJB or Bellman’s equation on the pair $(V,\rho)$

[TABLE]

This assumes showing uniqueness of the second component ( $\rho$ ) along with the property that it coincides with the cost from (6). The meaning of the first component $V$ will be explained later. The uniqueness of $V$ will be shown up to an additive constant.

The class where the solution $(V,\rho)$ will be studied is the family of all Borel functions $V$ and constants $\rho\in\mathbb{R}$ such that $V$ has two Sobolev derivatives which are all locally integrable in any power, and $V$ itself should have a moderate grow at infinity not faster than some polynomial. Respectively, the equation (7) is to be understood almost everywhere; yet, in the 1D situation and under our assumptions it will follow straightforwardly that this equation is actually satisfied for all $x\in\mathbb{R}$ . Note that the first derivative can be considered as continuous (due to the embedding theorems), and the second derivative will be always taken Borel, as one of the Borel representatives of Lebesgue’s measurable function.

The second goal of the paper is to show how to approach the solution $\rho$ of the main problem by some successive approximation procedure called the Reward Improvement Algorithm (RIA). It is interesting that under our minimal assumptions on regularity of strategies for the weak SDE solution setting it is yet possible to justify a monotonic convergence of the “exact” RIA; compare to [15, ch.1, §4] where it was necessary to work with “approximate” RIA (called Bellman–Howard’s iteration procedure there) and with regularized Lipschitz strategies.

Concerning the equation (7), it may look like it lacks some boundary conditions: indeed, a 2nd order PDE normally does require certain boundary conditions, which, for example, in the considered 1D case simply means two boundary conditions at two end-points if the equation is on a bounded interval. However, this is the equation “in the whole space” and we are going to solve it in a specific class of functions $V$ – namely, bounded (if $f$ is assumed bounded), or, at most, moderately growing (if $f$ may admit some moderate growth), – which in some sense substitutes the (Dirichlet) boundary conditions at $\pm\infty$ . Note that a similar situation can be found in the theory of Poisson equations in the whole space (see, for example, [PV, 32]).

Concerning a full uniqueness for the solution of (7), note that with any solution $(V,\rho)$ and for any constant $C$ , the couple $(V+C,\rho)$ is also a solution. There are two close enough options how to tackle this fact: either accept that uniqueness will be established up to a constant, or choose a certain “natural” constant satisfying some “centering condition” as will be done below.

To guarantee ergodicity, we will assume the “blanket” recurrence conditions (see below), which in some sense provide a uniform recurrence for any strategy. Conditions of this type are sometimes considered too restrictive; however, they do allow to include models and cases not covered earlier in this theory and for this reason we regard this restriction as a reasonable price for the time being. It is likely that such restrictions may be relaxed so as to include the “near monotonicity” type conditions (see [5]).

Let us say just a few words about the history of the problem. More can be found in the references provided below. Earlier results on ergodic control in continuous time were obtained in [22], [26], [6], et al. In his book [22] Mandl established apparently first results on ergodic (averaged) control for controlled 1D diffusion on a finite interval with boundary conditions including jumps from the boundary. The author established the HJB equation and proved uniqueness of the couple (up to a constant for the first component). Improvement of control was discussed, too, however, without convergence.

Morton [26] considered the 1D case (a multi-dimensional case too but under stronger assumptions: we do not touch it in this paper) with a price function defined by (6) without any relation to (4). He proved ([26, Theorem 1]) that the optimal price does satisfy the ergodic Bellman’s equation; that the policy determined by Argmax (in our setting Argmin) in the Bellman’s equation is optimal within some rather special class of Markov policies which are fixed functions outside some bounded interval; a certain inequality for the optimal price and any solution of Bellman’s equation; a remark about RIA; however, neither is the uniqueness for the Bellman’s equation solutions established, nor is the convergence of RIA towards a solution proved.

Discrete time controlled models were considered in the monographs [9], [11], [12], [28], and others, and in the papers [2], [24], [29], etc.

Continuous time controlled processes were treated in the 80s in a chapter of the monograph [6] where ergodic control for stable diffusions was considered. Arapostathis and Borkar [4], Arapostathis [3], Arapostathis, Borkar and Ghosh [5] treated diffusions with “relaxed control” and the diffusion coefficient not depending on the control, under weaker recurrence assumptions (i.e., under two types of condition, stable or near-monotone). In this setting, they establish Bellman’s equation, existence, uniqueness, and RIA convergence. In this paper we allow the diffusion coefficient to depend on control and we do not use relaxed control.

The latest works include [3], [5], [29], see also the references therein. Although devoted to another type of models – piecewise-linear Markov ones – the monograph [8] may also be mentioned here. In the very first papers and books compact cases with some auxiliary boundary conditions – so as to simplify ergodicity – were studied; convergence of the improvement control algorithms were studied only partially. In later investigations noncompact spaces are allowed; however, apparently, ergodic control in the diffusion coefficient $\sigma$ of the process has not been tackled earlier. The reader may consult [6] and [15] for research on controlled diffusion processes on a finite horizon, or on infinite horizon with discount (technically equivalent to killing).

In most of the works on the topic, measurability of the optimal or improved strategy (see below) is assumed. Yet, it is a subtle issue and in our case we give references – the basic one is [30] – and verify the conditions which provide this measurability.

The paper consists of four sections: 1 – Introduction, 2 – Assumptions and some auxiliaries, 3 – Main result and its proof, and the last one is the Appendix (not numbered). We will use the convention that arbitrary constants $C$ in the calculus may change from line to line.

2 Assumptions and some auxiliaries

To ensure ergodicity of $X^{\alpha}$ under any stationary control strategy $\alpha\in{\cal A}$ , we make the following assumptions on the drift and diffusion coefficients.

(A1)

(boundedness, non-degeneracy, regularity) The functions $b$ and $\sigma$ are Borel bounded in their variables; $|b(u,x)|\leq C_{b}$ , $|\sigma(u,x)|\leq C_{\sigma}$ , $\sigma$ is uniformly non-degenerate, $|\sigma(u,x)|^{-1}\leq C_{\sigma}$ ; the functions $\sigma(u,x)$ , $b(u,x)$ , $f^{u}(x)$ are continuous in $u$ for every $x$ . 2. (A2)

(recurrence)

[TABLE] 3. (A3)

(running cost) The function $f$ belongs to the class $\cal K$ of functions which are Borel measurable in $x$ for each $u$ and admit a uniform in $u$ polynomial bound: there exist constants $C_{1},m_{1}>0$ such that for any $x$ ,

[TABLE] 4. (A4)

(compactness of $U$ ) The set $U$ is compact. 5. (A5)

(additional regularity) The functions $b$ , $\sigma$ , and $f$ are of the class $C^{1}$ in $x$ for each $u$ with uniformly bounded derivatives.

We will need the following three lemmata.

Lemma 1.

Let the assumptions (A1) – (A3) hold true. Then

•

For any $C_{1},m_{1}>0$ there exist $C,m>0$ such that for any strategy $\alpha\in{\cal A}$ and for any function $g$ growing no faster than $C_{1}(1+|x|^{m_{1}})$ ,

[TABLE]

•

For any $\alpha\in{\cal A}$ , the invariant measure $\mu^{\alpha}$ integrates any polynomial and

[TABLE]

•

For any strategy $\alpha\in{\cal A}$ the function $\rho^{\alpha}$ is a constant, and

[TABLE]

moreover, for any $k>0$ and $f\in{\cal K}$ , there exist $C,m>0$ such that

[TABLE]

and

[TABLE]

Proof. Follows from [31, Theorems 5, 6]. Note that in [31] the solution of the SDE under investigation should be weakly unique, and it also must be a homogeneous Markov and strong Markov process; for the equation (1) it is all true by virtue of [16, Theorem 3], [17], and [14, Theorems 2, 3], as no continuity of the diffusion coefficient is required for this in the 1D case. (NB: In [14, Theorem 3] no continuity is needed even for $D\geq 1$ , but then weak uniqueness is established in the 1D case only [16, Theorem 3].)

Corollary 1.

Under the same assumptions,

[TABLE]

The proof is straightforward by Bienaymé – Chebyshev –Markov’s inequality.

Remark 1.

Note that because $D=1$ , under the assumptions (A1)–(A2) for any Borel function $\alpha\in{\cal A}$ there is a unique stationary measure $\mu^{\alpha}$ , which is equivalent to the Lebesgue measure $\Lambda$ . The latter follows from the formula for the unique stationary density

[TABLE]

where $C_{\alpha}$ is a normed constant. The fact that $p^{\alpha}$ is a stationary density can be seen from a substitution to the equation of stationarity $(L^{\alpha})^{*}p=0$ (see, for example, [13, Lemma 4.16, equation (4.70)]); its uniqueness in the class of integrable functions satisfying the normalizing condition $\int p\,dx=1$ can be justified via the explicit solution of the stationarity equation in the 1D case which we leave to the readers.

In the next Lemma (as well as later in the main Theorem) we use Sobolev spaces $W^{2}_{p,loc}$ with $p>1$ . (this notation are taken from [19, Chapter 2], although, in some other sources it is denoted by $W^{2,p}_{loc}$ .) Although all main statements can be stated without them, this is done in order to mimick the steps in the proof where these spaces show up naturally due to the direct references, even though the dimension equals one, in which case, of course, some calculus can be simlipified.

Lemma 2.

Let the assumptions (A1) – (A3) be satisfied. Then for any strategy $\alpha\in{\cal A}$ the cost function $v^{\alpha}$ has the following properties:

The function $v^{\alpha}$ is continuous as well as $(v^{\alpha})^{\prime}$ , and there exist $C,m>0$ both depending only on the constants in (A1)–(A3) such that

[TABLE] 2. 2.

$v^{\alpha}\in W^{2}_{p,loc}$ * for any $p\geq 1$ .* 3. 3.

$v^{\alpha}\in C^{1,Lip}$ * (i.e., $(v^{\alpha})^{\prime}$ is locally Lipschitz).* 4. 4.

$v^{\alpha}$ * satisfies a Poisson equation in the whole space,*

[TABLE]

in the Sobolev sense; in particular, for almost every $x\in\mathbb{R}$

[TABLE] 5. 5.

The solution of the equation (16) is unique up to an additive constant in the class of Sobolev solutions $W^{2}_{p,loc}$ with any $p>1$ with no more than some (any) polynomial growth of the solution $v^{\alpha}$ . 6. 6.

$\langle v^{\alpha},\mu^{\alpha}\rangle=0.$

Proof.

Firstly, the inequality

[TABLE]

follows immediately from (9) and from the assumptions.

Further, let us use a random change of time in the definition of $v^{\alpha}$ :

[TABLE]

where

[TABLE]

and $\bar{X}^{\alpha}_{s}$ is the process $X^{\alpha}_{t}$ with a changed time which makes the diffusion coefficient equal to one:

[TABLE]

where the function $t^{\prime}(t)$ is the inverse to the mapping

[TABLE]

see [23, Chapter 2.5], or [10, Theorem 15.5]. The process $\bar{X}^{\alpha}_{t}$ satisfies an SDE

[TABLE]

with a new Wiener process $\displaystyle\bar{W}_{t}=\int_{0}^{t^{\prime}(t)}\sigma(\alpha(X^{\alpha}_{s}),X^{\alpha}_{s})\,dW_{s}$ , see the same references [23, Chapter 2.5], or [10, Theorem 15.5].

Further, it follows from (18) and (19) that the function $v^{\alpha}$ is a solution of the equation

[TABLE]

where

[TABLE]

Moreover, the last integral in (18) can only converge if $\langle\bar{f}^{\alpha},\bar{\mu}^{\alpha}\rangle=0$ , where $\bar{\mu}^{\alpha}$ is the unique invariant measure of the Markov diffusion $\hat{X}^{\alpha}_{t}$ , since otherwise the integral in the right hand side of (18) diverges. Existence and uniqueness of such an invariant measure (along with a convergence rate) follows, for example, from [31, Theorem 5] (among many other possible references) due to the assumption (A1). The property $v^{\alpha}\in W^{2}_{p,loc}$ for any $p\geq 1$ and the bound

[TABLE]

for some $m>0$ follow both from [27, Theorem 1] due to the equation (20).

Further, given (15), the bound $v^{\alpha}\in C^{1,Lip}$ (which means a local, not global Lipschitz condition for $(v^{\alpha})^{\prime}$ ) follows from the equation (20), as $(v^{\alpha})^{\prime\prime}$ turns out to be locally bounded by virtue of this equation. The same equation(20) implies (16) and (17). Uniqueness of solution for the equation (20) and, hence, also for (16) up to an additive constant follows from [27]; see also [13, Lemma 4.13 and Remark 4.3]. Finally, the last assertion of the Lemma is due to the Fubini theorem,

[TABLE]

by virtue of the absolute convergence

[TABLE]

∎

Lemma 3.

Let the assumptions (A1) – (A2) hold true. Then $\exists\;0<C_{1}<C_{2}$ such that for any strategy $\alpha$ for the constant $C_{\alpha}$ from (14) we have,

[TABLE]

Also, for any $k$ there is a constant $C$ such that for every $x$ uniformly in $\alpha$

[TABLE]

and there exist constants $c,\kappa>0$ such that uniformly in $\alpha$

[TABLE]

Proof. Follows straightforwardly from the recurrence and boundedness assumptions and from the formula (14).

3 Main results

We accept in this section that a solution of the SDE with any Markov strategy exists and is a weak solution. However, it is important in the proof that it is unique in distribution, strong Markov and Markov ergodic; repeat what was already mentioned in the proof of the Lemma 1, that all of these follow from [16] and from the assumptions (A1) and (A2) (see [31] about ergodicity).

For any pair $(v,\rho):\;v\in\bigcap_{p>1}W^{2}_{p,loc},\,\rho\in\mathbb{R}$ , define

[TABLE]

and

[TABLE]

where

[TABLE]

The functions $v$ and $v^{\prime}$ may be regarded as continuous and absolutely continuous due to the embedding theorems [19]. The function $F[v,\rho](\cdot)$ is defined by the formula above as a function of the class $L_{p,loc}$ for any $p>1$ ; in particular, it is Lebesgue measurable and as such it is defined only a.e. with respect to $x$ . We may and will use a (any) Borel measurable version of the function $F[v,\rho]$ , the existence of which follows, for example, from Luzin’s Theorem [21]). It will be shown in the sequel that the function $F_{1}[v^{\prime},\rho](x)$ is continuous in $x$ and locally Lipschitz in the two other variables.

Let us recall what a reward improvement algorithm (RIA) is. We start with some (any) stationary strategy $\alpha_{0}\in{\cal A}$ . Denote the corresponding cost, the invariant measure, and the auxiliary function $\rho_{0}=\rho^{\alpha_{0}}=\langle f^{\alpha_{0}},\mu^{\alpha_{0}}\rangle$ , and $v_{0}=v^{\alpha_{0}}$ . If for some $n=0,1,\ldots$ the triple $(\alpha_{n},\rho_{n},v_{n})$ is determined, then the strategy $\alpha_{n+1}$ is defined as follows: for a.e. $x$ the function $\alpha_{n+1}$ is chosen so that for each $x$

[TABLE]

or, in other words,

[TABLE]

We assume that a Borel measurable version of such strategy may be chosen; see the reference in the Appendix. To this strategy $\alpha_{n+1}$ there correspond the unique invariant measure $\mu^{\alpha_{n+1}}$ , the value $\rho_{n+1}:=\langle f^{\alpha_{n+1}},\mu^{\alpha_{n+1}}\rangle$ , and the function $v_{n+1}=v^{\alpha_{n+1}}$ .

Theorem 1.

Let the assumptions (A1) – (A4) be satisfied. Then:

1. For any $n$ , $\rho_{n+1}\leq\rho_{n}$ , and there exists a limit $\rho_{n}\downarrow\tilde{\rho}.$

2. The sequence $(v_{n})$ is tight in $C^{1}[-N,N]$ for each $N>0$ , and there exists a bounded sequence of constants $\beta_{n}$ such that there exists a limit $\lim_{n}(v_{n}(x)+\beta_{n})=:\tilde{v}(x)$ .

3. The couple $(\tilde{v},\tilde{\rho})$ solves the equation (7).

4. This solution $(\tilde{v},\tilde{\rho})$ is unique – up to an additive constant for $\tilde{v}$ – in the class of functions growing no faster than some (any) polynomial and belonging to the class $W_{p,loc}^{2}$ for any $p>0$ for the first component and for $\tilde{\rho}\in\mathbb{R}$ .

5. The component $\tilde{\rho}$ in the couple $(\tilde{v},\tilde{\rho})$ coincides with $\rho$ .

6. Under the additional assumption (A5), $\tilde{v}^{\prime\prime}\in Lip_{loc}$ .

In the short presentation [1], beside the restrictive assumption $f\in[0,1]$ and maximisation instead of minimisation, only a sketch of the proof was offered with many details explained too briefly; uniqueness of $\tilde{v}$ was not addressed. Here the full proof is given. NB: We never compare the trajectories of two SDE solutions in one formula and the processes corresponding to different strategies may be defined on different probability spaces.

Proof.

1. Due to (21) and (16), for almost every (a.e.) $x\in\mathbb{R}$ ,

[TABLE]

and also for a.e. $x\in\mathbb{R}$ ,

[TABLE]

So,

[TABLE]

Let us apply Ito – Krylov’s formula (see [15]) with expectations (also known as Dynkin’s formula) to $(v_{n}-v_{n+1})(X^{\alpha_{n+1}}_{t})$ : we have for any $x\in\mathbb{R}$ ,

[TABLE]

The equality in the equation (3) holds for all $x\in\mathbb{R}$ and not just a.e. since the functions $v_{n}$ are Sobolev solutions of Poisson equations locally integrable in any degree with their derivatives up to the second order. Such functions can be regarded as continuous due to the embedding theorems [19]. In addition, the functions $\mathbb{E}_{x}v_{n}(X^{\alpha_{n+1}}_{t})$ , $\mathbb{E}_{x}v_{n+1}(X^{\alpha_{n+1}}_{t})$ , and $\displaystyle\mathbb{E}_{x}\int_{0}^{t}(L^{\alpha_{n+1}}v_{n}-L^{\alpha_{n+1}}v_{n+1})(X^{\alpha_{n+1}}_{s})\,ds$ as functions of $x$ for each $t>0$ are all Hölder continuous, being solutions of non-degenerate parabolic equations [18]. We also used the fact that the distribution of $X^{\alpha_{n+1}}_{s}$ for almost all $s>0$ is absolutely continuous with respect to the Lebesgue measure due to the non-degeneracy and by virtue of Krylov’s estimates [15]; due to this reason and because $v_{n},v_{n+1}\in C$ , the a.e. inequality (3) implies (3) for every $x$ . Further, since the left hand side in (3) is bounded for a fixed $x$ by virtue of the Lemma 2, we divide all terms of the latter inequality by $t$ and let $t\to\infty$ to get,

[TABLE]

as required. Thus, $\rho_{n}\geq\rho_{n+1}$ , so that $\rho_{n}\downarrow\tilde{\rho}$ (since the sequence $\rho_{n}$ is bounded for $f\in{\cal K}$ , see (10) in the Lemma 1) with some $\tilde{\rho}$ . So, the RIA does converge.

Note that clearly $\tilde{\rho}\geq\rho$ , since $\rho$ is the infimum over all Markov strategies, while $\tilde{\rho}$ is the infimum over some countable subset of them. Later on we shall show that they do coincide.

Now we want to show that there exists a bounded sequence of real values (non-random!) $\{\beta_{n}\}$ such that $v_{n}+\beta_{n}\to\tilde{v}$ , so that the couple $(\tilde{v},\tilde{\rho})$ satisfies the equation (7), and that $\tilde{\rho}$ here is unique, as well as $\tilde{v}$ in some sense. In the first instance we will do it for some subsequence $n_{j}$ ; eventually the convergence of the whole sequence $v_{n}$ will follow from the uniqueness of the solution of Bellman’s equation, although, it is not important for the proof of the Theorem.

2. Let us show local tightness of the family of functions $(v_{n})$ in $C^{1}$ . Note that the equation (7) is equivalent to the following:

[TABLE]

while the equation

[TABLE]

is equivalent to

[TABLE]

According to the Lemma 2, the functions $v_{n+1}^{\prime}$ are uniformly locally bounded. Since the sequence $\rho_{n+1}$ is bounded and due to the uniform local boundedness of the functions $f(\alpha_{n+1}(x),x)$ and uniform nondegeneracy of $a$ , it follows that $(v^{\prime\prime}_{n})$ locally are uniformly bounded and satisfy the uniform in $n$ growth bounds similar to (15) for the function itself and for its first derivative due to the equation (for example, due to (24)). This guarantees compactness of $(v_{n})$ in $C^{1}$ locally.

3. Due to the (local) compactness property showed in the previous step, by the diagonal procedure from any infinite sub-family of functions $v_{n}$ it is possible to choose a converging in $C^{1}_{loc}$ subsequence. We want to show that up to a constant the limit is unique. For this aim, first of all we shall see shortly that if some $v_{n_{j}}(x)$ has a limit as $n_{j}\to\infty$ , say, $\tilde{v}(x)$ (locally in $C$ ) then $v_{n_{j}+1}(x)+\beta_{n_{j}}$ has the same limit, where $\beta_{n}$ is some bounded sequence of real values. (In fact, what will be established is a little bit more complicated but still enough for our purposes.) We have,

[TABLE]

and

[TABLE]

Let us rewrite it as follows,

[TABLE]

In other words, the function $v_{n}$ solves the Poisson equation with the second order operator $L^{\alpha_{n+1}}$ and the “right hand side” $-(f^{\alpha_{n+1}}(x)+\psi_{n+1}(x)-\rho_{n})$ . This is only possible if the expression $f^{\alpha_{n+1}}(x)+\psi_{n+1}(x)-\rho_{n}$ is centered with respect to the invariant measure $\mu^{n+1}$ because Poisson equations in the whole space have no solutions for non-centered right hand sides (see, for example, [27]). This implies that

[TABLE]

So,

[TABLE]

Now denote

[TABLE]

We have,

[TABLE]

So, there is a constant $\beta_{n}=\langle w_{n},\mu^{n+1}\rangle$ such that

[TABLE]

Let us show that for any $N>0$ ,

[TABLE]

First of all, note that all functions $\psi_{n}$ and, hence, $\psi^{2}_{n}$ are uniformly locally bounded and may only grow polynomially fast,

[TABLE]

with some $C,m$ the same for all values of $n$ . which follows from the definition (26), and the properties of derivatives $v^{\prime}_{n}$ and $v^{\prime\prime}_{n}$ , and from the Lemma 3, and due to

[TABLE]

Now let us rewrite the equation (28) via a stationary version of our diffusion, say, $\tilde{X}^{n+1}_{t}$ :

[TABLE]

(Note that if we knew that $w_{n}$ were centered with respect to the invariant measure $\mu^{n+1}$ then we would have $\beta_{n}=0$ ; however, the functions $v_{n}$ and $v_{n+1}$ are both centered with respect to two different measures, and this is the reason why their difference is not just small, but small up to some additive constant; this very constant is denoted by $\beta_{n}$ .) Using the coupling idea (see, for example, [31]), let us consider the independent processes $X^{n+1}_{t}$ and $\tilde{X}^{n+1}_{t}$ on the same probability space (just considering the product space) and denote the moment of the first meeting

[TABLE]

It is known (see [31, Theorem 5]) that under our recurrence assumptions for any $k>0$ there are some constants $C_{k},m$ such that uniformly with respect to $n$ ,

[TABLE]

Denote

[TABLE]

Since $\tau$ is a stopping time and because the couple $(X^{n+1}_{t},\tilde{X}^{n+1}_{t})$ is strong Markov (see [14]), the process $(\hat{X}^{n+1}_{t})$ is also strong Markov equivalent to $(X^{n+1}_{t})$ . Therefore, it is possible to rewrite,

[TABLE]

Hence, using the fact that after $\tau$ the processes $\hat{X}^{n+1}_{t}$ and $\tilde{X}^{n+1}_{t}$ coincide, we obtain

[TABLE]

Thus, using Cauchy-Buniakovsky-Schwarz inequality and Fubini Theorem, we have,

[TABLE]

Now, let us take any $\epsilon>0$ and use the inequality $\sqrt{a}\leq\frac{\epsilon}{2}+\frac{a}{2\epsilon}$ . We estimate,

[TABLE]

Let us first consider the stationary term. We have,

[TABLE]

Given (30) and because any stationary measure integrates uniformly any power function, let us find such $N$ that uniformly with respect to $n$ ,

[TABLE]

which is possible due to the Lemmata 1 and 3, and also such that $N>\epsilon^{-2}$ . Then choose $n(\epsilon)$ such that

[TABLE]

Due to Krylov’s estimate

[TABLE]

for any function $g\geq 0$ , and also

[TABLE]

for any $s>0$ (follows from [15, Theorem 2.2.3]), we evaluate with $n\geq n(\epsilon)$ :

[TABLE]

Indeed, for any $k\geq 0$ we have,

[TABLE]

This argument works for the non-stationary process as well: due to the same Krylov’s estimate,

[TABLE]

Further,

[TABLE]

Finally, using (13), we obtain with some $m$ ,

[TABLE]

Overall, this shows that with the appropriately chosen (uniformly bounded) $\beta_{n}$ ,

[TABLE]

By virtue of the results in [31], for any $k>0$ there are $C,m>0$ such that

[TABLE]

Therefore, taking any $k>1$ , we have that the series in (32) converges providing us an estimate

[TABLE]

In other words, the difference $w_{n}(x)-\beta_{n}=v_{n}-v_{n+1}-\beta_{n}$ is locally uniformly converging to zero as $n\to\infty$ . Naturally, it also implies that for any subsequence $n_{j}$ such that $v_{n_{j}}$ converges locally uniformly in $C^{1}$ we have that $v^{\prime}_{n_{j}}$ and $v^{\prime}_{n_{j}+1}$ may only converge to the same limit, i.e., derivatives $v^{\prime}_{n_{j}}-v^{\prime}_{n_{j}+1}\to 0$ (locally uniformly) as $j\to\infty$ . Indeed, otherwise we just integrate to show that the limits of $v_{n_{j}}$ and $v_{n_{j}+1}+\beta_{n_{j}}$ are different, which contradicts to what was established earlier.

4. What we want to do now is to pass to the limit as $j\to\infty$ in the equations

[TABLE]

where $(n_{j},j\to\infty)$ is any sequence such that $v_{n_{j}}$ converges (locally uniformly) in $C^{1}$ . From

[TABLE]

by subtracting zero a.e. (25), we obtain a.e.,

[TABLE]

Now we want to show that

[TABLE]

which in turn implies by differentiation the equation equivalent to (7),

[TABLE]

for any $x$ , with the note that $\tilde{v}^{\prime}$ is absolutely continuous.

Let us show that (34), indeed, implies (35). Note that $G[v_{n_{j}}](x)-\rho_{n_{j}}\leq 0$ (a.e.). Let us divide (34) by $a_{n_{j}+1}=a^{\alpha_{n_{j}+1}}$ and use $\delta:=\inf_{u,x}a^{u}(x)>0$ : we get a.e. with some $K>0$ ,

[TABLE]

So, we have just shown that a.e.,

[TABLE]

The next trick is to note that again due to (3) and $\rho_{n_{j}}\geq\rho_{n_{j}+1}$ , and since $\delta\leq a\leq C$ ,

[TABLE]

which implies that with some $C,c>0$ ,

[TABLE]

Since $v_{n_{j}}^{\prime}$ is absolutely continuous, we can integrate (39) to get the following: for any (not a.e.!) $x$ and $r$ with $x>r$ ,

[TABLE]

As it was explained earlier, due to the compactness in $C^{1}$ we may assume that

[TABLE]

in $C$ locally for some $\tilde{v}\in C^{1}$ , as $j\to\infty$ . Note that $\tilde{v}^{\prime}$ is absolutely continuous, which follows from the uniform local boundedness of $v_{n}^{\prime\prime}$ . Therefore, it is possible to get to the limit in the inequality (40) as $j\to\infty$ : for any $x>r$ ,

[TABLE]

since the right hand side in (40) clearly goes to zero.

Here

[TABLE]

So, from (40) we obtain the desired equation (35)

[TABLE]

In turn, since $F_{1}[\tilde{v}^{\prime}(s),\tilde{\rho}](s)$ is continuous and absolutely continuous in $s$ , it implies $\tilde{v}\in C^{2}$ , and by (well-defined) differentiation we get the equation (36) for every $x\in\mathbb{R}$ .

In the sequel it will follow from the uniqueness of solution to the Bellman’s equation that actually the whole sequence $v_{n}$ converges up to an additive constant sequence locally uniformly in $C^{1}$ to a single limit. However, it is not needed for our proof.

5. Uniqueness for $\rho$ in (7). Assume that there are two solutions of the (HJB) equation, $(v^{1},\rho^{1})$ and $(v^{2},\rho^{2})$ with $v^{i}\in{\cal K}$ , $i=1,2$ :

[TABLE]

Earlier it was shown that both $v^{1}$ and $v^{2}$ are classical solutions with locally Lipschitz second derivatives. Let $w(x):=v^{1}(x)-v^{2}(x)$ and consider two strategies $\alpha_{1},\alpha_{2}\in{\cal A}$ such that $\alpha_{1}(x)\in\mbox{Argmax}_{u\in U}(L^{u}w(x))$ and $\alpha_{2}(x)\in\mbox{Argmin}_{u\in U}(L^{u}w(x))$ , and let $X^{1}_{t},X^{2}_{t}$ be solutions of the SDEs corresponding to each strategy $\alpha_{i}$ , $i=1,2$ . Note that due to the measurable choice arguments – see the Appendix – such Borel strategies exist; corresponding weak solutions also exist. Let us denote

[TABLE]

Then,

[TABLE]

Similarly,

[TABLE]

We have,

[TABLE]

and

[TABLE]

Due to Dynkin’s formula we have,

[TABLE]

Since the left hand side here is bounded for a fixed $x$ , due to the Lemma 1 we get,

[TABLE]

Similarly, considering $\alpha_{2}$ we conclude that

[TABLE]

From here, due to the boundedness of the left hand side (Lemma 1) we get,

[TABLE]

Thus, $\rho^{1}-\rho^{2}\geq 0$ and, hence,

[TABLE]

6. Why $\rho=\tilde{\rho}$ ? Recall that for any initial $\alpha_{0}\in{\cal A}$ , the sequence $\rho_{n}$ converges to the same value $\tilde{\rho}$ , which is a unique component of solution of the equation (7). Let us take any $\epsilon>0$ and consider a strategy $\alpha_{0}$ such that

[TABLE]

Since the sequence $(\rho_{n})$ decreases, the limit $\tilde{\rho}$ must satisfy the same inequality,

[TABLE]

Due to uniqueness of $\tilde{\rho}$ as a component of solution of the equation (7) and since $\epsilon>0$ is arbitrary, we find that

[TABLE]

But also $\tilde{\rho}\geq\rho$ since $\tilde{\rho}$ is the infimum of the cost function values over a smaller – just countable – family of strategies. So, in fact,

[TABLE]

7. Uniqueness for $V$ . Let us have another look at the earlier equations in the step 6, replacing $\rho^{2}-\rho^{1}$ by zero as we already know that the second component in the solution is unique:

[TABLE]

Clearly, $h_{1}\geq 0$ with $h_{1}\not=0$ – i.e., with $\Lambda(x:\,h_{1}(x)>0)>0$ – would imply that $\langle h_{1},\mu_{1}\rangle\,>0$ , which contradicts a zero left hand side (after division by $t$ with $t\to\infty$ ). So, we conclude that

[TABLE]

Since $\mu_{1}\sim\Lambda$ due to (14), by virtue of Krylov’s estimate we have that $0\leq\mathbb{E}_{x}\int_{0}^{t}h_{1}(X^{1}_{s})\,ds\leq N\|h_{1}\|_{L_{1}}=0$ . So, in fact,

[TABLE]

Further, from (41) and due to the last statement of the Lemma 1 it follows that

[TABLE]

Hence, $w(x)$ is a constant. Recall that uniqueness of the first component $V$ is stated up to a constant, and it was just established that

[TABLE]

8. Returning to the second statement of Theorem 1, note that due to uniqueness of the solution of the HJB equation, convergence of the whole sequence $(v_{n})$ up to additive constants depending only on $n$ is to the unique limit $v$ .

9. Local Lipschitz for $\tilde{v}^{\prime\prime}$ . Recall that a certain additional regularity of the coefficients is assumed. We have from (36) and (15),

[TABLE]

Therefore, it follows from the Cauchy Mean Value Theorem that

[TABLE]

So, due to Lipschitz condition on $b^{u},a^{u}$ in $x$ and in virtue of the nondegeneracy of $a^{u}$ ,

[TABLE]

The required local Lipschitz property of the function $\tilde{v}^{\prime\prime}$ has been verified. ∎

Appendix A On a measurable choice

For the reader’s convenience we repeat the main arguments from [1] concerning the measurable choice a little bit more precisely. Recall that in the presentation of RIA in the beginning of the section 3 existence of a Borel measurable version of such a strategy was assumed, which minimizes some function for any fixed $x$ . In our case existence of such a Borel strategy can be justified by using Stschegolkow’s (Shchegolkov’s) theorem [30] (see also [20, Satz 39], or [7, Theorem 1]). According to this result, if any section of a (nonempty) Borel set $E$ in the direct product of two complete separable metric spaces is sigma-compact (i.e., equals a countable sum of closed bounded sets) then a Borel selection belonging to this set $E$ exists.

In our case we have, $F[v,\rho](x)=\inf_{u\in U}\left[L^{u}v(x)+f^{u}(x)-\rho\right]$ . For a fixed $v$ representing any $v_{n}$ in the proof, denote $\chi(u,x):=L^{u}v(x)+f^{u}(x)-\rho$ and $\bar{\chi}(x):=F[v,\rho](x)$ , and let $E=\{(u,x):\chi(u,x)=\bar{\chi}(x)\}$ . This set is nonempty because the minima here are attained for each $x$ . Its section for any $x\in\mathbb{R}$ is $E_{x}:=\{u:\chi(u,x)=\bar{\chi}(x)\}$ . Any such section is nonempty and closed and, hence, Borel. Indeed, if $E_{x}\ni u_{n}\to u,\,n\to\infty$ , then $\chi(u_{n},x)\to\chi(u,x)$ due to the continuity of $\chi(\cdot,x)$ .

The set $E$ itself is Borel, too. To show this, take any $\epsilon>0$ and denote

[TABLE]

This set is Borel because the functions $\chi(u,x)$ and $\bar{\chi}(x\}$ are: the latter one since the minimum in $\min_{u}\chi(u,x)$ can be taken over some countable dense subset of $U$ . (Recall that the second derivative $v^{\prime\prime}$ is Borel measurable by our convention.) It remains to note that

[TABLE]

so that $E$ is also Borel.

Thus, Stschegolkow’s theorem is applicable and, hence, a Borel measurable improved strategy $\alpha_{n+1}$ in the induction step of the RIA does exist for each step $n$ . By the same reason Borel strategies $\alpha_{1}$ and $\alpha_{2}$ exist in the steps 6 and (implicitly) 8.

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S.V. Anulova, H. Mai, A.Yu. Veretennikov, On averaged expected cost control as reliability for 1D ergodic diffusions, Reliability: Theory & Applications (RT&A) , 12, 4(47), 31-38, 2017.
2[2] A. Arapostathis, V.S. Borkar, E. Fernãndes-Gaucherand, M.K. Ghosh, and S.I. Markus, Discrete-time controlled Markov processes with average cost criterion: a survey, SIAM J. Control and Optimization , 31(2), 282-344, 1993.
3[3] A. Arapostathis, On the policy iteration algorithm for nondegenerate controlled diffusions under the ergodic criterion. In: Optimization, control, and applications of stochastic systems , Systems Control Found. Appl., 1–12. Birkhäuser/Springer, New York, 2012.
4[4] A. Arapostathis, V.S. Borkar, A relative value iteration algorithm for non-degenerate controlled diffusions, SIAM Journal on Control and Optimization , 50(4), 1886-1902, 2012.
5[5] A. Arapostathis, V.S. Borkar, M.K. Ghosh, Ergodic control of diffusion processes, Encyclopedia of Mathem. and its Appl. 143. Cambridge: CUP, 2012.
6[6] V.S. Borkar, Optimal control of diffusion processes, Harlow: Longman Scientific & Technical; New York: John Wiley & Sons, 1989.
7[7] L.D. Brown, R. Purves, Measurable selections of extrema. Ann. Stat. , 1, 902–912, 1973.
8[8] O.L. do Valle Costa, F. Dufour, Continuous Average Control of Piecewise Deterministic Markov Processes , Springer, New York et al., 2013.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

1812.10665.tex

Abstract

1 Introduction

2 Assumptions and some auxiliaries

Lemma 1**.**

Corollary 1**.**

Remark 1**.**

Lemma 2**.**

Proof.

Lemma 3**.**

3 Main results

Theorem 1**.**

Proof.

Appendix A On a measurable choice

Lemma 1.

Corollary 1.

Remark 1.

Lemma 2.

Lemma 3.

Theorem 1.