Improved order 1/4 convergence for piecewise constant policy   approximation of stochastic control problems

Espen R. Jakobsen; Athena Picarelli; Christoph Reisinger

arXiv:1901.01193·math.PR·January 7, 2019

Improved order 1/4 convergence for piecewise constant policy approximation of stochastic control problems

Espen R. Jakobsen, Athena Picarelli, Christoph Reisinger

PDF

TL;DR

This paper improves the theoretical error rate for approximating value functions in controlled diffusion processes using piecewise constant policies from 1/6 to 1/4, aligning with PDE literature standards.

Contribution

The authors refine existing proofs to establish an improved 1/4 convergence rate, demonstrating optimality and enhancing error estimates for stochastic control approximations.

Findings

01

Error rate improved from 1/6 to 1/4

02

Aligns stochastic control approximation with PDE results

03

Provides refined proof techniques for convergence analysis

Abstract

In N.V. Krylov, Approximating value functions for controlled degenerate diffusion processes by using piece-wise constant policies, Electron. J. Probab., 4(2), 1999, it is proved under standard assumptions that the value functions of controlled diffusion processes can be approximated with order 1/6 error by those with controls which are constant on uniform time intervals. In this note we refine the proof and show that the provable rate can be improved to 1/4, which is optimal in our setting. Moreover, we demonstrate the improvements this implies for error estimates derived by similar techniques for approximation schemes, bringing these in line with the best available results from the PDE literature.

Equations102

X_{s} = x + \int_{0}^{s} b_{α_{r}} (t + r, X_{r}) d r + \int_{0}^{s} σ_{α_{r}} (t + r, X_{r}) d W_{r} for s \geq t .

X_{s} = x + \int_{0}^{s} b_{α_{r}} (t + r, X_{r}) d r + \int_{0}^{s} σ_{α_{r}} (t + r, X_{r}) d W_{r} for s \geq t .

\displaystyle J^{\alpha}(t,x):={\mathbb{E}}^{\alpha}_{t,x}\big{[}\int^{T-t}_{0}f_{\alpha_{r}}(t+r,X_{r})\,\mathrm{d}r+g(X_{T-t})\big{]}.

\displaystyle J^{\alpha}(t,x):={\mathbb{E}}^{\alpha}_{t,x}\big{[}\int^{T-t}_{0}f_{\alpha_{r}}(t+r,X_{r})\,\mathrm{d}r+g(X_{T-t})\big{]}.

v (t, x) := α \in A sup J^{α} (t, x) .

v (t, x) := α \in A sup J^{α} (t, x) .

∣ φ (t, x, a) - φ (s, y, a) ∣ \leq C_{0} (∣ x - y ∣ + ∣ t - s ∣^{1/2}) and ∣ φ (t, x, a) ∣ \leq C_{0};

∣ φ (t, x, a) - φ (s, y, a) ∣ \leq C_{0} (∣ x - y ∣ + ∣ t - s ∣^{1/2}) and ∣ φ (t, x, a) ∣ \leq C_{0};

∣ g (x) - g (y) ∣ \leq C_{1} ∣ x - y ∣,

∣ g (x) - g (y) ∣ \leq C_{1} ∣ x - y ∣,

∣ f (t, x, a) - f (s, y, a) ∣ \leq C_{1} (∣ x - y ∣ + ∣ t - s ∣^{1/2}) and ∣ f (t, x, a) ∣ \leq C_{1} .

v_{h} (t, x) := α \in A_{h} sup J^{α} (t, x) .

v_{h} (t, x) := α \in A_{h} sup J^{α} (t, x) .

0 \leq v (s, x) - v_{h} (s, x) \leq C h^{1/4},

0 \leq v (s, x) - v_{h} (s, x) \leq C h^{1/4},

ρ_{ε} (t, x) := \frac{1}{ε ^{d + 2}} ρ (\frac{t}{ε ^{2}}, \frac{x}{ε}),

ρ_{ε} (t, x) := \frac{1}{ε ^{d + 2}} ρ (\frac{t}{ε ^{2}}, \frac{x}{ε}),

ρ \in C^{\infty} (R^{d + 1}), ρ \geq 0, supp ρ = (0, 1) \times {∣ x ∣ < 1}, \int_{supp ρ} ρ (e) d e = 1.

ρ \in C^{\infty} (R^{d + 1}), ρ \geq 0, supp ρ = (0, 1) \times {∣ x ∣ < 1}, \int_{supp ρ} ρ (e) d e = 1.

φ^{(ε)} (t, x) := (φ * ρ_{ε}) (t, x) = \int_{0 \leq s \leq ε^{2}} \int_{∣ y ∣ \leq ε} φ (t - s, x - y) ρ_{ε} (s, y) d s d y .

φ^{(ε)} (t, x) := (φ * ρ_{ε}) (t, x) = \int_{0 \leq s \leq ε^{2}} \int_{∣ y ∣ \leq ε} φ (t - s, x - y) ρ_{ε} (s, y) d s d y .

∥ φ - φ^{(ε)} ∥_{\infty} \leq C ε and \partial_{t}^{m} D_{x}^{k} φ^{(ε)}_{\infty} \leq C ε^{1 - 2 m - k} for k + m \geq 1.

∥ φ - φ^{(ε)} ∥_{\infty} \leq C ε and \partial_{t}^{m} D_{x}^{k} φ^{(ε)}_{\infty} \leq C ε^{1 - 2 m - k} for k + m \geq 1.

∣ v (t, x) - \tilde{v} (t, x) ∣ \leq C ε .

∣ v (t, x) - \tilde{v} (t, x) ∣ \leq C ε .

{\mathbb{E}}^{\alpha}_{t,x}\Big{[}\underset{s\in[0,T-s]}{\sup}|X_{s}-\tilde{X}_{s}|^{2}\Big{]}\leq C(\|b-b^{({\varepsilon})}\|^{2}_{\infty}+\|\sigma-\sigma^{({\varepsilon})}\|^{2}_{\infty})\leq C{\varepsilon}^{2}

{\mathbb{E}}^{\alpha}_{t,x}\Big{[}\underset{s\in[0,T-s]}{\sup}|X_{s}-\tilde{X}_{s}|^{2}\Big{]}\leq C(\|b-b^{({\varepsilon})}\|^{2}_{\infty}+\|\sigma-\sigma^{({\varepsilon})}\|^{2}_{\infty})\leq C{\varepsilon}^{2}

\displaystyle u_{h}(s,x):=\underset{\alpha\in\mathcal{A}_{h},e\in\mathcal{E}_{h}}{\sup}\mathbb{E}^{(\alpha,e)}_{s,x}\Big{[}\int^{S-s}_{0}f_{\alpha_{r}}(s+r,\hat{X}_{r})\,\mathrm{d}r+g(\hat{X}_{S-s})\Big{]},

\displaystyle u_{h}(s,x):=\underset{\alpha\in\mathcal{A}_{h},e\in\mathcal{E}_{h}}{\sup}\mathbb{E}^{(\alpha,e)}_{s,x}\Big{[}\int^{S-s}_{0}f_{\alpha_{r}}(s+r,\hat{X}_{r})\,\mathrm{d}r+g(\hat{X}_{S-s})\Big{]},

\hat{X}_{\cdot} = x + \int_{0}^{\cdot} b_{α_{r}} (s + r + e_{1, r}, \hat{X}_{r} + e_{2, r}) d r + \int_{0}^{\cdot} σ_{α_{r}} (s + r + e_{1, r}, \hat{X}_{r} + e_{2, r}) d W_{r} .

\hat{X}_{\cdot} = x + \int_{0}^{\cdot} b_{α_{r}} (s + r + e_{1, r}, \hat{X}_{r} + e_{2, r}) d r + \int_{0}^{\cdot} σ_{α_{r}} (s + r + e_{1, r}, \hat{X}_{r} + e_{2, r}) d W_{r} .

∣ v_{h} (t, x) - u_{h} (t, x) ∣ \leq C ε

∣ v_{h} (t, x) - u_{h} (t, x) ∣ \leq C ε

∣ u_{h} (t, x) - u_{h} (s, y) ∣ \leq C (∣ x - y ∣ + ∣ t - s ∣^{1/2})

∣ u_{h} (t, x) - u_{h} (s, y) ∣ \leq C (∣ x - y ∣ + ∣ t - s ∣^{1/2})

\displaystyle u_{h}(s,x)=\underset{\begin{subarray}{c}a\in A\\ 0\leq\eta\leq{\varepsilon}^{2},|\xi|\leq{\varepsilon}\end{subarray}}{\sup}\mathbb{E}^{(a,(\eta,\xi))}_{s,x}\Big{[}\int^{h}_{0}f_{a}(s+r,\hat{X}_{r})\,\mathrm{d}r+u_{h}(s+h,\hat{X}_{h})\Big{]}.

\displaystyle u_{h}(s,x)=\underset{\begin{subarray}{c}a\in A\\ 0\leq\eta\leq{\varepsilon}^{2},|\xi|\leq{\varepsilon}\end{subarray}}{\sup}\mathbb{E}^{(a,(\eta,\xi))}_{s,x}\Big{[}\int^{h}_{0}f_{a}(s+r,\hat{X}_{r})\,\mathrm{d}r+u_{h}(s+h,\hat{X}_{h})\Big{]}.

\displaystyle\big{|}u_{h}(t,x)-u_{h}^{({\varepsilon})}(t,x)\big{|}\leq C{\varepsilon}

\displaystyle\big{|}u_{h}(t,x)-u_{h}^{({\varepsilon})}(t,x)\big{|}\leq C{\varepsilon}

\partial_{t}^{m} D_{x}^{k} u_{h}^{(ε)}_{\infty} \leq C ε^{1 - 2 m} ε^{1 - k}

\partial_{t}^{m} D_{x}^{k} u_{h}^{(ε)}_{\infty} \leq C ε^{1 - 2 m} ε^{1 - k}

u_{h}^{({\varepsilon})}(t,x)\geq E^{a}_{t,x}\Big{[}\int^{h}_{0}f_{a}(t+r,\tilde{X}_{r})\,\mathrm{d}r+u_{h}^{({\varepsilon})}(t+h,\tilde{X}_{h})\Big{]}

u_{h}^{({\varepsilon})}(t,x)\geq E^{a}_{t,x}\Big{[}\int^{h}_{0}f_{a}(t+r,\tilde{X}_{r})\,\mathrm{d}r+u_{h}^{({\varepsilon})}(t+h,\tilde{X}_{h})\Big{]}

E_{s, x}^{a} [u_{h}^{(ε)} (s + h, \tilde{X}_{h})]

E_{s, x}^{a} [u_{h}^{(ε)} (s + h, \tilde{X}_{h})]

\displaystyle=u_{h}^{({\varepsilon})}(s,x)+h(L_{a}u_{h}^{({\varepsilon})})(s,x)+\mathbb{E}_{s,x}^{a}\Big{[}\int_{0}^{h}\int_{0}^{t}L_{a}(L_{a}u_{h}^{({\varepsilon})})(s+r,\tilde{X}_{r})\,\mathrm{d}r\,\mathrm{d}t\Big{]}

L_{a} := \partial_{t} + b_{a}^{T} D_{x} + \frac{1}{2} t r [σ_{a} σ_{a}^{T} D_{x}^{2}] .

L_{a} := \partial_{t} + b_{a}^{T} D_{x} + \frac{1}{2} t r [σ_{a} σ_{a}^{T} D_{x}^{2}] .

\displaystyle(L_{a}u_{h}^{({\varepsilon})})(s,x)+f_{a}(s,x)\leq\frac{1}{h}\sup_{a\in A}\Big{(}\|L_{a}f_{a}\|_{\infty}+\|L_{a}L_{a}u_{h}^{({\varepsilon})}\|_{\infty}\Big{)}\int_{0}^{h}\int_{0}^{t}\,\mathrm{d}r\,\mathrm{d}t.

\displaystyle(L_{a}u_{h}^{({\varepsilon})})(s,x)+f_{a}(s,x)\leq\frac{1}{h}\sup_{a\in A}\Big{(}\|L_{a}f_{a}\|_{\infty}+\|L_{a}L_{a}u_{h}^{({\varepsilon})}\|_{\infty}\Big{)}\int_{0}^{h}\int_{0}^{t}\,\mathrm{d}r\,\mathrm{d}t.

(L_{a} u_{h}^{(ε)}) (s, x) + f_{a} (s, x) \leq C ε^{- 3} h .

(L_{a} u_{h}^{(ε)}) (s, x) + f_{a} (s, x) \leq C ε^{- 3} h .

E_{s, x}^{α} [u_{h}^{(ε)} (T - h, \tilde{X}_{T - h - s})]

E_{s, x}^{α} [u_{h}^{(ε)} (T - h, \tilde{X}_{T - h - s})]

\displaystyle\leq u_{h}^{({\varepsilon})}(s,x)-\mathbb{E}_{s,x}^{\alpha}\Big{[}\int_{0}^{T-s}f_{\alpha_{t}}(s+t,\tilde{X}_{t})\,\mathrm{d}t\Big{]}+TC{\varepsilon}^{-3}{h}.

E_{s, x}^{α} [u_{h} (T - h, \tilde{X}_{T - h - s})]

E_{s, x}^{α} [u_{h} (T - h, \tilde{X}_{T - h - s})]

\displaystyle\leq v_{h}(s,x)-\mathbb{E}_{s,x}^{\alpha}\Big{[}\int_{0}^{T-s}f_{\alpha_{t}}(s+t,\tilde{X}_{t})\,\mathrm{d}t\Big{]}+C({\varepsilon}+{\varepsilon}^{-3}{h}),

E_{s, x}^{α} [(u_{h} (T - h, \tilde{X}_{T - h - s})]

E_{s, x}^{α} [(u_{h} (T - h, \tilde{X}_{T - h - s})]

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Improved order 1/4

convergence for piecewise constant policy approximation of stochastic control problems

Espen R. Jakobsen

Department of Mathematical Sciences, Norwegian University of Science and Technology, 7491 Trondheim, N

[email protected]

,

Athena Picarelli

Department of Economics, University of Verona, via Cantarane 24, 37129 Verona, I

[email protected]

and

Christoph Reisinger

Mathematical Institute, University of Oxford, Andrew Wiles Building, OX2 6GG, Oxford, UK

[email protected]

Abstract.

In N. V. Krylov, Approximating value functions for controlled degenerate diffusion processes by using piece-wise constant policies, Electron. J. Probab., 4(2), 1999, it is proved under standard assumptions that the value functions of controlled diffusion processes can be approximated with order 1/6 error by those with controls which are constant on uniform time intervals. In this note we refine the proof and show that the provable rate can be improved to 1/4, which is optimal in our setting. Moreover, we demonstrate the improvements this implies for error estimates derived by similar techniques for approximation schemes, bringing these in line with the best available results from the PDE literature.

1. Introduction

In this paper we derive improved error estimates for approximations of value functions of stochastic optimal control problems. Let $(\Omega,\mathcal{F},\{\mathcal{F}_{t}\}_{t\geq 0},\mathbb{P})$ be a complete filtered probability space, $(W_{t})_{t\geq 0}$ a $p$ -dimensional $\{\mathcal{F}_{t}\}$ -Wiener process on $(\Omega,\mathcal{F},\mathbb{P})$ , and $\mathcal{A}$ the set of progressively measurable processes with values in a set $A\subseteq{\mathbb{R}}^{m}$ . For any $\alpha\in\mathcal{A}$ , $x\in{\mathbb{R}}^{d}$ , $t\in[0,T]$ (with $T>0$ ), let $X_{\cdot}=X^{\alpha,t,x}_{\cdot}$ be the (controlled) Itô diffusion which satisfies

[TABLE]

Here we use the notation $\varphi_{a}(\cdot,\cdot)=\varphi(\cdot,\cdot,a)$ for any $a\in A$ and function $\varphi$ . For a given terminal cost function $g$ and running cost $f$ , the optimal control problem consists of maximizing over $\alpha\in\mathcal{A}$ the expected total cost

[TABLE]

The indices on the expectation ${\mathbb{E}}$ indicate that the law of the process depends on the starting point and control. Finally, the value function of the optimal control problem is defined by

[TABLE]

We consider the following set of assumptions:

(H1)

$A$ is a compact set;

(H2)

$b:[0,T]\times{\mathbb{R}}^{d}\times A\to{\mathbb{R}}^{d}$ and $\sigma:[0,T]\times{\mathbb{R}}^{d}\times A\to{\mathbb{R}}^{d\times p}$ are continuous functions. For $\varphi\in\{b,\sigma\}$ , there exists $C_{0}\geq 0$ such that for every $t,s\in[0,T],x,y\in{\mathbb{R}}^{d},a\in A$ :

[TABLE]

(H3)

$g:{\mathbb{R}}^{d}\to{\mathbb{R}}$ and $f:[0,T]\times{\mathbb{R}}^{d}\times A\to{\mathbb{R}}$ are continuous functions. There exists $C_{1}\geq 0$ such that for every $t,s\in[0,T],x,y\in{\mathbb{R}}^{d},a\in A$ :

[TABLE]

Observe that under assumptions (H1), (H2), and for any $\alpha\in{\mathcal{A}}$ , there exists a unique strong solution of equation (1.1). For simplicity, we assume data and coefficients to be Lipschitz continuous in space and $1/2$ -Hölder continuous in time, and have included no discount factor, but it is not difficult to extend our results to include discounting and a lower Hölder regularity for $f$ and $g$ .

We aim to estimate the error introduced by approximating the set of measurable controls $\mathcal{A}$ by piecewise constant controls. Let $h>0$ be the discretization parameter and $\mathcal{A}_{h}$ the subset of $\mathcal{A}$ of processes which are constant in the intervals $[nh,(n+1)h)$ for $n\in{\mathbb{N}}$ .111Note that in [8] the length of intervals is $h^{2}$ , however, in absence of further discretisations, we use $h$ for simplicity. The value function associated with this restricted set of controls is defined by

[TABLE]

Note that the definition of $v_{h}$ in (1.4) under the “shifted” dynamics in (1.2) and (1.1) implies that the control discretisation is always centered at $t$ . This will be important for establishing a dynamic programming principle. This is not, though, how one would compute $v_{h}$ in practice, as discussed in the penultimate paragraph of this section.

From a probabilistic perspective, it is clear that 0 is a lower bound for $v-v_{h}$ since $\mathcal{A}_{h}\subseteq\mathcal{A}$ . Under our assumptions, an upper bound on $v-v_{h}$ of order $h^{\frac{1}{6}}$ is given in [8].

An indication that the order 1/6 from [8] might be improved is the fact that under the same regularity assumptions as above it is shown in [5] that a fully discrete semi-Lagrangian scheme applied to the corresponding HJB equation has order 1/4 in the timestep for an Euler approximation. This scheme does not distinguish between constant or other controls over individual timesteps. It would therefore be somewhat surprising if the scheme which employs further approximations was closer to the original problem than the one which only holds the policies constant over timesteps.

A slightly different angle to the problem is provided in [3], where the authors construct from (1.4) a subsolution to the HJB equation corresponding to (1.3) by a second order local expansion in $t$ . This results in an order 1 error bound in the case of smooth solutions, in contrast to 1/2 which would be obtained in the smooth case by the method in [8] (see also Section 2.3 below). However, in the general non-regular case, the order in [3] is limited by a switching system approximation of order ${\varepsilon}^{1/3}$ (for a switching cost chosen of order ${\varepsilon}$ ), which, combined with an error term of the regularised system of order $h/{\varepsilon}^{3}$ (for regularisation parameter ${\varepsilon}$ ), results in an order $1/10$ error by optimisation of ${\varepsilon}$ .

In this paper, we combine the advantages of both methods to obtain order 1/4. The reason we can improve the error estimates of Krylov is that we use a higher order expansion when we derive the truncation error. Our discussion (see Subsection 2.3) also shows that no further improvement can be obtained in this way: our new proof uses the maximal possible order of the truncation error.

Piecewise constant policy time stepping has been used in a numerical method for solving Hamilton-Jacobi-Bellman equations in [13], where the computational advantage comes from the fact that over the time intervals in which the policy is constant, only linear PDEs have to be solved. This has been extended to mixed optimal stopping and control problems with nonlinear expectations and jumps in [6]. A further benefit lies in the inherent parallelism so that the linear problems with different controls can be solved on parallel processors. A proof of convergence is given in these works using pure viscosity solution arguments, but no rate of convergence is provided. Early results on this type of approximation can be found in [10] and an extension with “predicted” controls is proposed in [7].

In the remainder of this article, we give in Section 2 a proof of the order 1/4 convergence of the piecewise constant policy approximation, and deduce the linear convergence in the case of sufficiently regular solutions and data. We then outline in Section 3 the improved orders which can be derived for approximation schemes by similar techniques.

2. Main result

We begin by stating the main result. Throughout the entire section we work under assumptions (H1)–(H3).

Theorem 2.1.

For any $s\in[0,T]$ , $x\in{\mathbb{R}}^{d}$ , and $h>0$ , we have

[TABLE]

where the constant $C$ only depends on the constants in Assumptions (H2) and (H3).

A major difficulty in the proof of Theorem 2.1 is the fact that typically $v$ and $v_{h}$ are not smooth. Even in the non-degenerate case where $v$ is $C^{2+\delta}$ , $v_{h}$ is still not smooth in general. A simple example is the Black-Scholes-Barenblatt equation resulting from an uncertain volatility model (see [11]). Here, the control is of bang-bang type and the optimal control problem for piecewise constant policies reduces to taking the maximum of two smooth functions at the end of each time interval, so that for $t$ on the time mesh, $v_{h}(t,\cdot)$ will only be Lipschitz (in the spatial argument).

Since the proof of Theorem 2.1 relies on repeated use of the Itô formula, we need to work with smooth functions, both for the coefficients and value functions $v$ and $v_{h}$ . This means that we need to introduce several regularization arguments and use Krylov’s method of shaking the coefficients.

2.1. Background results and regularisation

In this section, we introduce Krylov’s regularization and give related preliminary results. Some of the proofs are given in [8] and not repeated here; see also [1, 2] for analogous results proved with PDE arguments. In order to apply Itô’s formula twice, $\sigma,b,f,g,v$ , and $v_{h}$ must be regularized. Let ${\varepsilon}>0$ and the mollifier $\rho_{\varepsilon}$ be defined as

[TABLE]

where

[TABLE]

For any function $\varphi:[0,T]\times{\mathbb{R}}^{d}\to{\mathbb{R}}$ , we define $\varphi^{({\varepsilon})}\in C^{\infty}([0,T]\times{\mathbb{R}}^{d})$ to be the mollification of a suitable extension of $\varphi$ to $[-{\varepsilon}^{2},T]$

[TABLE]

We can always take an extension which preserves the Hölder continuity in time and Lipschitz continuity in space of $\varphi$ . Then standard estimates for mollifiers imply that

[TABLE]

Let $\tilde{X}_{\cdot}$ be the solution of (1.1) with coefficients replaced by $b^{({\varepsilon})}$ and $\sigma^{({\varepsilon})}$ . Then we denote by $\tilde{v}$ and $\tilde{J}^{\alpha}$ the solution and cost function of the optimal control problem (1.1)–(1.3) where $X_{\cdot}$ is replaced by $\tilde{X}_{\cdot}$ and $f,g$ by $f^{({\varepsilon})},g^{({\varepsilon})}$ .

Proposition 2.1.

There exists $C\geq 0$ such that for any $t\in[0,T],x\in{\mathbb{R}}^{d}$

[TABLE]

Proof.

The result follows from the definitions of $v$ and $\tilde{v}$ since by standard continuous dependence results for SDEs and Lipschitz and Hölder continuity of $f,g,b,\sigma$ ,

[TABLE]

for some constant $C$ independent of the control $\alpha$ . ∎

To avoid heavy notation, we will use $(f,g,b,\sigma)$ instead of $(f^{({\varepsilon})},g^{({\varepsilon})},b^{({\varepsilon})},\sigma^{({\varepsilon})})$ in the rest of the paper, keeping in mind estimates (2.3) for their derivatives. We now proceed with the regularisation of the value function $v_{h}$ . Let $\mathcal{E}_{h}$ be the set of progressively measurable processes $e\equiv(e_{1},e_{2})$ with values in $(-{\varepsilon}^{2},0)\times B_{\varepsilon}(0)$ (where $B_{\varepsilon}(0)$ denotes the ball of radius ${\varepsilon}$ in ${\mathbb{R}}^{d}$ ) which are constant in each time interval $[nh,(n+1)h)$ . Letting $S=T+{\varepsilon}^{2}$ , we define for any $s\in[0,S],x\in{\mathbb{R}}^{d}$ the following “perturbed” value function

[TABLE]

where $\hat{X}_{\cdot}=\hat{X}^{(\alpha,e),s,x}_{\cdot}$ is the solution of the following SDE with (mollified and) “shaken coefficients”:

[TABLE]

Proposition 2.2.

There exists a constant $C\geq 0$ such that

[TABLE]

for any $t\in[0,T],x\in{\mathbb{R}}^{d}$ , and

[TABLE]

for any $t,s\in[0,S]$ and $x,y\in{\mathbb{R}}^{d}$ . Moreover, for any $s\in[0,S-h]$ , $u_{h}$ satisfies the following dynamic programming principle (DPP):

[TABLE]

Proof.

These are standard results. The first two inequalities can be found e.g. in [8, Corollary 3.2], while (2.6) is a consequence of [8, Lemma 3.3]. ∎

Following the notation introduced above we consider the regularised (mollified) function $u_{h}^{({\varepsilon})}$ .

Proposition 2.3.

The function $u_{h}^{({\varepsilon})}$ belongs to $C^{\infty}([0,T]\times{\mathbb{R}}^{d})$ . There exists a constant $C\geq 0$ such that

[TABLE]

for $t\in[0,T],x\in{\mathbb{R}}^{d}$ , and

[TABLE]

for any $k,m\geq 1$ . Moreover, $u_{h}^{({\varepsilon})}$ satisfies the following super-dynamic programming principle

[TABLE]

for any $a\in A$ , $0\leq\eta\leq{\varepsilon}^{2},|\xi|\leq{\varepsilon}$ , $t\in[0,T-h],x\in{\mathbb{R}}^{d}$ .

Proof.

The first part follows from Proposition 2.2 and (2.3), while (2.9) follows by the definitions of $u_{h}^{({\varepsilon})}$ , $\hat{X}_{t}$ , $\tilde{X}_{t}$ , and the inequality $\int\sup(\cdots)\geq\sup\int(\cdots)$ . See [8, bottom of page 9] for more details. Here $\alpha_{t}\equiv a$ constant over $t\in[0,h]$ by a slight abuse of notation. ∎

2.2. Proof of Theorem 2.1

1) Upper bound on $L_{a}u_{h}^{({\varepsilon})}+f_{a}$ . By two applications of the Itô (or Dynkin) formula,

[TABLE]

for $s\leq T-h$ , $x\in\mathbb{R}^{{d}}$ , $a\in A$ , where the generator $L_{a}$ of the diffusion process is defined as

[TABLE]

Inserting this equality into the dynamic programming inequality (2.9) in Proposition 2.3, applying Itô once to the $f_{a}$ -term, and dividing by $h$ , we find that

[TABLE]

Since the leading term $L_{a}L_{a}u_{h}^{({\varepsilon})}$ is a sum of terms of the form $\phi_{1}(\partial_{t}^{m}\phi_{2})(D_{x}^{k}\phi_{3})$ with $\phi_{i}\in\{\mu,\sigma\sigma^{T},u_{h}^{({\varepsilon})}\}$ and $2m+k\leq 4$ , by (2.3) and (2.8),

[TABLE]

2) Upper bound on $\tilde{v}-v_{h}$ for $s\in[0,T-h)$ . Let $\alpha\in\mathcal{A}$ , $s\in[0,T-h]$ , and $x\in{\mathbb{R}}^{d}$ . By Itô’s formula and part 1),

[TABLE]

From (2.7) in Proposition 2.3 and the first part of Proposition 2.2, it then follows that

[TABLE]

for a generic constant $C$ . Since by definition (2.4) and the regularity of $u_{h}$ (Proposition 2.2),

[TABLE]

we conclude that

[TABLE]

Since $\alpha\in\mathcal{A}$ was arbitrary, by the definition of $\tilde{v}$ (see just before Proposition 2.1),

[TABLE]

3) Upper bound on $\tilde{v}-v_{h}$ for $s\in[T-h,T]$ . By the definition of $\tilde{J}^{\alpha}$ (see just before Proposition 2.1), Itô’s formula, the regularity of $f$ and $g$ , and using (2.3), there is a constant $C>0$ such that for every $\alpha\in\mathcal{A}$ and $s\in[T-h,T]$ ,

[TABLE]

Then it follows from the definitions of $\tilde{v}$ and $v_{h}$ that

[TABLE]

and hence also $|\tilde{v}(s,x)-v_{h}(s,x)|\leq 2C{\varepsilon}^{-1}h$ for $s\in[T-h,T]$ .

4) Conclusion: Using Proposition 2.1 and parts 2) and 3), we have that

[TABLE]

for $s\in[s,T]$ and $x\in{\mathbb{R}}^{d}$ . Taking ${\varepsilon}=h^{1/4}$ then concludes the proof of the right-hand inequality in (2.1). The left-hand inequality is immediate since $\mathcal{A}_{h}\subseteq\mathcal{A}$ .

2.3. The maximal rate and comparison with [8]

If the data and value functions are smooth enough, we can adapt the proof of Theorem 2.1 to obtain the maximal rate of the approximation, which is 1. More specifically, if we assume $v_{h}$ and $f$ sufficiently smooth, we have in (2.10) $\sup_{a\in A}(\|L_{a}(L_{a}u_{h}^{({\varepsilon})})\|_{\infty}+\|L_{a}f\|_{\infty}){\leq C}<\infty$ with $C$ independent of ${\varepsilon}$ . Therefore, instead of (2.11), the conclusion of step 1) in the previous proof gives

[TABLE]

for some constant $C$ independent of $a\in A$ and ${\varepsilon}$ . Moreover, if we assume that $b$ , $\sigma$ and $f$ are Lipschitz in $t$ uniformly in $x$ and $a$ , and $g$ belongs to $C_{b}^{2}({\mathbb{R}}^{d})$ , then by standard results $u_{h}$ will be Lipschitz in $t$ . Hence, we find in step 2) that

[TABLE]

Sending ${\varepsilon}$ to zero then gives that $\tilde{v}$ converges to $v$ , and we have the following result:

Proposition 2.4.

Additionally to assumptions (H1)-(H3), let $b,\sigma$ and $f$ be Lipschitz continuous in $t$ uniformly with respect to $x$ and $a$ , and $g\in C^{2}_{b}({\mathbb{R}}^{n})$ . If $\sup_{a\in A}(\|L_{a}(L_{a}v_{h})\|_{\infty}+\|L_{a}f\|_{\infty})<\infty$ , then there exists $C>0$ such that for any $s\in[0,T]$ , $x\in{\mathbb{R}}^{d}$ , and $h>0$ , we have

[TABLE]

This is the maximal rate that this approximation can reach. The reason is that the order obtained by applying Itô twice in step 1) of the proof cannot be improved. This can easily be checked by repeatedly applying Itô to obtain higher order error expansions and then noting that all such expansions contain terms of order $h$ .

Step 1) of the proof also explains why Krylov in [8] got a less sharp result than ours. After one application of Itô, he used the moment bound ${\mathbb{E}}[|x-X_{r}|]\leq\sqrt{{\mathbb{E}}[|x-X_{r}|^{2}]}\leq C\sqrt{r}$ to get

[TABLE]

This estimate requires only three derivatives in space of $u_{h}^{({\varepsilon})}$ but gives the lower rate 1/2. The conclusion of step 1) of the proof then becomes

[TABLE]

Completing the proof as in Section 2.2 then gives

[TABLE]

and optimizing with respect to ${\varepsilon}$ shows that $v(s,x)-v_{h}(s,x)\leq Ch^{1/6}$ . Note that there is no need for regularization of the coefficients and data since Itô is applied only once. In the case of smooth enough solutions, this approach cannot give a higher rate than $1/2$ .

3. Consequences on finite difference approximations

In this section, we outline the impact of the improved error bound for the control approximation on the achievable convergence order for numerical schemes, either by directly substituting the improved order (Section 3.1) or by applying adaptations of the steps here using higher order estimates (Section 3.2).

3.1. Improvement to Theorem 1.11 in [9]

Using the new bound for the control approximation from Section 2, one easily obtains a sharpening of the order from $1/39$ in [9, Theorem 1.11] and $1/21$ in [8, Theorem 5.4] to $1/15$ , which holds for local, monotone schemes of consistency order $1/2$ . Indeed, using Theorem 2.1 instead of [8, Theorem 2.3], the bound in the second inequality in the proof of [8, Theorem 5.4] (on top of page 14 in [8]) becomes

[TABLE]

where $\delta>0$ is the time discretization step used in [8] for the approximation scheme for the value function, $n$ the number of time intervals over which the policy is constant and $v_{\delta,1/n}$ is the obtained approximation of $v$ .222 Note that in Section 5 of [8], our $\delta$ above is denoted by $h^{2}$ . We introduce $\delta$ to avoid ambiguity with the parameter $h$ used in the previous sections of this paper (corresponding to $h=1/n$ in the present section).

Optimizing with respect to $\delta$ gives $n\sim\delta^{-4/15}$ and an estimate of order $1/15$ in $\delta$ .

Assuming order 1 consistency of the scheme used instead of order 1/2 as in [9, Theorem 1.11] and [8, Theorem 5.4], in conjunction with [9, Lemma 3.2], one gets

[TABLE]

and the rate improves further to $1/10$ .

3.2. Improvement to Theorem 5.7 in [8]

For a wide class of numerical schemes, similar modifications as those used to prove Theorem 2.1 can be performed to improve the error estimates given in [8, Theorem 5.7]. Following as much as possible the notation in [8], let us define for any $s\geq 0$ , $x\in{\mathbb{R}}^{d}$ , $a\in A$ the random variable

[TABLE]

where $\zeta$ is an ${\mathbb{R}}^{p}$ -valued random variable such that

[TABLE]

It is easy to check, by Taylor expansion, that for any smooth function $\phi$ the estimate in [8, Lemma 5.10] for the truncation error of the generator becomes

[TABLE]

for a constant $C$ depending only on $C_{1}$ and $C_{2}$ in assumptions (H2)–(H3) and the bounds on the derivatives $\partial^{m}_{t}D^{k}_{x}\phi$ for $2m+k\leq 4$ .

Observe that conditions (3.1) are slightly stronger than (5.4) in [8], who only assume accuracy of the moments to order $h^{{3/2}}$ instead of $h^{2}$ in (3.1), so that only order $1/2$ consistency results instead of order 1 above. However, the higher order assumptions are satisfied by very common schemes such as the classical semi-Lagrangian scheme [4, 5] corresponding to the choice

[TABLE]

The scheme considered in [8] is then recursively defined, for any $x\in{\mathbb{R}}^{d}$ , by

[TABLE]

Proceeding to a perturbation and regularization of $\hat{v}_{h}$ as in [8] (the notation follows the one in Section 2.2, i.e. $\hat{u}_{h}^{({\varepsilon})}$ is the mollification of $\hat{u}_{h}$ , the solution of the scheme with perturbed “shaken” coefficients) we get the inequality

[TABLE]

in $[0,T-h]\times{\mathbb{R}}^{d}$ for some constant $C$ depending only on $C_{0},C_{1}$ in assumptions (H2) and (H3). Arguing as in the proof of Theorem 2.1, one obtains

[TABLE]

Similarly, an upper bound of order $1/4$ for $v-\hat{v}_{h}$ can be obtained. This aligns the bounds for the scheme (3.2) with those obtained in [5] by PDE techniques.

4. Discussion and conclusions

In this short paper, we show a convergence rate of $1/4$ for piecewise constant control approximations to value functions of stochastic optimal control problems. This result is robust and holds for degenerate problems with non-smooth, merely Lipschitz continuous value functions. If the data and value function are smoother, we show that the approximation has rate 1 and explain why this is the maximal rate.

Our rate 1/4 in (2.1) improves both the order 1/6 in [8] and the rate 1/10 achieved in [3] by different (PDE) techniques. We also carefully explain why we can improve the result in [8]. It is an interesting open question if the same rate could be obtained purely by PDE techniques.

This work also opens up the possibility of improving the error estimates for other approximation schemes as outlined in Section 3. Moreover, it enables a purely probabilistic error analysis for semi-Lagrangian schemes for HJB equations with results that are in line with the best available results by PDE methods. We refer to [12] for the details.

Bibliography13

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] G. Barles and E.R. Jakobsen. On the convergence rate of approximation schemes for Hamilton-Jacobi-Bellman equations. M 2AN Math. Model. Numer. Anal. , 36:33–54, 2002.
2[2] G. Barles and E.R. Jakobsen. Error bounds for monotone approximation schemes for Hamilton-Jacobi-Bellman equations. SIAM J. Numer. Anal. , 43(2):540–558, 2005.
3[3] G. Barles and E.R. Jakobsen. Error bounds for monotone approximation schemes for parabolic Hamilton-Jacobi-Bellman equations. Math. Comput. , 74(260):1861–1893, 2007.
4[4] F. Camilli and M. Falcone. An approximation scheme for the optimal control of diffusion processes. RAIRO Modél. Math. Anal. Numér. , 29(1):97–122, 1995.
5[5] K. Debrabant and E.R. Jakobsen. Semi-Lagrangian schemes for linear and fully non-linear diffusion equations. Math. Comp. , 82(283):1433–1462, 2012.
6[6] R. Dumitrescu, C. Reisinger, and Y. Zhang. Approximation schemes for mixed optimal stopping and control problems with nonlinear expectations and jumps. ar Xiv preprint ar Xiv:1803.03794 , 2018.
7[7] I. Kossaczkỳ, M. Ehrhardt, and M. Günther. Modifications of the PCPT method for HJB equations. In AIP Conference Proceedings , volume 1773, page 030002, 2016.
8[8] N.V. Krylov. Approximating value functions for controlled degenerate diffusion processes by using piece-wise constant policies. Electron. J. Probab. , 4(2):1–19, 1999.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Improved order 1/4

Abstract.

1. Introduction

2. Main result

Theorem 2.1**.**

2.1. Background results and regularisation

Proposition 2.1**.**

Proof.

Proposition 2.2**.**

Proof.

Proposition 2.3**.**

Proof.

2.2. Proof of Theorem 2.1

2.3. The maximal rate and comparison with [8]

Proposition 2.4**.**

3. Consequences on finite difference approximations

3.1. Improvement to Theorem 1.11 in [9]

3.2. Improvement to Theorem 5.7 in [8]

4. Discussion and conclusions

Theorem 2.1.

Proposition 2.1.

Proposition 2.2.

Proposition 2.3.

Proposition 2.4.