Improved order 1/4 convergence for piecewise constant policy approximation of stochastic control problems
Espen R. Jakobsen, Athena Picarelli, Christoph Reisinger

TL;DR
This paper improves the theoretical error rate for approximating value functions in controlled diffusion processes using piecewise constant policies from 1/6 to 1/4, aligning with PDE literature standards.
Contribution
The authors refine existing proofs to establish an improved 1/4 convergence rate, demonstrating optimality and enhancing error estimates for stochastic control approximations.
Findings
Error rate improved from 1/6 to 1/4
Aligns stochastic control approximation with PDE results
Provides refined proof techniques for convergence analysis
Abstract
In N.V. Krylov, Approximating value functions for controlled degenerate diffusion processes by using piece-wise constant policies, Electron. J. Probab., 4(2), 1999, it is proved under standard assumptions that the value functions of controlled diffusion processes can be approximated with order 1/6 error by those with controls which are constant on uniform time intervals. In this note we refine the proof and show that the provable rate can be improved to 1/4, which is optimal in our setting. Moreover, we demonstrate the improvements this implies for error estimates derived by similar techniques for approximation schemes, bringing these in line with the best available results from the PDE literature.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Improved order 1/4
convergence for piecewise constant policy approximation of stochastic control problems
Espen R. Jakobsen
Department of Mathematical Sciences, Norwegian University of Science and Technology, 7491 Trondheim, N
,
Athena Picarelli
Department of Economics, University of Verona, via Cantarane 24, 37129 Verona, I
and
Christoph Reisinger
Mathematical Institute, University of Oxford, Andrew Wiles Building, OX2 6GG, Oxford, UK
Abstract.
In N. V. Krylov, Approximating value functions for controlled degenerate diffusion processes by using piece-wise constant policies, Electron. J. Probab., 4(2), 1999, it is proved under standard assumptions that the value functions of controlled diffusion processes can be approximated with order 1/6 error by those with controls which are constant on uniform time intervals. In this note we refine the proof and show that the provable rate can be improved to 1/4, which is optimal in our setting. Moreover, we demonstrate the improvements this implies for error estimates derived by similar techniques for approximation schemes, bringing these in line with the best available results from the PDE literature.
1. Introduction
In this paper we derive improved error estimates for approximations of value functions of stochastic optimal control problems. Let be a complete filtered probability space, a -dimensional -Wiener process on , and the set of progressively measurable processes with values in a set . For any , , (with ), let be the (controlled) Itô diffusion which satisfies
[TABLE]
Here we use the notation for any and function . For a given terminal cost function and running cost , the optimal control problem consists of maximizing over the expected total cost
[TABLE]
The indices on the expectation indicate that the law of the process depends on the starting point and control. Finally, the value function of the optimal control problem is defined by
[TABLE]
We consider the following set of assumptions:
- (H1)
is a compact set;
- (H2)
and are continuous functions. For , there exists such that for every :
[TABLE]
- (H3)
and are continuous functions. There exists such that for every :
[TABLE]
Observe that under assumptions (H1), (H2), and for any , there exists a unique strong solution of equation (1.1). For simplicity, we assume data and coefficients to be Lipschitz continuous in space and -Hölder continuous in time, and have included no discount factor, but it is not difficult to extend our results to include discounting and a lower Hölder regularity for and .
We aim to estimate the error introduced by approximating the set of measurable controls by piecewise constant controls. Let be the discretization parameter and the subset of of processes which are constant in the intervals for .111Note that in [8] the length of intervals is , however, in absence of further discretisations, we use for simplicity. The value function associated with this restricted set of controls is defined by
[TABLE]
Note that the definition of in (1.4) under the “shifted” dynamics in (1.2) and (1.1) implies that the control discretisation is always centered at . This will be important for establishing a dynamic programming principle. This is not, though, how one would compute in practice, as discussed in the penultimate paragraph of this section.
From a probabilistic perspective, it is clear that 0 is a lower bound for since . Under our assumptions, an upper bound on of order is given in [8].
An indication that the order 1/6 from [8] might be improved is the fact that under the same regularity assumptions as above it is shown in [5] that a fully discrete semi-Lagrangian scheme applied to the corresponding HJB equation has order 1/4 in the timestep for an Euler approximation. This scheme does not distinguish between constant or other controls over individual timesteps. It would therefore be somewhat surprising if the scheme which employs further approximations was closer to the original problem than the one which only holds the policies constant over timesteps.
A slightly different angle to the problem is provided in [3], where the authors construct from (1.4) a subsolution to the HJB equation corresponding to (1.3) by a second order local expansion in . This results in an order 1 error bound in the case of smooth solutions, in contrast to 1/2 which would be obtained in the smooth case by the method in [8] (see also Section 2.3 below). However, in the general non-regular case, the order in [3] is limited by a switching system approximation of order (for a switching cost chosen of order ), which, combined with an error term of the regularised system of order (for regularisation parameter ), results in an order error by optimisation of .
In this paper, we combine the advantages of both methods to obtain order 1/4. The reason we can improve the error estimates of Krylov is that we use a higher order expansion when we derive the truncation error. Our discussion (see Subsection 2.3) also shows that no further improvement can be obtained in this way: our new proof uses the maximal possible order of the truncation error.
Piecewise constant policy time stepping has been used in a numerical method for solving Hamilton-Jacobi-Bellman equations in [13], where the computational advantage comes from the fact that over the time intervals in which the policy is constant, only linear PDEs have to be solved. This has been extended to mixed optimal stopping and control problems with nonlinear expectations and jumps in [6]. A further benefit lies in the inherent parallelism so that the linear problems with different controls can be solved on parallel processors. A proof of convergence is given in these works using pure viscosity solution arguments, but no rate of convergence is provided. Early results on this type of approximation can be found in [10] and an extension with “predicted” controls is proposed in [7].
In the remainder of this article, we give in Section 2 a proof of the order 1/4 convergence of the piecewise constant policy approximation, and deduce the linear convergence in the case of sufficiently regular solutions and data. We then outline in Section 3 the improved orders which can be derived for approximation schemes by similar techniques.
2. Main result
We begin by stating the main result. Throughout the entire section we work under assumptions (H1)–(H3).
Theorem 2.1**.**
For any , , and , we have
[TABLE]
where the constant only depends on the constants in Assumptions (H2) and (H3).
A major difficulty in the proof of Theorem 2.1 is the fact that typically and are not smooth. Even in the non-degenerate case where is , is still not smooth in general. A simple example is the Black-Scholes-Barenblatt equation resulting from an uncertain volatility model (see [11]). Here, the control is of bang-bang type and the optimal control problem for piecewise constant policies reduces to taking the maximum of two smooth functions at the end of each time interval, so that for on the time mesh, will only be Lipschitz (in the spatial argument).
Since the proof of Theorem 2.1 relies on repeated use of the Itô formula, we need to work with smooth functions, both for the coefficients and value functions and . This means that we need to introduce several regularization arguments and use Krylov’s method of shaking the coefficients.
2.1. Background results and regularisation
In this section, we introduce Krylov’s regularization and give related preliminary results. Some of the proofs are given in [8] and not repeated here; see also [1, 2] for analogous results proved with PDE arguments. In order to apply Itô’s formula twice, , and must be regularized. Let and the mollifier be defined as
[TABLE]
where
[TABLE]
For any function , we define to be the mollification of a suitable extension of to
[TABLE]
We can always take an extension which preserves the Hölder continuity in time and Lipschitz continuity in space of . Then standard estimates for mollifiers imply that
[TABLE]
Let be the solution of (1.1) with coefficients replaced by and . Then we denote by and the solution and cost function of the optimal control problem (1.1)–(1.3) where is replaced by and by .
Proposition 2.1**.**
There exists such that for any
[TABLE]
Proof.
The result follows from the definitions of and since by standard continuous dependence results for SDEs and Lipschitz and Hölder continuity of ,
[TABLE]
for some constant independent of the control . ∎
To avoid heavy notation, we will use instead of in the rest of the paper, keeping in mind estimates (2.3) for their derivatives. We now proceed with the regularisation of the value function . Let be the set of progressively measurable processes with values in (where denotes the ball of radius in ) which are constant in each time interval . Letting , we define for any the following “perturbed” value function
[TABLE]
where is the solution of the following SDE with (mollified and) “shaken coefficients”:
[TABLE]
Proposition 2.2**.**
There exists a constant such that
[TABLE]
for any , and
[TABLE]
for any and . Moreover, for any , satisfies the following dynamic programming principle (DPP):
[TABLE]
Proof.
These are standard results. The first two inequalities can be found e.g. in [8, Corollary 3.2], while (2.6) is a consequence of [8, Lemma 3.3]. ∎
Following the notation introduced above we consider the regularised (mollified) function .
Proposition 2.3**.**
The function belongs to . There exists a constant such that
[TABLE]
for , and
[TABLE]
for any . Moreover, satisfies the following super-dynamic programming principle
[TABLE]
for any , , .
Proof.
The first part follows from Proposition 2.2 and (2.3), while (2.9) follows by the definitions of , , , and the inequality . See [8, bottom of page 9] for more details. Here constant over by a slight abuse of notation. ∎
2.2. Proof of Theorem 2.1
1) Upper bound on . By two applications of the Itô (or Dynkin) formula,
[TABLE]
for , , , where the generator of the diffusion process is defined as
[TABLE]
Inserting this equality into the dynamic programming inequality (2.9) in Proposition 2.3, applying Itô once to the -term, and dividing by , we find that
[TABLE]
Since the leading term is a sum of terms of the form with and , by (2.3) and (2.8),
[TABLE]
2) Upper bound on for . Let , , and . By Itô’s formula and part 1),
[TABLE]
From (2.7) in Proposition 2.3 and the first part of Proposition 2.2, it then follows that
[TABLE]
for a generic constant . Since by definition (2.4) and the regularity of (Proposition 2.2),
[TABLE]
we conclude that
[TABLE]
Since was arbitrary, by the definition of (see just before Proposition 2.1),
[TABLE]
3) Upper bound on for . By the definition of (see just before Proposition 2.1), Itô’s formula, the regularity of and , and using (2.3), there is a constant such that for every and ,
[TABLE]
Then it follows from the definitions of and that
[TABLE]
and hence also for .
4) Conclusion: Using Proposition 2.1 and parts 2) and 3), we have that
[TABLE]
for and . Taking then concludes the proof of the right-hand inequality in (2.1). The left-hand inequality is immediate since .
2.3. The maximal rate and comparison with [8]
If the data and value functions are smooth enough, we can adapt the proof of Theorem 2.1 to obtain the maximal rate of the approximation, which is 1. More specifically, if we assume and sufficiently smooth, we have in (2.10) with independent of . Therefore, instead of (2.11), the conclusion of step 1) in the previous proof gives
[TABLE]
for some constant independent of and . Moreover, if we assume that , and are Lipschitz in uniformly in and , and belongs to , then by standard results will be Lipschitz in . Hence, we find in step 2) that
[TABLE]
Sending to zero then gives that converges to , and we have the following result:
Proposition 2.4**.**
Additionally to assumptions (H1)-(H3), let and be Lipschitz continuous in uniformly with respect to and , and . If , then there exists such that for any , , and , we have
[TABLE]
This is the maximal rate that this approximation can reach. The reason is that the order obtained by applying Itô twice in step 1) of the proof cannot be improved. This can easily be checked by repeatedly applying Itô to obtain higher order error expansions and then noting that all such expansions contain terms of order .
Step 1) of the proof also explains why Krylov in [8] got a less sharp result than ours. After one application of Itô, he used the moment bound to get
[TABLE]
This estimate requires only three derivatives in space of but gives the lower rate 1/2. The conclusion of step 1) of the proof then becomes
[TABLE]
Completing the proof as in Section 2.2 then gives
[TABLE]
and optimizing with respect to shows that . Note that there is no need for regularization of the coefficients and data since Itô is applied only once. In the case of smooth enough solutions, this approach cannot give a higher rate than .
3. Consequences on finite difference approximations
In this section, we outline the impact of the improved error bound for the control approximation on the achievable convergence order for numerical schemes, either by directly substituting the improved order (Section 3.1) or by applying adaptations of the steps here using higher order estimates (Section 3.2).
3.1. Improvement to Theorem 1.11 in [9]
Using the new bound for the control approximation from Section 2, one easily obtains a sharpening of the order from in [9, Theorem 1.11] and in [8, Theorem 5.4] to , which holds for local, monotone schemes of consistency order . Indeed, using Theorem 2.1 instead of [8, Theorem 2.3], the bound in the second inequality in the proof of [8, Theorem 5.4] (on top of page 14 in [8]) becomes
[TABLE]
where is the time discretization step used in [8] for the approximation scheme for the value function, the number of time intervals over which the policy is constant and is the obtained approximation of .222 Note that in Section 5 of [8], our above is denoted by . We introduce to avoid ambiguity with the parameter used in the previous sections of this paper (corresponding to in the present section).
Optimizing with respect to gives and an estimate of order in .
Assuming order 1 consistency of the scheme used instead of order 1/2 as in [9, Theorem 1.11] and [8, Theorem 5.4], in conjunction with [9, Lemma 3.2], one gets
[TABLE]
and the rate improves further to .
3.2. Improvement to Theorem 5.7 in [8]
For a wide class of numerical schemes, similar modifications as those used to prove Theorem 2.1 can be performed to improve the error estimates given in [8, Theorem 5.7]. Following as much as possible the notation in [8], let us define for any , , the random variable
[TABLE]
where is an -valued random variable such that
[TABLE]
It is easy to check, by Taylor expansion, that for any smooth function the estimate in [8, Lemma 5.10] for the truncation error of the generator becomes
[TABLE]
for a constant depending only on and in assumptions (H2)–(H3) and the bounds on the derivatives for .
Observe that conditions (3.1) are slightly stronger than (5.4) in [8], who only assume accuracy of the moments to order instead of in (3.1), so that only order consistency results instead of order 1 above. However, the higher order assumptions are satisfied by very common schemes such as the classical semi-Lagrangian scheme [4, 5] corresponding to the choice
[TABLE]
The scheme considered in [8] is then recursively defined, for any , by
[TABLE]
Proceeding to a perturbation and regularization of as in [8] (the notation follows the one in Section 2.2, i.e. is the mollification of , the solution of the scheme with perturbed “shaken” coefficients) we get the inequality
[TABLE]
in for some constant depending only on in assumptions (H2) and (H3). Arguing as in the proof of Theorem 2.1, one obtains
[TABLE]
Similarly, an upper bound of order for can be obtained. This aligns the bounds for the scheme (3.2) with those obtained in [5] by PDE techniques.
4. Discussion and conclusions
In this short paper, we show a convergence rate of for piecewise constant control approximations to value functions of stochastic optimal control problems. This result is robust and holds for degenerate problems with non-smooth, merely Lipschitz continuous value functions. If the data and value function are smoother, we show that the approximation has rate 1 and explain why this is the maximal rate.
Our rate 1/4 in (2.1) improves both the order 1/6 in [8] and the rate 1/10 achieved in [3] by different (PDE) techniques. We also carefully explain why we can improve the result in [8]. It is an interesting open question if the same rate could be obtained purely by PDE techniques.
This work also opens up the possibility of improving the error estimates for other approximation schemes as outlined in Section 3. Moreover, it enables a purely probabilistic error analysis for semi-Lagrangian schemes for HJB equations with results that are in line with the best available results by PDE methods. We refer to [12] for the details.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] G. Barles and E.R. Jakobsen. On the convergence rate of approximation schemes for Hamilton-Jacobi-Bellman equations. M 2AN Math. Model. Numer. Anal. , 36:33–54, 2002.
- 2[2] G. Barles and E.R. Jakobsen. Error bounds for monotone approximation schemes for Hamilton-Jacobi-Bellman equations. SIAM J. Numer. Anal. , 43(2):540–558, 2005.
- 3[3] G. Barles and E.R. Jakobsen. Error bounds for monotone approximation schemes for parabolic Hamilton-Jacobi-Bellman equations. Math. Comput. , 74(260):1861–1893, 2007.
- 4[4] F. Camilli and M. Falcone. An approximation scheme for the optimal control of diffusion processes. RAIRO Modél. Math. Anal. Numér. , 29(1):97–122, 1995.
- 5[5] K. Debrabant and E.R. Jakobsen. Semi-Lagrangian schemes for linear and fully non-linear diffusion equations. Math. Comp. , 82(283):1433–1462, 2012.
- 6[6] R. Dumitrescu, C. Reisinger, and Y. Zhang. Approximation schemes for mixed optimal stopping and control problems with nonlinear expectations and jumps. ar Xiv preprint ar Xiv:1803.03794 , 2018.
- 7[7] I. Kossaczkỳ, M. Ehrhardt, and M. Günther. Modifications of the PCPT method for HJB equations. In AIP Conference Proceedings , volume 1773, page 030002, 2016.
- 8[8] N.V. Krylov. Approximating value functions for controlled degenerate diffusion processes by using piece-wise constant policies. Electron. J. Probab. , 4(2):1–19, 1999.
