Optimal approximation of stochastic integrals in analytic noise model

Andrzej Ka{\l}u\.za; Pawe{\l} M. Morkisz; Pawe{\l} Przyby{\l}owicz

arXiv:1812.10708·math.NA·October 6, 2020·Appl. Math. Comput.

Optimal approximation of stochastic integrals in analytic noise model

Andrzej Ka{\l}u\.za, Pawe{\l} M. Morkisz, Pawe{\l} Przyby{\l}owicz

PDF

TL;DR

This paper investigates the optimal approximation of stochastic Itô integrals under an analytic noise model, establishing error bounds for noisy evaluations and demonstrating the impact of low-precision computations on accuracy and performance.

Contribution

It introduces a new analytic noise model for stochastic integration and derives error bounds for approximation algorithms considering low-precision evaluations.

Findings

01

Error bounds are proportional to $n^{- ho} + ext{precision parameters}$.

02

Any algorithm with at most $n$ evaluations has a lower bound error of $C(n^{- ho} + ext{precision})$.

03

Numerical experiments confirm theoretical error bounds and compare CPU and GPU performance.

Abstract

We study approximate stochastic It\^o integration of processes belonging to a class of progressively measurable stochastic processes that are H\"older continuous in the $r$ th mean. Inspired by increasingly popularity of computations with low precision (used on Graphics Processing Units -- GPUs and standard Computer Processing Units -- CPU for significant speedup), we introduce a suitable analytic noise model of standard noisy information about $X$ and $W$ . In this model we show that the upper bounds on the error of the Riemann-Maruyama quadrature are proportional to $n^{- ϱ} + δ_{1} + δ_{2}$ , where $n$ is a number of noisy evaluations of $X$ and $W$ , $ϱ \in (0, 1]$ is a H\"older exponent of $X$ , and $δ_{1}, δ_{2} \geq 0$ are precision parameters for values of $X$ and $W$ , respectively. Moreover, we show that the error of any algorithm based on at most $n$ noisy…

Figures25

Click any figure to enlarge with its caption.

Equations187

I (X, W) = 0 \int T X (t) d W (t),

I (X, W) = 0 \int T X (t) d W (t),

L = \frac{\partial}{\partial t} + \frac{1}{2} \frac{\partial ^{2}}{\partial y ^{2}} .

L = \frac{\partial}{\partial t} + \frac{1}{2} \frac{\partial ^{2}}{\partial y ^{2}} .

\displaystyle F^{\varrho,r,q}_{L}=\{X:[0,T]\times\Omega\to\mathbb{R}\ |\

\displaystyle F^{\varrho,r,q}_{L}=\{X:[0,T]\times\Omega\to\mathbb{R}\ |\

\displaystyle\Bigl{\|}\sup\limits_{t\in[0,T]}|X(t)|\Bigl{\|}_{q}\leq L,

∥ X (t) - X (z) ∥_{r} \leq L ∣ t - z ∣^{ϱ}, t, z \in [0, T]} .

K^{1} =

K^{1} =

∣ p (t, y) ∣ \leq 1 + ∣ y ∣ for all t \in [0, T], y \in R} .

K_{s}^{2} =

K_{s}^{2} =

\displaystyle\;\max\Bigl{\{}\Bigl{|}\frac{\partial p}{\partial t}(t,y)\Bigl{|},\Bigl{|}\frac{\partial p}{\partial y}(t,y)\Bigl{|},\Bigl{|}\frac{\partial^{2}p}{\partial y^{2}}(t,y)\Bigl{|}\Bigr{\}}\leq 1+|y|^{s}\ \hbox{for all}\ t\in[0,T],y\in\mathbb{R}\Bigr{\}},

\overset{ˉ}{K}_{s}^{2} =

\overset{ˉ}{K}_{s}^{2} =

\displaystyle\;\max\Bigl{\{}\Bigl{|}\frac{\partial p}{\partial t}(t,y)\Bigl{|},\Bigl{|}\frac{\partial p}{\partial y}(t,y)\Bigl{|}\Bigr{\}}\leq 1+|y|^{s}\ \hbox{for all}\ t\in[0,T],y\in\mathbb{R}\Bigr{\}},

K_{α, β}^{3} =

K_{α, β}^{3} =

for all t, z \in [0, T], x, y \in R}

V_{X} (δ_{1}) = {\tilde{X} ∣ \exists_{p_{X} \in K^{1}} : \forall_{(t, ω) \in [0, T] \times Ω} \tilde{X} (t, ω) = X (t, ω) + δ_{1} \cdot p_{X} (t, X (t, ω))},

V_{X} (δ_{1}) = {\tilde{X} ∣ \exists_{p_{X} \in K^{1}} : \forall_{(t, ω) \in [0, T] \times Ω} \tilde{X} (t, ω) = X (t, ω) + δ_{1} \cdot p_{X} (t, X (t, ω))},

W_{s} (δ_{2}) = {\tilde{W} ∣ \exists_{p_{W} \in K_{s}^{2}} : \forall_{(t, ω) \in [0, T] \times Ω} \tilde{W} (t, ω) = W (t, ω) + δ_{2} \cdot p_{W} (t, W (t, ω))},

W_{s} (δ_{2}) = {\tilde{W} ∣ \exists_{p_{W} \in K_{s}^{2}} : \forall_{(t, ω) \in [0, T] \times Ω} \tilde{W} (t, ω) = W (t, ω) + δ_{2} \cdot p_{W} (t, W (t, ω))},

\overset{ˉ}{W}_{s} (δ_{2}) = {\tilde{W} ∣ \exists_{p_{W} \in \overset{ˉ}{K}_{s}^{2}} : \forall_{(t, ω) \in [0, T] \times Ω} \tilde{W} (t, ω) = W (t, ω) + δ_{2} \cdot p_{W} (t, W (t, ω))},

\overset{ˉ}{W}_{s} (δ_{2}) = {\tilde{W} ∣ \exists_{p_{W} \in \overset{ˉ}{K}_{s}^{2}} : \forall_{(t, ω) \in [0, T] \times Ω} \tilde{W} (t, ω) = W (t, ω) + δ_{2} \cdot p_{W} (t, W (t, ω))},

W_{α, β} (δ_{2}) = {\tilde{W} ∣ \exists_{p_{W} \in K_{α, β}^{3}} : \forall_{(t, ω) \in [0, T] \times Ω} \tilde{W} (t, ω) = W (t, ω) + δ_{2} \cdot p_{W} (t, W (t, ω))} .

W_{α, β} (δ_{2}) = {\tilde{W} ∣ \exists_{p_{W} \in K_{α, β}^{3}} : \forall_{(t, ω) \in [0, T] \times Ω} \tilde{W} (t, ω) = W (t, ω) + δ_{2} \cdot p_{W} (t, W (t, ω))} .

N (\tilde{X}, \tilde{W}) = [\tilde{X} (t_{0}), \tilde{X} (t_{1}), \dots, \tilde{X} (t_{i_{1} - 1}), \tilde{W} (z_{0}), \tilde{W} (z_{1}), \dots, \tilde{W} (z_{i_{2} - 1})],

N (\tilde{X}, \tilde{W}) = [\tilde{X} (t_{0}), \tilde{X} (t_{1}), \dots, \tilde{X} (t_{i_{1} - 1}), \tilde{W} (z_{0}), \tilde{W} (z_{1}), \dots, \tilde{W} (z_{i_{2} - 1})],

A (\tilde{X}, \tilde{W}, δ_{1}, δ_{2}) = φ (N (\tilde{X}, \tilde{W})),

A (\tilde{X}, \tilde{W}, δ_{1}, δ_{2}) = φ (N (\tilde{X}, \tilde{W})),

φ : R^{i_{1} + i_{2}} \to R,

φ : R^{i_{1} + i_{2}} \to R,

e^{(r)} (A, X, W, δ_{1}, δ_{2}) = (\tilde{X}, \tilde{W}) \in V_{X} (δ_{1}) \times W (δ_{2}) sup ∥ I (X, W) - A (\tilde{X}, \tilde{W}, δ_{1}, δ_{2}) ∥_{r},

e^{(r)} (A, X, W, δ_{1}, δ_{2}) = (\tilde{X}, \tilde{W}) \in V_{X} (δ_{1}) \times W (δ_{2}) sup ∥ I (X, W) - A (\tilde{X}, \tilde{W}, δ_{1}, δ_{2}) ∥_{r},

e^{(r)} (A, G, W, δ_{1}, δ_{2}) = X \in G sup e^{(r)} (A, X, W, δ_{1}, δ_{2})

e^{(r)} (A, G, W, δ_{1}, δ_{2}) = X \in G sup e^{(r)} (A, X, W, δ_{1}, δ_{2})

e_{n}^{(r)} (G, W, δ_{1}, δ_{2}) = A \in Φ_{n} in f e^{(r)} (A, G, W, δ_{1}, δ_{2}) .

e_{n}^{(r)} (G, W, δ_{1}, δ_{2}) = A \in Φ_{n} in f e^{(r)} (A, G, W, δ_{1}, δ_{2}) .

0 = t_{0} < t_{1} < \dots < t_{n} = T,

0 = t_{0} < t_{1} < \dots < t_{n} = T,

A_{n}^{R M} (\tilde{X}, \tilde{W}) = i = 0 \sum n - 1 \tilde{X} (t_{i}) \cdot (\tilde{W} (t_{i + 1}) - \tilde{W} (t_{i})),

A_{n}^{R M} (\tilde{X}, \tilde{W}) = i = 0 \sum n - 1 \tilde{X} (t_{i}) \cdot (\tilde{W} (t_{i + 1}) - \tilde{W} (t_{i})),

∥ I (X, W) - A_{n}^{R M} (\tilde{X}, \tilde{W}) ∥_{r} \leq C (0 \leq i \leq n - 1 max (Δ t_{i})^{ϱ} + δ_{1} + δ_{2} + δ_{1} \cdot δ_{2}) .

∥ I (X, W) - A_{n}^{R M} (\tilde{X}, \tilde{W}) ∥_{r} \leq C (0 \leq i \leq n - 1 max (Δ t_{i})^{ϱ} + δ_{1} + δ_{2} + δ_{1} \cdot δ_{2}) .

∥ I (X, W) - A_{n}^{R M} (\tilde{X}, \tilde{W}) ∥_{r}

∥ I (X, W) - A_{n}^{R M} (\tilde{X}, \tilde{W}) ∥_{r}

∥ I (X, W) - A_{n}^{R M} (\tilde{X}, \tilde{W}) ∥_{r}

∥ I (X, W) - A_{n}^{R M} (\tilde{X}, \tilde{W}) ∥_{r}

Z (t) = M (t) + V (t), t \in [0, T],

Z (t) = M (t) + V (t), t \in [0, T],

V (t) = 0 \int t L p_{W} (z, W (z)) d z,

V (t) = 0 \int t L p_{W} (z, W (z)) d z,

M (t) = 0 \int t \frac{\partial p _{W}}{\partial y} (z, W (z)) d W (z) .

M (t) = 0 \int t \frac{\partial p _{W}}{\partial y} (z, W (z)) d W (z) .

Δ Y_{i} = Y (t_{i + 1}) - Y (t_{i}) i = 0, 1 \dots, n - 1,

Δ Y_{i} = Y (t_{i + 1}) - Y (t_{i}) i = 0, 1 \dots, n - 1,

\hat{X}_{n} (t) = i = 0 \sum n - 1 X (t_{i}) \cdot 1_{(t_{i}, t_{i + 1}]} (t),

\hat{X}_{n} (t) = i = 0 \sum n - 1 X (t_{i}) \cdot 1_{(t_{i}, t_{i + 1}]} (t),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Optimal approximation of stochastic integrals in analytic noise model

Andrzej Kałuża

AGH University of Science and Technology, Faculty of Applied Mathematics, Al. A. Mickiewicza 30, 30-059 Kraków, Poland

[email protected]

,

Paweł M. Morkisz

AGH University of Science and Technology, Faculty of Applied Mathematics, Al. A. Mickiewicza 30, 30-059 Kraków, Poland

[email protected]

and

Paweł Przybyłowicz

AGH University of Science and Technology, Faculty of Applied Mathematics, Al. A. Mickiewicza 30, 30-059 Kraków, Poland

[email protected], corresponding author

Abstract.

We study approximate stochastic Itô integration of processes belonging to a class of progressively measurable stochastic processes that are Hölder continuous in the $r$ th mean.

Inspired by increasingly popularity of computations with low precision (used on Graphics Processing Units – GPUs and standard Computer Processing Units – CPU for significant speedup), we introduce a suitable analytic noise model of standard noisy information about $X$ and $W$ . In this model we show that the upper bounds on the error of the Riemann-Maruyama quadrature are proportional to $n^{-\varrho}+\delta_{1}+\delta_{2}$ , where $n$ is a number of noisy evaluations of $X$ and $W$ , $\varrho\in(0,1]$ is a Hölder exponent of $X$ , and $\delta_{1},\delta_{2}\geq 0$ are precision parameters for values of $X$ and $W$ , respectively. Moreover, we show that the error of any algorithm based on at most $n$ noisy evaluations of $X$ and $W$ is at least $C(n^{-\varrho}+\delta_{1})$ . Finally, we report numerical experiments performed on both CPU and GPU, that confirm our theoretical findings, together with some computational performance comparison between those two architectures.

Key words: Wiener process, noisy information, analytic noise model, optimal approximation, minimal error, GPU

Mathematics Subject Classification: 68Q25, 65C30.

1. Introduction

In this paper we investigate the problem of optimal approximation of stochastic integrals of the following form

[TABLE]

where $T>0$ and $W=\{W(t)\}_{t\geq 0}$ is a one-dimensional Wiener process on some probability space $(\Omega,\Sigma,\mathbb{P})$ , and we consider integrands $X=\{X(t)\}_{t\in[0,T]}$ from a class of progressively measurable stochastic processes that are Hölder continuous in the $r$ th mean. Such quadrature problems arise, for example, in the context of numerical solutions of stochastic differential equations, see Section 4.4 in [13] and [2]. Since the exact values of such stochastic integrals are known only in limited cases, an efficient approximation of $\mathcal{I}(X,W)$ with the error as small as possible is of interest. We are aiming at methods that are based only on discrete values of $X$ and $W$ which are, additionally, corrupted with some noise.

The problem of optimal approximation of stochastic Itô integrals under exact information about $X$ and $W$ has been well studied in the literature, see, [6], [7], [8], [21], [22], [23], [28]. Less explored is the approximation of stochastic integrals in the case when values of $X$ and $W$ are corrupted with some noise. The noise may arise from measurement errors, previous computations or simply the floating point numbers representation errors. There is a trend observed in deep learning, where the computations are mostly conducted in lower precision, i.e. using not only single but also half precision. Exemplary performance for NVIDIA V100 graphic card is about 7 TFLOPS for double precision and 14 TFLOPS for single precision. For deep learning purposes, using the nature of the operations and also further lowering the precision to half precision in chosen operations, enabled obtaining up to 112 TFLOPS of performance [16]. Hence, it is a huge motivation for analyzing how lowering the precision for other than deep learning applications, influence the computed result.

There are many results on solving problems under noisy information, including the problems such as integrating or approximating regular functions (see, e.g., [9, 19, 10]), $L_{p}$ approximation of piecewise regular functions (see [17]), approximate solving of IVPs (see [11]) or PDEs (see [29, 30]).

In this paper we study noisy information for stochastic Itô integration. According to our best knowledge this is the first paper that deals with noisy information for stochastic processes and its application to numerical computation of Itô integrals. In this sense this paper can be seen as the extension of the model presented in [18] in the context of SDEs. However, in that paper only drift and diffusion coefficients were corrupted with noise, information about the Wiener process was exact.

We use the Information–Based Complexity framework (see [19] and [27]). We assume that the algorithms may use only noisy standard information about the integrand $X$ and the Wiener process $W$ . Namely, let $\delta_{1},\delta_{2}\geq 0$ be the precision levels corresponding to the processes $X$ and $W$ , respectively. (The case of $\delta_{1}=\delta_{2}=0$ corresponds to the exact information.) Available standard information about each coefficient consists of noisy evaluations of $X$ and $W$ of points $t_{i},z_{i}\in[0,T]$ . This means that, for example, for the Wiener process and for a given $z_{i}\in[0,T]$ an evaluation returns $\tilde{W}(z_{i})$ such that $|W(z_{i})-\tilde{W}(z_{i})|\leq\delta_{2}(1+|W(z_{i})|^{s})$ for some $s\geq 0$ . In the context of computations performed on GPUs, this can be interpreted as the standard relative error. (Detailed description of the noisy information is given in Section 2.) From the reasons that become clear in Section 2 we refer to the model presented as to analytical noise model.

The error of an algorithm is measured in the $r$ -th mean maximized over the class of input data $X$ and over all permissible information about $(X,W)$ with the given precisions $\delta_{1},\delta_{2}\geq 0$ . In the model we show that the upper bounds on the error of the Riemann-Maruyama quadrature are proportional to $n^{-\varrho}+\delta_{1}+\delta_{2}$ , where $n$ is a number of noisy evaluations of $X$ and $W$ , $\varrho\in(0,1]$ is a Hölder exponent of $X$ , and $\delta_{1},\delta_{2}\geq 0$ are precision parameters for values of $X$ and $W$ , respectively. Moreover, we show that the error of any algorithm based on at most $n$ noisy evaluations of $X$ and $W$ is at least $C(n^{-\varrho}+\delta_{1})$ . We also present numerical experiments that confirm our theoretical findings. As we perform the similar operations for multiple trajectories of considered processes, the proposed algorithm is highly parallel. Hence, the usage of GPU for computation acceleration is very natural. There is a vast list of problems where employing GPUs gives significant speedups [26], including matrix operations [3, 14], bioinformatics [15] or solving ordinary or random differential equations [24, 25].

The paper is organized as follows. Section 2 contains basic notion and definitions. The Riemann-Maruyama quadrature rule $\mathcal{A}_{n}^{RM}$ for the perturbed information and upper estimate on its error (Theorem 1) are presented in Section 3. Section 4 consists of some lower bounds (Lemma 1, Proposition 1). This leads to the conclusion that the algorithm $\mathcal{A}_{n}^{RM}$ is optimal (Theorem 2). Section 5 reports numerical experiments performed for the GPU implementation of the algorithm.

2. Preliminaries

Denote $\mathbb{N}=\{1,2,\ldots\}$ . Let $W=\{W(t)\}_{t\geq 0}$ be the standard, one-dimensional Wiener process defined on a complete probability space $(\Omega,\Sigma,\mathbb{P})$ . By $\{\Sigma_{t}\}_{t\geq 0}$ we denote a filtration, satisfying the usual conditions, such that $W$ is a Wiener process with respect to $\{\Sigma_{t}\}_{t\geq 0}$ .

For a random variable $Y:\Omega\to\mathbb{R}$ we write $\|Y\|_{q}=(\mathbb{E}|Y|^{q})^{1/q}$ , $q\geq 2$ . Moreover, by $\mathcal{L}$ we mean the following differential operator

[TABLE]

For $r,q\in[2,+\infty)$ , $q\geq r$ , $L\geq 0$ , $\varrho\in(0,1]$ we consider the following class $F^{\varrho,r,q}_{L}$ of stochastic processes $X=\{X(t)\}_{t\in[0,T]}$

[TABLE]

Let us recall that by Theorem 33 from Chapter IV in [1] for $X$ being $\{\Sigma_{t}\}_{t\geq 0}$ -progressively measurable, the process $\{\sup\limits_{0\leq z\leq t}|X(z)|\}_{t\in[0,T]}$ is also $\{\Sigma_{t}\}_{t\geq 0}$ -progressively measurable. Hence, $\sup\limits_{t\in[0,T]}|X(t)|$ is ( $\Sigma_{T}$ -measurable) random variable. Moreover, the processes from the class $F^{\varrho,r,q}_{L}$ are Itô integrable, see, for example, [5] and [12].

The numbers $r,q,L,\varrho,T$ will be called parameters of the class $F^{\varrho,r,q}_{L}$ . Except for $T$ the parameters are not known and the algorithm presented later on will not use them as input parameters.

In order to define suitable model of computation under inexact information about $X$ , $W$ we need to introduce the following auxiliary classes.

Let

[TABLE]

For $s\in[0,+\infty)$ we define

[TABLE]

(if $s=0$ then we set $|y|^{s}:=0$ for all $y\in\mathbb{R}$ ), and for $\alpha,\beta\in(0,1]$

[TABLE]

We have that $\mathcal{K}^{2}_{0}\subset\mathcal{K}^{3}_{1,1}$ . In the sequel, the classes above will allow us to to model, at least in some sense, the influence of the regularity of the noise on the error bound.

For $\delta_{1},\delta_{2}\geq 0$ we define

[TABLE]

and the classes of disturbed Wiener process

[TABLE]

Since we impose some functional structure on corrupting functions, we refer to this model as to analytic noise model. (See also Remark 2 for possible alternative approach.)

For $X\in F^{\varrho,r,q}_{L}$ let $\tilde{X}\in V_{X}(\delta_{1})$ and $\tilde{W}\in\mathcal{W}(\delta_{2})$ where $\mathcal{W}\in\{\mathcal{W}_{s},\mathcal{W}_{\alpha,\beta}\}$ . We assume that the algorithm is based on discrete noisy information about $X$ and $W$ . Hence, a vector of noisy information has the following form

[TABLE]

where $i_{1},i_{2}\in\mathbb{N}$ . Moreover, $t_{0},t_{1},\ldots,t_{i_{1}-1}\in[0,T]$ and $z_{0},z_{1},\ldots,z_{i_{2}-1}\in[0,T]$ are given time points. Hence, the information is nonadaptive (see [6] and [27] for more discussion on adaptive and nonadaptive information). We assume that $t_{i}\neq t_{j},z_{i}\neq z_{j}$ for all $i\neq j$ . The total number of (noisy) evaluations of $X$ and $W$ is $l=i_{1}+i_{2}$ .

An algorithm $\mathcal{A}$ using $\mathcal{N}(\tilde{X},\tilde{W})$ , that approximates $\mathcal{I}(X,W)$ , is of the form

[TABLE]

where

[TABLE]

is a Borel measurable mapping.

For a given $n\in\mathbb{N}$ we denote by $\Phi_{n}$ a class of all algorithms of the form (8) for which the total number of evaluations $l$ is at most $n$ .

For a fixed $X\in F^{\varrho,r,q}_{L}$ the error of $\mathcal{A}\in\Phi_{n}$ is defined as

[TABLE]

where $\mathcal{W}\in\{\mathcal{W}_{s},\mathcal{\bar{W}}_{s},\mathcal{W}_{\alpha,\beta}\}$ . The worst case error of $\mathcal{A}$ in $\mathcal{G}$ is given by

[TABLE]

where $\mathcal{G}$ is a subclass of $F^{\varrho,r,q}_{L}$ . Finally, the $n$ th minimal error is defined as

[TABLE]

The aim is to develop an optimal algorithm and its efficient implementation by using GPUs.

Unless otherwise stated, all constants appearing in this paper (including those in the ’ $\mathcal{O}$ ’, ’ $\Omega$ ’, and ’ $\Theta$ ’ notation) will only depend on the parameters of the respective classes. Furthermore, the same symbol may be used for different constants.

3. The Riemann-Maruyama quadrature for noisy information

Let $n\in\mathbb{N}$ and

[TABLE]

be an arbitrary discretization on $[0,T]$ . We denote by $\Delta t_{i}=t_{i+1}-t_{i}$ , $i=0,1,\ldots,n-1$ . We define the Riemann-Maruyama quadrature that use noisy evaluations of $X$ and $W$ by

[TABLE]

where $(\tilde{X},\tilde{W})\in V_{X}(\delta_{1})\times\mathcal{W}(\delta_{2})$ for $\mathcal{W}\in\{\mathcal{W}_{s},\mathcal{\bar{W}}_{s},\mathcal{W}_{\alpha,\beta}\}$ . It is easy to see that the information cost of computing $\mathcal{A}^{RM}_{n}(\tilde{X},\tilde{W})$ is $2n$ noisy evaluations of $X$ and $W$ . The combinatory cost consists of $\mathcal{O}(n)$ arithmetic operations.

The aim of this section it to prove the following result.

Theorem 1.

Let us assume that $\varrho\in(0,1]$ and $r\geq 2$ .

(i)

Let $s\geq 0$ and $q\in(r,+\infty)$ . There exists a positive constant $C$ , depending only on the parameters of the class $F^{\varrho,r,q}_{L}$ and $s$ , such that for all $n\in\mathbb{N}$ , $\delta_{1},\delta_{2}\geq 0$ , $X\in F^{\varrho,r,q}_{L}$ , $(\tilde{X},\tilde{W})\in V_{X}(\delta_{1})\times\mathcal{W}_{s}(\delta_{2})$ it holds

[TABLE]

(ii)

Let $s\geq 0$ and $q\in(r,+\infty)$ . There exists a positive constant $C$ , depending only on the parameters of the class $F^{\varrho,r,q}_{L}$ and $s$ , such that for all $n\in\mathbb{N}$ , $\delta_{1},\delta_{2}\geq 0$ , $X\in F^{\varrho,r,q}_{L}$ , $(\tilde{X},\tilde{W})\in V_{X}(\delta_{1})\times\mathcal{\bar{W}}_{s}(\delta_{2})$ it holds

[TABLE]

(iii)

Let $\alpha,\beta\in(0,1]$ and $q\in[r,+\infty)$ . There exists a positive constant $C$ , depending only on the parameters of the class $F^{\varrho,r,q}_{L}$ and $\alpha$ , $\beta$ , such that for all $n\in\mathbb{N}$ , $\delta_{1},\delta_{2}\geq 0$ , $X\in F^{\varrho,r,q}_{L}$ , $(\tilde{X},\tilde{W})\in V_{X}(\delta_{1})\times\mathcal{W}_{\alpha,\beta}(\delta_{2})$ it holds

[TABLE]

**Proof. **Let $\tilde{X}\in V_{X}(\delta_{1})$ . We first show (15), where $\tilde{W}\in\mathcal{W}_{s}(\delta_{2})$ . Let the process $Z=\{Z(t)\}_{t\in[0,T]}$ be defined as $Z(t)=p_{W}(t,W(t))$ . Then, by the Itô formula we get that

[TABLE]

where

[TABLE]

We stress that $\{V(t)\}_{t\in[0,T]}$ is continuous process with bounded variation, while $\{M(t)\}_{t\in[0,T]}$ is continuous martingale with respect to the filtration $\{\Sigma_{t}\}_{t\geq 0}$ . Hence, the process $Z$ is continuous semimartingale.

We denote by

[TABLE]

for $Y\in\{W,Z\}$ and for all $t\in[0,T]$

[TABLE]

Note that $\{X_{n}(t)\}_{t\in[0,T]}$ and $\{p_{X,n}(t)\}_{t\in[0,T]}$ are $\{\Sigma_{t}\}_{t\geq 0}$ -progressively measurable simple processes. Since $Z$ and $W$ are continuous semimartingales, by Property (v) at page 110 in [5] we can write the algorithm $\mathcal{A}^{RM}_{n}$ as follows

[TABLE]

We thus obtain

[TABLE]

where

[TABLE]

By the Burkholder inequality and the Hölder continuity of $X$ in $r$ th mean we get

[TABLE]

and

[TABLE]

since $p_{X}$ is of at most linear growth.

Since $Z=M+V$ , from Definition 5.7 at page 109 in [5] we obtain

[TABLE]

where

[TABLE]

Note that

[TABLE]

and

[TABLE]

since $\frac{\partial p_{W}}{\partial y}$ and $\mathcal{L}p_{W}(t,y)$ are of at most linear growth. (The constants $C_{6},C_{7}$ depend only on $T$ , $s$ , $r$ , and $q$ .) Hence, by the associativity property (see, for example, Property (ii) at page 109 in [5]), Burkholder and Hölder inequalities we get

[TABLE]

and

[TABLE]

where $C_{9},C_{12}$ depend only on $T$ , $s$ , $q$ , $r$ and $L$ . Therefore, by (39), (40), and (32) we arrive at

[TABLE]

By proceeding analogously as for $A_{3,n}$ we obtain

[TABLE]

Combining (25), (30), (31), (41), and (42) we get (15), which ends the proof of (15).

We now justify (17) and (16). In this cases the process $Z$ is not necessarily a semimartingale. Hence, we use the following decomposition of $\mathcal{A}_{n}^{RM}$ , that follows directly from (24),

[TABLE]

We have that

[TABLE]

where

[TABLE]

For $D_{1,n}$ and $D_{2,n}$ we use the bounds (30), (31) obtained for $A_{1,n}$ and $A_{2,n}$ , respectively. However, for $D_{3,n}$ and $D_{4,n}$ we have to differ between the case when $\tilde{W}\in\mathcal{\bar{W}}_{s}(\delta_{2})$ and $\tilde{W}\in\mathcal{W}_{\alpha,\beta}(\delta_{2})$ .

Let $\tilde{W}\in\mathcal{\bar{W}}_{s}(\delta_{2})$ . Since in this case $p_{W}\in\mathcal{\bar{K}}^{2}_{s}$ we get, by the mean value theorem, that

[TABLE]

for all $t,z\in[0,T]$ and $x,y\in\mathbb{R}$ , where $\bar{C}>0$ depends only on $T$ and $s$ . This implies

[TABLE]

for $i=0,1,\ldots,n-1$ , where $\gamma=q/(q-r)$ . Hence, by the Hölder inequality

[TABLE]

and

[TABLE]

since $W$ has all absolute moments bounded. Therefore,

[TABLE]

which implies that

[TABLE]

For $D_{4,n}$ we proceed analogously as for $D_{3,n}$ . This gives (16).

Finally, let $\tilde{W}\in\mathcal{W}_{\alpha,\beta}(\delta_{2})$ . Since $X(t_{i})$ and $\Delta W_{i}$ are independent, we have in this case that

[TABLE]

where $m_{r\beta}=\|Z\|_{r\beta}$ and $Z$ is a standard normal random variable with mean zero and variance equal to $1$ . For $D_{4,n}$ we proceed analogously as for $D_{3,n}$ . This ends the proof. $\blacksquare$

Directly from Theorem 1 we have the following corollary that states the worst-case error of the algorithm $\mathcal{A}^{RM}_{n}$ in the class $F^{\varrho,r,q}_{L}$ .

Corollary 1.

Let $\varrho\in(0,1]$ , $r\geq 2$ , and let us consider the Riemann-Maruyama quadrature $\mathcal{A}^{RM}_{n}$ based on the equidistant mesh $t_{i}=iT/n$ , $i=0,1,\ldots,n$ .

(i)

Let $s\geq 0$ and $q\in(r,+\infty)$ . Then

[TABLE]

as $n\to+\infty$ and $\max\{\delta_{1},\delta_{2}\}\to 0+$ .

(ii)

Let $s\geq 0$ and $q\in(r,+\infty)$ . Then

[TABLE]

as $n\to+\infty$ and $\max\{\delta_{1},\delta_{2}\}\to 0+$ .

(iii)

Let $\alpha,\beta\in(0,1]$ and $q\in[r,+\infty)$ . Then

[TABLE]

as $n\to+\infty$ and $\max\{\delta_{1},\delta_{2}\}\to 0+$ .

Let us comment on the result obtained so far.

Remark 1.

As we can see from Theorem 1 and Corollary 1 domination of the noise term become more on more visible as the regularity of disturbing functions $p_{W}$ is decreasing.

Remark 2.

We considered the setting that we called analytic noise model, since we assumed certain form of the noise via disturbance function $p$ . Of course another approach is possible. Namely, one can assume that the exact values of $X$ are corrupted by noise in the following way

[TABLE]

where $(\varepsilon_{i})_{i=0,1,\ldots,n}$ are $\sigma\Bigl{(}\bigcup_{t\geq 0}\Sigma_{t}\Bigr{)}$ -measurable random variables. Preliminary estimates indicate that it is possible to achieve upper bounds like in Theorem 1, under certain assumptions on the discrete-time process $(\varepsilon_{i})_{i=0,1,\ldots,n}$ . We postpone this problem to our future work.

4. Lower bounds and optimality of the Riemann-Maruyama quadrature

In this section we investigate lower bounds on the worst-case error of an arbitrary algorithm from $\Phi_{n}$ and, in particular cases, we establish optimality of the Riemann-Maruyama algorithm $\mathcal{A}^{RM}_{n}$ . We concentrate on the class $\mathcal{W}_{s}$ of noisy evaluations of $W$ . Essentially sharp lower bounds in the classes $\mathcal{\bar{W}}_{s}$ and $\mathcal{W}_{\alpha,\beta}$ are left as an open problem.

The following lemma follows directly from (91) in [18], where the lower bound on the error for approximating Itô integrals of deterministic functions from the Hölder class has been established.

Lemma 1.

Let $\varrho\in(0,1]$ , $r\geq 2$ , $q\in(r,+\infty)$ , and $s\geq 0$ . Then

[TABLE]

From Corollary 1 and Lemma 1 we get the main result of this paper.

Theorem 2.

Let $\varrho\in(0,1]$ , $r\geq 2$ , $q\in(r,+\infty)$ , and $s\geq 0$ . Then the $n$ th minimal error satisfies

[TABLE]

and

[TABLE]

as $n\to+\infty$ and $\delta_{1}\to 0+$ . An optimal algorithm is the Riemann-Maruyama quadrature $\mathcal{A}^{RM}_{n}$ based on the equidistant discretization $t_{i}=iT/n$ , $i=0,1,\ldots,n$ .

The results above hold for particular values of the precision parameter $\delta_{2}$ , namely, for $\delta_{2}=0$ and $\delta_{2}=\delta_{1}$ . In general case preliminary estimates suggest that in order to establish dependence of the lower bounds also on $\delta_{2}$ completely new technique is required. (We stress that the results from [19] are not applicable here, since we consider a different model of noise.) Nevertheless, for the algorithm $\mathcal{A}^{RM}_{n}$ we have the following sharp (worst-case) error bounds in the case of arbitrary $\delta_{2}$ .

Proposition 1.

Let $\varrho\in(0,1]$ , $r\geq 2$ , $q\in(r,+\infty)$ , $s\geq 0$ , and let us consider the Riemann-Maruyama quadrature $\mathcal{A}^{RM}_{n}$ based on the equidistant mesh $t_{i}=iT/n$ , $i=0,1,\ldots,n$ . Then

[TABLE]

as $n\to+\infty$ and $\max\{\delta_{1},\delta_{2}\}\to 0+$ .

Proof. Upper bounds in (63) follows directly from Corollary 1. In the case when $\delta_{1}\geq 0=\delta_{2}$ , the lower bound $\Omega(\max\{n^{-\varrho},\delta_{1}\})$ again follows from (91) in [18].

Now we consider the case $\delta_{2}\geq 0=\delta_{1}$ . Let us take

[TABLE]

and take $\tilde{W}\in\mathcal{W}_{s}(\delta_{2})$ of the following form

[TABLE]

We get that

[TABLE]

and

[TABLE]

which gives

[TABLE]

This implies the thesis. $\blacksquare$

Remark 3.

In the case of exact information (i.e., $\delta_{1}=\delta_{2}=0$ ) we know, by the results of [6], that even randomized adaptive information does not help, and the rate $n^{-\varrho}$ is optimal.

5. Numerical results

We present results for the Riemann-Maruyama quadrature $\mathcal{A}_{n}^{RM}$ . There will be four exemplary problems presented, where for the first one we know the exact solution and for the others we need to assume some convergence of the analyzed method to estimate the obtained error. In the end of this section some practical guides on how to implement the algorithm efficiently using GPUs will be presented together with the discussion about the obtainable speedup of using such architecture.

5.1. Problems

For the test purposes we analyze following integration problem

[TABLE]

and we consider the following four examples

[TABLE]

where $N=\{N(t)\}_{t\in[0,T]}$ is a Poisson process with insensitivity $\lambda=5$ and $W_{2}=\{W_{2}(t)\}_{t\in[0,T]}$ is a standard one-dimensional Wiener process, both independent from $W$ . We also apply $\mathcal{A}_{n}^{RM}$ to the weak approximation of the following scalar SDE

[TABLE]

where $\mu=3$ . The exact solution of (74) leads to the quadrature problem, since

[TABLE]

where $\displaystyle{X(t)=e^{\mu(T-t)}W_{2}(t)}$ , see [2], [13]. We use GPU implementation of the Riemann-Maruyama quadrature in order to compute an approximation of the following expectation

[TABLE]

for $f=f(x)$ given as in (72) with $K=2$ . Computation of (76) corresponds to derivative pricing, where the price of the underlying risky asset is described by (74).

The approximation to $\displaystyle{\mathbb{E}(f(Y(T)))}$ is defined by

[TABLE]

where $M$ is a number of independent copies of $\mathcal{A}_{n}^{RM}(\tilde{X},\tilde{W})$ . Due to the strong law of large numbers we get for all $n\in\mathbb{N}$

[TABLE]

as $M\to+\infty$ . Moreover, since $f:\mathbb{R}\to\mathbb{R}$ is a Lipschitz function and $X\in F_{L}^{1/2,2,q}$ with $q>2$ , the standard arguments and Theorem 1 (i) yield the following estimate for averaged weak error, where $(\tilde{X},\tilde{W})\in V_{X}(\delta_{1})\times\mathcal{W}_{s}(\delta_{2})$ and $\delta_{1},\delta_{2}\in[0,1]$ ,

[TABLE]

5.2. Noise

For the purpose of testing we analyze following disturbing functions

[TABLE]

It is worth to mention that the noise function $p_{1}$ corresponds to the standard absolute deterministic noise and $p_{2}$ , $p_{3}$ are related to the standard relative error. The latter can be connected with the computation precision. There is a trend now observed in computations, e.g. for deep learning, where the computations are conducted in lower precision in order to gain huge computation speedup. The novel GPU architectures (e.g. NVIDIA Volta) are designed with some dedicated accelerators for single or half precision operations.

For each test the information about the analyzed precision level $\delta_{1}$ and $\delta_{2}$ will be given. All the tests were conducted with $r=2$ .

5.3. Error criterion

For the problem (70) we know the exact value of the solution, therefore we can have the following error estimate

[TABLE]

where $M=2048$ corresponds to the number of computed independent realizations under given precision levels. In case of the problems (71)-(74), the exact solution is not known, hence in order to analyze the algorithm error, we need to compare the obtained result with the result obtained on the same trajectories for denser mesh. In our tests, as the expected convergence ratio is of no less than $0.5$ , it is reasonable to have thousand times more points. That leads to following error estimation formula, used for (71)-(73)

[TABLE]

where $m\in\{2,3,4\}$ and $L=1000$ . For (74) we use the following quantity

[TABLE]

as the approximation of the weak error

[TABLE]

5.4. Results

In Figure 1 we present the behavior of the error for the Riemann-Maruyama quadrature $\mathcal{A}^{RM}_{n}$ for problem (70). The numerical results are compared with the theoretical rate of convergence obtained for the algorithm, i.e. we present the effect of changing the precision levels $\delta_{1}$ and $\delta_{2}$ . In Figures 2-4 we present behavior of the error for the Riemann-Maruyama quadrature $\mathcal{A}^{RM}_{n}$ for problems (71)-(73). (The errors are measured accordingly to (80).) From the Figure 6 we see that if $\delta_{1},\delta_{2}$ are on the level $n^{-1/2}$ then the Riemann-Maruyama quadrature preserves the error $\mathcal{O}(n^{-1/2})$ , known from the case when the information is exact. The results confirm the necessity of tending with the precision parameters to zero in order to maintain the convergence rate for the Riemann-Maruyama quadrature.

Results for the weak approximation are given at Figure 5.

5.5. GPU implementation

Below we present pseudo-code for the GPU implementation of the algorithm $\mathcal{A}_{n}^{RM}$ . This algorithm is designed for the case where we wish to compute multiple realizations of $\mathcal{A}_{n}^{RM}$ , returning the array of results. That algorithm, because of straightforward usage of parallel programming, enabled significant computational improvement of using graphics processing units. Moreover, additional speedup can be observed for using GPUs also for generating normally distributed numbers. Hence, it is suitable for e.g. derivative pricing.

In our experiments we compared the performance of the algorithm $\mathcal{A}_{n}^{RM}$ for both GPU and CPU implementations. For GPU implementation we used 32 blocks and 64 threads for problems (70), (71), (73) and 512 threads for problem (72). The CPU performance was tested using 8 threads. The used hardware was GPU – NVIDIA TESLA P100 (Maxwell), CPU – Intel Xeon E5-2680v4 (Broadwell). The speedup comparison for problem (72) is presented in Figure 7. As we can see it is possible to have speedup of level 100x.

6. Conclusions

We investigated the problem of efficient approximation of Itô integrals under inexact information about the Wiener process and an integrand. We showed that for certain precisions ( $\delta_{1}=\delta_{2}\geq 0$ ) the Riemann-Maryama quadrature rule is optimal. We also proposed GPU implementation of the algorithm that is suitable for practical purposes.

Acknowledgments.

The authors were partially supported by the Faculty of Applied Mathematics AGH UST dean grant for PhD students and young researchers within subsidy of Ministry of Science and Higher Education, with grant numbers as follows: 15.11.420.038/18 (Andrzej Kałuża), 15.11.420.038/2 (Paweł M. Morkisz), and 15.11.420.038/1 (Paweł Przybyłowicz).

Bibliography30

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] C. Dellacherie, P-A. Meyer, Probabilities and Potential, Hermann, 1978.
2[2] M. Eisenmann, R. Kruse, Two quadrature rules for stochastic Itô integrals with fractional Sobolev regularity, https://arxiv.org/abs/1712.08152.
3[3] K. Fatahalian, J. Sugerman, P. Hanrahan, Understanding the efficiency of GPU algorithms for matrix-matrix multiplication, Graphics Hardware (2005), 133–137.
4[4] N. Whitehead, A. Fit-Florea, Precision & Performance: Floating Point and IEEE 754 Compliance for NVIDIA GP Us, NVIDIA, 2011.
5[5] J-F. Le Gall, Brownian Motion, Martingales, and Stochastic Calculus , Springer, 2016.
6[6] S. Heinrich, Lower complexity bounds for parametric stochastic Itô integration, Preprint, https://www.uni-kl.de/AG-Heinrich/papers/lowpsint 17.pdf, 2017.
7[7] S. Heinrich, T. Daun, Complexity of Banach space valued and parametric stochastic Itô integration, J. Complex. 40 (2017), 100–122.
8[8] P. Hertling, Nonlinear Lebesgue and Itô integration problems of high complexity, J. Complex. 17 (2001), 366–387.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Optimal approximation of stochastic integrals in analytic noise model

Abstract.

1. Introduction

2. Preliminaries

3. The Riemann-Maruyama quadrature for noisy information

Theorem 1**.**

Corollary 1**.**

Remark 1**.**

Remark 2**.**

4. Lower bounds and optimality of the Riemann-Maruyama quadrature

Lemma 1**.**

Theorem 2**.**

Proposition 1**.**

Remark 3**.**

5. Numerical results

5.1. Problems

5.2. Noise

5.3. Error criterion

5.4. Results

5.5. GPU implementation

6. Conclusions

Theorem 1.

Corollary 1.

Remark 1.

Remark 2.

Lemma 1.

Theorem 2.

Proposition 1.

Remark 3.