Unbiased Multi-index Monte Carlo

Dan Crisan; Pierre Del Moral; Jeremie Houssineau; Ajay Jasra

arXiv:1702.03057·stat.CO·October 17, 2017

Unbiased Multi-index Monte Carlo

Dan Crisan, Pierre Del Moral, Jeremie Houssineau, Ajay Jasra

PDF

Open Access

TL;DR

This paper presents a novel unbiased multi-index Monte Carlo method that reduces computational effort and removes bias in expectation approximations of discretized random variables, with applications in PDE solutions and Bayesian inference.

Contribution

It introduces an unbiased multi-index Monte Carlo approach that improves efficiency and removes bias compared to traditional discretization sampling methods.

Findings

01

Reduced computational effort for a given error level.

02

Successfully applied to PDE solutions with random coefficients.

03

Effective in Bayesian inference for stochastic PDEs.

Abstract

We introduce a new class of Monte Carlo based approximations of expectations of random variables such that their laws are only available via certain discretizations. Sampling from the discretized versions of these laws can typically introduce a bias. In this paper, we show how to remove that bias, by introducing a new version of multi-index Monte Carlo (MIMC) that has the added advantage of reducing the computational effort, relative to i.i.d. sampling from the most precise discretization, for a given level of error. We cover extensions of results regarding variance and optimality criteria for the new approach. We apply the methodology to the problem of computing an unbiased mollified version of the solution of a partial differential equation with random coefficients. A second application concerns the Bayesian inference (the smoothing problem) of an infinite dimensional signal modelled…

Equations147

E [φ (X)] = \int_{E} φ (x) π (d x),

E [φ (X)] = \int_{E} φ (x) π (d x),

α \to \infty lim ∣ E [φ (X_{α})] - E [φ (X)] ∣ = 0,

α \to \infty lim ∣ E [φ (X_{α})] - E [φ (X)] ∣ = 0,

π_{α} (x_{1 : K} ∣ y_{1 : K}) \propto k = 1 \prod K g (y_{k} ∣ x_{k}) Q_{α} (x_{k} ∣ x_{k - 1})

π_{α} (x_{1 : K} ∣ y_{1 : K}) \propto k = 1 \prod K g (y_{k} ∣ x_{k}) Q_{α} (x_{k} ∣ x_{k - 1})

E [φ (X_{L})] = E [φ (X_{1})] + l = 2 \sum L {E [φ (X_{l})] - E [φ (X_{l - 1})]}

E [φ (X_{L})] = E [φ (X_{1})] + l = 2 \sum L {E [φ (X_{l})] - E [φ (X_{l - 1})]}

I_{m}^{n} ≐ {α \in N_{0}^{d} : m_{1} \leq α_{1} \leq n_{1}, \dots, m_{d} \leq α_{d} \leq n_{d}} .

I_{m}^{n} ≐ {α \in N_{0}^{d} : m_{1} \leq α_{1} \leq n_{1}, \dots, m_{d} \leq α_{d} \leq n_{d}} .

Z ≐ α \in I_{0}^{N} \sum \frac{Δ S _{α}}{P ( N \geq α )},

Z ≐ α \in I_{0}^{N} \sum \frac{Δ S _{α}}{P ( N \geq α )},

\Delta_{i}S_{\bm{\alpha}}=\begin{cases*}S_{\bm{\alpha}}-S_{\bm{\alpha}-\bm{e}_{i}}&if $\bm{\alpha}_{i}>0$\\ S_{\bm{\alpha}}&if $\bm{\alpha}_{i}=0$,\end{cases*}

\Delta_{i}S_{\bm{\alpha}}=\begin{cases*}S_{\bm{\alpha}}-S_{\bm{\alpha}-\bm{e}_{i}}&if $\bm{\alpha}_{i}>0$\\ S_{\bm{\alpha}}&if $\bm{\alpha}_{i}=0$,\end{cases*}

Δ S_{α} = r \in {0, 1}^{d} \sum (- 1)^{∣ r ∣} S_{α - r},

Δ S_{α} = r \in {0, 1}^{d} \sum (- 1)^{∣ r ∣} S_{α - r},

Δ_{d + 1} Δ_{d} \dots Δ_{1} S_{(α, α^{'})}

Δ_{d + 1} Δ_{d} \dots Δ_{1} S_{(α, α^{'})}

= r \in {0, 1}^{d} \sum r^{'} \in {0, 1} \sum (- 1)^{∣ r ∣ + r^{'}} S_{(α - r, α^{'} - r^{'})}

= r \in {0, 1}^{d + 1} \sum (- 1)^{∣ r ∣} S_{β - r}

α \in I_{k + 1}^{n} \sum Δ S_{α} = α \in {k, n}^{d} \sum (- 1)^{ℓ_{k} (α)} S_{α},

α \in I_{k + 1}^{n} \sum Δ S_{α} = α \in {k, n}^{d} \sum (- 1)^{ℓ_{k} (α)} S_{α},

α \in I_{k + 1}^{n} \sum Δ S_{α} = r \in {0, 1}^{d} \sum (- 1)^{∣ r ∣} α \in I_{k + 1}^{n} \sum S_{α - r} .

α \in I_{k + 1}^{n} \sum Δ S_{α} = r \in {0, 1}^{d} \sum (- 1)^{∣ r ∣} α \in I_{k + 1}^{n} \sum S_{α - r} .

α \in I_{k + 1}^{n} \sum S_{α - r} = I \subseteq {1, \dots, d} \sum α \in I_{k - r + 1}^{n - r} k < α_{i} < n ⟺ i \in I \sum S_{α},

α \in I_{k + 1}^{n} \sum S_{α - r} = I \subseteq {1, \dots, d} \sum α \in I_{k - r + 1}^{n - r} k < α_{i} < n ⟺ i \in I \sum S_{α},

α \in I_{k + 1}^{n} \sum Δ S_{α} = r \in {0, 1}^{d} \sum (- 1)^{∣ r ∣} S_{t_{k, n} (r)},

α \in I_{k + 1}^{n} \sum Δ S_{α} = r \in {0, 1}^{d} \sum (- 1)^{∣ r ∣} S_{t_{k, n} (r)},

(t_{k, n}^{- 1} (α))_{i} = \frac{n - α _{i}}{n - k}

(t_{k, n}^{- 1} (α))_{i} = \frac{n - α _{i}}{n - k}

Z_{n} ≐ α \in I_{0}^{N \land n} \sum \frac{Δ S _{α}}{P ( N \geq α )} = α \in I_{0}^{n} \sum Δ S_{α} \frac{I _{{N \geq α}}}{P ( N \geq α )},

Z_{n} ≐ α \in I_{0}^{N \land n} \sum \frac{Δ S _{α}}{P ( N \geq α )} = α \in I_{0}^{n} \sum Δ S_{α} \frac{I _{{N \geq α}}}{P ( N \geq α )},

\mathbb{E}Z_{n}=\sum_{\bm{\alpha}\in\mathcal{I}_{0}^{n}}\mathbb{E}\Delta S_{\bm{\alpha}}\mathbb{E}\bigg{[}\dfrac{\mathbb{I}_{\{\bm{N}\geq\bm{\alpha}\}}}{\mathbb{P}(\bm{N}\geq\bm{\alpha})}\bigg{]}=\sum_{\bm{\alpha}\in\mathcal{I}_{0}^{n}}\mathbb{E}\Delta S_{\bm{\alpha}},

\mathbb{E}Z_{n}=\sum_{\bm{\alpha}\in\mathcal{I}_{0}^{n}}\mathbb{E}\Delta S_{\bm{\alpha}}\mathbb{E}\bigg{[}\dfrac{\mathbb{I}_{\{\bm{N}\geq\bm{\alpha}\}}}{\mathbb{P}(\bm{N}\geq\bm{\alpha})}\bigg{]}=\sum_{\bm{\alpha}\in\mathcal{I}_{0}^{n}}\mathbb{E}\Delta S_{\bm{\alpha}},

E Z_{n} = I \subseteq {1, \dots, d} \sum α \in I_{0}^{n} α_{i} > 0 ⟺ i \in I \sum E Δ_{I} S_{α},

E Z_{n} = I \subseteq {1, \dots, d} \sum α \in I_{0}^{n} α_{i} > 0 ⟺ i \in I \sum E Δ_{I} S_{α},

E Z_{n} = I \subseteq {1, \dots, d} \sum α \in {0, n}^{d} α_{i} > 0 ⟹ i \in I \sum (- 1)^{∣ (n - α_{I}) / n ∣} E S_{α},

E Z_{n} = I \subseteq {1, \dots, d} \sum α \in {0, n}^{d} α_{i} > 0 ⟹ i \in I \sum (- 1)^{∣ (n - α_{I}) / n ∣} E S_{α},

E Z_{n}

E Z_{n}

= E S_{n},

α \lor β = (α_{1} \lor β_{1}, \dots, α_{d} \lor β_{d}) .

α \lor β = (α_{1} \lor β_{1}, \dots, α_{d} \lor β_{d}) .

α, β \in I_{0}^{\infty} \sum \frac{Δ∥ S _{α} - S ∥ _{2} Δ∥ S _{β} - S ∥ _{2}}{P ( N \geq α \lor β )} < \infty,

α, β \in I_{0}^{\infty} \sum \frac{Δ∥ S _{α} - S ∥ _{2} Δ∥ S _{β} - S ∥ _{2}}{P ( N \geq α \lor β )} < \infty,

E Z^{2} = α, β \in I_{0}^{\infty} \sum ν_{α, β} \frac{P ( N \geq α \lor β )}{P ( N \geq α ) P ( N \geq β )} .

E Z^{2} = α, β \in I_{0}^{\infty} \sum ν_{α, β} \frac{P ( N \geq α \lor β )}{P ( N \geq α ) P ( N \geq β )} .

∥ Z_{n} - Z_{k} ∥_{2}^{2} = α, β \in I_{k + 1}^{n} \sum E [Δ S_{α} Δ S_{β}] \frac{P ( N \geq α \lor β )}{P ( N \geq α ) P ( N \geq β )},

∥ Z_{n} - Z_{k} ∥_{2}^{2} = α, β \in I_{k + 1}^{n} \sum E [Δ S_{α} Δ S_{β}] \frac{P ( N \geq α \lor β )}{P ( N \geq α ) P ( N \geq β )},

∥ Z_{n} - Z_{k} ∥_{2}^{2} \leq α, β \in I_{k + 1}^{n} \sum \frac{Δ∥ S _{α} - S ∥ _{2} Δ∥ S _{β} - S ∥ _{2}}{P ( N \geq α \lor β )},

∥ Z_{n} - Z_{k} ∥_{2}^{2} \leq α, β \in I_{k + 1}^{n} \sum \frac{Δ∥ S _{α} - S ∥ _{2} Δ∥ S _{β} - S ∥ _{2}}{P ( N \geq α \lor β )},

α, β \in I_{0}^{\infty} \sum \frac{Δ∥ S _{α} - S ∥ _{2} Δ∥ S _{β} - S ∥ _{2}}{P ( N \geq α \lor β )} < \infty,

α, β \in I_{0}^{\infty} \sum \frac{Δ∥ S _{α} - S ∥ _{2} Δ∥ S _{β} - S ∥ _{2}}{P ( N \geq α \lor β )} < \infty,

E Z^{2} = α \geq 0 \sum \frac{∥ S _{α - 1} - S ∥ _{2}^{2} - ∥ S _{α} - S ∥ _{2}^{2}}{P ( N \geq α )},

E Z^{2} = α \geq 0 \sum \frac{∥ S _{α - 1} - S ∥ _{2}^{2} - ∥ S _{α} - S ∥ _{2}^{2}}{P ( N \geq α )},

α \geq 1 \sum \frac{∥ S _{α - 1} - S ∥ _{2}^{2}}{P ( N \geq α )}

α \geq 1 \sum \frac{∥ S _{α - 1} - S ∥ _{2}^{2}}{P ( N \geq α )}

E Z^{2}

E Z^{2}

= α, β \in I_{0}^{\infty} \sum \frac{ν _{α, β}}{P ( N \geq α \land β )} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProbabilistic and Robust Engineering Design · Mathematical Approximation and Integration · Statistical Methods and Inference

Full text

\newsiamremark

exampleExample \newsiamremarkremarkRemark

\headersUnbiased Multi-index Monte CarloD. Crisan, P. Del Moral, J. Houssineau and A. Jasra

Unbiased Multi-index Monte Carlo

Dan Crisan Department of Mathematics, Imperial College London, London, SW7 2AZ, UK.

Pierre Del Moral INRIA Bordeaux, 200 Avenue de la Vieille Tour, 33405 Talence, FR.

Jeremie Houssineau Department of Statistics and Applied Probability, National University of Singapore. Email: [email protected]

Ajay Jasra Department of Statistics and Applied Probability, National University of Singapore.

Abstract

We introduce a new class of Monte Carlo based approximations of expectations of random variables such that their laws are only available via certain discretizations. Sampling from the discretized versions of these laws can typically introduce a bias. In this paper, we show how to remove that bias, by introducing a new version of multi-index Monte Carlo (MIMC) that has the added advantage of reducing the computational effort, relative to i.i.d. sampling from the most precise discretization, for a given level of error. We cover extensions of results regarding variance and optimality criteria for the new approach. We apply the methodology to the problem of computing an unbiased mollified version of the solution of a partial differential equation with random coefficients. A second application concerns the Bayesian inference (the smoothing problem) of an infinite dimensional signal modelled by the solution of a stochastic partial differential equation that is observed on a discrete space grid and at discrete times. Both applications are complemented by numerical simulations.

keywords:

Monte Carlo, Multi-index discretization, Unbiasedness, Smoothing.

1 Introduction

Following Kalman’s seminal paper [11], the theory and applications of filtering, smoothing and prediction has expanded rapidly and massively and is as strong today as it was seventy years ago. The area has evolved far beyond the original linear/Gaussian framework, many current applications involving nonlinear signals (possibly high dimensional or even infinite dimensional) and nonlinear observations (see [4] for some of the recent developments). General filtering, smoothing and prediction problems no longer have explicit solutions: inference is achieved mostly through numerical methods. Methods based on Monte Carlo integration are very popular, particularly for high dimensional problems.

Monte Carlo methods have proved to be sensible in a variety of applications. However, in certain contexts, multilevel Monte Carlo (MLMC) integration has been introduced with success, see e.g. [5, 6]. This method is useful when one has a probability law associated to a discretization in a single dimension, for instance time. It can reduce the computational effort, relative to i.i.d. sampling (Monte Carlo) from the most precise discretization, for a given level of error. This has been extended in a subtle manner to discretizations in a multiple dimensions, for instance space and time, by [8]. To be more precise let us assume that we are given a probability measure $\pi$ defined on a measure space $(E,\mathcal{E})$ and a collection of bounded-measurable real-valued functions $\mathcal{B}_{b}(E)$ . We seek to compute for $\varphi\in\mathcal{B}_{b}(E)$

[TABLE]

where $X$ is a random variable with law $\pi$ . We assume that the random variable $X$ is associated to a $d$ -dimensional multi-index continuum system, such as the one described in [8]. For instance, $\pi$ may be associated to the solution of a stochastic partial differential equation (SPDE). Such systems are found in a wide variety of applications, see [1] for examples. Alternatively $\pi$ can be the solution of a filtering/smoothing problem as we will explain later on.

Whilst the probability $\pi$ is not available to us, we have access to a class of biased approximations $(\pi_{\bm{\alpha}})_{\bm{\alpha}\in\mathbb{N}^{d}_{0}}$ where $\mathbb{N}^{d}_{0}$ is the set on multi-indices of length $d$ with integer non-negative entries. For instance, by using Monte Carlo integration, one can then compute $\mathbb{E}[\varphi(X_{\bm{\alpha}})]$ where $X_{\bm{\alpha}}$ being a random variable with law $\pi_{\bm{\alpha}}$ and even though we have a bias, i.e., $\mathbb{E}[\varphi(X_{\bm{\alpha}})]\neq\mathbb{E}[\varphi(X)]$ we will assume that

[TABLE]

where the limit $\bm{\alpha}\to\infty$ is understood as $\min_{1\leq i\leq d}\bm{\alpha}_{i}\to\infty$ and, naturally, that the computational cost associated with $\pi_{\bm{\alpha}}$ increases as the values of $\bm{\alpha}$ increase.

As an example of the above general context, consider discretely observing data associated to a signal modelled by the solution $(x_{t})_{t\geq 0}$ of a stochastic partial differential equation (a concrete example can be found in Section 4.3). Suppose that data is obtained at unit time interval and is denoted by $y_{1},\dots,y_{K}$ . We assume that, conditional upon $(x_{t})_{t\geq 0}$ , $y_{1},\dots,y_{K}$ are conditionally independent with density on a finite dimensional space $g(y_{k}|x_{k})$ with $x_{k}$ the solution of the SPDE at time $k$ . Moreover we assume $g$ is well defined for any $x_{k}$ , even if $x_{k}$ is discretized. Let $Q_{\bm{\alpha}}$ be the transition density of the SPDE under a time and space discretization corresponding to a multi-index $\bm{\alpha}\in\mathbb{N}^{d}_{0}$ . Given observed data $y_{1},\dots,y_{K}$ our objective is to compute expectations with respect to the following distribution (known as a smoother or smoothing distribution for $x_{1:K}$ ):

[TABLE]

where we assume that $x_{0}$ is given and we used the notation $x_{1:K}=(x_{1},\dots,x_{K})$ and $y_{1:K}=(y_{1},\dots,y_{K})$ . If the class of discretizations is chosen so that Eq. 1 holds, then the methodology developed in this paper can be applied to solve this problem. It is worth noting that, integrating with respect to $\pi$ is non-trivial and well-known to be challenging. See, for instance, the work of [3, 12] and the references therein for more details.

1.1 Contribution and Structure

In the context described above, it is well-known that the Monte Carlo approximation of $\mathbb{E}[\varphi(X_{\bm{\alpha}})]$ , which we assume is required, can be significantly enhanced through the use of the MIMC method of [8]. This idea is intrinsically linked to the popular MLMC approach. In this latter approach the dimensionality of the index is 1. Briefly writing the indices $1,\dots,L$ , ( $L$ being the finest discretization and 1 the coarsest and the discretization becomes more and more fine from 1) we have

[TABLE]

that is, introducing a collapsing sum representation of the expectation w.r.t. the finest discretization. The idea is then to dependently sample a pair of approximations $(\pi_{l},\pi_{l-1})$ (i.e to dependently couple them) independently for $l\in\{2,\dots,L\}$ . Note the case $l=1$ is just i.i.d. sampling from $\pi_{1}$ . If the dependence in the coupling is sufficiently positively correlated, then it is possible to sample fewer simulations at the fine discretizations (which are expensive to sample) and more at the coarse discretizations in such a way that the cost associated to obtain a prescribed mean square error of the MLMC approximation is less than that of i.i.d. sampling from the finest discretization. In the MIMC context, it is a more challenging procedure, but similar reductions in computational cost can also be possible.

A randomized version of the MLMC approach has been developed in [13], which removed the discretization bias. In this work, we show how one can extend this idea in the context where the discretization parameters are in multiple dimensions. This extension allows for a judicious allocation of the computational effort in order to take into account the variance of the target distribution discretization in separate dimensions. In particular, Monte Carlo approximations are constructed to entirely remove the discretization bias, that is, to approximate $\mathbb{E}[\varphi(X)]$ directly. We also analyze the variance of the methodology and propose several original optimality criteria for its implementation. Several simulated examples are considered.

This article is structured as follows. In Section 2 we give some notations, the approach and some preliminary results. In Section 3 our main theoretical results and the corresponding proofs are given. In Section 4 the new methodology is illustrated by numerical examples. Section 5 summarizes our work, with a discussion of future work.

2 Notation and Preliminary Results

Throughout the article, a complete probability space $(\Omega,\mathcal{F},\mathbb{P})$ is considered with $\mathbb{E}$ denoting the expectation with respect to $\mathbb{P}$ and $\mathbb{I}_{A}$ denoting the random variable on $(\Omega,\mathcal{F},\mathbb{P})$ defined as the indicator of the event $A\in\mathcal{F}$ .

We work on the lattice $\mathbb{N}_{0}^{d}$ for some $d>0$ equipped with the natural partial order $\leq$ which is defined as $\bm{m}\leq\bm{n}$ if and only if $\bm{m}_{i}\leq\bm{n}_{i}$ for all $1\leq i\leq d$ . Note that different total orders can also be defined on this $d$ -dimensional lattice, such as the lexicographical order, but these total orders are to some extent arbitrary and will not be directly useful in the context we consider in this paper. For the sake of simplicity, $S$ and $S_{\bm{\alpha}}$ will denote the random variables $\varphi(X)$ and $\varphi(X_{\bm{\alpha}})$ respectively, for any $\bm{\alpha}\in\mathbb{N}^{d}_{0}$ . Let $\bm{m},\bm{n}\in\mathbb{N}_{0}^{d}$ such that $\bm{m}\leq\bm{n}$ and consider

[TABLE]

By a usual abuse of notation, $\mathcal{I}_{\bm{m}}^{n}$ is used instead of $\mathcal{I}_{\bm{m}}^{\bm{n}}$ if the superscript $\bm{n}$ verifies $\bm{n}_{i}=n$ for all $1\leq i\leq d$ , and similarly for the subscript. It holds that $\mathcal{I}^{\infty}_{0}=\mathbb{N}^{d}_{0}$ .

An estimator $Z$ of $\mathbb{E}S$ is defined as

[TABLE]

where $\bm{N}$ is a random variable on $\mathbb{N}_{0}^{d}$ independent of $(S_{\bm{\alpha}})_{\bm{\alpha}}$ which guarantees that the estimator is unbiased, i.e. it ensures that $\mathbb{E}Z=\mathbb{E}S$ holds, and where $\Delta\doteq\Delta_{1}\dots\Delta_{d}$ with

[TABLE]

for any $i\in\{1,\dots,d\}$ where $\bm{e}_{i}$ is the element of $\mathbb{N}^{d}_{0}$ such that $(\bm{e}_{i})_{i}=1$ and $(\bm{e}_{i})_{j}=0$ if $i\neq j$ . The order of the application of the operators $\{\Delta_{i}\}_{i=1}^{d}$ in the definition of $\Delta$ is irrelevant since these operators can be easily seen to commute. The choice of a vector-valued random variable $\bm{N}$ in Eq. 2 is justified by the fact that there might be interest in calculating the sum up to a non-diagonal index. For instance, relying on the increments with some very high and some very low components might yield an estimator with low variance at a reasonable computational cost. The following lemma will be useful to prove that the estimator $Z$ is unbiased.

Lemma 2.1.

The increment $\Delta S_{\bm{\alpha}}$ can be rewritten for any $\bm{\alpha}\geq 1$ as

[TABLE]

*with $|\cdot|$ defined as the 1-norm on $\mathbb{N}_{0}^{d}$ . *

Proof 2.2.

This result can be proved by recurrence on the dimension $d$ . The case $d=1$ is obvious. If Eq. 3 is assumed to hold for a given $d\in\mathbb{N}$ then for any $\alpha^{\prime}\in\mathbb{N}$ ,

[TABLE]

*where $\bm{\beta}=(\bm{\alpha},\alpha^{\prime})$ , hence showing that the relation is true for the dimension $d+1$ . *

It follows from Lemma 2.1 that a given term $S_{\bm{\alpha}}$ is going to appear exactly once in each of the increments $\Delta S_{\bm{\alpha}+\bm{r}}$ with $\bm{r}\in\{0,1\}^{d}$ , that is $|\{0,1\}^{d}|=2^{d}$ times, negatively in $2^{d-1}$ of the increments and positively in the other $2^{d-1}$ increments, therefore cancelling in $Z$ . However, this does not take into account cases where the condition $\bm{\alpha}\geq 1$ is not satisfied and is only valid for terms $S_{\bm{\alpha}}$ for which all the $\{S_{\bm{\alpha}+\bm{r}}\}_{\bm{r}\in\{0,1\}^{d}}$ are included in the considered sum.

Lemma 2.3.

For any $k,n\in\mathbb{N}_{0}$ such that $n>k$ , it holds that

[TABLE]

*where $\ell_{k}(\bm{\alpha})$ is the number of components of $\bm{\alpha}$ equal to $k$ . *

Proof 2.4.

From Lemma 2.1, it holds that

[TABLE]

The inner sum on the r.h.s. can be written as

[TABLE]

for any $\bm{r}\in\{0,1\}^{d}$ . Denoting $|I|$ the cardinality of a set $I$ , it follows that the terms $S_{\bm{\alpha}}$ corresponding to a non-empty subset $I$ of $\{1,\dots,d\}$ appear $2^{|I|}$ times, $2^{|I|-1}$ times positively and $2^{|I|-1}$ times negatively, therefore cancelling out. The remaining terms are the ones corresponding to $I=\emptyset$ since they only appear $2^{|I|}=1$ time. It follows that

[TABLE]

where $t_{k,n}:\{0,1\}^{d}\to\{k,n\}^{d}$ is characterised by $(t_{k,n}(\bm{r}))_{i}=k^{\bm{r}_{i}}n^{1-\bm{r}_{i}}$ for any $i\in\{1,\dots,d\}$ and any $\bm{r}\in\{0,1\}^{d}$ . The result of the lemma follows by a change of variable $\bm{r}\to\bm{\alpha}=t_{k,n}(\bm{r})$ in the sum on the r.h.s. and by verifying that

[TABLE]

holds for any $i\in\{0,\dots,d\}$ , so that $\big{|}t^{-1}_{k,n}(\bm{\alpha})\big{|}=\ell_{k}(\bm{\alpha})$ .

3 Main Theoretical Results

We now consider several theoretical results for our approach, which justify its practical implementation.

3.1 Unbiasedness

Lemma 2.3 still does not apply to sums of increments containing indices $\bm{\alpha}$ that do not verify $\bm{\alpha}\geq 1$ . Removing this last restriction leads in the following theorem.

Theorem 3.1.

*The estimator $Z$ is unbiased. *

Proof 3.2.

Let $Z_{n}$ be a partial version of the estimator $Z$ defined as

[TABLE]

Because of the independence between $\bm{N}$ and the $\{S_{\bm{\alpha}}\}_{\bm{\alpha}}$ , the estimator $Z_{n}$ satisfies

[TABLE]

which can be further expressed as

[TABLE]

where, defining $k\doteq|I|$ and denoting $I=\{i_{1},\dots,i_{k}\}$ , the operator $\Delta_{I}$ is defined as $\Delta_{I}=\Delta_{i_{1}}\dots\Delta_{i_{k}}$ . In the case $I=\emptyset$ , the inner sum is equal to $S_{\mathbf{0}}$ . Using Lemma 2.3, it follows that

[TABLE]

where, considering $\bm{\alpha}$ as a function from $\{1,\dots,d\}$ to $\{0,\dots,n\}$ , $\bm{\alpha}_{I}$ denotes the restriction of $\bm{\alpha}$ to the set $I$ . Therefore, for any $\bm{\alpha}\in\{0,1\}^{d}$ , the term $\mathbb{E}S_{\bm{\alpha}}$ appears once in the inner sum whenever the support $\operatorname{supp}(\bm{\alpha})$ of $\bm{\alpha}$ is included in $I$ . Denoting $s(\bm{\alpha})\doteq|\operatorname{supp}(\bm{\alpha})|$ , it holds that if $|I\setminus\operatorname{supp}(\bm{\alpha})|=l$ then $\mathbb{E}S_{\bm{\alpha}}$ appears $\binom{d-s(\bm{\alpha})}{l}$ times, positively if $l$ is even and negatively if $l$ is odd. It follows that

[TABLE]

*since the binomial formula in the first line differs from zero only when $\bm{\alpha}\in\{0,n\}^{d}$ verifies $s(\bm{\alpha})=d$ , that is when $\bm{\alpha}=(n,n,\dots)$ . The desired result follows by taking the limit under the condition stated in Eq. 1. *

3.2 Variance of the unbiased estimator

Being able to determine the variance of the unbiased estimator will be important when looking for an optimal distribution for the random variable $\bm{N}$ . We give expressions of the variance of $Z$ as well as for useful special cases. An additional notation is required for the statement of the following proposition: $\bm{\alpha}\lor\bm{\beta}$ denotes the component-wise maximum of any $\bm{\alpha}$ and $\bm{\beta}$ in $\mathbb{N}^{d}_{0}$ , that is

[TABLE]

Proposition 3.3.

Assuming that

[TABLE]

the second moment of $Z$ exists and is found to be

[TABLE]

*where $\nu_{\bm{\alpha},\bm{\beta}}=\mathbb{E}[\Delta S_{\bm{\alpha}}\Delta S_{\bm{\beta}}]$ . *

Remark 3.4.

*The condition stated in Eq. 6 will hold if the probability $\mathbb{P}(\bm{N}\geq\bm{\alpha}\lor\bm{\beta})$ decreases sufficiently slowly when compared with the discretization error $\Delta\|S_{\bm{\alpha}}-S\|_{2}\Delta\|S_{\bm{\beta}}-S\|_{2}$ . For instance, when solving a partial differential equation, the tail of the distribution of $\bm{N}$ should not be smaller than the decay of the error associated with the refinement of the mesh. *

Proof 3.5.

In order to study the variance of the estimator, consider

[TABLE]

where $\|\cdot\|_{2}$ is the $L^{2}$ -norm. For the same reasons as before, it can be verified that $\Delta S_{\bm{\alpha}}=\Delta(S_{\bm{\alpha}}-S)$ holds by adding and subtracting $2^{d-1}$ times the random variable $S$ . It follows that

[TABLE]

where the inequality $\mathbb{P}(\bm{N}\geq\bm{\alpha})\mathbb{P}(\bm{N}\geq\bm{\beta})\geq\mathbb{P}(\bm{N}\geq\bm{\alpha}\lor\bm{\beta})^{2}$ , which holds for any $\bm{\alpha},\bm{\beta}\in\mathbb{N}^{d}_{0}$ , has been used. Assuming that

[TABLE]

*it follows that $\|Z_{n}-Z_{k}\|^{2}_{2}$ can be made arbitrarily small by considering $k$ large enough, so that $(Z_{n})_{n}$ is a Cauchy sequence which therefore converges in the Hilbert space $L^{2}$ , so that the second moment of $Z$ is finite. This completes the proof of the proposition. *

Remark 3.6.

*By considering a total order $\mathbin{\dot{\leq}}$ on $\mathbb{N}_{0}^{d}$ such as the lexicographical order, dual sums over $\bm{\alpha}$ and $\bm{\beta}$ in some given subset of $\mathbb{N}^{d}_{0}$ could be split into diagonal and non-diagonal elements, the latter being simplified to terms verifying $\bm{\alpha}\mathbin{\dot{\leq}}\bm{\beta}$ . However, this would not allow for simplifying the indicator function $\mathbb{I}_{\{\bm{N}\geq\bm{\alpha}\lor\bm{\beta}\}}$ since $\bm{\alpha}\mathbin{\dot{\leq}}\bm{\beta}$ does not imply that $\bm{\alpha}\lor\bm{\beta}=\bm{\beta}$ in general. *

Remark 3.7.

The case $d=1$ has been studied in [13, Theorem 1] and yields an expression of the second moment $\mathbb{E}Z^{2}$ which simplifies drastically and which can be expressed as a single sum as

[TABLE]

where $\bm{\alpha}$ and $\bm{N}$ are now integers. The condition for the existence of $\mathbb{E}Z^{2}$ reduces to

[TABLE]

*for $d=1$ , which is much simpler to verify than Eq. 6. *

The random variable $\bm{N}$ can be chosen in such a way as to simplify Eq. 7: If the components of $\bm{N}$ are assumed to be independent random variables, then

[TABLE]

However, this expression of the second moment is still more complicated than in the case $d=1$ detailed in Remark 3.7 as it involves a double sum. Yet, another special case of the estimator $Z$ can be obtained by assuming that $\bm{N}$ is a constant function, i.e. that $\bm{N}_{i}=\bm{N}_{j}$ almost surely for any $i,j\in\{1,\dots,d\}$ . This estimator will be denoted $Z^{\prime}$ and is expressed as follows

[TABLE]

where $N$ is the integer-valued random variable induced by $\bm{N}=(N,N,\dots)$ and where $|\cdot|_{\infty}$ is the supremum norm. Since $Z^{\prime}$ is defined as the estimator $Z$ for a special choice for $\bm{N}$ , it is also unbiased.

Proposition 3.8.

If it exists, the second moment of the estimator $Z^{\prime}$ takes the form

[TABLE]

*with $\nu^{\prime}_{\bm{\alpha}}\doteq\mathbb{E}\big{[}\Delta S_{\bm{\alpha}}\big{(}(S-S_{|\bm{\alpha}|_{\infty}-1})+(S-S_{|\bm{\alpha}|_{\infty}})\big{)}\big{]}$ . *

Remark 3.9.

The expression of the second moment $\mathbb{E}Z^{\prime 2}$ is closer to the one obtained in Remark 3.7 for the case $d=1$ . This is natural since the simplification from $Z$ to $Z^{\prime}$ amounts to making $\bm{N}$ single-variate, so that only the terms $\{S_{\bm{\alpha}}\}_{\bm{\alpha}}$ retain their multi-index nature. The expression of $\nu^{\prime}_{\bm{\alpha}}$ for $d=1$ can be recovered easily as

[TABLE]

Proof 3.10.

As in the proof of Proposition 3.3, a partial version of $Z^{\prime}$ can be introduced as

[TABLE]

It holds that

[TABLE]

where the following relations have been used with $m=|\bm{\alpha}|_{\infty}\leq n$ :

[TABLE]

and

[TABLE]

*The desired result is obtained by rearranging the terms and taking the limit $n\to\infty$ . *

We note that if one produces independent realizations of $Z^{\prime}$ then Proposition 3.8 can be used to obtain a specific variance. That is, for specific models (see e.g. [8]) one has expressions for $\nu^{\prime}_{\bm{\alpha}}$ , appropriately centered, in terms of a function $\psi(\bm{\alpha})$ which goes to zero as $\min_{i}\alpha_{i}\rightarrow\infty$ such that $\mathbb{E}Z^{\prime 2}<+\infty$ . Then, for some $\epsilon>0$ , one can choose the number of samples to make the variance $\mathcal{O}(\epsilon^{2})$ as in the MLMC/MIMC literature [5, 8].

Following [13], a variant of the estimator $Z$ can be introduced as follows

[TABLE]

where the random variable $\tilde{\Delta}_{\bm{\alpha}}$ is defined for any $\bm{\alpha}\in\mathbb{N}_{0}^{d}$ as

[TABLE]

with the joint random variable $(\tilde{S}_{\bm{\alpha}-\bm{r}})_{\bm{r}\in\{0,1\}^{d}}$ having the same marginal distributions as the joint $(S_{\bm{\alpha}-\bm{r}})_{\bm{r}\in\{0,1\}^{d}}$ , making $\tilde{Z}$ unbiased. The estimators $Z$ and $\tilde{Z}$ can be respectively referred to as the coupled-sum estimator and the independent-sum estimator. A simpler version of the estimator $\tilde{Z}$ can be introduced as previously for $Z$ by assuming that realisations of $\bm{N}$ are constant on $\mathbb{N}_{0}^{d}$ almost surely:

[TABLE]

Proposition 3.11.

If it exists, the second-moment of the estimator $\tilde{Z}^{\prime}$ is found to be

[TABLE]

*with $\tilde{\nu}^{\prime}_{\bm{\alpha}}\doteq\operatorname{\textbf{var}}(\Delta S_{\bm{\alpha}})+\mathbb{E}\Delta S_{\bm{\alpha}}\big{(}(\mathbb{E}S-\mathbb{E}S_{|\bm{\alpha}|_{\infty}-1})+(\mathbb{E}S-\mathbb{E}S_{|\bm{\alpha}|_{\infty}})\big{)}$ . *

Proof 3.12.

The partial version $\tilde{Z}^{\prime}_{n}$ of the estimator $\tilde{Z}^{\prime}$ is introduced as before with $Z^{\prime}$ and verifies

[TABLE]

*from which the result of the proposition follows. *

3.3 Optimal distribution for $\bm{N}$

Since $\bm{N}$ is a design random variable, its distribution can be chosen in a way that maximises the performance of the corresponding MIMC method in a sense to be defined. The objective is to optimise jointly the computational effort and the accuracy of the algorithm. The former is quantified by the time necessary to compute one instance of $Z$ while the latter is represented by the variance of $Z$ . Consider an arbitrary total order $\mathbin{\dot{\leq}}$ on $\mathbb{N}_{0}^{d}$ that is compatible with $\leq$ , i.e. such that $\bm{\alpha}\leq\bm{\beta}$ implies $\bm{\alpha}\mathbin{\dot{\leq}}\bm{\beta}$ for any $\bm{\alpha},\bm{\beta}\in\mathbb{N}^{d}_{0}$ , and define $t_{\bm{\alpha}}$ as the time to compute the terms in $\Delta S_{\bm{\alpha}}$ that have not already been computed for previous $\Delta S_{\bm{\beta}}$ with $\bm{\beta}\mathbin{\dot{<}}\bm{\alpha}$ . Let $\tau$ be the time necessary to compute $Z$ , then

[TABLE]

For a given duration $c$ , let $n_{c}\doteq\max\{n\in\mathbb{N}_{0}\,:\,\sum_{i=1}^{n}\tau_{i}\leq c\}$ be the number of copies $Z_{i}$ of $Z$ that can be generated in this amount of time. The sample average $m_{n}$ defined as

[TABLE]

can then be used to formulate a CLT when $\mathbb{E}\tau$ and $\operatorname{\textbf{var}}Z$ are finite as [7]

[TABLE]

where $m\doteq\mathbb{E}S$ , where $\mathcal{N}(0,1)$ is the standard Gaussian distribution and where $\implies$ denotes the convergence in probability when $c\to\infty$ .

Remark 3.13.

For instance, the computational time $t^{\prime}_{\bm{\alpha}}$ for the term $S_{\bm{\alpha}}$ can be assumed to be equal to $2^{|\bm{\alpha}|}$ . This assumption makes sense in many cases including the ones considered here in the numerical results where partial differential equations are solved on meshes with $2^{|\bm{\alpha}|}$ elements. If we consider the independent sum-estimator then $t_{\bm{\alpha}}$ is the computational time for the whole of $\Delta S_{\bm{\alpha}}$ which verifies

[TABLE]

Then, the expected computational time $\mathbb{E}[\tau]$ will be finite if the probability $\mathbb{P}(\bm{N}\geq\bm{\alpha})$ is of order $O(2^{-r|\bm{\alpha}|})$ with $r>1$ . However, for Eq. 6 to hold, the tail of the distribution of $\bm{N}$ also needs to be sufficiently large. For instance, if $\|S_{\bm{\alpha}}-S\|=O(2^{-|\bm{\alpha}|p})$ for some $p>0$ related to the considered numerical scheme, then

[TABLE]

*and $\mathbb{P}(\bm{N}\geq\bm{\alpha}\lor\bm{\beta})=O(2^{-r|\bm{\alpha}\lor\bm{\beta}|})$ also have to verify $r<p$ since $|\bm{\alpha}\lor\bm{\beta}|\leq|\bm{\alpha}+\bm{\beta}|$ for any $\bm{\alpha},\bm{\beta}\in\mathbb{N}^{d}_{0}$ . The condition Eq. 6 can be weakened for special cases to allow for more freedom in the choice of $\bm{N}$ . *

Equation 13 indicates that the distribution of the random variable $\bm{N}$ can be chosen in a way to make the product between the expected computational time $\mathbb{E}\tau$ and the variance $\operatorname{\textbf{var}}Z$ as small as possible. The following problem is therefore considered:

[TABLE]

The solution to Eq. 14 is difficult to formulate in general, however, the special case of the estimator $Z^{\prime}$ yields the simpler problem:

[TABLE]

subject to Eqs. 14b to 14d and

[TABLE]

By a direct generalisation of [13, Proposition 1], it holds that if the net $(\mu^{\prime}_{\bm{\alpha}})_{\bm{\alpha}}$ , defined as $\mu^{\prime}_{\mathbf{0}}=\nu^{\prime}_{\mathbf{0}}-m^{2}$ and $\mu^{\prime}_{\bm{\alpha}}=\nu^{\prime}_{\bm{\alpha}}$ for any $\bm{\alpha}\neq\mathbf{0}$ , is non-negative then the following inequality holds

[TABLE]

with $F^{\dagger}$ characterised by

[TABLE]

for any $\bm{\alpha}\in\mathbb{N}^{d}_{0}$ . However, $F^{\dagger}$ might not be feasible and the solution over the feasible region is denoted $F^{*}$ ; by [13, Proposition 2], this minimum is achieved if $(\mu_{\bm{\alpha}})_{\bm{\alpha}}$ is positive and $(t_{\bm{\alpha}})_{\bm{\alpha}}$ is bounded below by a positive constant.

Due to the constraint Eq. 15, the probability mass function induced by $F^{*}$ can be characterised by its values on the diagonal of $\mathbb{N}_{0}^{d}$ and we consider the sequence of indices $J^{*}=(i^{*}_{j})_{j\geq 0}$ such that $i^{*}_{0}=0$ and

[TABLE]

where $F^{*}_{k}$ is a shorthand notation for $F^{*}_{\bm{k}}$ with $\bm{k}=(k,\dots,k)$ . Denoting, for any strictly increasing integer-valued sequence $J=(i_{j})_{j\geq 0}$ ,

[TABLE]

it follows that

[TABLE]

Extending the results of [13, Theorem 3] to the considered setting, it holds that if $(\mu^{\prime}_{\bm{\alpha}})_{\bm{\alpha}}$ is a positive net and $(t_{\bm{\alpha}})_{\bm{\alpha}}$ is non-decreasing w.r.t. $\mathbin{\dot{\leq}}$ , then there exists an optimiser $F^{*}$ inducing a sequence $J^{*}$ such that

[TABLE]

where $\gamma_{j}$ is the unique integer verifying $i^{*}_{\gamma_{j}}\leq j<i^{*}_{\gamma_{j}+1}$ . It follows that

[TABLE]

These expressions are the same for unbiased MLMC and the considered instance of unbiased MIMC so that [13, Algorithm 1] can be used to find the desired optimal sequence $J^{*}$ .

4 Numerical Results

In order to evaluate the performance of the proposed unbiased MIMC (UMIMC) method, a comparison with the MIMC algorithm of [8] is performed on two different problems. The first is covered in Section 4.2 and comprises computing a mollified version of the solution of a partial differential equation with random coefficients. The second application is an inference problem for a partially observed signal modelled by an SPDE on a 1-dimensional domain in Section 4.3. We begin by giving some implementation details for the UMIMC method.

4.1 Implementation

In this section as well as in the numerical results, we make use of the simplified version $\tilde{Z}^{\prime}$ of the independent-sum estimator $\tilde{Z}$ . Since, in practice, realisations of $S_{\bm{\alpha}}$ can only be computed up to a certain level, the partial estimator $\tilde{Z}^{\prime}_{m}$ defined as

[TABLE]

has to be considered instead, for a given $m$ . In order to accurately calculate the optimal sequence $J^{*}$ yielding the optimal distribution for $N$ as described in Section 3.3, a high number of realisations of $\tilde{\Delta}_{\bm{\alpha}}$ has to be computed for any $\bm{\alpha}\in\mathcal{I}^{m}_{0}$ . This fact implies that the computational effort required to start estimating from $\tilde{Z}^{\prime}_{m}$ is high. To bypass this limitation and reduce the number of realisations of $\tilde{\Delta}_{\bm{\alpha}}$ computed before calculating $J^{*}$ , the latter is updated frequently. One of the consequences of this solution is that the probabilities $\mathbb{P}(N\geq|\cdot|_{\infty})$ will vary through the iterations. This drawback can be compensated for by dividing the increment $\tilde{\Delta}_{\bm{\alpha}}$ by the number of times it has been sampled, instead of the probability of it being sampled. In spite of their difference, these two normalisations are equivalent asymptotically, by an easy application of the strong law of large numbers. Note that with the adaptations, one must do some more work to verify the consistency of the estimator, but this is left for future work. The estimation thus takes the following form in practice:

[TABLE]

where $M$ is the total number of iterations, where $\tilde{\Delta}_{\bm{\alpha},i}$ is the sampled increment and where $n_{i}$ is a sample from $N$ at the $i$ th iteration.

4.2 A Partial Differential Equation with random coefficients

We consider here a partial differential equation with random coefficients of the form

[TABLE]

Similarly to [8], the diffusion coefficient $a$ is defined as

[TABLE]

and the random variable of interest is

[TABLE]

with $\sigma=0.16$ and $\bm{x}_{0}=[0.5;0.2]$ . The objective is to compute $\mathbb{E}[X]$ using the proposed method with $d=2$ . For a each realisation of $Y_{1}$ and $Y_{2}$ , the partial differential equation Eq. 16 is solved by finite element method on a linear and uniform meshing defined on $D$ with $4\times 2^{\bm{\alpha}_{i}}$ elements in the $i$ th dimension for a multi-index $\bm{\alpha}$ . The terms corresponding to any index $\bm{\alpha}$ such that $|\bm{\alpha}_{2}-\bm{\alpha}_{1}|>2$ are not computed in order to avoid numerical issues with degenerated elements.

To better understand the accuracy associated with each index, some of the produced meshes at different levels are shown in Fig. 1. Figure 2 shows the RMSE as a function of the computational effort for the unbiased MIMC when the greatest multi-index available $\bm{\alpha}_{\max}=\alpha_{\max}(1,1)$ is either $(3,3)$ , $(4,4)$ or $(5,5)$ , compared with the MIMC algorithm described in [8, Sec. 3.2.2]. Three scalar parameters have to be set in the MIMC algorithm, the accuracy $\mathrm{TOL}$ , a splitting parameter $\theta$ and a confidence level $0<\epsilon\ll 1$ defined such that

[TABLE]

where $Z_{\text{\sc mimc}}$ is the MIMC estimator. The values $\mathrm{TOL}=5\times 10^{-3}$ , $\theta=0.5$ and $\epsilon=0.25$ are considered here. The maximum computational effort considered is increased with the value of $\alpha_{\max}$ to take into account the corresponding computational overhead.

Comparing Figs. 2(a) to 2(c), it appears that the proposed implementation of the UMIMC and the considered version of MIMC behave very differently in time: the UMIMC has a higher error at the start since it relies on all levels at all times and requires more computational effort to compensate for the randomness in the coefficient of the considered PDE. This effect is more pronounced when the value of $\alpha_{\max}$ increases. However, the error in the MIMC increases at the times when it starts to perform computations for higher indices, since the effect of the random coefficients has to be averaged again, whereas the error of the UMIMC decreases monotonically. These remarks about each method do not allow to conclude that one is better than the other, but they highlights the differences in terms of implementation: the considered version of the MIMC attempts to reach a given level of precision determined by some parameters while the UMIMC requires the setting of fewer parameters but offers less control on its behaviour.

4.3 Inference of partially observed solutions of SPDE

Following [14], consider an unknown signal modelled by the solution of the SPDE

[TABLE]

where the eigenfunction $e_{n}(x)=\sqrt{2}\sin(n\pi x)$ has corresponding eigenvalue $n^{2}\pi^{2}$ , and with a cylindrical Brownian motion

[TABLE]

where the terms $\beta^{n}_{t}$ , $n\geq 1$ , are independent Brownian motions. The final time $T$ is set to $T=0.1$ and the variance of the Brownian motion $q_{n}$ is set to $q_{n}=0.01$ . The SPDE is observed at times $t_{k}=Tk/K$ for $k\in\{1,\dots,K\}$ and at the locations $o_{l}=l/(K^{\prime}+1)$ for $l\in\{1,\dots,K^{\prime}\}$ under an additive Gaussian noise with standard deviation $\sigma=0.025$ , for some integers $K$ and $K^{\prime}$ . The observation vector at time $t_{k}$ is denoted $y_{k}$ and is made of the scalar observations made at the locations $o_{l}$ , $l\in\{1,\dots,K^{\prime}\}$ . The corresponding likelihood function is

[TABLE]

where $y_{k,l}$ is the $l$ th component of the vector $y_{k}$ and where $x^{l}_{k}$ is the value of the SPDE at time $k$ and location $o_{l}$ , for any $l\in\{1,\dots,K^{\prime}\}$ . An example of the set of observations obtained on one solution of Eq. 18 is given in Fig. 3. To ease the estimation procedure, the standard deviation of the observation noise is taken $4$ times bigger in the UMIMC and MIMC recursions.

At index $\bm{\alpha}\in\mathbb{N}^{2}_{0}$ , this SPDE is solved using the exponential Euler scheme of [14] with the first $2\times 2^{\bm{\alpha}_{1}}$ eigenfunctions and $2^{\bm{\alpha}_{2}}$ time steps. The MIMC is used in the same way as in the previous section, but with a tolerance $\mathrm{TOL}=5\times 10^{-3}$ . The quantity to estimate is the integral of the true path at the last time step, so that

[TABLE]

where the expectation is w.r.t. to the path $X^{\prime}$ , where $\varphi(X^{\prime})$ is the integral of the path $X^{\prime}$ and where $X^{\prime}_{k}$ is the vector containing the values of the path $X^{\prime}$ at the locations $\{o_{l}\}_{l=1}^{K^{\prime}}$ and at time $t_{k}$ for some $k\in\{1,\dots,K\}$ . The experiments are run with $K=3$ and $K^{\prime}=4$ . The results displayed in Fig. 4 show the same type of behaviour as in the simulation study of Section 4.2 in spite of the differences between the two considered problems, e.g. in the nature of the approximation represented by the indices (spatial discretization in 2 dimensions vs. time discretization plus number of basis functions).

5 Summary

In this article, we have considered exact approximation of expectations associated to probability laws with discretizations in multiple dimensions. We have developed several optimality results and implemented the methodology to a couple of numerical examples.

Future work associated to this methodology, includes combining our method in scenarios for which independent sampling from the (discretized) multi-index target is not possible. For instance, where one has to use Markov chain or sequential Monte Carlo methods (e.g. [2] in the case of a single index). The analysis in such a scenario is of interest as is its application, to enhance the range of examples where our approach can be implemented. This is being conducted in [10].

Acknowledgements

JH & AJ were supported by an AcRF tier 2 grant: R-155-000-143-112. AJ is affiliated with the CQF, RMI and OR cluster, NUS. DC was partially supported by the EPSRC grant: EP/N023781/1.

Bibliography14

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Beskos , A., Roberts , G. O., Stuart , A. M., & Voss , J. (2008). MCMC for diffusion bridges. Stoch. Dynam. , 8 , 319–350.
2[2] Beskos , A., Jasra, A., Law , K. J. H, Tempone , R. & Zhou , Y. (2017). Multilevel sequential Monte Carlo samplers. Stoch. Proc. Appl. , 127 , 1417-1440.
3[3] Beskos , A., Crisan , D., Jasra , A., Kamatani , K., & Zhou , Y. (2017). A stable particle filter in high-dimensions. Adv. Appl. Probab. , 49 , 24-48.
4[4] Crisan , D., Rozovskii , B. (2011) The Oxford handbook of nonlinear filtering, Oxford Univ. Press . Oxford,
5[5] Giles , M. B. (2008). Multilevel Monte Carlo path simulation. Op. Res. , 56 , 607-617.
6[6] Giles , M. B (2015). Multilevel Monte Carlo methods. Acta Numerica , 24 , 259-328.
7[7] Glynn , P. W. & Whitt , W. (1992). The asymptotic efficiency of simulation estimators. Op. Res. , 40 , 505–520.
8[8] Haji-Ali, A. L., Nobile , F. & Tempone , R. (2016). Multi-Index Monte Carlo: When sparsity meets sampling. Numerische Mathematik , 132 , 767–806.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Unbiased Multi-index Monte Carlo

Abstract

keywords:

1 Introduction

1.1 Contribution and Structure

2 Notation and Preliminary Results

Lemma 2.1**.**

Proof 2.2**.**

Lemma 2.3**.**

Proof 2.4**.**

3 Main Theoretical Results

3.1 Unbiasedness

Theorem 3.1**.**

Proof 3.2**.**

3.2 Variance of the unbiased estimator

Proposition 3.3**.**

Remark 3.4**.**

Proof 3.5**.**

Remark 3.6**.**

Remark 3.7**.**

Proposition 3.8**.**

Remark 3.9**.**

Proof 3.10**.**

Proposition 3.11**.**

Proof 3.12**.**

3.3 Optimal distribution for N\bm{N}N

Remark 3.13**.**

4 Numerical Results

4.1 Implementation

4.2 A Partial Differential Equation with random coefficients

4.3 Inference of partially observed solutions of SPDE

5 Summary

Acknowledgements

Lemma 2.1.

Proof 2.2.

Lemma 2.3.

Proof 2.4.

Theorem 3.1.

Proof 3.2.

Proposition 3.3.

Remark 3.4.

Proof 3.5.

Remark 3.6.

Remark 3.7.

Proposition 3.8.

Remark 3.9.

Proof 3.10.

Proposition 3.11.

Proof 3.12.

3.3 Optimal distribution for $\bm{N}$

Remark 3.13.