Double-loop randomized quasi-Monte Carlo estimator for nested integration

Arved Bartuska; Andr\'e Gustavo Carlon; Luis Espath; Sebastian Krumscheid; Ra\'ul Tempone

arXiv:2302.14119·math.NA·May 19, 2026

Double-loop randomized quasi-Monte Carlo estimator for nested integration

Arved Bartuska, Andr\'e Gustavo Carlon, Luis Espath, Sebastian Krumscheid, Ra\'ul Tempone

PDF

TL;DR

This paper introduces a novel nested randomized quasi-Monte Carlo (rQMC) estimator for nested integrals, providing theoretical error bounds and demonstrating improved efficiency over traditional methods in complex applications.

Contribution

The work develops a new nested rQMC method that simultaneously approximates inner and outer integrals, with rigorous error analysis and practical truncation schemes.

Findings

01

Derived asymptotic error bounds for bias and variance.

02

Addressed integrands with infinite variation using Owen's scrambling.

03

Numerical experiments show improved efficiency over standard nested MC.

Abstract

Nested integration of the form $\int f (\int g (\bs y, \bs x) \di \bs x) \di \bs y$ , characterized by an outer integral connected to an inner integral through a nonlinear function $f$ , is a challenging problem in various fields, such as engineering and mathematical finance. The available numerical methods for nested integration based on Monte Carlo (MC) methods can be prohibitively expensive owing to the error propagating from the inner to the outer integral. Attempts to enhance the efficiency of these approximations using the quasi-MC (QMC) or randomized QMC (rQMC) method have focused on either the inner or outer integral approximation. This work introduces a novel nested rQMC method that simultaneously addresses the approximation of the inner and outer integrals. The method leverages the unique nested integral structure to offer a more efficient approximation mechanism. As…

Equations105

I = \int_{[0, 1]^{d}} g (x) d x,

I = \int_{[0, 1]^{d}} g (x) d x,

I \approx I_{MC} : = \frac{1}{N} n = 1 \sum N g (x^{(n)}) .

I \approx I_{MC} : = \frac{1}{N} n = 1 \sum N g (x^{(n)}) .

P (∣ I - I_{MC} ∣ \leq ε_{MC}) \geq 1 - α

P (∣ I - I_{MC} ∣ \leq ε_{MC}) \geq 1 - α

ε_{MC} : = \frac{C _{α} V [ g ]}{N},

ε_{MC} : = \frac{C _{α} V [ g ]}{N},

I_{Q} : = \frac{1}{N} n = 1 \sum N g (x^{(n)}),

I_{Q} : = \frac{1}{N} n = 1 \sum N g (x^{(n)}),

x^{(n)} = {ξ^{(n)}, ρ}, 1 \leq n \leq N .

x^{(n)} = {ξ^{(n)}, ρ}, 1 \leq n \leq N .

{ξ^{(n)}, ρ} = fr (ξ^{(n)} + ρ),

{ξ^{(n)}, ρ} = fr (ξ^{(n)} + ρ),

P (∣ I - I_{Q} ∣ \leq ε_{Q}) \geq 1 - α

P (∣ I - I_{Q} ∣ \leq ε_{Q}) \geq 1 - α

ε_{Q} : = \frac{C _{α} V [ I _{Q} ]}{R},

ε_{Q} : = \frac{C _{α} V [ I _{Q} ]}{R},

V [I_{Q}] \approx \frac{1}{R - 1} r = 1 \sum R (\frac{1}{N} n = 1 \sum N g ({ξ^{(n)}, ρ^{(r)}}) - \overset{ˉ}{I}_{Q})^{2},

V [I_{Q}] \approx \frac{1}{R - 1} r = 1 \sum R (\frac{1}{N} n = 1 \sum N g ({ξ^{(n)}, ρ^{(r)}}) - \overset{ˉ}{I}_{Q})^{2},

\overset{ˉ}{I}_{Q} : = \frac{1}{R} r = 1 \sum R \frac{1}{N} n = 1 \sum N g ({ξ^{(n)}, ρ^{(r)}}) .

\overset{ˉ}{I}_{Q} : = \frac{1}{R} r = 1 \sum R \frac{1}{N} n = 1 \sum N g ({ξ^{(n)}, ρ^{(r)}}) .

I = \int_{[0, 1]^{d_{1}}} f (\int_{[0, 1]^{d_{2}}} g (y, x) d x) d y,

I = \int_{[0, 1]^{d_{1}}} f (\int_{[0, 1]^{d_{2}}} g (y, x) d x) d y,

I = \int_{[0, 1]^{d_{1}}} lo g (\int_{[0, 1]^{d_{2}}} exp (y \cdot G (x)) d x) d y,

I = \int_{[0, 1]^{d_{1}}} lo g (\int_{[0, 1]^{d_{2}}} exp (y \cdot G (x)) d x) d y,

I_{DLMC} : = \frac{1}{N} n = 1 \sum N f (\frac{1}{M} m = 1 \sum M g (y^{(n)}, x^{(n, m)})),

I_{DLMC} : = \frac{1}{N} n = 1 \sum N f (\frac{1}{M} m = 1 \sum M g (y^{(n)}, x^{(n, m)})),

E [∣ g (y, x) - g_{h} (y, x) ∣] = C_{disc} h^{η} + h . o . t .,

E [∣ g (y, x) - g_{h} (y, x) ∣] = C_{disc} h^{η} + h . o . t .,

∣ E [I_{DLMC}] - I ∣ \leq C_{disc} h^{η} + \frac{C _{MC, 3}}{M} + o (h^{η}) + O (\frac{1}{M ^{2}}),

∣ E [I_{DLMC}] - I ∣ \leq C_{disc} h^{η} + \frac{C _{MC, 3}}{M} + o (h^{η}) + O (\frac{1}{M ^{2}}),

V [I_{DLMC}] \leq \frac{C _{MC, 1}}{N} + \frac{C _{MC, 2}}{N M} + O (\frac{1}{N M ^{2}}),

V [I_{DLMC}] \leq \frac{C _{MC, 1}}{N} + \frac{C _{MC, 2}}{N M} + O (\frac{1}{N M ^{2}}),

W_{D L M C}^{*} \propto T O L^{- (3 + \frac{γ}{η})} .

W_{D L M C}^{*} \propto T O L^{- (3 + \frac{γ}{η})} .

I_{DLQ} : = \frac{1}{N} n = 1 \sum N f (\frac{1}{M} m = 1 \sum M g (y^{(n)}, x^{(n, m)})),

I_{DLQ} : = \frac{1}{N} n = 1 \sum N f (\frac{1}{M} m = 1 \sum M g (y^{(n)}, x^{(n, m)})),

y^{(n)}

y^{(n)}

x^{(n, m)}

∣ I_{DLQ} - I ∣ \leq bias error ∣ E [I_{DLQ}] - I ∣ + statistical error ∣ I_{DLQ} - E [I_{DLQ}] ∣ .

∣ I_{DLQ} - I ∣ \leq bias error ∣ E [I_{DLQ}] - I ∣ + statistical error ∣ I_{DLQ} - E [I_{DLQ}] ∣ .

∣ E [I_{DLQ}] - I ∣ \leq C_{disc} h^{η} + \frac{C _{Q, 3}}{M ^{(1 + δ)}} + O (h^{η + 1}) + O (\frac{1}{M ^{2 (1 + δ)}}),

∣ E [I_{DLQ}] - I ∣ \leq C_{disc} h^{η} + \frac{C _{Q, 3}}{M ^{(1 + δ)}} + O (h^{η + 1}) + O (\frac{1}{M ^{2 (1 + δ)}}),

∣ E [I_{DLQ}] - I ∣ \leq discretization bias ∣ E [I_{DLQ} - I_{DLQ}^{ex}] ∣ + inner sampling bias ∣ E [I_{DLQ}^{ex}] - I ∣ .

∣ E [I_{DLQ}] - I ∣ \leq discretization bias ∣ E [I_{DLQ} - I_{DLQ}^{ex}] ∣ + inner sampling bias ∣ E [I_{DLQ}^{ex}] - I ∣ .

∣ E [I_{DLQ} - I_{DLQ}^{ex}] ∣ \leq C_{disc} h^{η} + O (h^{η + 1}),

∣ E [I_{DLQ} - I_{DLQ}^{ex}] ∣ \leq C_{disc} h^{η} + O (h^{η + 1}),

I_{Q} ({ξ_{d_{1}}, ρ_{d_{1}}}) : = \frac{1}{M} m = 1 \sum M g ({ξ_{d_{1}}, ρ_{d_{1}}}, {ξ_{d_{2}}^{(m)}, ρ_{d_{2}}}) .

I_{Q} ({ξ_{d_{1}}, ρ_{d_{1}}}) : = \frac{1}{M} m = 1 \sum M g ({ξ_{d_{1}}, ρ_{d_{1}}}, {ξ_{d_{2}}^{(m)}, ρ_{d_{2}}}) .

f (X) = f (E [X]) + f^{'} (E [X]) (X - E [X]) + \frac{1}{2} f^{''} (E [X]) (X - E [X])^{2} + O (∣ X - E [X] ∣^{3}),

f (X) = f (E [X]) + f^{'} (E [X]) (X - E [X]) + \frac{1}{2} f^{''} (E [X]) (X - E [X])^{2} + O (∣ X - E [X] ∣^{3}),

f (I_{Q} ({ξ_{d_{1}}, ρ_{d_{1}}}))

f (I_{Q} ({ξ_{d_{1}}, ρ_{d_{1}}}))

+ \frac{1}{2} f^{''} (g ({ξ_{d_{1}}, ρ_{d_{1}}})) (I_{Q} ({ξ_{d_{1}}, ρ_{d_{1}}}) - g ({ξ_{d_{1}}, ρ_{d_{1}}}))^{2}

+ O (∣ I_{Q} ({ξ_{d_{1}}, ρ_{d_{1}}}) - g ({ξ_{d_{1}}, ρ_{d_{1}}}) ∣^{3}) .

E [f (I_{Q} ({ξ_{d_{1}}, ρ_{d_{1}}})) ∣ ρ_{d_{1}}] = E [f (g ({ξ_{d_{1}}, ρ_{d_{1}}})) ∣ ρ_{d_{1}}] + \frac{1}{2} f^{''} (g ({ξ_{d_{1}}, ρ_{d_{1}}})) V [I_{Q} ({ξ_{d_{1}}, ρ_{d_{1}}}) ∣ ρ_{d_{1}}] + O (\frac{1}{M ^{2 (1 + δ)}}),

E [f (I_{Q} ({ξ_{d_{1}}, ρ_{d_{1}}})) ∣ ρ_{d_{1}}] = E [f (g ({ξ_{d_{1}}, ρ_{d_{1}}})) ∣ ρ_{d_{1}}] + \frac{1}{2} f^{''} (g ({ξ_{d_{1}}, ρ_{d_{1}}})) V [I_{Q} ({ξ_{d_{1}}, ρ_{d_{1}}}) ∣ ρ_{d_{1}}] + O (\frac{1}{M ^{2 (1 + δ)}}),

E [I_{DLQ}] - I

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematical Approximation and Integration · Statistical Methods and Inference · Stochastic processes and financial applications

Full text

Double-loop quasi-Monte Carlo estimator for nested integration

Arved Bartuska1, André Gustavo Carlon2, Luis Espath3, Sebastian Krumscheid5, & Raúl Tempone1,2,4

1Department of Mathematics, RWTH Aachen University, Gebäude-1953 1.OG, Pontdriesch 14-16, 161, 52062 Aachen, Germany

2King Abdullah University of Science & Technology (KAUST), Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE), Thuwal 23955-6900, Saudi Arabia

3School of Mathematical Sciences, University of Nottingham, Nottingham, NG7 2RD, United Kingdom

4Alexander von Humboldt Professor in Mathematics for Uncertainty Quantification, RWTH Aachen University, Germany

5Steinbuch Center for Computing, and Institute for Applied and Numerical Mathematics, Karlsruhe Institute of Technology, Germany

[email protected]

Abstract.

Nested integration arises when a nonlinear function is applied to an integrand, and the result is integrated again, which is common in engineering problems, such as optimal experimental design, where typically neither integral has a closed-form expression. Using the Monte Carlo method to approximate both integrals leads to a double-loop Monte Carlo estimator, which is often prohibitively expensive, as the estimation of the outer integral has bias relative to the variance of the inner integrand. For the case where the inner integrand is only approximately given, additional bias is added to the estimation of the outer integral. Variance reduction methods, such as importance sampling, have been used successfully to make computations more affordable. Furthermore, random samples can be replaced with deterministic low-discrepancy sequences, leading to quasi-Monte Carlo techniques. Randomizing the low-discrepancy sequences simplifies the error analysis of the proposed double-loop quasi-Monte Carlo estimator. To our knowledge, no comprehensive error analysis exists yet for truly nested randomized quasi-Monte Carlo estimation (i.e., for estimators with low-discrepancy sequences for both estimations). We derive asymptotic error bounds and a method to obtain the optimal number of samples for both integral approximations. Then, we demonstrate the computational savings of this approach compared to standard nested (i.e., double-loop) Monte Carlo integration when estimating the expected information gain via two examples from Bayesian optimal experimental design, the latter of which involves an experiment from solid mechanics.

AMS subject classifications: $\cdot$ 62F15 $\cdot$ 65C05 $\cdot$ 65D30 $\cdot$ 65D32 $\cdot$

1. Introduction

A nested integral is an integral of a usually nonlinear function of another parametric integral. Integrals of this type appear in many fields (e.g., geology [God18], mathematical finance [Xu20], medical decision-making [Fan22], and optimal experimental design (OED) [Rya03]). The Monte Carlo (MC) method is one of the most popular approximation techniques for integrals, especially high-dimensional ones. For nested integrals, both can be approximated using the MC method, resulting in the double-loop MC (DLMC) estimator. Using MC to approximate a single integral to a specified error tolerance $TOL>0$ requires a sample size of $\mathcal{O}(TOL^{-2})$ , whereas using DLMC results in worse complexity, with an overall number of samples of $\mathcal{O}(TOL^{-3})$ [Rya03]. The two MC estimators in the DLMC estimator are connected through a nonlinear function; thus, the statistical error of the inner MC estimator causes bias in the DLMC estimator. The sample sizes of both MC estimators must be carefully controlled to guarantee that the error in the DLMC estimator is below $TOL$ with a certain confidence, controlling both the statistical error and bias. Improving the DLMC performance is the goal of intense research, with some approaches proposing the use of Laplace approximation [Lon13], importance sampling [Bec18], and multilevel MC (MLMC) [Fan22].

The randomized quasi-MC (RQMC) method [Caf98, Hic98, Nie92, Owe03, Dic10, Lec18] is a promising technique to improve the efficiency of the basic MC method (i.e., the required number of samples to meet a certain tolerance) while maintaining nonintrusive sampling. The RQMC estimator uses deterministic points from a low-discrepancy sequence and randomizes the entire sequence while maintaining a low-discrepancy structure. Randomization allows for the use of either the central limit theorem or Chebyshev inequality to estimate and subsequently bound the error asymptotically [Tuf04, Lec10, Lec18]. Given appropriate regularity assumptions on the integrand, the RQMC method can reduce the order of convergence of the approximation error without introducing additional bias to the MC estimator. Furthermore, if the integrand is sufficiently smooth, this approach can yield an asymptotic rate of 1 as the number of low-discrepancy points increases. The number of evaluations needed by the RQMC estimator to achieve a tolerance $TOL$ is $\mathcal{O}(TOL^{-1})$ .

Recently, researchers [Gob22] have investigated the optimal error tolerance that can be achieved by RQMC, given a fixed number of samples and a certain confidence level. They demonstrated that combining RQMC with robust estimation improves error tolerances. The RQMC concepts have been applied in the context of OED in [Dro18], but only to the outer integral of a nested integration problem. In [Dro18], the inner integral is approximated using the MC method. A reduced sample standard deviation was observed for several numerical experiments for this scheme compared to using the MC method for both integrals, which was demonstrated for a fixed number of outer and inner samples.

In recent work [Fan22], an RQMC method was used to approximate the outer integral of a nested estimator in medical decision-making. The authors estimated the variance error using the sample variance and bias error by successively doubling the number of inner samples. They compared this RQMC method with an MCLC approach and standard nested (i.e., double loop) MC by specifying a target mean squared error and observing the number of samples needed until this target is reached. Both MLMC and RQMC have similar performance results in practice, depending on the number of parameters and other measures of model complexity.

In [God18], nested integrals are approximated using MLMC techniques. The number of inner samples is increased for each level to reduce the bias induced on the outer approximation by the variance of the inner approximation. The RQMC estimator approximates the inner integral and reduces the variance, requiring a smaller sample size in the inner loop to achieve the error tolerance in the MLMC setting specified in [Gil08, Theorem 3.1]. This outcome is presented both theoretically and via examples. A similar approach is followed in [Xu20]. In this study, a discontinuous inner integrand is approximated using a sigmoid (i.e., smooth function), allowing the RQMC method to be applied.

To further reduce the number of samples required to estimate nested integrals up to a specified error tolerance $TOL$ , we use RQMC for both integrals to build a double-loop quasi-MC (DLQMC) estimator. Indeed, under suitable regularity conditions, the DLQMC method can significantly reduce the required number of overall samples to $\mathcal{O}(TOL^{-1.5})$ , compared to $\mathcal{O}(TOL^{-3})$ for the DLMC method. Moreover, we demonstrate that using RQMC for the outer integral has a greater effect on the overall number of samples than using it for the inner integral, but further savings can still be achieved by applying RQMC to both integrals. We also consider the case where the inner integrand is given only approximately in terms of a computational model, resulting in additional bias for the outer approximation. We provide approximate error bounds using suitable RQMC estimators and verify them on numerical examples.

This paper provides a quick overview of the MC and quasi-MC (QMC) methods, including bounds on the absolute error in Section 2. We introduce the proposed nested RQMC estimator in Section 3. As the main contributions of this work, we derive asymptotic error bounds on the number of inner and outer samples in Propositions 1 and 2, and the optimal setting for this estimator in Proposition 3. Finally, in Section LABEL:sec:numerical.results, we present two examples from Bayesian OED, where nested integrals frequently arise. The first example is an algebraic model introduced in [Hua13], which can be evaluated at a low cost and serves as a toy problem to highlight the effectiveness of the proposed method. The second example demonstrates an application from solid mechanics involving the solution to a partial differential equation (PDE) with favorable regularity properties, demonstrating the practical applicability of the DLQMC estimator.

Greek alphabet

$\alpha$ confidence level

$\beta$ increase in convergence rate of QMC beyond MC for the outer integral

$\gamma$ work rate for the finite element method

$\delta$ increase in convergence rate of QMC beyond MC for the inner integral

$\boldsymbol{\epsilon}$ observation noise

$\varepsilon$ central limit theorem error

$\boldsymbol{\varepsilon}$ strain tensor

$\eta$ convergence rate of the discretization error

$\boldsymbol{\theta}$ parameter of interest

$\boldsymbol{\vartheta}$ dummy variable for the parameter of interest

$\vartheta$ absolute temperature

$\kappa$ splitting parameter between bias and variance error

$\lambda$ first Lamé constant

$\mu$ second Lamé constant

$\boldsymbol{\xi}$ design parameter

$\pi$ probability density function of the parameter of interest

$\boldsymbol{\rho}$ randomization

$\rho$ material density

$\sigma$ diagonal elements of the error covariance matrix

$\boldsymbol{\Sigma}$ error covariance matrix and approximate negative inverse Hessian of the log-likelihood

$\Phi$ cumulative distribution function of the standard normal distribution

$\omega$ random outcome in the finite element formulation

$\Omega$ space of outcomes in the finite element formulation

2. Brief overview of Monte Carlo and quasi-Monte Carlo integration

Before we address the case of nested integration, which is the focus of this work, we first recall the basic concepts for approximating integrals using MC and RQMC for the reader’s convenience.

2.1. Monte Carlo method

We can approximate the integral

[TABLE]

where $g:[0,1]^{d}\to\mathbb{R}$ is square-integrable, and $d$ a positive integer, using the MC estimator:

[TABLE]

The MC method uses random points $\boldsymbol{x}^{(1)},\ldots,\boldsymbol{x}^{(N)}$ that are independent and identically distributed (iid) samples from the uniform distribution $\mathcal{U}\left([0,1]^{d}\right)$ to approximate $I$ in (1). Using the central limit theorem (CLT) [Dur19] to analyze the error of the MC estimator, we find that

[TABLE]

where

[TABLE]

as $N\to\infty$ , where $C_{\alpha}=\Phi^{-1}(1-\alpha/2)$ , $\Phi^{-1}$ is the inverse cumulative distribution function (cdf) of the standard normal distribution, and $\mathbb{V}[g]$ is the variance of the integrand for $0<\alpha\ll 1$ . Alternatively, Chebyshev’s inequality could be used to obtain an error estimate similar to (3) and (4), although with a potentially larger constant $C_{\alpha}$ .

2.2. Quasi-Monte Carlo method

The MC method always converges to the true value of the integral, even under weak assumptions, but the rate of $N^{-0.5}$ can be improved for certain integrands. We can instead use the RQMC method, which achieves a better convergence rate by exploiting the regularity properties of the integrand. For a square-integrable function $g:[0,1]^{d}\to\mathbb{R}$ , the RQMC estimator to approximate the integral (1) is given by

[TABLE]

where $\boldsymbol{x}^{(1)},\ldots,\boldsymbol{x}^{(N)}$ are chosen from a sequence of points consisting of a deterministic component $\boldsymbol{\xi}\in[0,1]^{d}$ and a random component $\boldsymbol{\rho}$ ,

[TABLE]

In particular, choosing $\boldsymbol{\xi}^{(1)},\ldots,\boldsymbol{\xi}^{(N)}$ from a low-discrepancy sequence [Nie92, Hic98] results in improved convergence for smooth integrands. One example that can achieve this is lattice rules [Hic98], and $\boldsymbol{\rho}\sim\mathcal{U}[0,1]^{d}$ is a random shift. This example provides points of the shape

[TABLE]

where $\mathfrak{fr}(\cdot)$ denotes the componentwise fractional part operator. Every one-dimensional projection of this point set is injective. Another common method for selecting a suitable low-discrepancy sequence is to choose the deterministic points $\boldsymbol{\xi}^{(1)},\ldots,\boldsymbol{\xi}^{(N)}$ from a digital sequence [Nie92, Owe03] and set $\boldsymbol{\rho}$ to be random permutations of the digits of $\boldsymbol{\xi}^{(1)},\ldots,\boldsymbol{\xi}^{(N)}$ . By splitting $[0,1]^{d}$ into equally spaced subintervals in each dimension, each subinterval contains the same number of points. We use a digital sequence called Sobol sequence [Sob67] throughout this work, as it has performed best on numerical tests. A number of points $N$ , such that $\log_{2}(N)\in\mathbb{N}$ , must be used to achieve the best results for this sequence type. The difference between the estimators (2) and (5) lies in the points used to evaluate the function to be integrated.

Traditional error estimates for numerical integration based on deterministic low-discrepancy sequences (i.e., $\boldsymbol{x}^{(n)}=\{\boldsymbol{\xi}^{(n)}\}$ in (6)) use the Koksma–Hlawka inequality [Hla61, Nie92] to bound the error by a product of suitable measures for the low-discrepancy sequence and integrand, respectively. This approach can be problematic in practice because, in most instances, sharp estimates of these quantities are exceedingly difficult to obtain, and the resulting quadrature error bound is far from optimal [Lec18].

To obtain a probabilistic estimate using the CLT, we must use several iid randomizations $\boldsymbol{\rho}^{(r)}$ , $1\leq r\leq R$ in (6). Then, we find that

[TABLE]

approximately holds for

[TABLE]

for $0<\alpha\ll 1$ , where the variance of the RQMC estimator can be approximated as follows:

[TABLE]

and

[TABLE]

For a fixed $R$ , the error (9) decreases at the rate $\mathcal{O}\left(N^{-\frac{(1+\delta)}{2}}\right)$ for $N\to\infty$ [Gob22, Loh03, Dic13, Lec18], where $0\leq\delta\leq 1$ depends on the dimension $d$ and may depend on the regularity of the integrand $g$ . For $\delta=0$ , this provides the usual MC rate of 1/2. For certain functions $g$ with desirable properties such as smoothness and boundedness, more precise statements are possible [Gob22, Owe08]. The CLT-based error estimate (8) only holds asymptotically as $R\to\infty$ . It can still be used in practice to obtain a confidence interval of the error (9); however, keeping $R$ fixed and letting $N\to\infty$ is sometimes problematic, as [Tuf04, Lec10] noted. Specifically, the convergence of the distribution of the estimator (5) to a normal distribution cannot be guaranteed. Chebyshev’s inequality can also justify the convergence rate of $(1+\delta)/2$ . We employ the CLT for the error analysis and demonstrate that the derived error bounds hold at the specified confidence level for a simple example. However, we remark that it is straightforward to adapt the analysis to the Chebyshev bounds instead.

Remark 1 (Integration over general domains).

The (RQ)-MC method is commonly defined for integration over the unit cube and uniform random variables. For integrals over general domains (e.g., normal random variables), the corresponding inverse cdf can be applied to maintain the general shape of the estimators (2) and (5).

3. Nested integration

After discussing the basics of the RQMC estimator, we address the focus of this work. This section establishes the DLQMC estimator for nested integration problems, derives asymptotic error bounds in the number of samples, and analyzes the optimal work required for this estimator to meet a tolerance goal.

Definition 1 (Nested integral).

We define a nested integral as

[TABLE]

where the square-integrable function $f:\mathbb{R}\to\mathbb{R}$ is nonlinear and twice differentiable with respect to $\boldsymbol{y}$ . In addition, $g:[0,1]^{d_{1}}\times[0,1]^{d_{2}}\to\mathbb{R}$ is square-integrable and defines a nonlinear relation between $\boldsymbol{x}$ and $\boldsymbol{y}$ , where $d_{1},d_{2}$ are positive integers.

Example 1 (Nested integral).

The integral

[TABLE]

where $\boldsymbol{G}(\boldsymbol{x})$ is a nonlinear function, is of the nested type and is typically not solvable in closed form, motivating the use of numerical integration techniques to approximate $I$ .

A standard method to approximate a nested integral (12) is via the DLMC estimator [Rya03], defined as

[TABLE]

where the points $\boldsymbol{y}^{(n)}$ , $1\leq n\leq N$ , are sampled iid from $\mathcal{U}\left([0,1]^{d_{1}}\right)$ , and $\boldsymbol{x}^{(n,m)}$ , $1\leq n\leq N$ , $1\leq m\leq M$ , are sampled iid from $\mathcal{U}\left([0,1]^{d_{2}}\right)$ . The standard MC estimator (2) for a single integral is unbiased and has a variance that decreases with the number of samples, which holds for the inner MC estimator in (14), where the variance decreases with the number of inner samples $M$ . The outer MC estimator in (14) has a variance that decreases with $N$ but also has a bias relative to the size of the variance of the inner integral estimate. Thus, we typically require many inner and outer samples to keep the bias and variance of this estimator in check, significantly limiting its practical usefulness, particularly for computationally demanding problems.

Directly evaluating the function $g$ is often not possible. For example, if evaluating $g$ requires solving a PDE, we may only have access to a finite element method (FEM) approximation $g_{h}$ with discretization parameter $h$ . As $h\to 0$ asymptotically, the convergence order of $g_{h}$ is given by

[TABLE]

where $\eta>0$ is the $h$ -convergence rate and $C_{\rm{disc}}>0$ is a constant. The work of evaluating $g_{h}$ is assumed to be $\mathcal{O}(h^{-\gamma})$ , for some $\gamma>0$ .

Unless stated otherwise, we assume that a discretized $g_{h}$ is used in the numerical estimators and omit the subscript for concision. The DLMC estimator has a bias with the following upper bound:

[TABLE]

where $C_{\rm{MC},3}>0$ is a constant related to the variance of the inner MC estimation in (14), and $C_{\rm{disc}}>0$ might be different from the one introduced in (15). The DLMC estimator has a variance with the following upper bound:

[TABLE]

where $C_{\rm{MC},1},C_{\rm{MC},2}>0$ are constants [Bec18]. The optimal work of the DLMC estimator for a specified error tolerance $TOL>0$ is given by

[TABLE]

Proofs for the specific case of approximating the expected information gain (EIG) in Bayesian OED are presented in [Bec18], but adapting them to the general case is straightforward.

To obtain smaller error bounds and, subsequently, smaller optimal work, we replaced both MC approximations in (14) with RQMC approximations and arrived at the DLQMC estimator, which we define below.

Definition 2 (DLQMC estimator).

The DLQMC estimator of a nested integral (12) is defined as follows:

[TABLE]

where the square-integrable function $f:\mathbb{R}\to\mathbb{R}$ is nonlinear, and $g:[0,1]^{d_{1}}\times[0,1]^{d_{2}}\to\mathbb{R}$ is square-integrable. The sample points have the following shape (see (6)):

[TABLE]

where $\boldsymbol{\xi}_{d_{1}}\in[0,1]^{d_{1}}$ and $\boldsymbol{\xi}_{d_{2}}\in[0,1]^{d_{2}}$ .

One instance of the estimator (19) requires $N+1$ iid randomizations. The total number of function evaluations required to obtain a probabilistic error estimate using iid randomizations $\boldsymbol{\rho}^{(r)}\equiv\{\boldsymbol{\rho}_{d_{1}},\boldsymbol{\rho}_{d_{2}}^{(1)},\ldots,\boldsymbol{\rho}_{d_{2}}^{(n)}\}^{(r)}$ , $1\leq r\leq R$ , is $N\times M\times R$ . The difference between the estimators (14) and (19) lies in the points used to evaluate the function to be integrated.

Next, we analyze the error of DLQMC. First, we split the error into the bias and statistical errors, respectively, and estimate each individually. Specifically, these terms are

[TABLE]

The CLT allows us to replace the statistical error with the variance $\mathbb{V}[I_{\rm{DLQ}}]$ .

Proposition 1 (Bias of the DLQMC estimator).

The DLQMC estimator for (12) has a bias with the following upper bound:

[TABLE]

where $C_{\mathrm{Q},3},C_{\mathrm{disc}}>0$ , and $\eta$ is the weak rate defined in (16), which is induced by the approximate function $g_{h}$ . The parameter $0\leq\delta\leq 1$ depends on the dimension $d_{2}$ and may depend on the smoothness of $g$ .

Proof.

We start by introducing $I_{\rm{DLQ}}^{\rm{ex}}=\lim_{h\to 0^{+}}I_{\rm{DLQ}}$ , the DLQMC estimator for $g$ evaluated exactly, and further split the bias into

[TABLE]

For the discretization bias, we have

[TABLE]

where $\eta$ is the weak rate defined in (16).

For the bias from the inner sampling, we first define the following:

[TABLE]

Next, we use the second-order Taylor expansion of $f(X)$ for a random variable $X$ around $\mathbb{E}[X]$ ,

[TABLE]

to Taylor expand $f(I_{Q}(\{\boldsymbol{\xi}_{d_{1}},\boldsymbol{\rho}_{d_{1}}\}))$ around $\mathbb{E}[I_{Q}(\{\boldsymbol{\xi}_{d_{1}},\boldsymbol{\rho}_{d_{1}}\})|\boldsymbol{\rho}_{d_{1}}]=g(\{\boldsymbol{\xi}_{d_{1}},\boldsymbol{\rho}_{d_{1}}\})$ (with a slight abuse of notation)

[TABLE]

Taking the expectation conditioned on $\boldsymbol{\rho}_{d_{1}}$ , we obtain

[TABLE]

where the higher-order term can be derived using the Bienaymé formula, resulting in

[TABLE]

∎

The parameter $\delta$ can be estimated numerically along with $\tilde{C}_{\mathrm{Q},3}(\boldsymbol{\rho}_{d_{1}})$ for practical applications. In addition, the constant $C_{\mathrm{disc}}$ in (22) might be different from that in (16).

Proposition 2 (Variance of the DLQMC estimator).

The DLQMC estimator for (12) has a variance with the following upper bound:

[TABLE]

where $C_{\mathrm{Q},1},C_{\mathrm{Q},2}>0$ are constants and $0\leq\beta\leq 1$ depends on the dimension $d_{1}$ and may depend on the smoothness of $f$ , $0\leq\delta\leq 1$ depends on the dimension $d_{2}$ and may depend on the smoothness of $g$ , and $0\leq\tilde{\beta}\leq 1$ depends on the dimension $d_{2}$ and may depend on the smoothness of the higher-order terms.

Proof.

By the law of total variance, we have

[TABLE]

Using (28) for the first term yields

[TABLE]

Only using (3) up to the first order for the second term in (31) results in

[TABLE]

∎

We expect $\beta$ , $\delta$ , and $\tilde{\beta}$ to be different from each other in general, as the approximated integrands might have different smoothness properties and dimensions.

With the error bounds established in Propositions (1) and (2), we can analyze the work required for the DLQMC estimator in terms of the number of samples and approximated model. We assume that the work for each model evaluation is $\mathcal{O}\left(h^{-\gamma}\right)$ .

Proposition 3 (Optimal work of the DLQMC estimator).

The total work of the optimized DLQMC estimator for a specified error tolerance $TOL>0$ is given by

[TABLE]

as $TOL\to 0$ , where $0\leq\beta\leq 1$ depends on the dimension $d_{1}$ and may depend on the smoothness of $f$ , and $0\leq\delta\leq 1$ depends on the dimension $d_{2}$ and may depend on the smoothness of $g$ .

Proof.

The computational work of the DLQMC estimator is

[TABLE]

where $h^{-\gamma}$ is proportional to the work required for evaluating $g_{h}$ for discretization parameter $h$ . The CLT allows us to approximately bound the statistical error (30) above in probability. We obtain the optimal setting by solving

[TABLE]

for $C_{\alpha}=\Phi^{-1}(1-\frac{\alpha}{2})$ , the inverse cdf of the standard normal at confidence level $1-\alpha$ , and $TOL>0$ (the allotted tolerance). In addition, $\kappa\in(0,1)$ is an error-splitting parameter.

We can solve this problem using Lagrange multipliers to derive the optimal $M^{\ast}$ and $h^{\ast}$ in terms of $\kappa$ and $N$ . The equation for $\kappa^{\ast}$ is cubic and has a closed-form solution, but it is unwieldy to state explicitly here. The last remaining equation for $N^{\ast}$ unfortunately has no closed-form solution for $0<\beta<1$ , so we must solve a simplified version and demonstrate that the resulting solution converges to the true solution as $TOL\to 0$ . The optimal values are given by

[TABLE]

The optimal $\kappa^{\ast}$ is given by the real root of

[TABLE]

Finally, the optimal $N^{\ast}$ is given by the solution to

[TABLE]

It immediately follows from (36) that such an $N^{\ast}$ exists. However, it is only given in closed form for the cases $\beta\in\{0,1\}$ .

Assuming that the optimal discretization parameter $h^{\ast}$ is independent of the sampling method (MC or RQMC), we split the bias constraint (22) as follows:

[TABLE]

to derive asymptotic rates in terms of the tolerance $TOL$ . Splitting more elaborately might improve constant terms, but this is only performed for analytical purposes. Together with (38) and (37), this implies that

[TABLE]

and

[TABLE]

Next, we demonstrate that $N\propto TOL^{-\frac{2}{(1+\beta)}}$ . From the variance constraint (30), we obtain

[TABLE]

The term on the right-hand side of (45) is constant in $TOL$ . If we ignore the second term on the left-hand side, we can solve the equation (45) and obtain the approximate solution:

[TABLE]

To determine that this approximation converges to the true solution, we check that the ignored term in (45) approaches 0 as $TOL\to 0$ as we insert (46). For this term, we have

[TABLE]

where the exponent of $TOL$ is between 0 and 1 because $0<\beta<1$ . Thus, this term approaches 0 as $TOL\to 0$ . In contrast, if we ignored the first term in (45), the approximate solution would be $N\approx-(C_{\rm{Q},1}/(1-\kappa))^{1/\beta}TOL^{-1/\beta}$ . Inserting this solution, the first term in (45) has an exponent of $TOL$ between negative infinity and 0; thus, it approaches negative infinity as $TOL\to 0$ . ∎

The constants $C_{\rm{Q},1}$ , $C_{\rm{Q},2}$ , and $C_{\rm{Q},3}$ can be estimated using $R$ random shifts. Rather than using the approximate solution (46), we can also solve the equation (40) numerically.

Remark 2 (Borderline settings).

In the worst case, $\beta=\delta=0$ , we obtain the optimal MC work $W_{\mathrm{DLMC}}^{\ast}\propto TOL^{-(3+\frac{\gamma}{\eta})}$ , as presented in (18). In the best case, $\beta=\delta=1$ , we obtain the optimal DLQMC work $W_{\mathrm{DLQ}}^{\ast}\propto TOL^{-(\frac{3}{2}+\frac{\gamma}{\eta})}$ , a reduction of order 3/2 compared to the DLMC method.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Double-loop quasi-Monte Carlo estimator for nested integration

Abstract.

Contents

1. Introduction

Greek alphabet

2. Brief overview of Monte Carlo and quasi-Monte Carlo integration

2.1. Monte Carlo method

2.2. Quasi-Monte Carlo method

Remark 1** (Integration over general domains).**

3. Nested integration

Definition 1** (Nested integral).**

Example 1** (Nested integral).**

Definition 2** (DLQMC estimator).**

Proposition 1** (Bias of the DLQMC estimator).**

Proof.

Proposition 2** (Variance of the DLQMC estimator).**

Proof.

Proposition 3** (Optimal work of the DLQMC estimator).**

Proof.

Remark 2** (Borderline settings).**

Remark 1 (Integration over general domains).

Definition 1 (Nested integral).

Example 1 (Nested integral).

Definition 2 (DLQMC estimator).

Proposition 1 (Bias of the DLQMC estimator).

Proposition 2 (Variance of the DLQMC estimator).

Proposition 3 (Optimal work of the DLQMC estimator).

Remark 2 (Borderline settings).