Adaptive Quasi-Monte Carlo Methods for Cubature

Fred J. Hickernell; Llu\'is Antoni Jim\'enez Rugama; Da Li

arXiv:1702.01491·math.NA·June 27, 2017

Adaptive Quasi-Monte Carlo Methods for Cubature

Fred J. Hickernell, Llu\'is Antoni Jim\'enez Rugama, Da Li

PDF

Open Access

TL;DR

This paper introduces adaptive quasi-Monte Carlo methods for high-dimensional integrals, providing error bounds, generalizations for error tolerances, and techniques for using control variates to improve accuracy.

Contribution

It develops theoretically justified adaptive cubature algorithms based on digital and lattice sequences, extending error criteria and incorporating control variates.

Findings

01

Error bounds depend on Fourier transforms of integrands.

02

Methods accommodate both absolute and relative error tolerances.

03

Effective in estimating multiple integrals and Sobol' indices.

Abstract

High dimensional integrals can be approximated well by quasi-Monte Carlo methods. However, determining the number of function values needed to obtain the desired accuracy is difficult without some upper bound on an appropriate semi-norm of the integrand. This challenge has motivated our recent development of theoretically justified, adaptive cubatures based on digital sequences and lattice nodeset sequences. Our adaptive cubatures are based on error bounds that depend on the discrete Fourier transforms of the integrands. These cubatures are guaranteed for integrands belonging to cones of functions whose true Fourier coefficients decay steadily, a notion that is made mathematically precise. Here we describe these new cubature rules and extend them in two directions. First, we generalize the error criterion to allow both absolute and relative error tolerances. We also demonstrate how to…

Tables1

Table 1. Table 1: Examples of the tolerance function in ( 6p ) and the optimal approximation to the integral when p = 1 𝑝 1 p=1 and v ( μ ) = μ 𝑣 𝜇 𝜇 v(\mu)=\mu .

\begin{matrix} Kind & tol ​ (μ, \hat{v}, ε_{a}, ε_{r}) & Optimal ​ \hat{v} & Optimal tol ​ (μ, \hat{v}, ε_{a}, ε_{r}) \\ [-1ex] Absolute ε_{r} = 0 & \frac{{(μ - \hat{v})}^{2}}{ε_{a}^{2}} & {\hat{μ}}_{n} & \frac{{err}_{n}^{2}}{ε_{a}^{2}} \\ [2ex] Relative ε_{a} = 0 & \frac{{(μ - \hat{v})}^{2}}{ε_{r}^{2} ​ μ^{2}} & \frac{\max ({\hat{μ}}_{n}^{2} - {err}_{n}^{2}, 0)}{{\hat{μ}}_{n}} & \frac{{err}_{n}^{2}}{ε_{r}^{2} ​ \max ({\hat{μ}}_{n}^{2}, {err}_{n}^{2})} \\ [2ex] Hybrid & \frac{{(μ - \hat{v})}^{2}}{\max (ε_{a}^{2}, ϵ_{r}^{2} ​ μ^{2})} & see (6z) & see (6ac) \end{matrix}

Equations122

μ = \int_{[0, 1)^{d}} f (x) d x,

μ = \int_{[0, 1)^{d}} f (x) d x,

∣ μ - μ_{n} ∣ \leq ε, where μ_{n} = \frac{1}{n} i = 0 \sum n - 1 f (x_{i}), f \in C .

∣ μ - μ_{n} ∣ \leq ε, where μ_{n} = \frac{1}{n} i = 0 \sum n - 1 f (x_{i}), f \in C .

∣ μ - μ_{n} ∣ \leq D ({x_{i}}_{i = 0}^{n - 1}) ∥ f ∥,

∣ μ - μ_{n} ∣ \leq D ({x_{i}}_{i = 0}^{n - 1}) ∥ f ∥,

μ_{n, R} = \frac{1}{R} r = 1 \sum R μ_{n}^{(r)}, μ_{n}^{(r)} = \frac{1}{n} i = 0 \sum n - 1 f (x_{i}^{(r)}), r = 1, \dots, R,

μ_{n, R} = \frac{1}{R} r = 1 \sum R μ_{n}^{(r)}, μ_{n}^{(r)} = \frac{1}{n} i = 0 \sum n - 1 f (x_{i}^{(r)}), r = 1, \dots, R,

μ_{n R} = \frac{1}{R} r = 1 \sum R μ_{n}^{(r)} = \frac{1}{n R} r = 1 \sum n R f (x_{i}), μ_{n}^{(r)} = \frac{1}{n} i = (r - 1) n \sum r n - 1 f (x_{i}), r = 1, \dots, R .

μ_{n R} = \frac{1}{R} r = 1 \sum R μ_{n}^{(r)} = \frac{1}{n R} r = 1 \sum n R f (x_{i}), μ_{n}^{(r)} = \frac{1}{n} i = (r - 1) n \sum r n - 1 f (x_{i}), r = 1, \dots, R .

μ_{n, R} = \frac{1}{R} r = 1 \sum R μ_{n}^{(r)}, μ_{n}^{(r)} = \frac{1}{n} i = 0 \sum n - 1 f (x_{i, (r - 1) d + 1 : r d}), r = 1, \dots, R,

μ_{n, R} = \frac{1}{R} r = 1 \sum R μ_{n}^{(r)}, μ_{n}^{(r)} = \frac{1}{n} i = 0 \sum n - 1 f (x_{i, (r - 1) d + 1 : r d}), r = 1, \dots, R,

f (x) = k \in K \sum \hat{f} (k) e^{2 π - 1 ⟨ k, x ⟩} \forall x \in [0, 1)^{d}, f \in L^{2} [0, 1)^{d}, where \hat{f} (k) := \int_{[0, 1)^{d}} f (x) e^{- 2 π - 1 ⟨ k, x ⟩} d x .

f (x) = k \in K \sum \hat{f} (k) e^{2 π - 1 ⟨ k, x ⟩} \forall x \in [0, 1)^{d}, f \in L^{2} [0, 1)^{d}, where \hat{f} (k) := \int_{[0, 1)^{d}} f (x) e^{- 2 π - 1 ⟨ k, x ⟩} d x .

K_{0} := K, K_{m} := {k \in K : ⟨ k, z_{2^{ℓ}} ⟩ = 0 for all ℓ = 0, \dots, m - 1}, m \in N .

K_{0} := K, K_{m} := {k \in K : ⟨ k, z_{2^{ℓ}} ⟩ = 0 for all ℓ = 0, \dots, m - 1}, m \in N .

\frac{1}{2 ^{m}} i = 0 \sum 2^{m} - 1 e^{2 π - 1 ⟨ k, z_{i} ⟩} = {1, 0, k \in K_{m}, otherwise .

\frac{1}{2 ^{m}} i = 0 \sum 2^{m} - 1 e^{2 π - 1 ⟨ k, z_{i} ⟩} = {1, 0, k \in K_{m}, otherwise .

\tilde{f}_{m} (k)

\tilde{f}_{m} (k)

= \hat{f} (k) + l \in K_{m} ∖ {0} \sum \hat{f} (k \oplus l) e^{2 π - 1 ⟨ l, Δ ⟩},

μ_{n} = \frac{1}{2 ^{m}} i = 0 \sum 2^{m} - 1 f (x_{i}) = \tilde{f}_{m} (0) = l \in K_{m} \sum \hat{f} (l) e^{2 π - 1 ⟨ l, Δ ⟩} .

μ_{n} = \frac{1}{2 ^{m}} i = 0 \sum 2^{m} - 1 f (x_{i}) = \tilde{f}_{m} (0) = l \in K_{m} \sum \hat{f} (l) e^{2 π - 1 ⟨ l, Δ ⟩} .

∣ μ - μ_{n} ∣ = \hat{f} (0) - \tilde{f}_{m} (0) = l \in K_{m} ∖ {0} \sum \hat{f} (l) e^{2 π - 1 ⟨ l, Δ ⟩} \leq l \in K_{m} ∖ {0} \sum \hat{f} (l) .

∣ μ - μ_{n} ∣ = \hat{f} (0) - \tilde{f}_{m} (0) = l \in K_{m} ∖ {0} \sum \hat{f} (l) e^{2 π - 1 ⟨ l, Δ ⟩} \leq l \in K_{m} ∖ {0} \sum \hat{f} (l) .

∣ μ - μ_{n} ∣ \leq λ = 1 \sum \infty \hat{f}_{λ 2^{m}} .

∣ μ - μ_{n} ∣ \leq λ = 1 \sum \infty \hat{f}_{λ 2^{m}} .

C = {f \in A C ([0, 1)^{d}) : S_{ℓ, m} (f) \leq ω (m - ℓ) \savestack \tmpbox \stretchto \scaleto \scalerel * [\widthof S] ⋀ 0.5 e x \stackon [1 pt] S \scalebox - 1.0 \tmpbox_{m} (f), ℓ \leq m, \savestack \tmpbox \stretchto \scaleto \scalerel * [\widthof S] ⋀ 0.5 e x \stackon [1 pt] S \scalebox - 1.0 \tmpbox_{m} (f) \leq \overset{ω}{˚} (m - ℓ) S_{ℓ} (f), ℓ_{*} \leq ℓ \leq m},

C = {f \in A C ([0, 1)^{d}) : S_{ℓ, m} (f) \leq ω (m - ℓ) \savestack \tmpbox \stretchto \scaleto \scalerel * [\widthof S] ⋀ 0.5 e x \stackon [1 pt] S \scalebox - 1.0 \tmpbox_{m} (f), ℓ \leq m, \savestack \tmpbox \stretchto \scaleto \scalerel * [\widthof S] ⋀ 0.5 e x \stackon [1 pt] S \scalebox - 1.0 \tmpbox_{m} (f) \leq \overset{ω}{˚} (m - ℓ) S_{ℓ} (f), ℓ_{*} \leq ℓ \leq m},

S_{m} (f) := κ = ⌊ 2^{m - 1} ⌋ \sum 2^{m} - 1 \hat{f}_{κ}, S_{ℓ, m} (f) := κ = ⌊ 2^{ℓ - 1} ⌋ \sum 2^{ℓ} - 1 λ = 1 \sum \infty \hat{f}_{κ + λ 2^{m}},

\savestack \tmpbox \stretchto \scaleto \scalerel * [\widthof S] ⋀ 0.5 e x \stackon [1 pt] S \scalebox - 1.0 \tmpbox_{m} (f) := S_{0, m} (f) + \dots + S_{m, m} (f) = κ = 2^{m} \sum \infty \hat{f}_{κ}

λ = 1 \sum \infty \hat{f}_{λ 2^{m}} = S_{0, m} (f) \leq ω (m) \savestack \tmpbox \stretchto \scaleto \scalerel * [\widthof S] ⋀ 0.5 e x \stackon [1 pt] S \scalebox - 1.0 \tmpbox_{m} (f) \leq ω (m) \overset{ω}{˚} (r) S_{m - r} (f), m \geq r + ℓ_{*} \geq ℓ_{*} .

λ = 1 \sum \infty \hat{f}_{λ 2^{m}} = S_{0, m} (f) \leq ω (m) \savestack \tmpbox \stretchto \scaleto \scalerel * [\widthof S] ⋀ 0.5 e x \stackon [1 pt] S \scalebox - 1.0 \tmpbox_{m} (f) \leq ω (m) \overset{ω}{˚} (r) S_{m - r} (f), m \geq r + ℓ_{*} \geq ℓ_{*} .

S_{ℓ, m} (f) := κ = ⌊ 2^{ℓ - 1} ⌋ \sum 2^{ℓ} - 1 \tilde{f}_{m, κ} .

S_{ℓ, m} (f) := κ = ⌊ 2^{ℓ - 1} ⌋ \sum 2^{ℓ} - 1 \tilde{f}_{m, κ} .

S_{m - r, m} (f)

S_{m - r, m} (f)

\displaystyle\geq\sum_{\kappa=\left\lfloor 2^{m-r-1}\right\rfloor}^{2^{m-r}-1}\biggl{[}\left\lvert\hat{f}_{\kappa}\right\rvert-\sum_{\lambda=1}^{\infty}\left\lvert\hat{f}_{\kappa+\lambda 2^{m}}\right\rvert\biggr{]}

= S_{m - r} (f) - S_{m - r, m} (f)

\geq S_{m - r} (f) [1 - ω (r) \overset{ω}{˚} (r)] .

∣ μ - μ_{n} ∣ \leq err_{n} := C (m) S_{m - r, m} (f), C (m) := \frac{ω ( m ) ω ˚ ( r )}{1 - ω ( r ) ω ˚ ( r )}, m \geq ℓ_{*} + r,

∣ μ - μ_{n} ∣ \leq err_{n} := C (m) S_{m - r, m} (f), C (m) := \frac{ω ( m ) ω ˚ ( r )}{1 - ω ( r ) ω ˚ ( r )}, m \geq ℓ_{*} + r,

S_{ℓ} (f)

S_{ℓ} (f)

\displaystyle\geq\sum_{\kappa=\left\lfloor 2^{\ell-1}\right\rfloor}^{2^{\ell}-1}\biggl{[}\left\lvert\tilde{f}_{m,\kappa}\right\rvert-\sum_{\lambda=1}^{\infty}\left\lvert\hat{f}_{\kappa+\lambda 2^{m}}\right\rvert\biggr{]}

= S_{ℓ, m} (f) - S_{ℓ, m} (f)

\geq S_{ℓ, m} (f) / [1 + ω (m - ℓ) \overset{ω}{˚} (m - ℓ)] .

\frac{S _{ℓ, m} ( f )}{1 + ω ( m - ℓ ) ω ˚ ( m - ℓ )} \leq S_{ℓ} (f) \leq \frac{S _{ℓ, m^{'}} ( f )}{1 - ω ( m ^{'} - ℓ ) ω ˚ ( m ^{'} - ℓ )} .

\frac{S _{ℓ, m} ( f )}{1 + ω ( m - ℓ ) ω ˚ ( m - ℓ )} \leq S_{ℓ} (f) \leq \frac{S _{ℓ, m^{'}} ( f )}{1 - ω ( m ^{'} - ℓ ) ω ˚ ( m ^{'} - ℓ )} .

err_{n} \leq \frac{ω ( m ) ω ˚ ( r )}{1 - ω ( r ) ω ˚ ( r )} [1 + ω (r) \overset{ω}{˚} (r)] S_{m - r} (f) .

err_{n} \leq \frac{ω ( m ) ω ˚ ( r )}{1 - ω ( r ) ω ˚ ( r )} [1 + ω (r) \overset{ω}{˚} (r)] S_{m - r} (f) .

m^{*} := min {m \geq ℓ_{*} + r : \frac{ω ( m ) ω ˚ ( r )}{1 - ω ( r ) ω ˚ ( r )} [1 + ω (r) \overset{ω}{˚} (r)] S_{m - r} (f) \leq ε},

m^{*} := min {m \geq ℓ_{*} + r : \frac{ω ( m ) ω ˚ ( r )}{1 - ω ( r ) ω ˚ ( r )} [1 + ω (r) \overset{ω}{˚} (r)] S_{m - r} (f) \leq ε},

Φ (m^{*} + 1) - Φ (m^{*}) = $ (f) 2^{- m^{*} - 1} + 2^{- m^{*}} + \dots + 2^{0} \leq $ (f) 2^{- m^{*} - 1} + 2,

Φ (m^{*} + 1) - Φ (m^{*}) = $ (f) 2^{- m^{*} - 1} + 2^{- m^{*}} + \dots + 2^{0} \leq $ (f) 2^{- m^{*} - 1} + 2,

Φ (m^{*})

Φ (m^{*})

\leq [$ (f) 2^{- m^{*}} + 2] + \dots + [$ (f) 2^{- 1} + 2]

\leq 2 [$ (f) + m^{*}]

μ \in Ω \cap [μ_{n} - err_{n}, μ_{n} + err_{n}] sup tol (v (μ), \overset{v}{^}, ε_{a}, ε_{r}) \leq 1,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematical Approximation and Integration · Nuclear Physics and Applications

Full text

\stackMath

11institutetext: Fred J. Hickernell (✉), Lluís Antoni Jiménez Rugama 22institutetext: Da Li 33institutetext: Illinois Institute of Technology, RE 208, 10 W. 32 ${}^{\text{nd}}$ St., Chicago, USA

33email: [email protected]; [email protected]; [email protected]

Adaptive Quasi-Monte Carlo Methods for Cubature

Fred J. Hickernell

Lluís Antoni Jiménez Rugama

Da Li

Abstract

High dimensional integrals can be approximated well by quasi-Monte Carlo methods. However, determining the number of function values needed to obtain the desired accuracy is difficult without some upper bound on an appropriate semi-norm of the integrand. This challenge has motivated our recent development of theoretically justified, adaptive cubatures based on digital sequences and lattice nodeset sequences. Our adaptive cubatures are based on error bounds that depend on the discrete Fourier transforms of the integrands. These cubatures are guaranteed for integrands belonging to cones of functions whose true Fourier coefficients decay steadily, a notion that is made mathematically precise. Here we describe these new cubature rules and extend them in two directions. First, we generalize the error criterion to allow both absolute and relative error tolerances. We also demonstrate how to estimate a function of several integrals to a given tolerance. This situation arises in the computation of Sobol’ indices. Second, we describe how to use control variates in adaptive quasi-Monte cubature while appropriately estimating the control variate coefficient.

1 Introduction

An important problem studied by Ian Sloan is evaluating multivariate integrals by quasi-Monte Carlo methods. After perhaps a change of variable, one may pose the problem as constructing an accurate approximation to

[TABLE]

given a black-box function $f$ that provides $f({\bm{x}})$ for any ${\bm{x}}\in[0,1)^{d}$ . Multivariate integrals arise in applications such as evaluating financial risk, computing multivariate probabilities, statistical physics, and uncertainty quantification.

We have developed and implemented quasi-Monte Carlo (qMC) cubature algorithms that adaptively determine the sample size needed to guarantee that an error tolerance is met provided that the integrand belongs to a cone ${\mathcal{C}}$ of well-behaved functions ChoEtal15a ; HicJim16a ; Jim16a ; JimHic16a . That is, given a low discrepancy sequence ${\bm{x}}_{0},{\bm{x}}_{1},\ldots$ and function data $f({\bm{x}}_{0}),f({\bm{x}}_{1}),\ldots$ , we have a stopping rule based on the function data obtained so far that chooses $n$ for which

[TABLE]

Here, $\widehat{\mu}_{n}$ is the sample average of function values taken at well-chosen points whose empirical distribution mimics the uniform distribution. The cone ${\mathcal{C}}$ contains integrands whose Fourier coefficients decay in a reasonable manner, thus allowing the stopping rule to succeed. Specifically, the size of the high wavenumber components of an integrand in ${\mathcal{C}}$ cannot be large in comparison to the size of the low wavenumber components. Rather than choosing the ${\bm{x}}_{i}$ to be independent and identically distributed (IID) $\mathcal{U}[0,1)^{d}$ points, we use shifted digital sequences DicPil10a ; Nie92 and sequences of nodesets of shifted rank- $1$ lattices HicEtal00 ; Mai81a ; MaiSepSpa10a ; SloJoe94 . Sequences that are more evenly distributed than IID points are the hallmark of qMC algorithms.

Traditional qMC error analysis leads to error bounds of the form DicEtal14a ; Hic97a

[TABLE]

where the integrand, $f$ , is assumed to lie in some Banach space with (semi-)norm $\lVert\cdot\rVert$ , and $\lVert f\rVert$ is often called the variation of $f$ . Moreover, the discrepancy $D(\cdot)$ is a measure of quality of the sample, $\{{\bm{x}}_{i}\}_{i=0}^{n-1}$ . For integrands lying in the ball ${\mathcal{B}}:=\{f:\lVert f\rVert\leq\sigma\}$ one may construct a non-adaptive algorithm guaranteeing $\left\lvert\mu-\widehat{\mu}_{n}\right\rvert\leq\varepsilon$ by choosing $n=\min\bigl{\{}n^{\prime}\in{\mathbb{N}}:D(\{{\bm{x}}_{i}\}_{i=0}^{n^{\prime}-1})\leq\varepsilon/\sigma\bigr{\}}$ .

Our interest is in adaptive qMC algorithms, where $n$ depends on the the function data observed. Several heuristics have been proposed for choosing $n$ :

Independent and identically distributed (IID) replications. Owe99a

Compute

[TABLE]

where $\bigl{\{}{\bm{x}}_{i}^{(1)}\bigr{\}}_{i=0}^{\infty},\ldots,\bigl{\{}{\bm{x}}_{i}^{(R)}\bigr{\}}_{i=0}^{\infty}$ are IID randomizations of a low discrepancy sequence, and ${\mathbb{E}}\bigl{(}\widehat{\mu}^{(r)}_{n}\bigr{)}=\mu$ . The standard deviation of these $\widehat{\mu}^{(r)}_{n}$ , perhaps multiplied by an inflation vector is proposed as an upper bound for $\lvert\mu-\widehat{\mu}_{n,R}\rvert$ .

Internal replications. Owe99a

Compute

[TABLE]

The standard deviation of these $\widehat{\mu}^{(r)}_{n}$ , perhaps multiplied by an inflation vector is proposed as an upper bound for $\lvert\mu-\widehat{\mu}_{nR}\rvert$ .

Quasi-standard error. Hal05a

Compute

[TABLE]

where $\{{\bm{x}}_{i}\}_{i=0}^{\infty}$ is now an $Rd$ dimensional sequence, and ${\bm{x}}_{i,(r-1)d+1:rd}$ denotes the $(r-1)d+1^{\text{st}}$ through $rd^{\text{th}}$ components of the $i^{\text{th}}$ point in the sequence. The standard deviation of these $\widehat{\mu}^{(r)}_{n}$ , perhaps multiplied by an inflation vector is proposed as an upper bound for $\lvert\mu-\widehat{\mu}_{n,R}\rvert$ . However, see Owe06a for cautions regarding this method.

None of the above methods have theoretical justification. Since the proposed error bounds are homogeneous, it is clear that the sets of integrands for which these error bounds are correct are cones. That is, if one of the above error bounds above is correct for integrand $f$ , it is also correct for integrand $cf$ , where $c$ is an arbitrary constant. Unfortunately, there is no theorem defining a cone ${\mathcal{C}}$ for which any of the above error bounds must succeed.

In this article we review our recent work developing adaptive qMC algorithms satisfying (1). We describe the cones ${\mathcal{C}}$ for which our algorithms succeed. We also extend our earlier algorithms in two directions:

•

Meeting more general error criteria than simply absolute error, and

•

Using control variates to improve efficiency.

Our data-based cubature error bounds are described in Sec. 2. This section also emphasizes the similar algebraic structures of our two families of qMC sequences. In Sec. 3, we describe how our error bounds can be used to satisfy error criteria that are more general than that in (1). Sec. 4 describes the implementation of our new adaptive qMC algorithms and provides numerical examples. Control variates with adaptive qMC cubature is described in Sec. 5. We conclude with a discussion that identifies problems for further research.

2 Error Estimation for Digital Net and Lattice Cubature

Here we summarize some of the key properties of cubature based on digital sequences and rank- $1$ lattice node sequences. We use a common notation for both cases to highlight the similarities in analysis. We focus on the base $2$ setting for simplicity and because it is most common in practice. Moreover, $n=2^{m}$ for non-negative integer $m$ . See HicJim16a and JimHic16a for more details.

Let $\{{\bm{0}}={\bm{z}}_{0},{\bm{z}}_{1},\ldots\}$ be a sequence of distinct points that is either a digital sequence or a rank- $1$ lattice node sequence. Let $\oplus:[0,1)^{d}\times[0,1)^{d}\to[0,1)^{d}$ denote an addition operator under which the sequence is a group and the first $2^{m}$ points form a subgroup. For some shift, ${\bm{\Delta}}\in[0,1)^{d}$ , the data sites used for cubature in (1) are given by ${\bm{x}}_{i}={\bm{z}}_{i}\oplus{\bm{\Delta}}$ for all $i\in{\mathbb{N}}_{0}$ . Typical examples of a digital sequence and a rank- $1$ lattice node sequence are given in Fig. 1.

There is a set of integer vector wavenumbers, ${\mathbb{K}}$ , which is a group under its own addition operator, also denoted $\oplus$ . There is also a a bilinear functional, $\langle\cdot,\cdot\rangle:{\mathbb{K}}\times[0,1)^{d}\to{\mathbb{R}}$ , which is used to to define a Fourier basis for $L^{2}[0,1)^{d}$ , given by $\bigl{\{}\mathrm{e}^{2\pi\sqrt{-1}\langle{\bm{k}},\cdot\rangle}\bigr{\}}_{{\bm{k}}\in{\mathbb{K}}}$ . The integrand is expressed as a Fourier series,

[TABLE]

Since we require function values for cubature, we assume throughout that this Fourier series is absolutely convergent, i.e., $\sum_{{\bm{k}}\in{\mathbb{K}}}\lvert\hat{f}({\bm{k}})\rvert<\infty$ .

In the case of digital sequences, $\oplus$ denotes digit-wise addition modulo $2$ for points in $[0,1)^{d}$ and wavenumbers in ${\mathbb{K}}={\mathbb{N}}_{0}^{d}$ . The digits of ${\bm{z}}_{1},{\bm{z}}_{2},{\bm{z}}_{4},{\bm{z}}_{8},\ldots$ correspond to elements in the generator matrices for the usual method for constructing digital sequences (DicPil10a, , Sec. 4.4). Also, $\langle{\bm{k}},{\bm{x}}\rangle$ is one half of an $\ell^{2}$ inner product of the digits of ${\bm{k}}$ and ${\bm{x}}$ modulo $2$ . The $\mathrm{e}^{2\pi\sqrt{-1}\langle{\bm{k}},\cdot\rangle}$ are multivariate Walsh functions (see Fig. 2).

In the case of rank-1 lattice node sequences, $\oplus$ denotes addition modulo ${\bm{1}}$ for points in $[0,1)^{d}$ and ordinary addition for wavenumbers in ${\mathbb{K}}={\mathbb{Z}}^{d}$ . Moreover, $\langle{\bm{k}},{\bm{x}}\rangle={\bm{k}}^{T}{\bm{x}}\bmod 1$ . The $\mathrm{e}^{2\pi\sqrt{-1}\langle{\bm{k}},\cdot\rangle}$ are multivariate complex exponential functions.

The dual set corresponding to the first $n=2^{m}$ unshifted points, $\{{\bm{z}}_{0},\ldots,{\bm{z}}_{2^{m}-1}\}$ , is denoted ${\mathbb{K}}_{m}$ and defined as

[TABLE]

The dual set satisfies

[TABLE]

The discrete Fourier transform of a function $f$ using $n=2^{m}$ data is denoted $\tilde{f}_{m}$ and defined as

[TABLE]

after applying some of the properties alluded to above. This last expression illustrates how the discrete Fourier coefficient $\tilde{f}_{m}({\bm{k}})$ differs from its true counterpart, $\hat{f}({\bm{k}})$ , by the aliasing terms, which involve the other wavenumbers in the coset ${\bm{k}}\oplus{\mathbb{K}}_{m}$ . As $m$ increases, wavenumbers leave ${\mathbb{K}}_{m}$ , and so the aliasing decreases.

The sample mean of the function data is the ${\bm{k}}={\bm{0}}$ discrete Fourier coefficient:

[TABLE]

Hence, an error bound for the sample mean may be expressed in terms of those Fourier coefficients corresponding to wavenumbers in the dual set:

[TABLE]

Our aim is to bound the right hand side of this cubature error bound in terms of function data or more specifically, in terms of the discrete Fourier transform. However, this requires that the true Fourier coefficients of the integrand do not decay too erratically. This motivates our definition of ${\mathcal{C}}$ , the cone of integrands for which our adaptive algorithms succeed.

To facilitate the definition of ${\mathcal{C}}$ we construct an ordering of the wavenumbers, $\tilde{{\bm{k}}}:{\mathbb{N}}_{0}\to\mathbb{K}$ satisfying $\tilde{{\bm{k}}}(0)={\bm{0}}$ and $\bigl{\{}\tilde{{\bm{k}}}(\kappa+\lambda 2^{m})\bigr{\}}_{\lambda=0}^{\infty}=\tilde{{\bm{k}}}(\kappa)\oplus{\mathbb{K}}_{m}$ for $\kappa=0,\ldots,2^{m}-1$ and $m\in{\mathbb{N}}_{0}$ , as described in HicJim16a ; JimHic16a . This condition implies the crucial fact that $\left\lvert\tilde{f}_{m}(\tilde{{\bm{k}}}(\kappa+\lambda 2^{m}))\right\rvert$ is the same for all $\lambda\in{\mathbb{N}}_{0}$ . Although there is some arbitrariness in this ordering, it is understood that $\tilde{{\bm{k}}}(\kappa)$ generally increases in magnitude as $\kappa$ tends to infinity. We adopt the shorthand notation $\hat{f}_{\kappa}:=\hat{f}(\tilde{{\bm{k}}}(\kappa))$ and $\tilde{f}_{m,\kappa}:=\tilde{f}_{m}(\tilde{{\bm{k}}}(\kappa))$ . Then, the error bound in (4) may be written as

[TABLE]

The cone of functions whose Fourier series are absolutely convergent and whose true Fourier coefficients, $\hat{f}_{\kappa}$ , decay steadily as $\kappa$ tends to infinity is

[TABLE]

and where $\ell,m\in{\mathbb{N}}_{0}$ and $\ell\leq m$ . The positive integer $\ell_{*}$ and the bounded functions $\widehat{\omega},\mathring{\omega}:{\mathbb{N}}_{0}\to[0,\infty)$ are parameters that determine how inclusive ${\mathcal{C}}$ is and how robust our algorithm is. Moreover, $\mathring{\omega}(m)\to 0$ as $m\to\infty$ . The default values are provided in Sec. 4.

We now explain the definition of the cone ${\mathcal{C}}$ and the data driven cubature error bound that we are able to derive. For illustration we use the functions depicted in Fig. 3. The one on the left lies inside ${\mathcal{C}}$ because its Fourier coefficients decay steadily (but not necessarily monotonically), while the one on the right lies outside ${\mathcal{C}}$ because its Fourier coefficients decay erratically. The function lying outside ${\mathcal{C}}$ resembles the one lying inside ${\mathcal{C}}$ but with high wavenumber noise.

The sum of the absolute value of the Fourier coefficients appearing on the right side of error bound (5) is $\widehat{S}_{0,m}(f)$ according to the definition in (6b). In Fig. 3, $m=11$ , and $\widehat{S}_{0,11}(f)$ corresponds to the sum of $\lvert\hat{f}_{\kappa}\rvert$ for $\kappa=2~{}048,4~{}096,6~{}144,\ldots$ . Since only $n=2^{m}$ function values are available, it is impossible to estimate the Fourier coefficients appearing in $\widehat{S}_{0,m}(f)$ directly by discrete Fourier coefficients.

By the definition in (6), it follows that $\widehat{S}_{0,m}(f)\leq\savestack{\tmpbox}{\stretchto{\scaleto{\scalerel*[\widthof{S}]{\kern-0.6pt\bigwedge\kern-0.6pt}{\rule[-505.89pt]{4.30554pt}{505.89pt}}}{}}{0.5ex}}\stackon[1pt]{S}{\scalebox{-1.0}{\tmpbox}}_{m}(f)$ . In Fig. 3, $\savestack{\tmpbox}{\stretchto{\scaleto{\scalerel*[\widthof{S}]{\kern-0.6pt\bigwedge\kern-0.6pt}{\rule[-505.89pt]{4.30554pt}{505.89pt}}}{}}{0.5ex}}\stackon[1pt]{S}{\scalebox{-1.0}{\tmpbox}}_{11}(f)$ corresponds to the sum of all $\lvert\hat{f}_{\kappa}\rvert$ with $\kappa\geq 2048$ . The definition of ${\mathcal{C}}$ assumes that $\widehat{S}_{0,m}(f)\leq\widehat{\omega}(m)\savestack{\tmpbox}{\stretchto{\scaleto{\scalerel*[\widthof{S}]{\kern-0.6pt\bigwedge\kern-0.6pt}{\rule[-505.89pt]{4.30554pt}{505.89pt}}}{}}{0.5ex}}\stackon[1pt]{S}{\scalebox{-1.0}{\tmpbox}}_{m}(f)$ , where $\widehat{\omega}(m)$ could be chosen as $1$ or could decay with $m$ . This is up to the user.

Still, $\savestack{\tmpbox}{\stretchto{\scaleto{\scalerel*[\widthof{S}]{\kern-0.6pt\bigwedge\kern-0.6pt}{\rule[-505.89pt]{4.30554pt}{505.89pt}}}{}}{0.5ex}}\stackon[1pt]{S}{\scalebox{-1.0}{\tmpbox}}_{m}(f)$ involves Fourier coefficients that are of too high a wavenumber to be approximated by discrete Fourier coefficients. The definition of ${\mathcal{C}}$ also assumes that $\savestack{\tmpbox}{\stretchto{\scaleto{\scalerel*[\widthof{S}]{\kern-0.6pt\bigwedge\kern-0.6pt}{\rule[-505.89pt]{4.30554pt}{505.89pt}}}{}}{0.5ex}}\stackon[1pt]{S}{\scalebox{-1.0}{\tmpbox}}_{m}(f)\leq\mathring{\omega}(r)S_{m-r}(f)$ for any non-negative $r\leq m-\ell_{*}$ . This means that the infinite sum of the high wavenumber coefficients, $\savestack{\tmpbox}{\stretchto{\scaleto{\scalerel*[\widthof{S}]{\kern-0.6pt\bigwedge\kern-0.6pt}{\rule[-505.89pt]{4.30554pt}{505.89pt}}}{}}{0.5ex}}\stackon[1pt]{S}{\scalebox{-1.0}{\tmpbox}}_{m}(f)$ cannot exceed some factor, $\mathring{\omega}(r)$ , of the finite sum of modest wavenumber coefficients $S_{m-r}(f)$ . In Fig. 3, $r=4$ , and the graph on the left shows $\savestack{\tmpbox}{\stretchto{\scaleto{\scalerel*[\widthof{S}]{\kern-0.6pt\bigwedge\kern-0.6pt}{\rule[-505.89pt]{4.30554pt}{505.89pt}}}{}}{0.5ex}}\stackon[1pt]{S}{\scalebox{-1.0}{\tmpbox}}_{11}(f)$ to be bounded above by $\mathring{\omega}(4)S_{7}(f)$ for a modest value of $\mathring{\omega}(4)$ . Recall from the definition in (6b) that $S_{7}(f)$ is the sum of the absolute value of the Fourier coefficients corresponding to $64,\ldots,127$ . However, the function depicted in the right of Fig. 3 violates the assumption that $\savestack{\tmpbox}{\stretchto{\scaleto{\scalerel*[\widthof{S}]{\kern-0.6pt\bigwedge\kern-0.6pt}{\rule[-505.89pt]{4.30554pt}{505.89pt}}}{}}{0.5ex}}\stackon[1pt]{S}{\scalebox{-1.0}{\tmpbox}}_{11}(f)\leq\mathring{\omega}(4)S_{7}(f)$ because $S_{7}(f)$ in that case is very small. Thus, the function on the right in Fig. 3 lies outside ${\mathcal{C}}$ .

Based on the above argument, it follows in general that for $f\in{\mathcal{C}}$ ,

[TABLE]

This implies an error bound in terms of the true Fourier coefficients with modest wavenumber. In particular (6j) holds for the function depicted on the left side of Fig. 3, but not the one on the right side.

Before going on, we note that we have not specified the parameters $\ell_{*},r,\widehat{\omega}$ , and $\mathring{\omega}$ for the sake of simplicity. Their choices reflect the robustness desired by the user, but are meant to be kept constant rather than changed for every problem. The parameter $\ell_{*}$ is the minimum wavenumber for which we expect steady decay to set in. The parameter $r$ controls how small the values of the wavenumber that are used to bound the cubature error should be. The functions $\widehat{\omega}$ and $\mathring{\omega}$ are the inflation factors for bounding one sum of Fourier coefficients in terms of another. See Sec. 4 for the default choices in our algorithm implementations.

While (6j) is a step forward, it involves the unknown true Fourier coefficients and not the known discrete Fourier coefficients. We next bound $S_{m-r}(f)$ in terms of a sum of discrete Fourier coefficients:

[TABLE]

By (3) and the triangle inequality it follows that

[TABLE]

This provides an upper bound on $S_{m-r}(f)$ in terms of the data-based $\widetilde{S}_{m-r,m}(f)$ , provided that $r$ is large enough to satisfy $\widehat{\omega}(r)\mathring{\omega}(r)<1$ . Such a choice of $r$ ensures that the aliasing errors are modest.

Combining (6j) and (6k) with (5), it is shown in HicJim16a ; JimHic16a that for any $f\in\mathcal{C}$ ,

[TABLE]

provided that $\widehat{\omega}(r)\mathring{\omega}(r)<1$ . Since $\widetilde{S}_{m-r,m}(f)$ depends only on the discrete Fourier coefficients, (6l) is a data-based cubature error bound. One may now increment $m$ (keeping $r$ fixed) until $\textup{err}_{n}$ is small enough, where again $n=2^{m}$ .

If $\$ (f) $denotes the cost of one function value, then evaluating$ f({\bm{x}}{0}),\ldots,f({\bm{x}}{2^{m}-1}) $requires$ $(f)n $operations. A fast transform then computes$ \tilde{f}{m,0},\ldots,\tilde{f}{m,2^{m}-1} $in an additional$ {\mathcal{O}}(n\log(n))={\mathcal{O}}(m2^{m}) $operations. So computing$ \textup{err}_{2^{m}} $for each$ m $costs$ {\mathcal{O}}\bigl{(}[$(f)+m]2^{m}\bigr{)} $operations. For integrands that are cheap to evaluate the$ $(f) $term is negligible, but for integrands that are expensive to integrate$ $(f) $may be comparable to$ m $given that$ m$ might be ten to twenty.

Using an analogous reasoning as in (6k),

[TABLE]

Therefore, from (6k) and (6m), for any $\ell,m,m^{\prime}\in{\mathbb{N}}$ such that $\ell_{*}\leq\ell\leq\min(m,m^{\prime})$ , it must be the case that

[TABLE]

Equation (6n) is a data-based necessary condition for an integrand, $f$ , to lie in $\mathcal{C}$ . If it is found that the right hand side of (6n) is smaller than the left hand side of (6n), then $f$ must lie outside $\mathcal{C}$ . In this case the parameters defining the cone should be adjusted to expand the cone appropriately, e.g., by increasing $\widehat{\omega}$ or $\mathring{\omega}$ by a constant.

By substituting inequality (6m) in the error bound (6l), we get

[TABLE]

We define $m^{*}$ ,

[TABLE]

Here $m^{*}$ depends on the fixed parameters of the algorithm, $\ell_{*},r,\widehat{\omega},$ and $\mathring{\omega}$ . Note that $\textup{err}_{2^{m^{*}}}\leq\varepsilon$ .

Recall from above that at each step $m$ in our algorithm the computational cost is ${\mathcal{O}}\bigl{(}[\$ (f)+m]2^{m}\bigr{)} $. Thus, the computational cost for our adaptive algorithm to satisfy the absolute error tolerance, as given in ([1](#Ch0.E1)), is$ {\mathcal{O}}(\Phi(m^{})2^{m^{}}) $, where$ \Phi(m^{})=[$(f)+0]2^{-m^{}}+\cdots+[$(f)+m^{*}]2^{0}$. Since

[TABLE]

it follows that

[TABLE]

Thus, the cost of making our data based error bound no greater than $\varepsilon$ is bounded above by ${\mathcal{O}}\bigl{(}[\$ (f)+m^{}]2^{m^{}}\bigr{)}$.

The algorithm does not assume a rate of decay of the Fourier coefficients but automatically senses the rate of decay via the discrete Fourier coefficients. From (6o) it is evident that the dependence of the computational cost with $\varepsilon$ depends primarily on the unknown rate of decay of $S_{m-r}(f)$ with $m$ , and secondarily on the specified rate of decay of $\widehat{\omega}(m)$ , since all other parameters are fixed. For example, assuming $\widehat{\omega}(m)={\mathcal{O}}(1)$ , if $\hat{f}_{\kappa}={\mathcal{O}}(\kappa^{-p})$ , then $S_{m-r}(f)={\mathcal{O}}(2^{-(p-1)m})$ , and the total computational cost is ${\mathcal{O}}(\varepsilon^{-1/(p-1)-\delta})$ for all $\delta>0$ . If $\widehat{\omega}(m)$ decays with $m$ , then the computational cost is less.

3 General Error Criterion

The algorithms summarized above are described in HicJim16a ; JimHic16a and implemented in the Guaranteed Automatic Integration Library (GAIL) ChoEtal15a as cubSobol_g and cubLattice_g, respectively. They satisfy the absolute error criterion (1) by increasing $n$ until $\textup{err}_{n}$ defined in (6l) is no greater than the absolute error tolerance, $\varepsilon$ .

There are situations requiring a more general error criterion than (1). In this section we generalize the cubature problem to involve a $p$ -vector of integrals, ${\bm{\mu}}$ , which are approximated by a $p$ -vector of sample means, $\widehat{{\bm{\mu}}}_{n}$ , using $n$ samples, and for which we have a $p$ -vector of error bounds, $\textbf{err}_{n}$ , given by (6l). This means that ${\bm{\mu}}\in[\widehat{{\bm{\mu}}}_{n}-\textbf{{err}}_{n},\widehat{{\bm{\mu}}}_{n}+\textbf{{err}}_{n}]$ for integrands in ${\mathcal{C}}$ . Given some

•

function $v:\Omega\subseteq{\mathbb{R}}^{p}\to{\mathbb{R}}$ ,

•

positive absolute error tolerance $\varepsilon_{\textrm{a}}$ , and

•

relative error tolerance $\varepsilon_{\textrm{r}}<1$ ,

the goal is to construct an optimal approximation to $v(\mu)$ , denoted $\hat{v}$ , which depends on $\widehat{{\bm{\mu}}}_{n}$ and $\textbf{err}_{n}$ and satisfies the error criterion

[TABLE]

Our hybrid error criterion is satisfied if the actual error is no greater than either the absolute error tolerance or the relative error tolerance times the absolute value of the true answer. If we want to satisfy both an absolute error criterion and a relative error criterion, then “ $\max$ ” in the definition of $\textup{tol}(\cdot)$ should be replaced by “ $\min$ ”. This would require a somewhat different development than what is presented here. By optimal we mean that the choice of $\hat{v}$ we prescribe yields the smallest possible left hand side of (6pa). This gives the greatest chance of satisfying the error criterion. The dependence of $\hat{\nu}$ on $n$ is suppressed in the notation for simplicity.

The common case of estimating the integral itself, $p=1$ and $v(\mu)=\mu$ , is illustrated in Table 1. This includes i) an absolute error criterion (see (1)), ii) a relative error criterion, and iii) a hybrid error criterion that is satisfied when either the absolute or relative error tolerances are satisfied. Note that $\hat{v}$ is not necessarily equal to $\hat{\mu}_{n}$ . For a pure relative error criterion, $\hat{v}$ represents a shrinkage of the sample mean towards zero. Fig. 4 illustrates how the optimal choice of $\hat{v}$ may satisfy (6p), when $\hat{v}=\hat{\mu}$ does not.

Define $v_{\pm}$ as the extreme values of $v(\mu)$ for $\widehat{\bm{\mu}}$ satisfying the given error bound:

[TABLE]

Then the following criterion is equivalent to (6p):

[TABLE]

We claim that the optimal value of the estimated integral, i.e., the value of $\hat{v}$ satisfying (6w), is

[TABLE]

From (6xa) it follows that $\hat{v}\in[v_{-},v_{+}]$ . Moreover, by (6xb) $\hat{v}$ is a shrinkage estimator: it is either zero or has the same sign as $(v_{-}+v_{+})/2$ , and its magnitude is no greater than $\left\lvert(v_{-}+v_{+})/2\right\rvert$ . Our improved GAIL algorithms cubSobol_g and cubLattice_g, which are under development, are summarized in the following theorem.

Theorem 3.1

Let our goal be the computation of $v({\bm{\mu}})$ , as described at the beginning of this section. Let the tolerance function be defined as in (6pb), let the extreme possible values of $v({\bm{\mu}})$ be defined as in (6v), and let the approximation to $v({\bm{\mu}})$ be defined in terms of $\widehat{{\bm{\mu}}}_{n}$ and $\textbf{{err}}_{n}$ as in (6x). Then, $\hat{v}$ is the optimal approximation to $v({\bm{\mu}})$ , and the tolerance function for this optimal choice is given as follows:

[TABLE]

By optimal, we mean that the infimum in (6ya) is satisfied by $\hat{v}$ as claimed in (6yb). Moreover, it is shown that the supremum in (6yb) is obtained simultaneously at $v_{+}$ and $v_{-}$ .

Our new adaptive quasi-Monte Carlo cubature algorithms increase $n=2^{m}$ by incrementing $m$ by one until the right side of (6yd) is no larger than one. The resulting $\hat{v}$ then satisfies the error criterion $\textup{tol}(v({\bm{\mu}}),\hat{v},\varepsilon_{\textrm{a}},\varepsilon_{\textrm{r}})\leq 1$ .

Proof

The gist of the proof is to establish the equalities in (6y). Equality (6yd) follows from the definition of $\hat{v}$ and $v_{\pm}$ . Equality (6yc) is proven next, and (6yb) is proven after that. Equality (6ya) follows from definition (6v).

The derivative of $\textup{tol}(\cdot,\hat{v},\varepsilon_{\textrm{a}},\varepsilon_{\textrm{r}})$ is

[TABLE]

The sign of this derivative is shown in Fig. 5. For either $\varepsilon_{\textrm{r}}\left\lvert v_{\pm}\right\rvert\leq\varepsilon_{\textrm{a}}$ or $\varepsilon_{\textrm{a}}\leq\varepsilon_{\textrm{r}}\left\lvert v_{\pm}\right\rvert$ , the only critical point in $[v_{-},v_{+}]$ is $v^{\prime}=\hat{v}$ , where the tolerance function vanishes. Thus, the maximum value of the tolerance function always occurs at the boundaries of the interval. For $\varepsilon_{\textrm{r}}\left\lvert v_{-s}\right\rvert\leq\varepsilon_{\textrm{a}}<\varepsilon_{\textrm{r}}\left\lvert v_{s}\right\rvert$ , $s\in\{+,-\}$ , there is also a critical point at $v^{\prime}=\textup{sign}(v_{s})\varepsilon_{\textrm{a}}/\varepsilon_{\textrm{r}}$ . However, since $v_{s}$ and $\hat{v}$ have the same sign (see (6xb)), the partial derivative of the tolerance function with respect to $v^{\prime}$ does not change sign at this critical point. Hence, the maximum value of the tolerance function still occurs at the boundaries of the interval, and (6yc) is established.

To prove assertion (6yb), consider $\hat{v}^{\prime}$ , some alternative to $\hat{v}$ . Then

[TABLE]

This difference is positive for the $+$ sign if $\hat{v}^{\prime}\in(-\infty,\hat{v})$ and positive for the $-$ sign if $\hat{v}^{\prime}\in(\hat{v},\infty)$ . Thus, the proof of Theorem 3.1 is complete.

We return to the special case of $v(\mu)=\mu$ . The following corollary interprets Theorem 3.1 for this case, and the theorem that follows extends the computational cost upper bound in (6o) for these new quasi-Monte Carlo cubature algorithms.

Corollary 1

For $p=1$ and $v(\mu)=\mu$ , it follows that $v_{\pm}=\mu_{n}\pm\textup{err}_{n}$ ,

[TABLE]

Theorem 3.2

For the special case described in Corollary 1, the computational cost of obtaining an approximation to the integral $\mu$ satisfying the generalized error criterion $\textup{tol}(\mu,\hat{v},\varepsilon_{\textrm{a}},\varepsilon_{\textrm{r}})\leq 1$ according to the adaptive quasi-Monte Carlo cubature algorithm described in Theorem 3.1 is ${\mathcal{O}}\bigl{(}[\$ (f)+m^{}]2^{m^{}}\bigr{)}$, where

[TABLE]

Proof

For each $n=2^{m}$ , we know that our algorithm produces $\widehat{\mu}_{n}$ and $\textup{err}_{n}$ satisfying $\widehat{\mu}_{n}-\textup{err}_{n}\leq\mu\leq\widehat{\mu}_{n}+\textup{err}_{n}$ . This implies that

[TABLE]

Thus, the right hand side of (6ac) must be no greater than one if

[TABLE]

Applying the logic that leads to (6o) completes the proof.

The cost upper bound depends on various parameters as one would expect. The computational cost may increase if

•

$\varepsilon_{\textrm{a}}$ decreases,

•

$\varepsilon_{\textrm{r}}$ decreases,

•

$\left\lvert\mu\right\rvert$ decreases,

•

the Fourier coefficients of the integrand increase, or

•

the cone $\mathcal{C}$ expands because $\ell_{*}$ , $\widehat{\omega}$ , or $\mathring{\omega}$ increase.

4 Numerical Implementation

The algorithm described here is intended to be released in the next release of GAIL ChoEtal15a as cubSobol_g and cubLattice_g, coded in MATLAB. These two functions use the Sobol’ sequences provided by MATLAB 2017a MAT9.2 and the lattice generator exod2_base2_m20.txt from Dirk Nuyens’ website Nuy17a , respectively. Our algorithm sets its default parameters as follows:

[TABLE]

These choices are based on experience and are used in the examples below. A larger $\ell_{*}$ allows the Fourier coefficients of the integrand to behave erratically over a larger initial segment of wavenumbers. A larger $r$ decreases the impact of aliasing in estimating the true Fourier coefficients by their discrete analogues. Increasing $\ell_{*}$ or $r$ increases $2^{\ell_{*}+r}$ , the minimum number of sample points used by the algorithms. The inputs to the algorithms are

•

a black-box $p$ -vector function ${\bm{f}}$ , such that ${\bm{\mu}}=\mathbb{E}[{\bm{f}}({\bm{X}})]$ for ${\bm{X}}\sim{\mathcal{U}}[0,1]^{d}$ ,

•

a solution function $v:{\mathbb{R}}^{p}\to{\mathbb{R}}$ ,

•

functions for computing $v_{\pm}$ as described in (6v),

•

an absolute error tolerance, $\varepsilon_{\textrm{a}}$ , and

•

a relative error tolerance $\varepsilon_{\textrm{r}}$ .

The algorithm increases $m$ incrementally until the right side of (6yd) does not exceed one. At this point the algorithm returns $\hat{v}$ as given by (6x).

Example 1

We illustrate the hybrid error criterion by estimating multivariate normal probabilities for a distribution with mean ${\bm{0}}$ and covariance matrix $\mathsf{\Sigma}$ :

[TABLE]

The transformation proposed by Genz Gen93 is used write this as an integral over the $d-1$ dimensional unit cube. As discussed in Gen93 ; HicHon97a , when ${\bm{a}}=-\bm{\infty}$ , $\mathsf{\Sigma}_{ij}=\sigma$ if $i\neq j$ , and $\mathsf{\Sigma}_{ii}=1$ , the exact value of (6ag) reduces to a 1-dimensional integral that can be accurately estimated by a standard quadrature rule. This value is taken to be the true $\mu$ .

We perform $1000$ adaptive integrations: $500$ using our cubature rule based on randomly scrambled and digitally shifted Sobol’ sequences (cubSobol_g) and $500$ using our cubature rule based on randomly shifted rank-1 lattice node sequences, (cubLattice_g). Default parameters are used. For each case we choose $\sigma\sim\mathcal{U}[0,1]$ , dimension $d=\lfloor 500^{D}\rfloor$ with $D\sim\mathcal{U}[0,1]$ , and ${\bm{b}}\sim\mathcal{U}[0,\sqrt{d}]^{d}$ . The dependence of ${\bm{b}}$ on the dimension of the problem ensures that the estimated probabilities are of the same order of magnitude for all $d$ . Otherwise, the higher the dimension, the smaller the value of the probabilities the test would be estimating. The execution time and $\textup{tol}(\mu,\hat{v},0.01,0.05)$ are shown in Fig. 6.

Satisfying the error criterion is equivalent to having $\textup{tol}(\mu,\hat{v},0.01,0.05)\leq 1$ , which happens in every case. A very small value of $\textup{tol}(\mu,\hat{v},0.01,0.05)$ means that the approximation is much more accurate than required, which may be due to coincidence or due to the minimum sample size used, $n=2^{10}$ . In Fig. 6 the error tolerances are fixed and do not affect the computation time. However, the computation time does depend on the dimension, $d$ , since higher dimensional problems tend to be harder to solve. The performances of cubSobol_g and cubLattice_g are similar.

Example 2

Sobol’ indices Sob90 ; Sob01 , which arise in uncertainty quantification, depend on more than one integral. Suppose that one is interested in how an output, $Y:=g({\bm{X}})$ depends on the input ${\bm{X}}\sim\mathcal{U}[0,1]^{d}$ , and $g$ has a complicated or unknown structure. For example, $g$ might be the output of a computer simulation. For any coordinate indexed by $j=1,\dots,d$ , the normalized closed first-order Sobol’ index for coordinate $j$ , commonly denoted as $\underline{\tau}_{j}^{2}/\sigma^{2}$ , involves three integrals:

[TABLE]

Here, $({\bm{x}}_{j}:{{\bm{x}}^{\prime}}_{-j})\in[0,1)^{d}$ denotes a point whose $r^{\text{th}}$ coordinate is $x_{r}$ if $r=j$ , and $x^{\prime}_{r}$ otherwise. By definition, the value of these normalized indices must lie between [math] and $1$ , and both the numerator and denominator in the expression for $v({\bm{\mu}})$ are non-negative. Therefore, the domain of the function $v$ is $\Omega:=\{{\bm{\mu}}\in[0,\infty)^{3}:0\leq\mu_{1}\leq\mu_{2}-\mu_{3}^{2}\}$ . Thus, given $\widehat{{\bm{\mu}}}_{n}$ and $\textbf{{err}}_{n}$ , the values of $v_{\pm}$ defined in (6v) are

[TABLE]

We estimate the first-order Sobol’ indices of the test function in Bratley et al. BraFoxNie92 using randomly scrambled and digitally shifted Sobol’ sequences and the same algorithm parameters as in Ex. 1:

[TABLE]

The value of $n$ chosen by our adaptive algorithm and the actual value of the tolerance function, $\textup{tol}(v,\hat{v},5\times 10^{-3},0)$ , are shown. Since none of those tolerance values exceed one, our algorithm correctly provides $\hat{v}$ for each coordinate $j$ . In the last row above, we replaced our optimal $\hat{v}$ by $v(\widehat{{\bm{\mu}}}_{n})$ for the same $n$ as returned by our algorithm. Interestingly, this approximation to the Sobol’ indices, while perhaps intuitive, does not satisfy the absolute error criterion because sometimes $\textup{tol}(v,v(\widehat{{\bm{\mu}}}_{n}),5\times 10^{-3},0)$ exceeds one. This reflects how $v(\widehat{{\bm{\mu}}}_{n})$ differs from $v$ much more than $\hat{v}$ does. An extensive study on how to estimate first-order and total effect Sobol’ indices using the automatic quasi-Monte Carlo cubature is provided in GilJim16b .

5 Control Variates

The results in this section mainly follow the work of Da Li Li16a . Control variates are commonly used to improve the efficiency of IID Monte Carlo integration. If one chooses a vector of functions ${\bm{g}}:[0,1)^{d}\to{\mathbb{R}}^{q}$ for which ${\bm{\mu}}_{\bm{g}}:=\int_{[0,1)^{d}}{\bm{g}}({\bm{x}})\,\mathrm{d}{\bm{x}}$ is known, then

[TABLE]

for any choice of ${\bm{\beta}}$ . The goal is to choose an optimal ${\bm{\beta}}$ to make

[TABLE]

sufficiently close to $\mu$ with the least expense, $n$ , possible.

If ${\bm{x}}_{0},{\bm{x}}_{1},\ldots$ are IID ${\mathcal{U}}[0,1)^{d}$ , then $\widehat{\mu}_{{\bm{\beta}},n}$ is an unbiased estimator for $\mu$ for any choice of ${\bm{\beta}}$ , and the variance of the control variates estimator may be expressed as

[TABLE]

where $\hat{{\bm{g}}}_{\kappa}$ are the Fourier coefficients of ${\bm{g}}$ . Since ${\bm{\beta}}^{T}{\bm{\mu}}_{\bm{g}}$ is constant, it does not enter into the calculation of the variance. The optimal choice of ${\bm{\beta}}$ , which minimizes $\textup{var}(\widehat{\mu}_{{\bm{\beta}},n})$ , is

[TABLE]

Although ${\bm{\beta}}_{\textup{MC}}$ cannot be computed exactly, it may be well approximated in terms of sample estimates of the quantities on the right hand side.

However, if ${\bm{x}}_{0},{\bm{x}}_{1},\ldots$ are the points described in Sec. 2, then the error depends on only some of the Fourier coefficients, and (5) and (6l) lead to

[TABLE]

Assuming that $f-{\bm{\beta}}^{T}{\bm{g}}\in{\mathcal{C}}$ for all ${\bm{\beta}}$ , it makes sense to choose ${\bm{\beta}}$ to minimize the rightmost term. There seems to be some advantage to choose ${\bm{\beta}}$ based on $\widetilde{S}_{m-r,m}(f-{\bm{\beta}}^{T}{\bm{g}}),\ldots,\widetilde{S}_{m,m}(f-{\bm{\beta}}^{T}{\bm{g}})$ . Our experience suggests that this strategy makes ${\bm{\beta}}$ less dependent on the fluctuations of the discrete Fourier coefficients over a small range of wave numbers. In summary,

[TABLE]

As already noted in HicEtal03 , the optimal control variate coefficients for IID and low discrepancy sampling are generally different. Whereas ${\bm{\beta}}_{\textup{MC}}$ may be strongly influenced by low wavenumber Fourier coefficients of the integrand, ${\bm{\beta}}_{\textup{qMC}}$ depends on rather high wavenumber Fourier coefficients.

Minimizing the sum of absolute values is computationally more time consuming than minimizing the sum of squares. Thus, in practice we choose ${\bm{\beta}}$ to be

[TABLE]

This choice performs well in practice. Moreover, we often find that there is little advantage to updating $\widetilde{{\bm{\beta}}}_{\textup{qMC}}$ for each $m$ .

Example 3

Control variates may be used to expedite the pricing of an exotic option when one can identify a similar option whose price is known exactly. This often happens with geometric Brownian motion asset price models. The geometric mean Asian payoff is a good control variate for estimating the price of an arithmetic mean Asian option. The two payoffs are,

[TABLE]

Here $\mathsf{C}$ is the covariance matrix of the values of a Brownian motion at the discrete times $t_{1},\ldots,t_{d}$ . We choose $\mathsf{A}$ via a principal component analysis (singular value) decomposition of $\mathsf{C}$ as this tends to provide quicker convergence to the answer than other choices of $\mathsf{A}$ .

The option parameters for this example are $S_{0}=100$ , $r=2\%$ , $\sigma=50\%$ , $K=100$ , and $T=1$ . We employ weekly monitoring, so $d=52$ , and $t_{j}=j/52$ , where the option price is about $\$ 11.97 $. Parameter$ \tilde{{\bm{\beta}}}{\textup{qMC}} $is estimated at the first iteration of the algorithm when$ m=10 $, but not updated for each$ m $. For$ \varepsilon{\textrm{a}}=0.01 $and$ \varepsilon_{\textrm{r}}=0 $, cubSobol_g without control variates requires$ 16~{}384 $points while only$ 4~{}096$ when using control variates.

Fig. 7 shows the Fourier Walsh coefficients of the original payoff, $f$ , and the function integrated using control variates, $h_{\tilde{\beta}_{\textup{qMC}}}=f+\tilde{\beta}_{\textup{qMC}}(\mu_{g}-g)$ , with given $\tilde{\beta}_{\textup{qMC}}=1.0793$ , a typical value of $\beta$ chosen by our algorithm. The squares correspond to the coefficients in the sums $\widetilde{S}_{6,10}(f)$ and $\widetilde{S}_{6,10}(h_{\tilde{\beta}_{\textup{qMC}}})$ , respectively, which are used to bound the Sobol’ cubature error. The circles are the first coefficients from the dual net that appear in error bound (5). From this Fig. we can appreciate how control variates reduces the magnitude of both the squares and the circles.

6 Discussion and Conclusion

Ian Sloan has made substantial contributions to the understanding and practical application of quasi-Monte Carlo cubature. One challenge is how to choose the parameters that define these cubatures in commonly encountered situations where not much is known about the integrand. These parameters include

a)

the generators of the sequences themselves, 2. b)

the sample size, $n$ , 3. c)

the choice of importance sampling distributions, 4. d)

the control variate coefficients HicEtal03 , 5. e)

the parameters defining multilevel (quasi-)Monte Carlo methods Gil15a , and the 6. f)

the parameters defining the multivariate decomposition method Was13b .

The rules for choosing these parameters should work well in practice, but not be simply heuristic as they are in the adaptive algorithms highlighted in the introduction. There should be a theoretical justification. Item a) has received much attention. This article has addressed items b) and d). We realize that the question of choosing $n$ is now replaced by the question of choosing the parameters defining the cone of integrands, ${\mathcal{C}}$ . However, we have made progress because when our adaptive algorithms fail, we can pinpoint the cause. We hope for further investigations into the best way to choose $n$ . We also hope that further efforts will lead to more satisfying answers for the other items on the list.

As demonstrated in Sec. 3, it is now possible to set relative error criteria or hybrid error criteria. We also know now how to accurately estimate a function of several means. In addition to the problem of Sobol’ indices, this problem may arise in Bayesian inference, where the posterior mean of a parameter is the quotient of two integrals.

As already pointed out some years ago in HicEtal03 , the choice of control variate for IID sampling is not necessarily the right choice for low discrepancy sampling. Here in Sec. 5, we have identified a natural way to determine a good control variate coefficient for digital sequence or lattice sequence sampling.

Bibliography27

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1) Bratley, P., Fox, B.L., Niederreiter, H.: Implementation and tests of low-discrepancy sequences. ACM Trans. Model. Comput. Simul. 2 , 195–213 (1992)
2(2) Choi, S.C.T., Ding, Y., Hickernell, F.J., Jiang, L., Jiménez Rugama, Ll.A., Tong, X., Zhang, Y., Zhou, X.: GAIL: Guaranteed Automatic Integration Library (versions 1.0–2.1). MATLAB software (2013–2015). URL http://gailgithub.github.io/GAIL_Dev/
3(3) Dick, J., Kuo, F., Sloan, I.H.: High dimensional integration — the Quasi-Monte Carlo way. Acta Numer. 22 , 133–288 (2013)
4(4) Dick, J., Pillichshammer, F.: Digital Nets and Sequences: Discrepancy Theory and Quasi-Monte Carlo Integration. Cambridge University Press, Cambridge (2010)
5(5) Genz, A.: Comparison of methods for the computation of multivariate normal probabilities. Computing Science and Statistics 25 , 400–405 (1993)
6(6) Giles, M.: Multilevel monte carlo methods. Acta Numer. 24 (259–328) (2015)
7(7) Halton, J.H.: Quasi-probability: Why quasi-Monte-Carlo methods are statistically valid and how their errors can be estimated statistically. Monte Carlo Methods and Appl. 11 , 203–350 (2005)
8(8) Hickernell, F.J.: A generalized discrepancy and quadrature error bound. Math. Comp. 67 , 299–322 (1998). DOI 10.1090/S 0025-5718-98-00894-1