A weighted Discrepancy Bound of quasi-Monte Carlo Importance Sampling

Josef Dick; Daniel Rudolf; Houying Zhu

arXiv:1901.08115·stat.CO·December 20, 2024

A weighted Discrepancy Bound of quasi-Monte Carlo Importance Sampling

Josef Dick, Daniel Rudolf, Houying Zhu

PDF

Open Access

TL;DR

This paper introduces a deterministic quasi-Monte Carlo importance sampling method with an explicit error bound related to star-discrepancy, enhancing the accuracy of expectation approximations under known probability measures.

Contribution

It provides the first explicit discrepancy-based error bound for a deterministic quasi-Monte Carlo importance sampling estimator.

Findings

01

Derived an explicit star-discrepancy error bound for the method

02

Demonstrated improved convergence properties over traditional stochastic methods

03

Applicable to a wide class of probability measures

Abstract

Importance sampling Monte-Carlo methods are widely used for the approximation of expectations with respect to partially known probability measures. In this paper we study a deterministic version of such an estimator based on quasi-Monte Carlo. We obtain an explicit error bound in terms of the star-discrepancy for this method.

Figures2

Click any figure to enlarge with its caption.

Equations116

E_{π} (f) = \int_{R^{d}} f (x) d π (x)

E_{π} (f) = \int_{R^{d}} f (x) d π (x)

π (A) = \frac{\int _{A} u ( x ) d x}{\int _{R^{d}} u ( y ) d y}, A \in B (R^{d}) .

π (A) = \frac{\int _{A} u ( x ) d x}{\int _{R^{d}} u ( y ) d y}, A \in B (R^{d}) .

u (x) = exp (- β H (x)), x \in R^{d},

u (x) = exp (- β H (x)), x \in R^{d},

u (x) = ℓ (y ∣ x) p (x), x \in R^{d} .

u (x) = ℓ (y ∣ x) p (x), x \in R^{d} .

S (f, u) := \frac{\int _{[0, 1]^{d}} f ( x ) u ( x ) d x}{\int _{[0, 1]^{d}} u ( y ) d y} = E_{π} (f)

S (f, u) := \frac{\int _{[0, 1]^{d}} f ( x ) u ( x ) d x}{\int _{[0, 1]^{d}} u ( y ) d y} = E_{π} (f)

M_{n} (f, u) := \frac{\sum _{j = 1}^{n} f ( x _{j} ) u ( x _{j} )}{\sum _{j = 1}^{n} u ( x _{j} )} .

M_{n} (f, u) := \frac{\sum _{j = 1}^{n} f ( x _{j} ) u ( x _{j} )}{\sum _{j = 1}^{n} u ( x _{j} )} .

D_{λ_{d}} (P_{n}) := x \in [0, 1]^{d} sup \frac{1}{n} j = 1 \sum n 1_{[0, x)} (x_{j}) - λ_{d} ([0, x))

D_{λ_{d}} (P_{n}) := x \in [0, 1]^{d} sup \frac{1}{n} j = 1 \sum n 1_{[0, x)} (x_{j}) - λ_{d} ([0, x))

Q_{n} (f, u) = \frac{\sum _{j = 1}^{n} f ( x _{j} ) u ( x _{j} )}{\sum _{j = 1}^{n} u ( x _{j} )} .

Q_{n} (f, u) = \frac{\sum _{j = 1}^{n} f ( x _{j} ) u ( x _{j} )}{\sum _{j = 1}^{n} u ( x _{j} )} .

∣ S (f, u) - Q_{n} (f, u) ∣ \leq 4 \frac{∥ f ∥ _{H_{1}} ∥ u ∥ _{D}}{\int _{[0, 1]^{d}} u ( x ) d x} D_{λ_{d}} (P_{n}) .

∣ S (f, u) - Q_{n} (f, u) ∣ \leq 4 \frac{∥ f ∥ _{H_{1}} ∥ u ∥ _{D}}{\int _{[0, 1]^{d}} u ( x ) d x} D_{λ_{d}} (P_{n}) .

D_{π} (w, P_{n}) = x \in [0, 1]^{d} sup i = 1 \sum n w_{i} 1_{[0, x)} (x_{i}) - π ([0, x)) .

D_{π} (w, P_{n}) = x \in [0, 1]^{d} sup i = 1 \sum n w_{i} 1_{[0, x)} (x_{i}) - π ([0, x)) .

w_{i}^{u} := w_{i} (u, P_{n}) := \frac{u ( x _{i} )}{\sum _{j = 1}^{n} u ( x _{j} )}, i = 1, \dots, n .

w_{i}^{u} := w_{i} (u, P_{n}) := \frac{u ( x _{i} )}{\sum _{j = 1}^{n} u ( x _{j} )}, i = 1, \dots, n .

⟨ f, g ⟩ = v \subseteq [d] \sum \int_{[0, 1]^{∣ v ∣}} \frac{\partial ^{∣ v ∣}}{\partial x _{v}} f (x_{v}; 1) \frac{\partial ^{∣ v ∣}}{\partial x _{v}} g (x_{v}; 1) d x_{v},

⟨ f, g ⟩ = v \subseteq [d] \sum \int_{[0, 1]^{∣ v ∣}} \frac{\partial ^{∣ v ∣}}{\partial x _{v}} f (x_{v}; 1) \frac{\partial ^{∣ v ∣}}{\partial x _{v}} g (x_{v}; 1) d x_{v},

\frac{\partial ^{∣ v ∣}}{\partial x _{v}} K ((x_{v}; 1), y) = (- 1)^{∣ v ∣} 1_{[y_{v}, 1]} (x_{v}),

\frac{\partial ^{∣ v ∣}}{\partial x _{v}} K ((x_{v}; 1), y) = (- 1)^{∣ v ∣} 1_{[y_{v}, 1]} (x_{v}),

f (y) = v \subseteq [d] \sum \int_{[y_{v}, 1]} (- 1)^{∣ v ∣} \frac{\partial ^{∣ v ∣}}{\partial x _{v}} f (x_{v}; 1) d x_{v} .

f (y) = v \subseteq [d] \sum \int_{[y_{v}, 1]} (- 1)^{∣ v ∣} \frac{\partial ^{∣ v ∣}}{\partial x _{v}} f (x_{v}; 1) d x_{v} .

∥ f ∥_{H_{1}} := v \subseteq [d] \sum \int_{[0, 1]^{∣ v ∣}} \frac{\partial ^{∣ v ∣}}{\partial x _{v}} f (x_{v}; 1) d x_{v},

∥ f ∥_{H_{1}} := v \subseteq [d] \sum \int_{[0, 1]^{∣ v ∣}} \frac{\partial ^{∣ v ∣}}{\partial x _{v}} f (x_{v}; 1) d x_{v},

∥ f ∥_{H_{1}} := \emptyset \neq = v \subseteq [d] \sum \int_{[0, 1]^{∣ v ∣}} \frac{\partial ^{∣ v ∣}}{\partial x _{v}} f (x_{v}; 1) d x_{v} .

∥ f ∥_{H_{1}} := \emptyset \neq = v \subseteq [d] \sum \int_{[0, 1]^{∣ v ∣}} \frac{\partial ^{∣ v ∣}}{\partial x _{v}} f (x_{v}; 1) d x_{v} .

S (f, u) - i = 1 \sum n w_{i} f (x_{i}) \leq ∥ f ∥_{H_{1}} D_{π} (w, P_{n}) .

S (f, u) - i = 1 \sum n w_{i} f (x_{i}) \leq ∥ f ∥_{H_{1}} D_{π} (w, P_{n}) .

h (x) := \int_{[0, 1]^{d}} K (x, y) d π (y) - i = 1 \sum n w_{i} K (x, x_{i}),

h (x) := \int_{[0, 1]^{d}} K (x, y) d π (y) - i = 1 \sum n w_{i} K (x, x_{i}),

e (f, P_{n})

e (f, P_{n})

= ⟨ f, h ⟩ .

∣ e (f, P_{n}) ∣ = ∣ e (f, P_{n}) ∣ \leq ∥ f ∥_{H_{1}} D_{π} (w, P_{n}) = ∥ f ∥_{H_{1}} D_{π} (w, P_{n}),

∣ e (f, P_{n}) ∣ = ∣ e (f, P_{n}) ∣ \leq ∥ f ∥_{H_{1}} D_{π} (w, P_{n}) = ∥ f ∥_{H_{1}} D_{π} (w, P_{n}),

∣ S (f, u) - Q_{n} (f, u) ∣ \leq ∥ f ∥_{H_{1}} D_{π} (w^{u}, P_{n}) .

∣ S (f, u) - Q_{n} (f, u) ∣ \leq ∥ f ∥_{H_{1}} D_{π} (w^{u}, P_{n}) .

D_{π} (w^{u}, P_{n}) \leq 4 D_{λ_{d}} (P_{n}) \frac{∥ u ∥ _{D}}{\int _{[0, 1]^{d}} u ( x ) d x},

D_{π} (w^{u}, P_{n}) \leq 4 D_{λ_{d}} (P_{n}) \frac{∥ u ∥ _{D}}{\int _{[0, 1]^{d}} u ( x ) d x},

∥ u ∥_{D} = z \in [0, 1]^{d} sup u (z) + z \in [0, 1]^{d} sup ∥ u (T_{z} \cdot) ∥_{H_{1}}

∥ u ∥_{D} = z \in [0, 1]^{d} sup u (z) + z \in [0, 1]^{d} sup ∥ u (T_{z} \cdot) ∥_{H_{1}}

j = 1 \sum n w_{j}^{u} 1_{[0, z)} (x_{j}) - π ([0, z)) = \frac{\sum _{j = 1}^{n} u ( x _{j} ) 1 _{[0, z)} ( x _{j} )}{\sum _{i = 1}^{n} u ( x _{i} )} - \frac{\int _{[0, z)} u ( x ) d x}{∥ u ∥ _{1}}

j = 1 \sum n w_{j}^{u} 1_{[0, z)} (x_{j}) - π ([0, z)) = \frac{\sum _{j = 1}^{n} u ( x _{j} ) 1 _{[0, z)} ( x _{j} )}{\sum _{i = 1}^{n} u ( x _{i} )} - \frac{\int _{[0, z)} u ( x ) d x}{∥ u ∥ _{1}}

\leq \frac{\sum _{j = 1}^{n} u ( x _{j} ) 1 _{[0, z)} ( x _{j} )}{∥ u ∥ _{1} \sum _{i = 1}^{n} u ( x _{i} )} ∥ u ∥_{1} - \frac{1}{n} i = 1 \sum n u (x_{i})

+ \frac{1}{∥ u ∥ _{1}} \frac{1}{n} i = 1 \sum n u (x_{i}) 1_{[0, z)} (x_{i}) - \int_{[0, z)} u (x) d x

\leq \frac{2}{∥ u ∥ _{1}} z \in [0, 1]^{d} sup \frac{1}{n} i = 1 \sum n u (x_{i}) 1_{[0, z)} (x_{i}) - \int_{[0, z)} u (x) d x .

I_{1} (z)

I_{1} (z)

I_{2} (z)

\frac{1}{n} i = 1 \sum n u (x_{i}) 1_{[0, z)} (x_{i}) - \int_{[0, z)} u (x) d x

\frac{1}{n} i = 1 \sum n u (x_{i}) 1_{[0, z)} (x_{i}) - \int_{[0, z)} u (x) d x

= \frac{∣ P ^{z} ∣}{n} \frac{1}{∣ P ^{z} ∣} x \in P^{z} \sum u (x) - \frac{n}{∣ P ^{z} ∣} \int_{[0, z)} u (x) d x \leq I_{1} (z) + I_{2} (z) .

I_{1} (z) \leq \frac{\int _{[0, z]} u ( x ) d x}{λ _{d} ([ 0 , z ])} D_{λ_{d}} (P_{n}) \leq D_{λ_{d}} (P_{n}) x \in [0, z] sup u (x) .

I_{1} (z) \leq \frac{\int _{[0, z]} u ( x ) d x}{λ _{d} ([ 0 , z ])} D_{λ_{d}} (P_{n}) \leq D_{λ_{d}} (P_{n}) x \in [0, z] sup u (x) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematical Approximation and Integration · Probabilistic and Robust Engineering Design · Statistical Methods and Inference

Full text

A weighted Discrepancy Bound of quasi-Monte Carlo Importance Sampling

Viacheslav Natarovskii, Daniel Rudolf ${}^{\,*,}$ , Björn Sprungk∗ Institute for Mathematical Stochastics, Georg-August-Universität Göttingen, Goldschmidtstraße 7, 37077 Göttingen, Email: [email protected], [email protected], [email protected] for Mathematical Statistics in the Biosciences, Goldschmidtstraße 7, 37077 Göttingen

Josef Dick, Daniel Rudolf, Houying Zhu The University of New South Wales, Sydney, NSW 2052, Australia, Email: [email protected] for Mathematical Stochastics, Universität Göttingen, Goldschmidtstraße 7, 37077 Göttingen, Germany, Email: [email protected] Integrative Genomics & School of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Australia, Email: [email protected]

Abstract

Importance sampling Monte-Carlo methods are widely used for the approximation of expectations with respect to partially known probability measures. In this paper we study a deterministic version of such an estimator based on quasi-Monte Carlo. We obtain an explicit error bound in terms of the star-discrepancy for this method.

**Keywords: ** Importance sampling, Monte Carlo method, quasi-Monte Carlo

Classification. Primary: 62F15; Secondary: 11K45.

1 Introduction

In statistical physics and Bayesian statistics it is desirable to compute expected values

[TABLE]

with $f\colon\mathbb{R}^{d}\to{\mathbb{R}}$ and a partially known probability measure $\pi$ on $(\mathbb{R}^{d},\mathcal{B}(\mathbb{R}^{d}))$ . Here $\mathcal{B}(\mathbb{R}^{d})$ denotes the Borel $\sigma$ -algebra and partially known means that there is an unnormalized density $u\colon\mathbb{R}^{d}\to[0,\infty)$ (with respect to the Lebesgue measure) and $\int_{\mathbb{R}^{d}}u({\boldsymbol{x}})\,{\mathrm{d}}{\boldsymbol{x}}\in(0,\infty)$ , such that

[TABLE]

Probability measures of this type are met in numerous applications. For example, for the density of a Boltzmann distribution one has

[TABLE]

with inverse temperature $\beta>0$ and Hamiltonian $H\colon\mathbb{R}^{d}\to\mathbb{R}$ . The density of a posterior distribution is also of this form. Given observations ${\boldsymbol{y}}\in\mathcal{Y}$ , likelihood function $\ell({\boldsymbol{y}}\mid{\boldsymbol{x}})$ and prior probability density $p$ , with respect to the Lebesgue measure on $\mathbb{R}^{d}$ ,

[TABLE]

In this setting $\mathbb{R}^{d}$ is considered as parameter- and $\mathcal{Y}$ as observable-space. In both examples, the normalizing constant is in general unknown.

In the present work we only consider unnormalized densities $u$ which are zero outside of the unit cube $[0,1]^{d}$ . Hence we restrict ourself to $u\colon[0,1]^{d}\to[0,\infty)$ , i.e., $\pi$ is a probability measure on $[0,1]^{d}$ , and $f\colon[0,1]^{d}\to\mathbb{R}$ . To stress the dependence on the unnormalized density in (1), define

[TABLE]

for $f$ and $u$ belonging to some class of functions. It is desirable to have algorithms which approximately compute $S(f,u)$ by only having access to function values of $f$ and $u$ without knowing the normalizing constant a priori. A straightforward strategy to do so provides an importance sampling Monte Carlo approach. It works as follows.

Algorithm 1.

Monte Carlo importance sampling:

Generate a sample of an i.i.d. sequence of random variables $X_{1},\dots,X_{n}$ with $X_{i}\sim\mathcal{U}([0,1]^{d})$ 111By $\mathcal{U}([0,1]^{d})$ we denote the uniform distribution on $[0,1]^{d}$ . and call the result ${\boldsymbol{x}}_{1},\dots,{\boldsymbol{x}}_{n}$ . 2. 2.

Compute

[TABLE]

Under the minimal assumption that $S(f,u)$ is finite, a strong law of large numbers argument guarantees that the importance sampling estimator $M_{n}(f,u)$ is well-defined, cf. [16, Chapter 9, Theorem 9.2]. For uniformly bounded $f$ and finite $\sup u/\inf u$ an explicit error bound of the mean square error is proven in [14, Theorem 2].

Surprisingly, there is not much known about a deterministic version of this method. The idea is to substitute the uniformly in $[0,1]^{d}$ distributed i.i.d. sequence by a carefully chosen deterministic point set. Carefully chosen in the sense that the point set $P_{n}=\{{\boldsymbol{x}}_{1},\dots,{\boldsymbol{x}}_{n}\}\subset[0,1]^{d}$ has “small” star-discrepancy, that is,

[TABLE]

is “small”. Here, the set $[0,{\boldsymbol{x}})=\prod_{i=1}^{d}[0,x_{i})$ denotes an anchored box in $[0,1]^{d}$ with ${\boldsymbol{x}}=(x_{1},\dots,x_{d})$ and $\lambda_{d}([0,{\boldsymbol{x}}))=\prod_{i=1}^{d}x_{i}$ is the $d$ -dimensional Lebesgue measure of $[0,{\boldsymbol{x}})$ . This leads to a quasi-Monte Carlo importance sampling method.

Algorithm 2.

Quasi-Monte Carlo importance sampling:

Generate a point set $P_{n}=\{{\boldsymbol{x}}_{1},\dots,{\boldsymbol{x}}_{n}\}$ with “small” star discrepancy $D_{\lambda_{d}}(P_{n})$ . 2. 2.

Compute

[TABLE]

Our main result, stated in Theorem 3, is an explicit error bound for the estimator $Q_{n}$ of the form

[TABLE]

Here $f$ must be differentiable, such that $\|f\|_{H_{1}}$ , defined in (7) below, is finite. As a regularity assumption on $u$ it is assumed that $\|u\|_{D}$ , defined in (9) below, is also finite.

The estimate of (4) is proven by two results which might be interesting on its own. The first is a Koksma-Hlawka inequality in terms of a weighted star-discrepancy, see Theorem 1. The second is a relation between this quantity and the classical star-discrepancy, see Theorem 2. To illustrate the quasi-Monte Carlo importance sampling procedure and the error bound we provide an example in Section 3 where (4) is applicable.

Related Literature. The Monte Carlo importance sampling procedure from Algorithm 1 is well studied. In [14], Novak and Mathé prove that it is optimal on a certain class of tuples $(f,u)$ . However, recently this Monte Carlo approach attracted considerable attention, let us mention here [1, 4]. In particular, in [1] upper error bounds not only for bounded functions $f$ are provided and the relevance of the method for inverse problems is presented.

Another standard approach the approximation of $\mathbb{E}_{\pi}(f)$ are Markov chain Monte Carlo methods. For details concerning error bounds we refer to [11, 12, 13, 17, 19, 20, 21] and the references therein. Combinations of importance sampling and Markov chain Monte Carlo are for example analyzed in [18, 24, 22].

The quasi-Monte Carlo importance sampling procedure of Algorithm 2 is, to our knowledge, less well studied. An asymptotic convergence result is stated in [9, Theorem 1] and promising numerical experiments are conducted in [10]. A related method, a randomized deterministic sampling procedure according to the unnormalized distribution $\pi$ , is studied in [23]. Recently, [3] explore the efficiency of using QMC inputs in importance sampling for Archimedean copulas where significant variance reduction is obtained for a case study.

A quasi-Monte Carlo approach to Bayesian inversion was used in [5] and in [6] The latter paper uses a combination of quasi-Monte Carlo and the multi-level method. The computation of the likelihood function involves solving a partial differential equation, but otherwise the problem is of the same form as described in the introduction.

2 Weighted Star-discrepancy and error bound

Recall that $[0,{\boldsymbol{x}})$ for ${\boldsymbol{x}}\in[0,1]^{d}$ are boxes anchored at [math]. As a measure of “closeness” between the empirical distribution $\frac{1}{n}\sum_{j=1}^{n}\mathbf{1}_{[0,{\boldsymbol{x}})}({\boldsymbol{x}}_{i})$ of a point set $P_{n}=\{{\boldsymbol{x}}_{1},\dots,{\boldsymbol{x}}_{n}\}$ to $\lambda_{d}([0,{\boldsymbol{x}}))$ we consider the star-discrepancy $D_{\lambda_{d}}(P_{n})$ . A straightforward extension of this quantity taking the probability measure $\pi$ on $[0,1]^{d}$ into account is the following weighted discrepancy.

Definition 1 (Weighted Star-discrepancy).

For a given point set $P_{n}=\{{\boldsymbol{x}}_{1},\dots,{\boldsymbol{x}}_{n}\}\subset[0,1]^{d}$ and weight vector ${\boldsymbol{w}}=(w_{1},\dots,w_{n})\in\mathbb{R}^{n}$ , which might depend on $P_{n}$ and satisfies $\sum_{i=1}^{n}w_{i}=1$ , define the weighted star-discrepancy by

[TABLE]

Remark 1.

If $\pi$ is the Lebesgue measure on $[0,1]^{d}$ and the weight vector is ${\boldsymbol{w}}=(1/n,\dots,1/n)$ , then $D_{\lambda_{d}}(P_{n})=D_{\pi}({\boldsymbol{w}},P_{n})$ for any point set $P_{n}$ . For general $\pi$ with unnormalized density $u\colon[0,1]^{d}\to[0,\infty)$ , allowing the representation (2), we focus on the weight vector

[TABLE]

Here let us emphasize that ${\boldsymbol{w}}^{u}:=(w^{u}_{1},\dots,w^{u}_{n})$ depends on $u$ and $P_{n}$ .

2.1 Integration Error and weighted Star-discrepancy

With standard techniques one can prove a Koksma-Hlawka inequality according to $D_{\pi}(w,P_{n})$ . For details we refer to [7], [8, Section 2.3] and [15, Chapter 9]. A similar inequality of a quasi-Monte Carlo importance sampler can be found in [2, Corollary 1].

Let $[d]:=\{1,\dots,d\}$ and $L_{2}([0,1]^{d})$ be the space of square integrable functions with respect to the Lebesgue measure. Define the reproducing kernel $K\colon[0,1]^{d}\times[0,1]^{d}\to[0,1]$ by $K({\boldsymbol{x}},{\boldsymbol{y}}):=\prod_{i=1}^{d}(1+\min\{1-x_{i},1-y_{i}\}).$ By $H_{2}=H_{2}(K)$ we denote the corresponding reproducing kernel Hilbert space, which consists of differentiable functions with respect to all variables with first partial derivatives being in $L_{2}([0,1]^{d})$ . For $f,g\in H_{2}$ the inner product is given by

[TABLE]

where for $v\subseteq[d]$ and ${\boldsymbol{x}}=(x_{1},\dots,x_{d})$ we write ${\boldsymbol{x}}_{v}=(x_{j})_{j\in v}$ and $({\boldsymbol{x}}_{v};1)=(z_{1},\dots,z_{d})$ with $z_{j}=x_{j}$ if $j\in V$ and $z_{j}=1$ if $j\not\in v$ . Thus, $H_{2}$ consists of functions which are differentiable according to all variables with first partial derivatives being in $L_{2}([0,1]^{d})$ . Note that, for $v\subseteq[d]$ holds

[TABLE]

where $[{\boldsymbol{y}}_{v},1]=\prod_{i\in v}[y_{i},1]$ with ${\boldsymbol{y}}=(y_{1},\dots,y_{d})\in[0,1]^{d}$ . Thus, the reproducing property of the reproducing kernel Hilbert space can be rewritten as

[TABLE]

Further, we define the space $H_{1}$ of differentiable functions $f\colon[0,1]^{d}\to\mathbb{R}$ with finite norm

[TABLE]

where for $v=\emptyset$ we have $\int_{[0,1]^{|v|}}\left|\frac{\partial^{|v|}}{\partial{\boldsymbol{x}}_{v}}f({\boldsymbol{x}}_{v};1)\right|{\mathrm{d}}{\boldsymbol{x}}_{v}=|f(1)|$ . We also define the semi-norm

[TABLE]

It is obvious that $\|f\|_{\widetilde{H}_{1}}\leq\|f\|_{H_{1}}$ .

We have the following relation between the integration error in $H_{1}$ and the weighted discrepancy.

Theorem 1 (Koskma-Hlawka inequality).

Let $\pi$ be a probability measure of the form (2) with unnormalized density $u\colon[0,1]^{d}\to[0,\infty)$ . Then, for $P_{n}=\{{\boldsymbol{x}}_{1},\ldots,{\boldsymbol{x}}_{n}\}\subset[0,1]^{d}$ , arbitrary weight vector ${\boldsymbol{w}}=(w_{1},\dots,w_{n})\in\mathbb{R}^{n}$ with $\sum_{i=1}^{d}w_{i}=1$ , and for all $f\in H_{1}$ we have

[TABLE]

Proof.

Define the quadrature error $e(f,P_{n}):=\int_{[0,1]^{d}}f({\boldsymbol{x}})\,{\mathrm{d}}\pi({\boldsymbol{x}})-\sum_{i=1}^{n}w_{i}f({\boldsymbol{x}}_{i})$ of the approximation of $\mathbb{E}_{\pi}(f)=S(f,u)$ by $\sum_{i=1}^{n}w_{i}f({\boldsymbol{x}}_{i})$ . Define the function $\widetilde{f}=f-f(1)$ . Then $\widetilde{f}(1)=0$ , $e(f,P_{n})=e(\widetilde{f},P_{n})$ and $\|f\|_{\widetilde{H}_{1}}=\|\widetilde{f}\|_{H_{1}}$ .

For

[TABLE]

and $v\subseteq[d]$ we have $\frac{\partial^{|v|}}{\partial{\boldsymbol{x}}_{v}}h({\boldsymbol{z}}_{v};1)=(-1)^{|v|}\left(\pi([0,({\boldsymbol{z}}_{v};1)))-\sum_{i=1}^{n}w_{i}\mathbf{1}_{[0,{\boldsymbol{z}}_{v}]}({\boldsymbol{x}}_{i,v})\right).$ A straightforward calculation, see also for instance [7, formula (3)], shows by using (6) that

[TABLE]

Finally, by $\left|\frac{\partial^{|v|}}{\partial{\boldsymbol{z}}_{v}}h({\boldsymbol{z}}_{v};1)\right|\leq D_{\pi}({\boldsymbol{w}},P_{n})$ we have

[TABLE]

which finishes the proof. ∎

An immediate consequence of the theorem with ${\boldsymbol{w}}^{u}$ from (5) and $Q_{n}$ from (3) is the error bound

[TABLE]

Here the dependence on $u$ on the right-hand side is hidden in $D_{\pi}({\boldsymbol{w}}^{u},P_{n})$ through ${\boldsymbol{w}}^{u}$ and $\pi$ . The intuition is, that under suitable assumptions on $u$ the weighted star-discrepancy can be bounded by the classical star-discrepancy of $P_{n}$ .

2.2 Weighted and classical Star-discrepancy

In this section we provide a relation between the classical star-discrepancy $D_{\lambda_{d}}(P_{n})$ and the weighted star-discrepancy $D_{\pi}({\boldsymbol{w}}^{u},P_{n})$ .

Theorem 2.

Let $\pi$ be a probability measure of the form (2) with unnormalized density function $u\colon[0,1]^{d}\to[0,\infty)$ . Then, for any point set $P_{n}=\{{\boldsymbol{x}}_{1},\ldots,{\boldsymbol{x}}_{n}\}$ in $[0,1]^{d}$ , we have

[TABLE]

where

[TABLE]

with $T_{{\boldsymbol{z}}}\colon[0,1]^{d}\to[0,{\boldsymbol{z}}]$ and $T_{{\boldsymbol{z}}}(x_{1},\dots,x_{d})=(z_{1}x_{1},\dots,z_{d}x_{d})$ for ${\boldsymbol{z}}\in[0,1]^{d}$ .

Proof.

For the given point set $P_{n}\subset[0,1]^{d}$ and unnormalized density $u$ recall that ${\boldsymbol{w}}^{u}$ is defined in (5). To shorten the notation define $\|u\|_{1}:=\int_{[0,1]^{d}}u({\boldsymbol{y}}){\mathrm{d}}{\boldsymbol{y}}$ . Then, for ${\boldsymbol{z}}\in[0,1]^{d}$ we have

[TABLE]

For ${\boldsymbol{z}}\in[0,1]^{d}$ denote $P^{{\boldsymbol{z}}}=P_{n}\cap[0,{\boldsymbol{z}})$ and let $\left|P^{{\boldsymbol{z}}}\right|$ be the cardinality of $P^{{\boldsymbol{z}}}$ . Define

[TABLE]

and note that

[TABLE]

Estimation of $I_{1}({\boldsymbol{z}})$ : An immediate consequence of the definition of $I_{1}({\boldsymbol{z}})$ is

[TABLE]

Estimation of $I_{2}({\boldsymbol{z}})$ : With the transformation $T_{{\boldsymbol{z}}}\colon[0,1]^{d}\to[0,{\boldsymbol{z}}]$ defined in the theorem one has $\frac{\int_{[0,{\boldsymbol{z}}]}u({\boldsymbol{x}}){\mathrm{d}}{\boldsymbol{x}}}{\lambda_{d}([0,{\boldsymbol{z}}])}=\int_{[0,1]^{d}}u(T_{{\boldsymbol{z}}}{\boldsymbol{x}})\,{\mathrm{d}}{\boldsymbol{x}}.$ Let

[TABLE]

and observe that $|P^{{\boldsymbol{z}}}|=|Q|$ . Then

[TABLE]

where the last inequality follows from Theorem 1 with ${\boldsymbol{w}}=(1/n,\dots,1/n)$ and constant unnormalized density. Further,

[TABLE]

By the fact that $T_{{\boldsymbol{z}}}([0,{\boldsymbol{y}}))$ is again a box anchored at [math] and

[TABLE]

we have

[TABLE]

Hence we have

[TABLE]

which implies the result. ∎

In particular, the theorem implies that whenever $\|u\|_{D}$ is finite and $D_{\lambda_{d}}(P_{n})$ goes to zero as $n$ goes to infinity, also $D_{\pi}({\boldsymbol{w}}^{u},P_{n})$ goes to zero for increasing $n$ with the same rate of convergence.

2.3 Explicit error bound

An immediate consequence of the results of the previous two sections is the following explicit error bound of the quasi-Monte Carlo importance sampling method of Algorithm 2.

Theorem 3.

Let $\pi$ be a probability measure of the form (2) with unnormalized density $u\colon[0,1]^{d}\to[0,\infty)$ . Then, for any point set $P_{n}=\{{\boldsymbol{x}}_{1},\ldots,{\boldsymbol{x}}_{n}\}$ in $[0,1]^{d}$ , $f\in H_{1}$ and $Q_{n}$ from (3) we obtain

[TABLE]

with $\|u\|_{D}$ from Theorem 2.

Under the regularity assumption that $\|u\|_{D}$ is finite, the error bound tells us that the classical star-discrepancy determines the rate of convergence on how fast $Q_{n}(f,u)$ goes to $S(f,u)$ .

3 Illustrating Example

Define the $d$ -simplex by $\Delta_{d}:=\left\{{\boldsymbol{x}}\in[0,1]^{d}\colon\sum_{i=1}^{d}x_{i}\leq 1\right\}$ and consider the (slightly differently formulated) unnormalized density $u\colon[0,1]^{d}\to[0,1)$ of the Dirichlet distribution with parameter vector ${\boldsymbol{\alpha}}\in(1,\infty)^{d+1}$ given by

[TABLE]

The Dirichlet distribution is the conjugate prior of the multinomial distribution: Assume that we observed some data ${\boldsymbol{y}}=(y_{1},\dots,y_{d+1})\in[0,\infty)^{d+1}$ , which we model as a realization of a multinomial distributed random variable with unknown parameter vector ${\boldsymbol{x}}=(x_{1},\dots,x_{d})\in[0,1]^{d}$ . With $n\in\mathbb{N}$ this leads to a likelihood function $\ell({\boldsymbol{y}}\mid{\boldsymbol{x}})=\frac{n!}{y_{1}!\cdots y_{d+1}!}(1-\sum_{i=1}^{d}x_{i})^{y_{d+1}}\prod_{i=1}^{d}x_{i}^{y_{i}}.$ For a prior distribution with unnormalized density $u({\boldsymbol{x}},{\boldsymbol{\beta}})$ and ${\boldsymbol{\beta}}\in(1,\infty)^{d+1}$ we obtain a posterior measure with unnormalized density $u({\boldsymbol{x}},{\boldsymbol{\beta}}+{\boldsymbol{y}})$ .

The normalizing constant of $u$ can be computed explicitly, it is known that

[TABLE]

To have a feasible setting for the application of Theorem 1 and Theorem 2 we need to show that $\|u\|_{D}$ is finite. This is not immediately clear, since in $\|u\|_{D}$ we take the supremum over ${\boldsymbol{z}}\in[0,1]^{d}$ . The following lemma is useful.

Lemma 1.

Let $v\subseteq[d]$ and recall that we write $k_{v}=(k_{i})_{i\in v}$ . Define $(k_{v};0;k_{d+1})=(r_{1},\dots,r_{d+1})$ with $r_{j}=k_{j}$ if $j\in v$ , $r_{j}=0$ if $j\not\in v$ and $r_{j}=k_{d+1}$ if $j=d+1$ . Assume that $\alpha_{i}\geq 2$ for $1\leq i\leq d$ and $\alpha_{d+1}\geq d$ . Then

[TABLE]

with $c_{v,k_{v},k_{d+1}}=(-1)^{k_{d+1}}\prod_{j=1}^{k_{d+1}}(\alpha_{d+1}-j)\prod_{i\in v}(\alpha_{i}-1)^{k_{i}}.$

Proof.

The statement follows by induction over the cardinality of $v$ . For $|v|=0$ , i.e., $v=\emptyset$ both sides of (13) are equal to $u({\boldsymbol{x}},{\boldsymbol{\alpha}})$ .

Assume $|v|=1$ , i.e., for some $s\in[d]$ we have $v=\{s\}$ . Then

[TABLE]

with ${\boldsymbol{e}}_{i}=(0,\dots,0,1,0,\dots,0)\in\mathbb{R}^{d+1}$ where the $i$ th entry is “1”. On the other hand

[TABLE]

By the fact that $c_{\{s\},0,1}=-(\alpha_{d+1}-1)$ and $c_{\{s\},1,0}=(\alpha_{s}-1)$ the claim is proven for $|v|=1$ .

Now assume that (13) is true for any $v\subseteq[d]$ with $|v|\leq\ell<d$ . Let $v\subseteq[d]$ with $|v|=\ell$ be an arbitrary subset and let $r\in[d]$ with $r\not\in v$ . Then we prove that the result also holds for $\widetilde{v}=v\cup\{r\}$ . We have

[TABLE]

Observe that

[TABLE]

where $\bar{k}_{d+1}:=k_{d+1}+1-k_{r}$ . Further, note that

[TABLE]

Hence, by using $\bar{k}_{d+1}:=k_{d+1}+1-k_{r}$ we obtain

[TABLE]

and the proof is finished. ∎

An immediate consequence of the previous lemma and a chain rule argument we have for arbitrary $v\subseteq[d]$ , ${\boldsymbol{z}}\in[0,1]^{d}$ and $T_{{\boldsymbol{z}}}$ , defined as in Theorem 2, that

[TABLE]

For $\alpha_{i}\geq 2$ with $1\leq i\leq d$ , $\alpha_{d+1}\geq d$ and arbitrary ${\boldsymbol{x}},{\boldsymbol{z}}\in[0,1]^{d}$ , holds $u(T_{{\boldsymbol{z}}}{\boldsymbol{x}},{\boldsymbol{\alpha}}-(k_{v};0;k_{d+1}))\leq 1$ , where $v\subseteq[d]$ , $k_{v}\in\{0,1\}^{|v|}$ and $k_{d+1}\in[d]$ . Then, it follows that $\left|\frac{\partial^{|v|}}{\partial x_{v}}u(T_{{\boldsymbol{z}}}{\boldsymbol{x}},{\boldsymbol{\alpha}})\right|\leq C^{(1)}_{d,{\boldsymbol{\alpha}}}<\infty,$ with a constant $C^{(1)}_{d,{\boldsymbol{\alpha}}}$ depending on $d$ and ${\boldsymbol{\alpha}}$ . Hence, for another constant $C^{(2)}_{d,{\boldsymbol{\alpha}}}$ holds $\left\|u(T_{{\boldsymbol{z}}}\,\cdot,{\boldsymbol{\alpha}})\right\|_{H_{1}}\leq C^{(2)}_{d,{\boldsymbol{\alpha}}}<\infty$ uniformly in ${\boldsymbol{z}}\in[0,1]^{d}$ . Finally, by the fact that $u({\boldsymbol{x}})\leq 1$ we obtain the following corollary.

Corollary 1.

For $\alpha_{i}\geq 2$ with $1\leq i\leq d$ and $\alpha_{d+1}\geq d$ we have for $u({\boldsymbol{x}},{\boldsymbol{\alpha}})$ defined in (11) that there is a constant $C_{d,{\boldsymbol{\alpha}}}$ such that

[TABLE]

This verifies that the application of Theorem 1 and Theorem 2 is justified. For ${\boldsymbol{w}}^{u}$ given by (5) we obtain

[TABLE]

Consider $f_{{\boldsymbol{\gamma}}}\colon[0,1]^{d}\to[0,1]$ with ${\boldsymbol{\gamma}}\in(1,\infty)^{d}$ given by $f_{{\boldsymbol{\gamma}}}({\boldsymbol{x}})=2^{-d}\prod_{i=1}^{d}x_{i}^{\gamma_{i}}.$ Then, by (12) we have

[TABLE]

and $\|f_{\boldsymbol{\gamma}}\|_{H_{1}}=1$ . Since we know $S(f_{\boldsymbol{\gamma}},u(\cdot,{\boldsymbol{\alpha}}))$ we can run the quasi-Monte Carlo importance sampling algorithm and plot the error for different $d$ and fixed ${\boldsymbol{\alpha}}$ and ${\boldsymbol{\gamma}}$ .

Numerical experiments. Let ${\boldsymbol{\gamma}}=(1,\dots,1)\in\mathbb{R}^{d}$ and ${\boldsymbol{\alpha}}=(2,\dots,2,d)\in\mathbb{R}^{d}$ . Here the true expectation of $f_{\boldsymbol{\gamma}}$ according to the distribution determined by $u(\cdot,{\boldsymbol{\alpha}})$ can be further simplified to $S(f_{{\boldsymbol{\gamma}}},u(\cdot,{\boldsymbol{\alpha}}))=\frac{(3d-1)!}{(4d-1)!}.$ Since for large $d$ this value is very small we plot the normalized error. For a given point set $P_{n}$ it is defined by

[TABLE]

and can be computed exactly. Let $H_{n}$ the first $n$ points of the Halton sequence and note that it is known that $D_{\lambda_{d}}(H_{n})\leq O\left(\frac{(\log n)^{d}}{n}\right)$ . By $S_{n}$ we denote the first $n$ points of the Sobol sequence. For details to those standard quasi-Monte Carlo point sets we refer to [8]. We obtain the following plots for $d=2,4,6$ .

Acknowledgment

D. Rudolf is supported by the Felix-Bernstein-Institute for Mathematical Statistics in the Biosciences, the Campus laboratory AIMS and the DFG within the project 389483880.

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] S. Agapiou, O. Papaspiliopoulos, D. Sanz-Alonso, and A. Stuart, Importance Sampling: Intrinsic Dimension and Computational Cost , Statist. Sci. 32 .
2[2] Ch. Aistleitner and J. Dick, Functions of bounded variation, signed measures, and a general Koksma–Hlawka inequality , Acta Arith. 167 (2015), 143–171.
3[3] P. Arbenz, M. Cambou, M. Hofert, C. Lemieux, and Y. Taniguchi, Importance sampling and stratification for copula models , Contemporary Computational Mathematics - a celebration of the 80th birthday of Ian Sloan (J. Dick, F. Y. Kuo, H. Woźniakowski, eds.), Springer-Verlag, 2018.
4[4] S. Chatterjee and P Diaconis, The sample size required in importance sampling , Ann. Appl. Probab. 28 (2018), 1099–1135.
5[5] J. Dick, R. N. Gantner, Q. T. Le Gia, and C. Schwab, Higher order Quasi-Monte Carlo integration for Bayesian Estimation , Ar Xiv e-prints (2016).
6[6] , Multilevel higher-order quasi-Monte Carlo Bayesian estimation , Math. Models Methods Appl. Sci. 27 (2017), 953–995.
7[7] J. Dick, A. Hinrichs, and F. Pillichshammer, Proof Techniques in Quasi-Monte Carlo Theory , J. Complexity 31 (2015), 327–371.
8[8] J. Dick and F. Pillichshammer, Digital nets and sequences: Discrepancy theory and quasi-Monte Carlo integration , Cambridge University Press, Cambridge, 2010.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A weighted Discrepancy Bound of quasi-Monte Carlo Importance Sampling

Abstract

1 Introduction

Algorithm 1**.**

Algorithm 2**.**

2 Weighted Star-discrepancy and error bound

Definition 1** (Weighted Star-discrepancy).**

Remark 1**.**

2.1 Integration Error and weighted Star-discrepancy

Theorem 1** (Koskma-Hlawka inequality).**

Proof.

2.2 Weighted and classical Star-discrepancy

Theorem 2**.**

Proof.

2.3 Explicit error bound

Theorem 3**.**

3 Illustrating Example

Lemma 1**.**

Proof.

Corollary 1**.**

Acknowledgment

Algorithm 1.

Algorithm 2.

Definition 1 (Weighted Star-discrepancy).

Remark 1.

Theorem 1 (Koskma-Hlawka inequality).

Theorem 2.

Theorem 3.

Lemma 1.

Corollary 1.