On Negatively Dependent Sampling Schemes, Variance Reduction, and   Probabilistic Upper Discrepancy Bounds

Michael Gnewuch; Marcin Wnuk; Nils Hebbinghaus

arXiv:1904.10796·math.NA·February 10, 2021

On Negatively Dependent Sampling Schemes, Variance Reduction, and Probabilistic Upper Discrepancy Bounds

Michael Gnewuch, Marcin Wnuk, Nils Hebbinghaus

PDF

TL;DR

This paper explores negative dependence in sampling schemes to improve variance and discrepancy bounds, providing new explicit bounds and comparing different negative dependence notions.

Contribution

It introduces new pre-asymptotic bounds with explicit constants for discrepancy measures under negative dependence conditions.

Findings

01

Negative dependence can lead to improved variance bounds.

02

Explicit constants are derived for star discrepancy bounds.

03

Several negatively dependent sampling schemes are exemplified.

Abstract

We study some notions of negative dependence of a sampling scheme that can be used to derive variance bounds for the corresponding estimator or discrepancy bounds for the underlying random point set that are at least as good as the corresponding bounds for plain Monte Carlo sampling. We provide new pre-asymptotic bounds with explicit constants for the star discrepancy and the weighted star discrepancy of sampling schemes that satisfy suitable negative dependence properties. Furthermore, we compare the different notions of negative dependence and give several examples of negatively dependent sampling schemes.

Equations192

μ^{MC} (f) = \frac{1}{N} i = 1 \sum N f (p_{i})

μ^{MC} (f) = \frac{1}{N} i = 1 \sum N f (p_{i})

I (f) = \int_{[0, 1)^{d}} f (u) d u .

I (f) = \int_{[0, 1)^{d}} f (u) d u .

C_{0}^{d} := {[0, a) ∣ a \in [0, 1)^{d}},

C_{0}^{d} := {[0, a) ∣ a \in [0, 1)^{d}},

C_{1}^{d} := {[a, 1) ∣ a \in [0, 1)^{d}} .

C_{1}^{d} := {[a, 1) ∣ a \in [0, 1)^{d}} .

D_{0}^{d} := {Q ∖ R ∣ Q, R \in C_{0}^{d}} .

D_{0}^{d} := {Q ∖ R ∣ Q, R \in C_{0}^{d}} .

P (j \in u ⋂ {T_{j} = 1}) \leq γ j \in u \prod P (T_{j} = 1) for all u \subseteq [N],

P (j \in u ⋂ {T_{j} = 1}) \leq γ j \in u \prod P (T_{j} = 1) for all u \subseteq [N],

P (j \in u ⋂ {T_{j} = 0}) \leq γ j \in u \prod P (T_{j} = 0) for all u \subseteq [N] .

P (j \in u ⋂ {T_{j} = 0}) \leq γ j \in u \prod P (T_{j} = 0) for all u \subseteq [N] .

P (∣ S ∣ \geq t) \leq 2 γ exp (- \frac{2 t ^{2}}{N}) for all t > 0.

P (∣ S ∣ \geq t) \leq 2 γ exp (- \frac{2 t ^{2}}{N}) for all t > 0.

1_{Q} (p_{1}), 1_{R} (p_{2})

1_{Q} (p_{1}), 1_{R} (p_{2})

P (p_{1} \in Q, p_{2} \in R) \leq P (p_{1} \in Q) P (p_{2} \in R),

P (p_{1} \in Q, p_{2} \in R) \leq P (p_{1} \in Q) P (p_{2} \in R),

P (p_{1} \in / Q, p_{2} \in / R) \leq P (p_{1} \in / Q) P (p_{2} \in / R) .

P (p_{1} \in / Q, p_{2} \in / R) \leq P (p_{1} \in / Q) P (p_{2} \in / R) .

μ_{P} f = \frac{1}{N} i = 1 \sum N f (p_{i}) .

μ_{P} f = \frac{1}{N} i = 1 \sum N f (p_{i}) .

Δ^{d} (f, A) := J \subset [d] \sum (- 1)^{∣ J ∣} f (a_{J}, b_{- J}) .

Δ^{d} (f, A) := J \subset [d] \sum (- 1)^{∣ J ∣} f (a_{J}, b_{- J}) .

Δ^{d} (f, A) \geq 0

Δ^{d} (f, A) \geq 0

P (p_{1}^{(i)} \geq α, p_{2}^{(i)} \geq β ∣ p_{1}^{(1 : i - 1)} \in A, p_{2}^{(1 : i - 1)} \in B)

P (p_{1}^{(i)} \geq α, p_{2}^{(i)} \geq β ∣ p_{1}^{(1 : i - 1)} \in A, p_{2}^{(1 : i - 1)} \in B)

\leq P (p_{1}^{(i)} \geq α ∣ p_{1}^{(1 : i - 1)} \in A, p_{2}^{(1 : i - 1)} \in B) P (p_{2}^{(i)} \geq β ∣ p_{1}^{(1 : i - 1)} \in A, p_{2}^{(1 : i - 1)} \in B),

P (p_{1}^{(i)} \in [q, 1), p_{2}^{(i)} \in [r, 1)) \leq P (p_{1}^{(i)} \in [q, 1)) P (p_{2}^{(i)} \in [r, 1)),

P (p_{1}^{(i)} \in [q, 1), p_{2}^{(i)} \in [r, 1)) \leq P (p_{1}^{(i)} \in [q, 1)) P (p_{2}^{(i)} \in [r, 1)),

Var (μ_{P} f) \leq Var (μ^{MC} f) .

Var (μ_{P} f) \leq Var (μ^{MC} f) .

(1_{Q} (p_{j}))_{j = 1}^{N}

(1_{Q} (p_{j}))_{j = 1}^{N}

P (j = 1 ⋂ t {p_{j} \in Q}) \leq γ j = 1 \prod t P (p_{j} \in Q),

P (j = 1 ⋂ t {p_{j} \in Q}) \leq γ j = 1 \prod t P (p_{j} \in Q),

P (j = 1 ⋂ t {p_{j} \in / Q)} \leq γ j = 1 \prod t P (p_{j} \in / Q) .

P (j = 1 ⋂ t {p_{j} \in / Q)} \leq γ j = 1 \prod t P (p_{j} \in / Q) .

D_{N} (P, x) := D_{N} (P, Q_{x}) := \frac{1}{N} ∣ P \cap Q_{x} ∣ - λ^{d} (Q_{x})

D_{N} (P, x) := D_{N} (P, Q_{x}) := \frac{1}{N} ∣ P \cap Q_{x} ∣ - λ^{d} (Q_{x})

D_{N}^{*} (P) := x \in [0, 1]^{d} sup D_{N} (P, x) .

D_{N}^{*} (P) := x \in [0, 1]^{d} sup D_{N} (P, x) .

\int_{[0, 1)^{d}} f d λ^{d} (x) - \frac{1}{N} p \in P \sum f (p) \leq D_{N}^{*} (P) Va r_{HK} (f),

\int_{[0, 1)^{d}} f d λ^{d} (x) - \frac{1}{N} p \in P \sum f (p) \leq D_{N}^{*} (P) Va r_{HK} (f),

D_{N}^{*} (P) \leq c \frac{d}{N}

D_{N}^{*} (P) \leq c \frac{d}{N}

\operatorname{{\bf P}}\left(D^{*}_{N}(\mathcal{P})\leq 0.7729\sqrt{10.7042+\rho+\frac{\ln\big{(}(1-\theta)^{-1}\big{)}}{d}}\sqrt{\frac{d}{N}}\right)\geq\theta.

\operatorname{{\bf P}}\left(D^{*}_{N}(\mathcal{P})\leq 0.7729\sqrt{10.7042+\rho+\frac{\ln\big{(}(1-\theta)^{-1}\big{)}}{d}}\sqrt{\frac{d}{N}}\right)\geq\theta.

D_{N}^{*} (P) \leq c \frac{d}{N} max {1, lo g (\frac{N}{d})}

D_{N}^{*} (P) \leq c \frac{d}{N} max {1, lo g (\frac{N}{d})}

P (D_{N}^{*} (P) \leq \frac{2}{N} d lo g (η) + ρ d + lo g (\frac{2}{1 - θ})) \geq θ,

P (D_{N}^{*} (P) \leq \frac{2}{N} d lo g (η) + ρ d + lo g (\frac{2}{1 - θ})) \geq θ,

N (d, δ) \leq 2^{d} \frac{d ^{d}}{d !} (δ^{- 1} + 1)^{d} .

N (d, δ) \leq 2^{d} \frac{d ^{d}}{d !} (δ^{- 1} + 1)^{d} .

N (1, δ) = ⌈ δ^{- 1} ⌉

N (1, δ) = ⌈ δ^{- 1} ⌉

D_{N}^{*} (P) \leq x \in Γ max D_{N} (P, [0, x)) + δ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On Negatively Dependent Sampling Schemes, Variance Reduction, and Probabilistic Upper Discrepancy Bounds

Marcin Wnuk Mathematisches Seminar, Universität Osnabrück, Germany ([email protected]).

Michael Gnewuch Institut für Mathematik, Universität Osnabrück, Germany ([email protected]).

Nils Hebbinghaus Institut für Informatik, Christian-Albrechts-Universität zu Kiel, Germany ([email protected]).

Abstract

We study some notions of negative dependence of a sampling scheme that can be used to derive variance bounds for the corresponding estimator or discrepancy bounds for the underlying random point set that are at least as good as the corresponding bounds for plain Monte Carlo sampling.

We provide new pre-asymptotic bounds with explicit constants for the star discrepancy and the weighted star discrepancy of sampling schemes that satisfy suitable negative dependence properties. Furthermore, we compare the different notions of negative dependence and give several examples of negatively dependent sampling schemes, including mixed sequences.

1 Introduction

Plain Monte Carlo (MC) sampling is a method frequently used in stochastic simulation and multivariate numerical integration. Let $p_{1},\ldots,p_{N}$ be independent random points, each uniformly distributed in the $d$ -dimensional unit cube $[0,1)^{d}$ . For an arbitrary integrable random variable (or function) $f:[0,1)^{d}\to{\mathbb{R}}$ we consider the MC estimator (or quadrature)

[TABLE]

for the expected value (or integral)

[TABLE]

An advantage of the MC estimator is that already under the very mild assumption on $f$ to be square integrable, it converges to $I(f)$ for $N\to\infty$ with convergence rate $1/2$ . Even though the convergence rate is not very impressive, it has the invaluable advantage that it does not depend on the number of variables $d$ .

However, there are many dependent sampling schemes (i.e., random sample points $p_{i}$ , $i=1,\ldots,N$ , that are still uniformly distributed in $[0,1)^{d}$ , but not necessarily independent any more) known that are superior to plain MC sampling with respect to certain objectives. An example are suitably randomized quasi-Monte Carlo (RQMC) point sets. They ensure, for instance, higher convergence rates for numerical integration of sufficiently smooth functions, they lead to much smaller asymptotic discrepancy measures, their sample points do not tend to cluster and have more evenly distributed lower dimensional projections (see, e.g., [5, 6, 19, 22]). It would be desirable to be able to construct dependent sampling schemes that have some of these or other favorable properties, and that are, with respect to other objectives, at least as good as MC sampling schemes.

Recently, in this direction some research had been done. In [20] Christiane Lemieux showed that a negative dependence property of RQMC points ensures that the variance of the corresponding RQMC estimator for functions $f$ that are monotone with respect to each variable is never larger than the variance of the corresponding MC estimator $\mu^{\rm MC}f$ . She also proved that a different negative dependence property yields that the variance of the RQMC estimator for an arbitrary bounded quasi-monotone $f$ is never larger than the variance of $\mu^{\rm MC}f$ . Those negative dependence properties rely solely on the marginals and the bivariate copulas of the RQMC points (i.e., on the distribution of single points and on the common distribution of pairs of points). Related results can be found in [33].

In a different line of research the second and the third author of this book chapter showed in [12, 14] that a specific negative dependence property of RQMC points guarantees that they satisfy the same pre-asymptotic probabilistic discrepancy bounds (with explicitly revealed dependence on the number of points $N$ as well as on the dimension $d$ ) as MC points. Here the negative dependence property relies on the common distribution of all sample points. Related results can be found in [7].

For more extensive motivations of both lines of research we refer to the elaborate introductions of [20] and [7, 12], respectively. The aim of this book chapter is to survey and compare the approaches mentioned above and to provide several new results.

This chapter is organized as follows: In Section 2 we introduce some notions of negatively dependent sampling schemes and discuss how one can benefit from them. In Section 3 we provide new probabilistic upper discrepancy bounds for sampling schemes. The discrepancy measures we consider are the star discrepancy and the weighted star discrepancy. These bounds are “plug-in results” in the following sense: One just has to check whether a sampling scheme satisfies the sufficient negative dependence condition and – if this is the case – obtains immediately a probabilistic discrepancy bound with explicitly given constants. In the Section 4, we give several examples of sampling schemes that satisfy the one or the other notion of negative dependence, including a generalized notion of stratified sampling schemes and mixed randomized sequences. Finally, in the Section 5 we elaborate on relations between different notions of negative dependence.

We finish the introduction by stating some notation. Let $d,N\in\mathbb{N}.$ If not stated otherwise we are always considering a randomized point set $(p_{j})_{j=1}^{N}:=\mathcal{P}\subset[0,1)^{d}$ consisting of $N$ points. For $a,b\in\mathbb{R}^{d},a=(a_{1},\ldots,a_{d}),b=(b_{1},\ldots,b_{d})$ we write $a\leq b$ if $a_{i}\leq b_{i},i=1,\ldots,d.$ All other inequalities are also to be understood componentwise. Moreover, $[a,b):=[a_{1},b_{1})\times\ldots\times[a_{d},b_{d})$ . Via $\mathcal{C}^{d}_{0}$ we denote the set of boxes (“corners”) anchored at [math]

[TABLE]

and by $\mathcal{C}_{1}^{d}$ the set of boxes anchored at $1,$

[TABLE]

We write $\mathcal{D}_{0}^{d}$ for the set of differences of boxes anchored at [math],

[TABLE]

For $m\in{\mathbb{N}}$ we denote the set $\{1,2,\ldots,m\}$ by $[m]$ , $\lambda^{d}$ stands for the d-dimensional Lebesgue measure on $\mathbb{R}^{d},$ in case $d=1$ we just write $\lambda.$ If not specified, all random variables are defined on a probability space $(\Omega,\Sigma,\operatorname{{\bf P}})$ .

2 Review of Notions of Negative Dependance of Sampling Schemes

2.1 $\gamma$ -Negative Dependence of Binary Random Variables and Sampling Schemes

The concept of negative dependence was introduced by Lehmann [18] for pairs of random variables. In the literature one finds several contributions on rather demanding notions of negative dependence as, e.g., negative association introduced in [16]; a survey can be found in [30]. Sufficient for our purpose is the following notion for Bernoulli or binary random variables, i.e., random variables that only take values in $\{0,1\}$ .

Definition 2.1.

Let $\gamma\geq 1$ . We call binary random variables $T_{1},T_{2},\ldots,T_{N}$ upper $\gamma$ -negatively dependent if

[TABLE]

and lower $\gamma$ -negatively dependent if

[TABLE]

We call $T_{1},T_{2},\ldots,T_{N}$ $\gamma$ -negatively dependent if both conditions (2) and (3) are satisfied. If $\gamma=1$ , we usually suppress the explicit reference to $\gamma$ .

$1$ -Negative dependence is usually called negative orthant dependence, cf. [4].

Notice that, in particular, independent binary random variables are negatively dependent. Furthermore, it is easily seen that for $N=2$ and $\gamma=1$ the notions of upper and lower $\gamma$ -negative dependence are equivalent, cf. [18].

We are interested in binary random variables $T_{i}$ , $i=1,\ldots,N$ , of the form $T_{i}=\operatorname{\bf{1}}_{A}(p_{i})$ , where $A$ is a Lebesgue-measurable subset of $[0,1)^{d}$ (whose characteristic function is denoted by $\operatorname{\bf{1}}_{A}$ ), and $p_{1},\ldots,p_{N}$ are randomly chosen points in $[0,1)^{d}$ .

We will use the following bound of Hoeffding-type; for a proof see, e.g., [14].

Theorem 2.2.

Let $\gamma\geq 1$ , and let $T_{1},\ldots,T_{N}$ be $\gamma$ -negatively dependent binary random variables. Put $S:=\sum^{N}_{i=1}(T_{i}-{\mathbb{E}}[T_{i}])$ . We have

[TABLE]

Definition 2.3.

A randomized point set $\mathcal{P}=(p_{j})_{j=1}^{N}$ is called a sampling scheme if every single $p\in\mathcal{P}$ is distributed uniformly in $[0,1)^{d}$ and the vector $(p_{1},\ldots,p_{N})$ is exchangeable, meaning that for any permutation $\pi$ of $[N]$ it holds that the law of $(p_{1},\ldots,p_{N})$ is the same as the law of $(p_{\pi(1)},\ldots,p_{\pi(N)}).$

The assumption of exchangeability is only of technical nature and, if we consider $\mathcal{P}$ as a randomized point set, may be always obtained in the process of symmetrization. Indeed, let $\tilde{\mathcal{P}}$ be a point set such that every $\tilde{p}\in\tilde{\mathcal{P}}$ is uniformly distributed in $[0,1)^{d}$ and let $\pi$ be a random uniformly chosen permutation of $[N].$ Then $(\tilde{p}_{\pi(1)},\ldots,\tilde{p}_{\pi(N)})$ is already a sampling scheme.

2.2 Pairwise Negative Dependence and Variance Reduction

Definition 2.4.

We say that a sampling scheme $\mathcal{P}$ is pairwise negatively dependent if for every $Q,R\in\mathcal{C}_{1}^{d}$ it holds that the random variables

[TABLE]

are negatively dependent. In other words, a sampling scheme $\mathcal{P}$ is pairwise negatively dependent if for every $Q,R\in\mathcal{C}_{1}^{d}$ we have

[TABLE]

Note that (5) implies (6) and vice versa, therefore one of the conditions is in fact redundant. In [20] this is known as negatively upper orthant dependent - or NUOD - sampling schemes.

Our interest lies in numerical integration of functions from some class $\mathcal{F}\subset L^{2}([0,1)^{d})$ using RQMC. A QMC quadrature is just a quadrature consisting of $N$ nodes, such that the evaluation in every node is given the same weight $\tfrac{1}{N}.$ By randomizing the set of nodes we obtain an RQMC quadrature. Let $\mu_{\mathcal{P}}f$ be an RQMC estimator of $I(f):=\int_{[0,1)^{d}}f(u)\,du$ based on the sampling scheme $\mathcal{P}=(p_{i})_{i=1}^{N},$ i.e.

[TABLE]

Moreover, let $\mu^{\rm{MC}}f$ be an estimator of $I(f)$ based on a Monte Carlo sample consisting of $N$ points (i.e. the integration nodes are chosen independently and uniformly from $[0,1)^{d}$ , see (1)).

It turns out that randomized QMC quadratures based on pairwise negatively dependent sampling schemes may lead to variance reduction in comparison to the simple MC quadratures. Here we describe shortly one of such cases, namely when integrands are bounded quasimonotone functions. The following exposition is based on [20].

To define what a quasimonotone function is we need first to introduce the notion of quasivolume. For $a,b\in[0,1)^{d},$ $J\subset[d]$ and a function $f:[0,1)^{d}\rightarrow\mathbb{R}$ we write $f(a_{J},b_{-J})$ to represent the evaluation of $f$ at the point $(x_{1},\ldots,x_{d}),$ where $x_{j}=a_{j}$ for $j\in J$ and $x_{j}=b_{j}$ otherwise. The quasivolume of $f$ over an interval $A=[a,b)\subset[0,1)^{d}$ is given by

[TABLE]

We say that the function $f$ is quasimonotone if

[TABLE]

for every interval $A.$ Note that if we define a content $\nu_{f}([0,a)):=f(a),a\in[0,1)^{d}$ then quasimonotonicity of $f$ means exactly that for any axis-parallel rectangle $R\subset[0,1)^{d}$ it holds $\nu_{f}(R)\geq 0.$

Apart from pairwise negative dependence there are a few similar notions which are also of interest. Let $p_{j}=(p_{j}^{(1)},\ldots,p_{j}^{(d)}),j=1,\ldots,N.$ If for every $i=1,\ldots,d,$ and every measurable $A,B\subset[0,1)^{i-1},\alpha,\beta\in[0,1)$

[TABLE]

we say that the sampling scheme $(p_{j})_{j=1}^{N}$ is conditionally negatively quadrant dependent (conditionally NQD). Here $p^{(1:i-1)}$ denotes the orthogonal projection of $p$ onto its first $i-1$ coordinates. Note that the conditional NQD property holds in particular if $(p_{1}^{(i)},p_{2}^{(i)})_{i=1}^{d}$ are independent and for every $i=1,\ldots,d,$ and every $q,r\in[0,1)$ we have

[TABLE]

in which case we talk of a coordinatewise independent NQD sampling scheme. Christiane Lemieux showed in [20, Corollary 2] that conditionally NQD sampling schemes provide RQMC estimators of integrals with variance no bigger then the variance of the MC estimator if the integrand is monotone in each coordinate.

The following is basically a combination of Proposition $3,$ Remark $8$ and Corollary $2$ from [20].

Theorem 2.5.

Let $f:[0,1)^{d}\rightarrow\mathbb{R}$ and $\mathcal{P}$ be a sampling scheme. Then if either

The function $f$ is bounded and $f$ or $-f$ is quasimonotone and $\mathcal{P}$ is pairwise negatively dependent, 2. 2.

The function $f$ is monotone in each coordinate and $\mathcal{P}$ is conditionally negatively quadrant dependent,

it holds

[TABLE]

In Section 5 we discuss relations between the introduced notions of negative dependence.

Let us note that the aforementioned paper provides actually more general results. Interested reader will find details in Sections 3 and 4 of [20].

For examples of pairwise negatively dependent and conditionally NQD sampling schemes see Sections 4.2 and 4.3.

2.3 Negatively Dependent Sampling Schemes and Discrepancy

Definition 2.6.

We say that a sampling scheme $(p_{j})_{j=1}^{N}=\mathcal{P}$ is $\mathcal{S}-\gamma-$ negatively dependent if for every $Q\in\mathcal{S}$ the random variables

[TABLE]

are $\gamma$ -negatively dependent. In other words for every $t\leq N$ we require

[TABLE]

Note that differently from the case of pairwise negative dependence, for $N>2$ one indeed needs to check both inequalities as they do not, in general, imply one another. If $\gamma=1$ and $\mathcal{S}=\mathcal{C}_{0}^{d}$ we usually talk just of negatively dependent sampling schemes. Moreover, if (7) is satisfied we speak of upper $\gamma-$ negatively dependent sampling schemes and if (8) is satisfied we speak of lower $\gamma-$ negatively dependent sampling schemes.

To motivate the interest in negatively dependent sampling schemes we introduce the notion of discrepancy. Discrepancy is meant to quantify how far is a finite point set $P\subset[0,1)^{d}$ consisting of $N$ points from being equidistributed in $[0,1)^{d}.$ It plays an important role in fields like numerical integration, computer graphics, empirical process theory and many more. Let $x\in[0,1]^{d}$ and $Q_{x}:=[0,x)\in\mathcal{C}_{0}^{d}.$ We define the discrepancy function $D_{N}(P,\cdot)$ for the point set $P$ at the point $x$ via

[TABLE]

and the star discrepancy $D_{N}^{*}(P)$ by

[TABLE]

Making a connection to numerical integration we note one of the versions of the Koksma-Hlawka inequality, which states that for every point set $P$ consisting of $N$ points it holds

[TABLE]

where $\operatorname{Var_{HK}}(f)$ is the Hardy-Krause variation of $f$ . The inequality is actually sharp, cf. [24].

It has been shown in [12] that $\mathcal{D}_{0}^{d}$ - $\gamma$ - negatively dependent sampling schemes have with large probability star discrepancy of the order $\sqrt{\tfrac{d}{N}}.$ More precisely the following theorem holds.

Theorem 2.7.

*Let $d,N\in{\mathbb{N}}$ and $\rho\in[0,\infty)$ . Let $\mathcal{P}=(p_{j})_{j=1}^{N}$ be a negatively $\mathcal{D}_{0}^{d}$ - $e^{\rho d}$ -dependent sampling scheme.

Then for every $c>0$ *

[TABLE]

holds with probability at least $1-e^{-(1.6741\cdot c^{2}-10.7042-\rho)\cdot d}.$ Moreover, for every $\theta\in(0,1)$

[TABLE]

Notice that these bounds depend only mildly on $\rho$ or $\gamma=e^{\rho d}.$ In particular, $\mathcal{D}_{0}^{d}$ - $1$ -negatively dependent sampling schemes satisfy the same preasymptotic discrepancy bound as Monte Carlo point sets do. For more details see [12].

In Remark 4.12 we present a bound similar to (10) under a bit different assumptions that can be applied to so-called mixed randomized sequences.

3 New Probabilistic Discrepancy Bounds

3.1 Bound on the Star Discrepancy for Negatively Dependent Sampling Schemes

Proving that a given sampling scheme is $\mathcal{D}_{0}^{d}$ - $\gamma$ -negatively dependent may turn out to be a difficult task. One of the problems lies in the fact that elements of $\mathcal{D}_{0}^{d}$ may in general not be represented as Cartesian products of one-dimensional intervals, cf. also Remark 4.12. With this in mind we would like to weaken the assumptions on the sampling scheme $\mathcal{P}.$ In the following result we show that by requiring the sampling scheme $\mathcal{P}$ only to be $\mathcal{C}^{d}_{0}$ - $\gamma$ -negatively dependent one already gets with high probability a discrepancy of the order $\sqrt{\tfrac{d}{N}\log(e+\tfrac{N}{d})}.$

Theorem 3.1.

Let $d,N\in{\mathbb{N}}$ and $\rho\in[0,\infty)$ . Let $\mathcal{P}=(p_{j})_{j=1}^{N}$ be a $\mathcal{C}^{d}_{0}$ - $e^{\rho d}$ -negatively dependent sampling scheme in $[0,1)^{d}.$ Then for every $c>0$

[TABLE]

holds with probability at least $1-2e^{(-\frac{1}{2}(c^{2}-1)\xi+\rho+\log(2e(\frac{2}{c}+1)))d},$ where $\xi=\max\left\{1,\log\left(\frac{N}{d}\right)\right\}.$ Moreover, for every $\theta\in(0,1)$

[TABLE]

where $\eta:=\eta(N,d)=6e\left(\max(1,\frac{N}{2d\log(6e)})\right)^{\frac{1}{2}}.$

The proof of Theorem 3.1 requires some preparation. To “discretize” the star discrepancy, we define $\delta$ –covers as in [8]: for any $\delta\in(0,1]$ a finite set $\Gamma$ of points in $[0,1)^{d}$ is called a $\delta$ –cover of $[0,1)^{d}$ , if for every $y\in[0,1)^{d}$ there exist $x,z\in\Gamma\cup\{0\}$ such that $x\leq y\leq z$ and $\lambda^{d}([0,z])-\lambda^{d}([0,x])\leq\delta$ . The number $\mathcal{N}(d,\delta)$ denotes the smallest cardinality of a $\delta$ –cover of $[0,1)^{d}$ .

The following theorem was stated and proved in [10].

Theorem 3.2.

For any $d\geq 1$ and $\delta\in(0,1]$ we have

[TABLE]

Notice that due to Stirling’s formula we have $d^{d}/d!\leq e^{d}/\sqrt{2\pi d}$ and so the cardinality of the $\delta-$ cover may be bounded from above by $(2e)^{d}(1+\delta^{-1})^{d}.$ Furthermore, it is easy to verify that in the case $d=1$ the identity

[TABLE]

is established with the help of the $\delta$ -cover $\Gamma:=\{1/\lceil\delta^{-1}\rceil,2/\lceil\delta^{-1}\rceil,\ldots,1\}$ .

With the help of $\delta$ -covers the star discrepancy can be approximated in the following sense.

Lemma 3.3.

Let $P\subset[0,1)^{d}$ be an $N$ -point set, $\delta>0$ , and $\Gamma$ be a $\delta$ -cover of $[0,1)^{d}$ . Then

[TABLE]

The proof of Lemma 3.3 is straightforward, cf., e.g., [8, Lemma 3.1].

Now we are ready to prove Theorem 3.1.

Proof.

For $\delta\in(0,1)$ to be chosen later let $\Gamma$ be a $\delta-$ cover consisting of at most $(2e)^{d}(1+\delta^{-1})^{d}$ elements. Such a $\Gamma$ exists due to Theorem 3.2 and discussion thereafter.

Define

[TABLE]

Now Lemma 3.3 gives us

[TABLE]

For every $\beta\in\Gamma$ and $j\in[N]$ put

[TABLE]

Let $\epsilon=2\delta.$ Due to Hoeffding’s inequality applied to random variables $(\xi_{\beta}^{(j)})_{j=1}^{N}$ (applicable since $(p_{j})_{j=1}^{N}$ is $e^{\rho d}$ - negatively dependent) we obtain for every $\beta\in\Gamma$

[TABLE]

With the help of a simple union bound we get

[TABLE]

Using the above we would like to find a bound on discrepancy of the sampling scheme $\mathcal{P}$ which holds with probability at least $\theta\in(0,1).$ We are looking for $\epsilon_{\theta}$ such that

[TABLE]

Put $\epsilon_{\theta}=C_{\theta}(\frac{d}{N}\log(1+\frac{N}{d}))^{\frac{1}{2}}=2\delta_{\theta}.$ Inequality (15) holds true if

[TABLE]

Our problem boils now down to finding possibly small $\delta_{\theta}\in(0,1)$ for which

[TABLE]

Specifying $\delta_{\theta}$ to be of the form

[TABLE]

we get that (16) is satisfied if

[TABLE]

Expanding $\delta_{\theta}$ in dependence of $\eta$ it suffices to find $\eta$ for which

[TABLE]

and one easily sees that this is satisfied for $\eta$ given in the statement of the theorem. To prove 11 one only needs to plug in $\epsilon=c\sqrt{\frac{d}{N}}$ into (14) and then consider the two cases $\xi=1$ and $\xi=\log\left(\frac{N}{d}\right)$ separately. ∎

3.2 Bound on the Weighted Star Discrepancy for $\mathcal{D}_{0}^{d}-\gamma$ -negatively Dependent Sampling Schemes.

One of the reasons why the QMC integration may be successfully applied in many high-dimensional problems is the fact that quite often only a small number of coordinates is really important. This observation led to the introduction of weighted function spaces and weighted discrepancies by Sloan and Woźniakowski in [31]. The above concepts are closely related to the theory of weighted spaces of Sobolev type, in particular the integration error in those spaces obeys a Koksma-Hlawka type upper bound, which may be phrased using the norm of the function and the weighted star discrepancy.

By weights we understand a set of non-negative numbers $\gamma=(\gamma_{u})_{u\in[d]\setminus\emptyset},$ where $\gamma_{u}$ is interpreted as the weight of the coordinates from $u.$ Let $|u|$ denote the cardinality of $u.$ For $x\in[0,1]^{d}$ we write $(x(u),1)$ to denote the point in $[0,1]^{d}$ agreeing with $x$ on the coordinates from $u$ and having all the other coordinates set to $1.$

The weighted star discrepancy of a point set $X=(x_{1},\ldots,x_{N})$ and weights $\gamma$ is defined by

[TABLE]

The following theorem is similar in flavor to the Theorem $1$ from [2].

Theorem 3.4.

Let $N,d\in\mathbb{N}$ and let $\mathcal{P}=(p_{j})_{j=1}^{N}\subset[0,1)^{d}$ be a sampling scheme, such that for every $\emptyset\neq u\subset[d]$ its projection on the coordinates in $u$ is $\mathcal{D}_{0}^{|u|}$ - $e^{\rho|u|}$ - negatively dependent. Then for any weights $(\gamma_{u})_{u\subset[d]\setminus\emptyset}$ and any $c>0$ it holds

[TABLE]

with probability at least $2-(1+e^{-(1.674c^{2}-10.7042-\rho)})^{d}.$ Moreover, for $\theta\in(0,1)$ it holds

[TABLE]

Proof.

We shall only prove the statement (17), the statement (18) follows then by simple calculations. For $\emptyset\neq u\subset[d]$ and $c>0$ put

[TABLE]

Here $X^{u}$ denotes the projection of $X$ on the coordinates from $u.$ By Theorem 2.7 it holds

[TABLE]

Now

[TABLE]

∎

4 Examples of Negatively Dependent and Pairwise Negatively Dependent Sampling Schemes

Many sampling schemes, such as randomly shifted and jittered rank-1 lattices (cf. Section 4.2) and Latin hypercube sampling (cf. Section 4.3), are multidimensional generalizations of the one-dimensional simple stratified sampling. Simple startified sampling is defined in the following way: let $\pi$ be a uniformly chosen permutation of $\{1,\ldots,N\}$ and let $(U_{j})_{j=1}^{N}$ be independent random variables distributed uniformly on $(0,1].$ Moreover, $\pi$ is independent of $(U_{j})_{j}.$ We put

[TABLE]

Effectively, one is considering the partition $I_{j}:=[\tfrac{j-1}{N},\tfrac{j}{N}),j=1,\ldots,N,$ of the unit interval and in every element of the partition putting one point, independently of all the other points. The simple lemma is a useful tool for our investigations and may be found e.g. in [33].

Lemma 4.1.

Simple stratified sampling $\mathcal{P}=(p_{j})_{j=1}^{N}$ is pairwise negatively dependent.

4.1 Negative Dependence of Generalized Stratified Sampling

We partition $[0,1)^{d}$ into $\beta\geq N$ sets $(B_{j})_{j=1}^{\beta}$ with $\lambda^{d}(B_{j})=\frac{1}{\beta},j=1,\ldots,\beta.$ Let $Y=(Y_{1},\ldots,Y_{\beta})$ be a random vector distributed uniformly on

[TABLE]

Given the value of $Y$ we place one point for each $j\in[\beta]$ with $Y_{j}=1$ uniformly and independently of all other points inside $B_{j}.$ Symmetrizing this construction yields a sampling scheme $\mathcal{P}=(p_{j})_{j=1}^{N},$ which we call generalized stratified sampling (note that every single $p\in\mathcal{P}$ is uniformly distributed in $[0,1)^{d})$ . Here “generalized” has to be understood in the sense that there are possibly more strata then points.

Example 4.2.

There are many natural choices for the strata. The simplest one would be stripes of the form $B_{j},j=1,\ldots,N,$ with $B_{j}:=[\tfrac{j-1}{N},\tfrac{j}{N})\times[0,1)^{d-1}.$ However, one could also choose, e.g., elementary cells (i.e., fundamental parallelepipeds) of a rank-1 lattice (cf. [17]), see Figure 1.

To show that generalized stratified sampling is negatively dependent we need first a simple lemma.

Lemma 4.3.

Let $t,N\in\mathbb{N},t\leq N,\xi\geq 0$ and let

[TABLE]

The function

[TABLE]

takes on its maximum in the point $(x_{1},\ldots,x_{N})=(\frac{\xi}{N}.\ldots,\frac{\xi}{N}).$

Proof.

We shall prove the statement by induction on $N\geq t.$ The case $N=t$ is straightforward by Lagrange multipliers theorem. Suppose we have already shown the statement for $N-1$ and we would like to prove it for $N.$ Firstly let us fix the value of $x_{N}\in(0,\xi).$ It holds

[TABLE]

By the induction assumption for a fixed value of $x_{N}$ the last term is maximal when for $j=1,\ldots,N-1$ we have $x_{j}=\frac{\eta}{N-1},$ where we put $\eta=\xi-x_{N}.$ Plugging it into the above formula we obtain

[TABLE]

which we need to maximize with respect to $\eta.$ It holds

[TABLE]

where $C=\frac{(N-1)!}{(t-1)!(N-t)!(N-1)^{t-1}}$ and $h(\eta)=\xi\eta^{t-1}+\left(\frac{N-t}{t(N-1)}-1\right)\eta^{t}.$ Now we have

[TABLE]

The derivative vanishes for $t\geq 3$ at $\eta_{1}=0$ and $\eta_{2}=\frac{N-1}{N}\xi.$ Since $h(\eta_{2})>\max\{h(0),h(\xi)\}$ and $\eta_{2}$ is a local maximum the claim follows. ∎

Theorem 4.4.

Let $\mathcal{P}=(p_{j})_{j=1}^{N}$ be a generalized stratified sampling as described above and $A\subset[0,1)^{d}$ be measurable. Then for every $1\leq t\leq N$ it holds

[TABLE]

In particular, generalized stratified sampling is $\mathcal{S}$ - negatively dependent for any system $\mathcal{S}$ of measurable subsets of $[0,1)^{d}$ .

Proof.

Fix $t$ as in the statement of the theorem and define

[TABLE]

Note that $|D_{t}|=\beta(\beta-1)\cdots(\beta-t+1).$ For $k=(k_{1},\ldots,k_{t})\in D_{t}$ we have

[TABLE]

By Lemma 4.3 it follows

[TABLE]

∎

Remark 4.5.

Without further information on the strata we cannot make any conclusions about pairwise negative dependence of generalized stratified sampling. As an example consider a stratified sampling scheme $\mathcal{P}=(p_{1},p_{2})$ defined by two strata $B_{1},B_{2}$ in $d\geq 2.$ One may choose $B_{1},B_{2}$ and $Q,R\in\mathcal{C}_{1}^{d}$ in such a way that $Q\subset B_{1},B_{2}\subset R$ and $R\neq[0,1)^{d},$ see Figure 2. In this case however

[TABLE]

and the sampling scheme is not pairwise negatively dependent.

On the other hand if we consider strata $B_{j},j=1,\ldots,N,$ with $B_{j}:=[\tfrac{j-1}{N},\tfrac{j}{N})\times[0,1)^{d-1}$ then this practically boils down to the one-dimensional case and so the corresponding sampling scheme is pairwise negatively dependent, cf. Lemma 4.1.

4.2 Pairwise Negative Dependence and Conditional NQD Property of Randomly Shifted and Jittered Rank-1 Lattices

The exposition follows closely [33]. Let $N$ be prime. By $\mathbb{F}:=\mathbb{F}_{N}$ we denote $\{0,1,\ldots,N-1\}$ . Moreover, $\mathbb{F}^{*}:=\mathbb{F}\setminus\{0\}.$ We also put $\widetilde{\mathbb{F}}:=\frac{1}{N}\mathbb{F}$ and similarly $\widetilde{\mathbb{F}}^{*}:=\frac{1}{N}\mathbb{F}^{*}.$

A discrete subgroup $\mathcal{L}$ of the $d-$ dimensional torus $\mathbb{T}^{d}$ is called a lattice. A set $(y_{j})_{j=1}^{N}$ is a rank-1 lattice if for some $g\in(\tilde{\mathbb{F}}^{*})^{d}$ it admits a representation

[TABLE]

In this case $g$ is called a generating vector of the lattice.

Note that our definition differs from the usual one in that we allow only for generating vectors $g$ from $(\widetilde{\mathbb{F}}^{*})^{d}$ and not from $\widetilde{\mathbb{F}}^{d},$ which saves us from considering some degenerate cases.

We want now to define a sampling scheme based on rank-1 lattices which we call randomly shifted and jittered rank-1 lattice. To this end let $(y_{j})_{j=1}^{N}$ be a rank-1 lattice with generating vector chosen randomly uniformly from $(\widetilde{\mathbb{F}}^{*})^{d}.$ Let $U$ be distributed uniformly on $\widetilde{\mathbb{F}}^{d},$ $J_{j},j=1,\ldots,N$ be uniformly distributed on $[0,\frac{1}{N})^{d}$ and $\pi$ be a uniformly chosen permutation of $\{1,\ldots,N\}.$ Moreover, let all of the aforementioned random variables be independent. We put

[TABLE]

We call the sampling scheme $\mathcal{P}=(p_{j})_{j=1}^{N}$ a randomly shifted and jittered rank-1 lattice (RSJ rank-1 lattice). Putting it in words: we first take a rank-1 lattice with a random generator and symmetrize it. Then we shift the lattice uniformly on the torus, where the shift has resolution $\frac{1}{N}$ . In the last step we jitter every point independently of all the other points in a cube of volume $(\tfrac{1}{N})^{d}$ .

The following is Theorem 3.4. from [33].

Theorem 4.6.

Let $N$ be prime, $d\in\mathbb{N}$ . RSJ rank-1 lattice $\mathcal{P}=(p_{j})_{j=1}^{N}$ in $[0,1)^{d}$ is a coordinatewise independent NQD sampling scheme.

In particular, RSJ rank-1 lattice is a pairwise negatively dependent and a conditionally NQD sampling scheme, which means that both alternative conditions for $\mathcal{P}$ from Theorem 2.5 hold.

In contrast to generalized stratified sampling (cf. Theorem 4.4) and Latin hypercube sampling (see Theorem 4.7), RSJ rank-1 lattice is for $d\geq 2$ and $N\geq 3$ in general not $\mathcal{C}_{0}^{d}$ - negatively dependendent, see Subsection 5.1.

4.3 Negative Dependence, Conditional NQD Property, and Pairwise Negative Dependence of Latin Hypercube Sampling

Let $(\pi_{i})_{i=1}^{d}$ be independent uniformly chosen permutations of $[N],$ and $U^{(i)}_{j},i=1,\ldots,d,j=1,\ldots,N$ be independent random variables distributed uniformly on $(0,1]$ and independent also of the permutations. A sampling scheme $(p_{j})_{j=1}^{N}$ is called a Latin hypercube sampling if the $i-$ th coordinate of the $j-$ th point $p_{j}^{(i)}$ is given by

[TABLE]

What one intuitively does is the following: one cuts $[0,1)^{d}$ into slices $(S_{k,j})_{j=1}^{N},k=1,\ldots,d$ given by

[TABLE]

and puts $N$ points in such a way that in every slice there is exactly one point.

It is worth mentioning that for $d=1$ Latin hypercube sampling is exactly the same as RSJ rank-1 lattice (namely simple stratified sampling). For $d\geq 2$ the joint distribution of a pair of points is the same for Latin hypercube sampling as for RSJ rank-1 lattice. But if we sample more than two points, then the joint distributions already differ, see [33].

Negative dependence of Latin hypercube Sample has been studied in [12] and pairwise negative dependence has been investigated in [33].

Theorem 4.7.

Latin hypercube Sample in $[0,1)^{d}$ is a sampling scheme which is

(i)

$\mathcal{D}^{d}_{0}$ * - $e^{d}$ - negatively dependent,*

(ii)

$\mathcal{C}_{0}^{d}$ * - negatively dependent,*

(iii)

coordinatewise independent NQD.

In the above, statements $(i)$ and $(ii)$ follow from Theorem $3.5.$ from [12], and statement $(iii)$ is Theorem $3.4.$ from [33].

In particular from $(iii)$ it follows that LHS is pairwise negatively dependent as well as conditionally NQD.

4.4 Pairwise Negative Dependence of Scrambled (0,m,s)-Nets.

The so called $(t,m,s)$ -nets belong to the most regular deterministic point sets. First defined by Niederreiter in [23], they have been subject of extensive research. For a nice introduction on $(t,m,s)$ -nets and their randomization, see [22].

Let us fix a base $b\in\mathbb{N}_{\geq 2}.$ For $j\in\mathbb{N}_{0}$ and $k=0,1,\ldots,b^{j}-1$ an interval of the form

[TABLE]

is called an elementary interval (in base b). Moreover, for $s\in\mathbb{N}$ and vectors ${\bf{j}}=(j_{1},\ldots,j_{s})$ and ${\bf{k}}=(k_{1},\ldots,k_{s})$ (where for every $l=1,\ldots,s,$ we require $0\leq k_{l}\leq b^{j_{l}}-1$ ) we define an $s$ -dimensional elementary interval via

[TABLE]

A $(t,m,s)$ -net is any $P\subset[0,1]^{s}$ such that for any elementary interval $E$ with $\lambda^{s}(E)=b^{-m+t}$ there are exactly $b^{t}$ point in $P\cap E.$ It is easily seen that a $(t,m,s)$ -net consists of exactly $b^{m}$ points. Specific constructions of $(t,m,s)$ -nets are known.

Scrambling of depth $m$ is a bijective function $S:[0,1]^{s}\rightarrow[0,1]^{s}$ such that for any elementary interval $E$ with $\lambda^{s}(E)=b^{-m}$ the image $S(E)$ is again an elementary interval of volume $b^{-m}.$

Now let us focus on the case $t=0.$ Taking a $(0,m,s)$ -net and applying to it a random scrambling of depth $m$ one obtains a randomized point set. Scramblings are defined in such a way that for any scrambling $S$ of depth $m$ and a $(0,m,s)$ -net $P$ , the point set $S(P)$ is again a $(0,m,s)$ -net. By an appropriate choice of randomized scrambling $\tilde{S}$ one may make $\tilde{S}(P)$ to be a sampling scheme. In this case we call $\tilde{S}(P)$ a scrambled $(0,m,s)$ -net. Scrambling as a way of randomization of $(0,m,s)$ -nets has been studied by A.B. Owen, e.g., in [29].

In a recent article [21] C. Lemieux and J. Wiart have shown the following theorem (which follows from Corollary $4.10$ from the aforementioned article).

Theorem 4.8.

Scrambled $(0,m,s)$ -nets are pairwise negatively dependent sampling schemes.

4.5 Mixed Randomized Sequences

As already mentioned, part of the success of RQMC stems from the fact that in many high-dimensional practical integration problems only a small number of coordinates is of real importance. It stands to reason that one tries to use it to his avail by constructing quadratures in which one uses RQMC on the “important” coordinates and simple (usually much cheaper) Monte Carlo for the rest of the coordinates. This method is sometimes referred to as padding and the resulting sequences of integration nodes are called mixed sequences. Let us give a formal definition.

Definition 4.9.

Let $d,d^{\prime},d^{\prime\prime}\in{\mathbb{N}}$ with $d=d^{\prime}+d^{\prime\prime}$ . Let $X=(X_{k})_{k\in{\mathbb{N}}}$ be a sequence in $[0,1)^{d^{\prime}}$ , and let $Y=(Y_{k})_{k\in{\mathbb{N}}}$ be a sequence in $[0,1)^{d^{\prime\prime}}$ . The $d$ -dimensional concatenated sequence $Z=(Z_{k})_{k\in{\mathbb{N}}}=(X_{k},Y_{k})_{k\in{\mathbb{N}}}$ is called a mixed sequence. If $Y$ is a sequence of independent uniformly distributed random points, one also says that $Z$ results from $X$ by padding by Monte Carlo and calls $Z$ a hybrid-Monte Carlo sequence. If $X$ and $Y$ are both randomized sequences, we call $Z$ a mixed randomized sequence.

Padding by Monte Carlo was introduced by Spanier in [32] to tackle problems in particle transport theory. He suggested to use a hybrid-Monte Carlo sequence resulting from padding a deterministic low-discrepancy sequence. Hybrid-Monte Carlo sequences showed a favorable performance in several numerical experiments, see, e.g., [26, 27]. The latter papers also provided theoretical results on probabilistic discrepancy estimates of hybrid-Monte Carlo sequences which have been improved in [3, 11]. Favorable discrepancy bounds for padding Latin hypercube sampling (LHS) by Monte Carlo were provided in [12]. Padding a sequence by LHS (instead of by Monte Carlo) was considered earlier by Owen [28, Example 5].

A related line of research, initiated in [25], is to study the discrepancy of concatenated sequences that result from two deterministic sequences. More recent results can, e.g., be found in [13, 9, 15] and the literature mentioned therein.

The following proposition shows that concatenating two mutually independent negatively dependent sampling schemes results again in a (higher dimensional) negatively dependent sampling scheme. A weaker version of the next proposition may be found in [14]; cf. Lemma 5 there.

Proposition 4.10.

Let $d,d^{\prime},d^{\prime\prime}\in{\mathbb{N}}$ such that $d=d^{\prime}+d^{\prime\prime}$ . Let $A\subseteq[0,1)^{d^{\prime}}$ , $B\subseteq[0,1)^{d^{\prime\prime}}$ be Borel measurable sets. Let $x_{1},\ldots,x_{N}$ be a sampling scheme in $[0,1)^{d^{\prime}}$ and $y_{1},\ldots,y_{N}$ a sampling scheme in $[0,1)^{d^{\prime\prime}}$ . Furthermore, let $\alpha,\beta\geq 1$ .

(i)

If the random variables $\operatorname{\bf{1}}_{A}(x_{i})$ , $i=1,\ldots,N$ , and $\operatorname{\bf{1}}_{B}(y_{i})$ , $i=1,\ldots,N$ , are upper negatively $\alpha$ - and $\beta$ - dependent, respectively, and mutually independent, then the random variables $\operatorname{\bf{1}}_{A\times B}(x_{i},y_{i})$ , $i=1,\ldots,N$ , induced by the random vectors $(x_{1},y_{1}),\ldots,(x_{N},y_{N})$ in $[0,1)^{d}$ , are upper negatively $\alpha\beta$ -dependent.

(ii)

If the random variables $\operatorname{\bf{1}}_{A}(x_{i})$ , $i=1,\ldots,N$ , and $\operatorname{\bf{1}}_{B}(y_{i})$ , $i=1,\ldots,N$ , are lower negatively $\alpha$ - and $\beta$ -dependent, respectively, and mutually independent, then the random variables $\operatorname{\bf{1}}_{A\times B}(x_{i},y_{i})$ , $i=1,\ldots,N$ , induced by the random vectors $(x_{1},y_{1})$ , … , $(x_{N},y_{N})$ * in $[0,1)^{d}$ , are lower negatively $\alpha\beta$ -dependent.*

Proof.

Let us first prove statement (i). Obviously we have for $J\subseteq[N]$

[TABLE]

We now prove statement (ii). Take any $\emptyset\neq J\subseteq[N]$ and set $t=|J|$ . Suppose first that $((x_{j},y_{j}))_{j=1}^{N}$ is a hybrid-Monte Carlo sequence, i.e. $(y_{j})_{j=1}^{N}$ is a Monte Carlo sampling scheme. Due to our assumptions in statement (ii) we obtain

[TABLE]

Now let $(y_{j})_{j=1}^{N}$ be any sampling scheme in $[0,1)^{d^{\prime\prime}}$ such that the random variables $(\operatorname{\bf{1}}_{B}(y_{j}))_{j=1}^{N}$ are lower $\beta$ -negatively dependent and let $(\hat{y}_{j})_{j=1}^{N}$ be a Monte Carlo sampling scheme in $[0,1)^{d^{\prime\prime}}$ ; we assume both sampling schemes to be mutually independent to $(x_{j})_{j=1}^{N}$ . Analogously as in the previous case we obtain

[TABLE]

It follows from the case of hybrid-Monte Carlo sequences that

[TABLE]

∎

Remark 4.11.

It follows easily on closer examination of the proof that for the statement (i) of Proposition 4.10 to hold true we need only $(\operatorname{\bf{1}}_{A}(x_{j}))_{j=1}^{N}$ and $(\operatorname{\bf{1}}_{B}(y_{j}))_{j=1}^{N}$ to be negatively $\alpha$ - respectively $\beta$ -upper dependent point sets, not necessarily sampling schemes. Moreover, if in (ii) we assume that $(y_{j})_{j=1}^{N}$ is a Monte Carlo sampling scheme we also do not need to assume that $(x_{j})_{j=1}^{N}$ is a sampling scheme.

Remark 4.12.

Let $\mathcal{S}^{\prime}$ , $\mathcal{S}^{\prime\prime}$ be systems of measurable sets in $[0,1)^{d^{\prime}}$ and $[0,1)^{d^{\prime\prime}}$ , respectively. Let $(x_{j})_{j=1}^{N}$ be an $\mathcal{S}^{\prime}$ - $\alpha$ -negative dependent sampling scheme in $[0,1)^{d^{\prime}}$ and $(y_{j})_{j=1}^{N}$ an $\mathcal{S}^{\prime\prime}$ - $\beta$ -negative dependent sampling scheme in $[0,1)^{d^{\prime\prime}}$ ; both sampling schemes should be mutually independent. Furthermore, let $\mathcal{P}:=(p_{j})_{j=1}^{N}$ be the resulting concatenated sampling scheme in $[0,1)^{d}$ , i.e., $p_{i}:=(x_{i},y_{i})$ , $i=1,\ldots,N$ .

(i)

If $\mathcal{S}^{\prime}=\mathcal{C}_{0}^{d^{\prime}}$ and $\mathcal{S}^{\prime\prime}=\mathcal{C}_{0}^{d^{\prime\prime}}$ , we obtain from Proposition 4.10 that the mixed randomized sequence $(p_{j})_{j=1}^{N}$ is $\mathcal{C}^{d}_{0}$ - $\alpha\beta$ -negatively dependent, which implies that we may directly apply Theorem 3.1 to obtain a probabilistic discrepancy bound for $\mathcal{P}$ .

(ii)

If $\mathcal{S}^{\prime}=\mathcal{D}_{0}^{d^{\prime}}$ and $\mathcal{S}^{\prime\prime}=\mathcal{D}_{0}^{d^{\prime\prime}}$ , we obtain from Proposition 4.10 that $(p_{j})_{j=1}^{N}$ is $\alpha\beta$ -negatively dependent with respect to the set system

[TABLE]

Hence Theorem 2.7 is unfortunately not directly applicable to $\mathcal{P}$ . Nevertheless, one may prove a counterpart of Theorem 2.7 with slightly worse constants that relies on negative dependence with respect to $\mathcal{D}^{d^{\prime}}_{0}\times\mathcal{D}^{d^{\prime\prime}}_{0}$ . Namely, one may show for every $\theta\in(0,1)$ that

[TABLE]

The bound is based on the following simple observation: To estimate the local discrepancy of $\mathcal{P}$ in a test box $Q\in\mathcal{C}^{d}_{0}$ , the strategy used in [12] (and earlier in [2]) is to decompose $Q$ into finitely many disjoint differences of boxes $\Delta_{1},\ldots,\Delta_{K}\in\mathcal{D}^{d}_{0}$ such that $Q=\cup_{\nu=1}^{K}\Delta_{\nu}$ . This gives

[TABLE]

Now let us consider a fixed index $\nu$ . Then we find $A_{\nu},B_{\nu}\in\mathcal{C}^{d}_{0}$ such that $A_{\nu}\subseteq B_{\nu}$ and $\Delta_{\nu}=B_{\nu}\setminus A_{\nu}$ . Furthermore, we may write $A_{\nu}=A_{\nu}^{\prime}\times A_{\nu}^{\prime\prime}$ and $B_{\nu}=B_{\nu}^{\prime}\times B_{\nu}^{\prime\prime}$ with $A_{\nu}^{\prime},B_{\nu}^{\prime}\in\mathcal{C}_{0}^{d^{\prime}}$ and $A_{\nu}^{\prime\prime},B_{\nu}^{\prime\prime}\in\mathcal{C}_{0}^{d^{\prime\prime}}$ . Then we may represent $\Delta_{\nu}$ as disjoint union

[TABLE]

Thus

[TABLE]

where $C^{1}_{\nu},C^{2}_{\nu}\in\mathcal{D}^{d^{\prime}}_{0}\times\mathcal{D}^{d^{\prime\prime}}_{0}$ . Now large deviation inequalities of Bernstein- and Hoeffding-type can be used to obtain for each of the random variables $D_{N}(\mathcal{P},C^{1}_{\nu})$ , $D_{N}(\mathcal{P},C^{2}_{\nu})$ the same upper bound as for the local discrepancy $D_{N}(\mathcal{P}^{*},\Delta_{\nu})$ of a $\mathcal{D}^{d}_{0}$ - $\alpha\beta$ -negative dependent sampling schemes $\mathcal{P}^{*}$ in the proof of [12, Theorem 4.3]. This, combined with (20) und (21), results in a probabilistic discrepancy bound for $D^{*}_{N}(\mathcal{P})$ that is as most as twice as big as the one from Theorem 2.7; for further details see [12, Proof of Theorem 4.3].

5 Relations Between Notions of Negative Dependence

It may be easily seen that the coordinatewise independent NQD property implies the pairwise negative dependence property as well as the conditional NQD property. It turns out that this is the only valid implication between the considered notions of negative dependence. In this section we give examples showing that other implications do not hold.

5.1 Pairwise Negative Dependence and Negative Dependence

Neither the pairwise negative dependence of a sampling scheme implies the negative dependence, nor the other way round.

Example 5.1.

We first show an example of a negatively dependent sampling scheme which is not pairwise negatively dependent. To this end consider a sampling scheme consisting of just two points $(p_{1},p_{2})$ with joint CDF $F:[0,1]^{2}\rightarrow[0,1]$ given by

[TABLE]

It is easy to see that $F(0,0)=0,F(1,1)=1,F$ is continuous, qusi-monotone, and $F(x,y)=F(y,x)$ , which implies that $F$ is a CDF of a sampling scheme. Moreover,

[TABLE]

so the sampling scheme is $\mathcal{C}_{0}^{1}$ - negatively dependent. Notice that due to $d=1,$ it is equivalent to saying that the sampling scheme is $\mathcal{C}_{1}^{1}$ - negatively dependent. However, for instance

[TABLE]

Example 5.2.

To see that even the stronger coordinatewise independent NQD property does not imply the negative dependence property consider RSJ rank-1 lattice defined in Subsection 4.2. On the one hand, according to Theorem 4.6, RSJ rank-1 lattice is coordinatewise independent NQD. On the other hand, let us consider the situation for $d=2,$ and a large $N$ to be chosen later. We put $Q=[0,\tfrac{3}{N})^{2}.$ Obviously

[TABLE]

We also have

[TABLE]

the inequality follows since for the diagonal configuration of the points (i.e. $p_{j}=(\tfrac{\pi(j)}{N},\tfrac{\pi(j)}{N})+J_{j},j=1,\ldots,n$ for some permutation $\pi$ of $\{1,\ldots,N\},k\in[N-1]$ ) there is one triple of points always lying in $Q.$ Notice that any generating vector of the form $g=(\tfrac{k}{N},\tfrac{k}{N})$ and any shift of the form $S=(\tfrac{l}{N},\tfrac{l}{N}),l\in\{0,1,\ldots,N-1\},$ results in a diagonal configuration. Now for $N$ large enough it holds

[TABLE]

5.2 Conditional NQD and Pairwise Negative Dependence

Example 5.3.

First we show an example of a pairwise negatively dependent sampling scheme which is not conditionally NQD. Let $B_{1}=[0,\tfrac{1}{2})^{2},B_{2}=[\tfrac{1}{2},1)\times[0,\tfrac{1}{2}),B_{3}=[0,\tfrac{1}{2})\times[\tfrac{1}{2},1),B_{4}=[\tfrac{1}{2},1)^{2}$ denote the slots. Now we are considering a sampling scheme $\mathcal{P}=(p_{1},p_{2})$ such that given the slots the points are distributed uniformly within the slots and are independent. Denote $A_{ij}:=\{p_{1}\in B_{i},p_{2}\in B_{j}\}$ and set

[TABLE]

It is easy to see that $\mathcal{P}$ is not conditionally NQD, e.g.

[TABLE]

Showing that $\mathcal{P}$ is pairwise negatively dependent requires simple but tedious calculations and as such will be omitted. Intuitively it is clear, since the sampling scheme gives high probability to diagonal arrangements (i.e. $A_{14},A_{23},A_{41},A_{32}$ ).

Example 5.4.

Now we show an example of a sampling scheme which is conditionally NQD but not pairwise negatively dependentd. To this end let $X,Y$ be two independent random variables distributed uniformly on $[0,1).$ We consider a sampling scheme $\mathcal{P}=(p_{1},p_{2})$ given by $p_{1}=(X,Y),p_{2}=(Y,X).$ Let $u,v\in[0,1)^{2}$ and $A,B\subset[0,1)$ be measurable. Sampling scheme $\mathcal{P}$ is conditionally NQD since

[TABLE]

On the other hand $\mathcal{P}$ is not pairwise negatively dependent. To see this note that

[TABLE]

and

[TABLE]

Taking for some $u^{(1)},u^{(2)}\in(0,1)$ the point $v$ satisfying $v^{(1)}=u^{(2)}$ and $v^{(2)}=u^{(1)}$ yields the claim.

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] C. Aistleitner , Covering numbers, dyadic chaining and discrepancy , J. Complexity, 27 (2011), pp. 531–540.
2[2] , Tractability results for the weighted star-discrepancy , J. Complexity, 30 (2014), pp. 381–391.
3[3] C. Aistleitner and M. T. Hofer , Probabilistic error bounds for the discrepancy of mixed sequences , Monte Carlo Methods Appl., 18 (2012), pp. 181–200.
4[4] H. W. Block, T. H. Savits, and M. Shaked. , Some concepts of negative dependence , Ann. Probab., 10 (1982), pp. 765–772.
5[5] J. Dick, F. Y. Kuo, and I. H. Sloan , High dimensional integration – the quasi-Monte Carlo way , Acta Numerica, 22 (2013), pp. 133–288.
6[6] J. Dick and F. Pillichshammer , Digital nets and sequences , Cambridge University Press, Cambridge, 2010.
7[7] B. Doerr, C. Doerr, and M. Gnewuch , Probabilistic lower discrepancy bounds for Latin hypercube samples , in Contemporary Computational Mathematics – a Celebration of the 80th Birthday of Ian Sloan, J. Dick, F. Y. Kuo, and H. Woźniakowski, eds., Springer-Verlag, 2018, pp. 339–350.
8[8] B. Doerr, M. Gnewuch, and A. Srivastav , Bounds and constructions for the star discrepancy via δ 𝛿 \delta -covers , J. Complexity, 21 (2005), pp. 691–709.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On Negatively Dependent Sampling Schemes, Variance Reduction, and Probabilistic Upper Discrepancy Bounds

Abstract

1 Introduction

2 Review of Notions of Negative Dependance of Sampling Schemes

2.1 γ\gammaγ-Negative Dependence of Binary Random Variables and Sampling Schemes

Definition 2.1**.**

Theorem 2.2**.**

Definition 2.3**.**

2.2 Pairwise Negative Dependence and Variance Reduction

Definition 2.4**.**

Theorem 2.5**.**

2.3 Negatively Dependent Sampling Schemes and Discrepancy

Definition 2.6**.**

Theorem 2.7**.**

3 New Probabilistic Discrepancy Bounds

3.1 Bound on the Star Discrepancy for Negatively Dependent Sampling Schemes

Theorem 3.1**.**

Theorem 3.2**.**

Lemma 3.3**.**

Proof.

3.2 Bound on the Weighted Star Discrepancy for D0d−γ\mathcal{D}_{0}^{d}-\gammaD0d​−γ-negatively Dependent Sampling Schemes.

Theorem 3.4**.**

Proof.

4 Examples of Negatively Dependent and Pairwise Negatively Dependent Sampling Schemes

Lemma 4.1**.**

4.1 Negative Dependence of Generalized Stratified Sampling

Example 4.2**.**

Lemma 4.3**.**

Proof.

Theorem 4.4**.**

Proof.

Remark 4.5**.**

4.2 Pairwise Negative Dependence and Conditional NQD Property of Randomly Shifted and Jittered Rank-1 Lattices

Theorem 4.6**.**

4.3 Negative Dependence, Conditional NQD Property, and Pairwise Negative Dependence of Latin Hypercube Sampling

Theorem 4.7**.**

4.4 Pairwise Negative Dependence of Scrambled (0,m,s)-Nets.

Theorem 4.8**.**

4.5 Mixed Randomized Sequences

Definition 4.9**.**

Proposition 4.10**.**

Proof.

Remark 4.11**.**

Remark 4.12**.**

5 Relations Between Notions of Negative Dependence

5.1 Pairwise Negative Dependence and Negative Dependence

Example 5.1**.**

Example 5.2**.**

5.2 Conditional NQD and Pairwise Negative Dependence

Example 5.3**.**

Example 5.4**.**

2.1 $\gamma$ -Negative Dependence of Binary Random Variables and Sampling Schemes

Definition 2.1.

Theorem 2.2.

Definition 2.3.

Definition 2.4.

Theorem 2.5.

Definition 2.6.

Theorem 2.7.

Theorem 3.1.

Theorem 3.2.

Lemma 3.3.

3.2 Bound on the Weighted Star Discrepancy for $\mathcal{D}_{0}^{d}-\gamma$ -negatively Dependent Sampling Schemes.

Theorem 3.4.

Lemma 4.1.

Example 4.2.

Lemma 4.3.

Theorem 4.4.

Remark 4.5.

Theorem 4.6.

Theorem 4.7.

Theorem 4.8.

Definition 4.9.

Proposition 4.10.

Remark 4.11.

Remark 4.12.

Example 5.1.

Example 5.2.

Example 5.3.

Example 5.4.