Perfect Sampling for Gibbs Point Processes Using Partial Rejection   Sampling

Sarat B. Moka; Dirk P. Kroese

arXiv:1901.05624·math.PR·January 18, 2019

Perfect Sampling for Gibbs Point Processes Using Partial Rejection Sampling

Sarat B. Moka, Dirk P. Kroese

PDF

TL;DR

This paper introduces a perfect sampling algorithm for Gibbs point processes, leveraging partial rejection sampling, with efficiency depending on interaction range and point intensity.

Contribution

It develops a novel perfect sampling method tailored for specific Gibbs point processes with finite interaction range, improving sampling efficiency.

Findings

01

Expected running time scales as O(log(1/r))

02

Algorithm effective for moderate point intensities

03

Applicable to pairwise interaction, penetrable spheres, and area-interaction models

Abstract

We present a perfect sampling algorithm for Gibbs point processes, based on the partial rejection sampling of Guo et al. (2017). Our particular focus is on pairwise interaction processes, penetrable spheres mixture models and area-interaction processes, with a finite interaction range. For an interaction range $2 r$ of the target process, the proposed algorithm can generate a perfect sample with $O (lo g (1/ r))$ expected running time complexity, provided that the intensity of the points is not too high.

Figures11

Click any figure to enlarge with its caption.

Equations84

Dist (C, D) = in f {∥ x - y ∥ : x \in C and y \in D},

Dist (C, D) = in f {∥ x - y ∥ : x \in C and y \in D},

\mathscr{G}:=\Big{\{}\mathbf{x}=\{x_{1},x_{2},\dots,x_{n}\}:n\in\mathbb{Z}_{+}\text{ and }x_{i}\in S,\forall i\leq n\Big{\}},

\mathscr{G}:=\Big{\{}\mathbf{x}=\{x_{1},x_{2},\dots,x_{n}\}:n\in\mathbb{Z}_{+}\text{ and }x_{i}\in S,\forall i\leq n\Big{\}},

\frac{d μ}{d ρ} (x) = \frac{exp ( - U ( x ) )}{Z},

\frac{d μ}{d ρ} (x) = \frac{exp ( - U ( x ) )}{Z},

U (x) = {x, y} \subseteq x \sum f (x, y), x \in G,

U (x) = {x, y} \subseteq x \sum f (x, y), x \in G,

f (x, y) = {\infty, 0, if ∥ x - y ∥ < 2 r, otherwise .

f (x, y) = {\infty, 0, if ∥ x - y ∥ < 2 r, otherwise .

f (x, y) = {- lo g γ, 0, if ∥ x - y ∥ < 2 r, otherwise .

f (x, y) = {- lo g γ, 0, if ∥ x - y ∥ < 2 r, otherwise .

f (x, y) = ⎩ ⎨ ⎧ \infty, - lo g γ, 0, if ∥ x - y ∥ < a_{1}, if a_{1} \leq ∥ x - y ∥ < a_{2}, otherwise,

f (x, y) = ⎩ ⎨ ⎧ \infty, - lo g γ, 0, if ∥ x - y ∥ < a_{1}, if a_{1} \leq ∥ x - y ∥ < a_{2}, otherwise,

f (x, y) = {\infty, 0, if ∥ x - y ∥ < 2 r and x, y have different marks, otherwise .

f (x, y) = {\infty, 0, if ∥ x - y ∥ < 2 r and x, y have different marks, otherwise .

U (x) = β Vol (\cup_{x \in x} Ball (x, 2 r)), x \in G,

U (x) = β Vol (\cup_{x \in x} Ball (x, 2 r)), x \in G,

U (x) = β Vol (S \cap (\cup_{x \in x} Ball (x, 2 r))), x \in G,

U (x) = β Vol (S \cap (\cup_{x \in x} Ball (x, 2 r))), x \in G,

\cup_{x \in S} Ball (x, 2 r) \subseteq S (r)

\cup_{x \in S} Ball (x, 2 r) \subseteq S (r)

E = {{u, v} : u, v \in V, u \neq = v and u \leftrightarrow v} .

E = {{u, v} : u, v \in V, u \neq = v and u \leftrightarrow v} .

Bad (Y (ω)) = {v \in V : ω \in B_{v}} .

Bad (Y (ω)) = {v \in V : ω \in B_{v}} .

\partial W = {v \in V : v \in / W and \exists u \in W such that u \leftrightarrow v} .

\partial W = {v \in V : v \in / W and \exists u \in W such that u \leftrightarrow v} .

E = {{v, u} : v \neq = u, v and u are connected by a common node} .

E = {{v, u} : v \neq = u, v and u are connected by a common node} .

A_{u, v} = {ω \in Ω : \exists ω^{'} \in B_{v} such that Y (ω) ∣_{W} = Y (ω^{'}) ∣_{W}} .

A_{u, v} = {ω \in Ω : \exists ω^{'} \in B_{v} such that Y (ω) ∣_{W} = Y (ω^{'}) ∣_{W}} .

\mathbb{E}\left[|\mathsf{Res}_{t+1}|\big{|}\mathsf{Res}_{0},\dots,\mathsf{Res}_{t}\right]\leq(1-p)|\mathsf{Res}_{t}|.

\mathbb{E}\left[|\mathsf{Res}_{t+1}|\big{|}\mathsf{Res}_{0},\dots,\mathsf{Res}_{t}\right]\leq(1-p)|\mathsf{Res}_{t}|.

E [∣ Bad_{t} ∣] \leq E [∣ Res_{t} ∣] \leq (1 - p)^{t} ∣ V ∣,

E [∣ Bad_{t} ∣] \leq E [∣ Res_{t} ∣] \leq (1 - p)^{t} ∣ V ∣,

U (x) = {x, y} \subseteq x \sum f (x, y), x \in G,

U (x) = {x, y} \subseteq x \sum f (x, y), x \in G,

U (x) = i = 1 \sum n U (x_{C_{i}}) + {i, j} \in V \sum x \in x_{C_{i}} y \in x_{C_{j}} \sum f (x, y) .

U (x) = i = 1 \sum n U (x_{C_{i}}) + {i, j} \in V \sum x \in x_{C_{i}} y \in x_{C_{j}} \sum f (x, y) .

E_{ρ} [exp (- U (X))]

E_{ρ} [exp (- U (X))]

= E_{ρ} i = 1 \prod n exp (- U (X_{C_{i}})) {i, j} \in V \prod I ⎩ ⎨ ⎧ U_{i, j} \leq exp - x \in X_{C_{i}}, y \in X_{C_{j}} \sum f (x, y) ⎭ ⎬ ⎫,

B_{i, j} = ⎩ ⎨ ⎧ ω \in Ω : U_{i, j} (ω) > exp - x \in X_{C_{i}} (ω), y \in X_{C_{j}} (ω) \sum f (x, y) ⎭ ⎬ ⎫, {i, j} \in V .

B_{i, j} = ⎩ ⎨ ⎧ ω \in Ω : U_{i, j} (ω) > exp - x \in X_{C_{i}} (ω), y \in X_{C_{j}} (ω) \sum f (x, y) ⎭ ⎬ ⎫, {i, j} \in V .

\frac{d μ}{d μ ^{\otimes}} (x) = \frac{1}{Z} {i, j} \in V \prod exp - {i, j} \in V \sum x \in x_{C_{i}}, y \in x_{C_{j}} \sum f (x, y), x \in G .

\frac{d μ}{d μ ^{\otimes}} (x) = \frac{1}{Z} {i, j} \in V \prod exp - {i, j} \in V \sum x \in x_{C_{i}}, y \in x_{C_{j}} \sum f (x, y), x \in G .

Z = P_{μ^{\otimes}} (\cap_{{i, j} \in V} B_{i, j}^{c}) = E_{μ^{\otimes}} {i, j} \in V \prod exp - x \in X_{C_{i}}, y \in X_{C_{j}} \sum f (x, y) .

Z = P_{μ^{\otimes}} (\cap_{{i, j} \in V} B_{i, j}^{c}) = E_{μ^{\otimes}} {i, j} \in V \prod exp - x \in X_{C_{i}}, y \in X_{C_{j}} \sum f (x, y) .

Y := {X_{C_{i}}, i = 1, \dots, n, and U_{i, j}, {i, j} \in V} .

Y := {X_{C_{i}}, i = 1, \dots, n, and U_{i, j}, {i, j} \in V} .

U (X_{C_{i}} (ω) \cup X_{C_{j}} (ω)) = U (X_{C_{i}} (ω)) + U (X_{C_{j}} (ω)),

U (X_{C_{i}} (ω) \cup X_{C_{j}} (ω)) = U (X_{C_{i}} (ω)) + U (X_{C_{j}} (ω)),

X_{C_{i}} = \emptyset, for all {i, j} \in \partial Res with i \in I (Res) .

X_{C_{i}} = \emptyset, for all {i, j} \in \partial Res with i \in I (Res) .

\frac{Vol ( C _{i} )}{r ^{d}} \leq b for all r .

\frac{Vol ( C _{i} )}{r ^{d}} \leq b for all r .

P_{μ^{\otimes}} (X_{C_{i}} = \emptyset) = exp (- κ Vol (C_{i})) / Z_{i},

P_{μ^{\otimes}} (X_{C_{i}} = \emptyset) = exp (- κ Vol (C_{i})) / Z_{i},

p

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Perfect Sampling for Gibbs Point Processes Using

Partial Rejection Sampling

Sarat B. Moka

School of Mathematics and Physics

The University of Queensland, Brisbane

Dirk P. Kroese

School of Mathematics and Physics

The University of Queensland, Brisbane

Abstract

We present a perfect sampling algorithm for Gibbs point processes, based on the partial rejection sampling of Guo et al., (2017). Our particular focus is on pairwise interaction processes, penetrable spheres mixture models and area-interaction processes, with a finite interaction range. For an interaction range $2r$ of the target process, the proposed algorithm can generate a perfect sample with $O(\log(1/r))$ expected running time complexity, provided that the intensity of the points is not too high.

Keywords— Perfect sampling, Partial-rejection sampling, Hard-core process, Strauss process, Pairwise interaction process, Area-interaction process, Penetrable spheres mixture model

1 Introduction

Various phenomena in physics, chemistry and biology are modelled by Gibbs point processes. A Gibbs point process — or simply, Gibbs process — is a spatial point process whose distribution is absolutely continuous with respect to that of a Poisson point process (PPP). Pairwise interaction point (PIP) processes and penetrable spheres mixture (PSM) models are two widely studied examples of Gibbs processes; see, for e.g., Møller and Waagepetersen, (2004); Huber, (2016); Kendall and Møller, (2000); Baddeley and Nair, (2012); Baddeley and Turner, (2005). The PIP family includes hard-core processes and Strauss processes.

Perfect sampling for Gibbs processes is an active area of research. A sampling algorithm for a given distribution is called perfect if it generates an exact sample from this distribution within a finite time. We refer to Kendall, (1998); Fill, (1998); Kendall and Møller, (2000); Garcia, (2000); Ferrari et al., (2002); Huber, (2012); Moka et al., (2017); Guo and Jerrum, (2018) for some of the existing perfect sampling algorithms for Gibbs processes. The methods in Moka et al., (2017) and Guo and Jerrum, (2018) generate perfect samples of hard-core processes. The other methods in the references above are applicable to more general Gibbs processes, including PIP processes and PSM models. Among these methods, the dominated coupling from the past (dCFTP) methods by Kendall, (1998); Kendall and Møller, (2000) and Huber, (2012) are shown to be efficient when the density of the points is small; see, for example, Huber, (2016). As we show in this paper, for an interaction range $2r$ of the target Gibbs process and dimension $d$ of the points, the expected running time complexity of any dCFTP method is at least of order $\frac{1}{r^{d}}\log\left(\frac{1}{r}\right)$ even when the density of the reference PPP is very small. However, dCFTP algorithms are sequential and thus they do not take advantage of parallel computing. In this paper, we propose a method for generating perfect samples of PIP processes and PSM models using partial rejection sampling (PRS) method of Guo et al., (2017) and show how one can obtain, using parallel computing, an expected running time complexity of $O(\log(1/r))$ , provided that the density of the reference PPP is not too high.

The PRS method provides a general methodology for generating perfect samples from a product distribution, conditioned on none of a number of bad events occurring. Such problems are in general NP-hard; see, for e.g., Bezáková et al., (2016) and Guo et al., (2017). However, for certain types of parametric product distributions, the PRS algorithm is efficient and terminates within $O(\log n)$ iterations on average, where $n$ is the number of bad events. An additional feature of the PRS algorithm is that, unlike the dCFTP methods, it is distributive, in the sense that it allows parallel computation within each iteration. As a consequence, the PRS algorithm can be implemented with $O(\log n)$ expected running time complexity. By exploiting the distributive property of the PRS, we use the PRS algorithm for generating perfect samples of Gibbs processes on a Euclidean subset $S$ . In particular, a brief description of our contributions is as follows:

•

We partition $S$ into a finite number of cells and define a product measure by ignoring the cross interactions between the cells. Further by defining appropriate bad events that depend on the cross interactions, we express the distribution of the target Gibbs process as the product distribution conditioned on none of the bad events occuring. This construction allows the generation of perfect samples using PRS.

•

To analyze the running time complexity of the proposed algorithm, we take $S=[0,1]^{d}$ and the intensity of the reference PPP as ${\kappa=\frac{\kappa_{0}}{\mathsf{v}_{d}r^{d}}}$ for some constant $\kappa_{0}$ , where $2r$ is the interaction range of the Gibbs process and $\mathsf{v}_{d}$ is the volume of a $d$ -dimensional sphere of unit radius. We consider the regime where $\kappa_{0}$ is fixed and $r$ goes to zero, and prove that if the volume of each cell is of order $r^{d}$ , there exists a constant $\bar{\kappa}~{}>~{}0$ such that for all $\kappa_{0}\leq\bar{\kappa}$ , the expected running time complexity of the algorithm is $O\left(\log\frac{1}{r}\right)$ as a function of $r$ .

•

To illustrate the application of the proposed algorithm, we consider a $d$ -dimensional cubic grid partitioning of $S=[0,1]^{d}$ and conduct extensive simulations to estimate the expected number of iterations of the algorithm for different values of $\kappa_{0}$ and the interaction range $2r$ .

To the best of our knowledge, this is the first method for PIP processes and PSM models with $O\left(\log\frac{1}{r}\right)$ running time complexity. The method of Guo and Jerrum, (2018) is a continuous version of the PRS algorithm. It has the same order of expected number of iterations as our algorithm, but restricted to hard-core processes. One of our simulation results provides a comparison between the expected number of iterations of the proposed method and the method of Guo and Jerrum, (2018) for a hard-core process.

The remaining paper is organized as follows: In Section 2, we introduce some notations that are useful throughout the paper. Section 3 provides definitions of the spatial point processes of interest. In Section 4, the PRS method is presented and illustrated its application with an example. In Section 5, we propose our new perfect sampling method for Gibbs processes using PRS, and in Section 6, we analyze its running time complexity. Simulation results for Strauss process and PSM model are presented in Section 7. The paper is concluded in Section 8.

2 Notation

First, some notation. $\mathbb{R}_{+}$ is the set of non-negative real numbers and $\mathbb{Z}_{+}$ is the set of non-negative integers. $\mathbb{R}^{d}$ denotes the $d$ -dimensional Euclidean space with the corresponding Euclidean norm $\|\cdot\|$ . The distance between any two sets $C,D\subseteq\mathbb{R}^{d}$ is defined by

[TABLE]

with $\mathsf{Dist}(\varnothing,C)=\infty$ , where $\varnothing$ denotes the empty set. We use $\mathrm{e}$ to denote $\exp(1)$ . For any $x\in\mathbb{R}_{+}$ , $\lfloor x\rfloor$ is the largest $n\in\mathbb{Z}_{+}$ such that $n\leq x$ . For any two probability measures $\mu_{1}$ and $\mu_{2}$ that are defined on the same measurable space, we write $\mu_{1}\ll\mu_{2}$ to denote that $\mu_{1}$ is absolutely continuous with respect $\mu_{2}$ . We write $X\sim\mu_{1}$ to indicate that the distribution of a random object $X$ is $\mu_{1}$ . The distributions of a Bernoulli random variable with success probability $p$ , a uniform random variable over $(0,1)$ and a Poisson random variable with mean $\lambda$ are denoted, respectively, by $\mathsf{Bern}(p)$ , $\mathsf{Unif}(0,1)$ and $\mathsf{Poi}(\lambda)$ . For any event $A$ , $\mathbb{I}(A)$ is equal to $1$ if the event holds, otherwise it is equal to [math].

3 Spatial Point Processes

Consider a finite measure $\nu$ on a Euclidean subset ${S\subseteq\mathbb{R}^{d}}$ that is absolutely continuous with respect to the Lebesgue measure. Let $\mathscr{G}$ be the set of all finite sets on $S$ , defined by

[TABLE]

where $n=0$ corresponds to the empty set. We assume that the elements of $\mathscr{G}$ are simple, that is, they do not have multi-points. For any ${\mathbf{x}\in\mathscr{G}}$ , ${\big{|}\mathbf{x}_{A}\big{|}}$ denotes the cardinality of ${\mathbf{x}_{A}:=\mathbf{x}\cap A}$ . A point process is a random element $\mathbf{X}:\Omega\to\mathscr{G}$ .

Poisson point process (PPP): A point process ${\mathbf{X}}$ is called Poisson on $S$ with intensity measure $\nu$ if it satisfies the following two properties:

(i)

$|\mathbf{X}_{A}|\sim\mathsf{Poi}(\nu(A))$ for any measurable $A\subseteq S$ and 2. (ii)

$|\mathbf{X}_{A_{1}}|,\dots,|\mathbf{X}_{A_{n}}|$ are independent if $A_{1},\dots,A_{n}$ are measurable disjoint subsets of $S$ .

A PPP is called $\kappa$ -homogeneous if the intensity ${\nu(\mathrm{d}x)=\kappa\,\mathrm{d}x}$ for some constant $\kappa>0$ .

In several scenarios, it is important to associate an independent mark with each point in a PPP to characterize the shape or type of the object at that point. A marked PPP on $S$ is a PPP such that each point has a (random) mark independent of all other points. The mark associated with a point can depend on the point. For example, a mark at a point denotes the radius of a circle centered at that point. A typical realization of a marked PPP with $n$ points is of the form $\mathbf{x}=\{(z_{1},m_{1}),(z_{2},m_{2}),\dots,(z_{n},m_{n})\}$ , where $\{z_{1},z_{2},\dots,z_{n}\}\in\mathscr{G}$ and $m_{i}$ is the mark associated with $z_{i}$ for $i=1,\dots,n$ . For such a marked configuration, we define $\mathbf{x}_{A}=\{(z_{i},m_{i})\in\mathbf{x}:z_{i}\in A\}$ for any $A\subseteq S$ . If the mark space is $M$ , then it is easy to see that the marked PPP is a PPP on $S\times M$ .

It is common approach in statistical physics to wrap $S$ on a torus (that is, $S$ has periodic boundary) when large interacting particle systems are considered. In that case, throughout the paper, the Euclidean distance is replaced by geodesic distance.

Gibbs point process: Suppose that $\rho$ is the distribution of a (marked) PPP. A point process with distribution ${\mu\ll\rho}$ is called a Gibbs point process (or simply, Gibbs process) if the associated Radon-Nikodym derivative is of the form

[TABLE]

for every possible realization $\mathbf{x}$ under $\rho$ , where $\mathcal{U}$ is a non-negative real-valued potential function that is non-degenerate (i.e., ${\mathcal{U}(\{x\})<\infty}$ ), and hereditary (i.e., ${\mathcal{U}(\mathbf{x})\leq\mathcal{U}(\mathbf{x}^{\prime})}$ for all $\mathbf{x}\subseteq\mathbf{x}^{\prime}$ ). The normalizing constant $Z$ is equal to $\mathbb{E}_{\rho}\left[\exp\left(-\mathcal{U}(\mathbf{X})\right)\right]$ .

Pairwise interaction point (PIP) processes: A pairwise interaction point (PIP) process is a Gibbs point process for which the potential function is of the form

[TABLE]

where $f:\mathbb{R}^{d}\times\mathbb{R}^{d}\rightarrow\mathbb{R}_{+}\cup\{\infty\}$ is called the pairwise interaction function; see, for e.g., Chiu et al., (2013). We say that a PIP has finite range interaction if there exists $a<\infty$ such that $f(x,y)=0$ for all $x,y\in S$ for which $\|x-y\|\geq a$ ; that is, the interaction between any two points is zero if they are separated by a distance of at least $a$ . The smallest such $t$ is called the interaction range of the PIP. Some important PIP processes are considered below.

Hard-core process: A hard-core process with hard-core distance $2r>0$ (that is, the hard-core radius is $r$ ) has

[TABLE]

In a hard-core process no two points are within a distance of $2r$ . Note that the interaction range here is $2r$ . One generalization of the hard-core process is hard-sphere model with random radii, where the centers of spheres with independent and identically distributed ( $i.i.d.$ ) radii constitute a PPP on $S$ conditioned on the event that no two spheres overlap.

Strauss process: Another well-studied PIP process is the Strauss process with parameters $\gamma\in[0,1]$ and $r>0$ . Here the interaction function is defined by

[TABLE]

The interaction range of this PIP process is $r$ . This process becomes a hard-core process if $\gamma=0$ with the convention that $0^{0}=1$ .

Strauss-hard core process: This PIP process is a hybrid of the Strauss and hard-core processes, and has interaction function

[TABLE]

for some $\gamma\in[0,1]$ and $0<a_{1}<a_{2}$ . Here $t_{1}$ is called hard-core distance. Clearly, the interaction range for this process is $t_{2}$ .

Penetrable spheres mixture (PSM) model: This model was introduced by Widom and Rowlinson, (1970) to study liquid-vapor phase transitions. Let $\rho$ is the distribution of $\kappa$ -homogeneous marked PPP, where each point is independently marked either as type-1 (with probability $\kappa_{1}/(\kappa_{1}+\kappa_{2})$ ) or as type-2 (with probability $\kappa_{1}/(\kappa_{1}+\kappa_{2})$ ), for some constants $\kappa_{1},\kappa_{2}\geq 0$ . A realization of a PSM model can be viewed as a realization of $\mathbf{X}\sim\rho$ conditioned on the event that no two points from different types are within a distance $2r$ from each other; that is, the corresponding potential function is given by (2) with

[TABLE]

Area-interaction process: This process was first studied by Baddeley and van Lieshout, (1995) (see, also, Kendall and Møller, (2000), Ferrari et al., (2002) and Møller, (2001)). For any $A\subseteq\mathbb{R}^{d}$ , let $\mathsf{Vol}(A)$ be the volume of $A$ and $\mathsf{Ball}(x,a)$ be the $d$ -dimensional sphere centered at $x$ with radius $a$ . The distribution of an area-interaction process on $S$ is absolutely continuous with respect to that of a $\lambda$ -homogeneous PPP for some $\lambda>0$ , with the potential function given by

[TABLE]

where the constant $\beta>0$ is called inverse temperature; see Figure 1 (a). The definition of area-interaction process given in Baddeley and van Lieshout, (1995) is more general, as it allows $\beta<0$ . However, in this paper, we focus only on the case $\beta>0$ .

There is an interesting connection between area-interaction processes and PSM models. To see this, instead of (3), if we suppose that the potential function is

[TABLE]

then the distribution of this modified area-interaction process is identical to the distribution of type- $1$ points of the PSM model with $\kappa=\lambda+\beta$ , $\kappa_{1}=\lambda$ and $\kappa_{2}=\beta$ ; see Figure 1 (b). This is because from the property (i) in the definition of PPPs, for any $\mathbf{x}~{}\in~{}\mathscr{G}$ , the probability that none of the points of a realization of a $\beta$ -homogeneous PPP falls within the set $S\cap\left({\cup}_{x\in\mathbf{x}}\mathsf{Ball}(x,2r)\right)$ is equal to $\exp\left(-\beta\,\mathsf{Vol}\left(S{\cap}\left({\cup}_{x\in\mathbf{x}}\mathsf{Ball}(x,2r)\right)\right)\right)$ . Further interesting fact is that if $S$ is periodic, both (3) and (4) are the same. Hence, under the periodic assumption, an area-interaction process can be viewed as a realization of one type of points of a PSM model, and vice versa. This is the reason why area-interaction processes are sometimes referred as PSM models.

In the definition of PSM models, type- $1$ and type- $2$ points are independent PPPs on $S$ with intensities $\kappa_{1}$ and $\kappa_{2}$ , respectively. Instead, if we assume that the type- $2$ points constitute a $\kappa_{2}$ homogeneous PPP on a bigger set $S(r)$ such that

[TABLE]

(when $S=[0,1]^{d}$ , $S(r)$ can be $[-2r,1+2r]^{d}$ ). Then, with the choice of $\kappa_{1}=\lambda$ and $\kappa_{2}=\beta$ , the distribution of type- $1$ points of this modified PSM model is identical to the distribution of the area-interaction process.

4 Partial Rejection Sampling

In this section, we briefly discuss the partial rejection sampling (PRS) method proposed in Guo et al., (2017, Section 4). This method generates perfect samples from a product distribution, conditioned on none of a number of bad events happening.

To be precise, let $\mathbf{Y}=\{Y_{1},Y_{2},\dots,Y_{n}\}$ be a set of easy to simulate independent random objects on a probability space $(\varOmega,\mathcal{F},\mathbb{P})$ , taking values on $\mathcal{Y}$ . Suppose that $\mu^{\otimes}$ is the distribution of $\mathbf{Y}$ . Clearly, $\mu^{\otimes}$ is a product distribution. Without loss of generality, we assume that $\mathcal{Y}$ is the support of $\mu^{\otimes}$ . Let $\{{B_{v}\in\mathcal{F}}:v\in V\}$ be a set of bad events indexed by elements of a finite set $V$ . Each bad event $B_{v}$ depends on a subset of $\mathbf{Y}$ . Let $\mathcal{I}(v)\subseteq\{1,2,\dots,n\}$ be the largest set such that $B_{v}$ is dependent on $Y_{i}$ for all $i\in\mathcal{I}(v)$ ; that is, $\mathcal{I}(v)$ is the smallest set such that the set of variables $\{Y_{i}:i\in\mathcal{I}(v)\}$ imply whether the event $B_{v}$ occurs or not. By definition, $B_{v}$ does not depend on $\left\{Y_{i}:i\in\{1,\dots,n\}\setminus\mathcal{I}(v)\right\}$ . The goal of PRS is to generate perfect samples from $\mu^{\otimes}$ , conditioned on the event that none of the bad events $\{B_{v}:{v\in V}\}$ occur.

One can generate the desired samples using the naive rejection sampling algorithm: repeatedly generate a sample from $\mu^{\otimes}$ until none of the bad events occur. The last sample has the desired distribution. In each iteration of this naive method, a fresh copy of the entire set $\mathbf{Y}$ is generated. Whereas, as we see below, in each iteration of the PRS method, only a subset of $\mathbf{Y}$ is resampled based on which bad events occurred in the previous iteration. This helps to significantly reduce the running time complexity compared with naive rejection sampling.

For any ${u,v\in V}$ , we write $u\leftrightarrow v$ if $\mathcal{I}(u)\cap\mathcal{I}(v)\neq\varnothing$ . Define the dependency graph $G=(V,E)$ to be the graph, with the vertex set $V$ and edge set $E$ given by

[TABLE]

That is, there is an edge between two nodes in $V$ if the bad events associated with the nodes depend on at least one common random object. For $\omega\in\varOmega$ , let

[TABLE]

For any subset ${W\subseteq V}$ , let $\partial W$ be the boundary of the set $W$ defined by

[TABLE]

Also define $\displaystyle\mathcal{I}(W)=\cup_{u\in W}\,\mathcal{I}(u)$ , and for any assignment $\mathbf{y}=\{y_{i}:i=1,\dots,n\}\in\mathcal{Y}$ of $\mathbf{Y}$ , let $\mathbf{y}|_{W}:=\{y_{i}(\omega):i\in\mathcal{I}(W)\}$ denote the partial assignment of $\mathbf{y}$ restricted to $\mathcal{I}(W)$ . For any two assignments $\mathbf{y},\mathbf{y}^{\prime}$ of $\mathbf{Y}$ , if $\mathbf{y}|_{W}=\mathbf{y}^{\prime}|_{W}$ then $\mathbf{y}^{\prime}$ is called an extension of $\mathbf{y}|_{W}$ . Furthermore, an event $B$ is said to be disjoint from $\mathbf{y}|_{W}$ if either $\mathcal{I}(v)\cap\mathcal{I}(W)=\varnothing$ or $B$ can not occur for any extension of $\mathbf{y}|_{W}$ .

Algorithm 1 generates a perfect sample ${\mathbf{Y}\sim\mu^{\otimes}}$ , conditioned on the event that none of the bad events $B_{v}$ occurs. In each iteration of Algorithm 1, the inner while-loop constructs the resampling set ${\mathsf{Res}\subseteq V}$ . It starts with ${\mathsf{Res}=\mathsf{Bad}(\mathbf{Y})}$ where the initial assignment of random objects is $\mathbf{Y}$ , and recursively adds to $\mathsf{Res}$ the set of all the boundary vertices that are not disjoint from $\mathsf{Res}$ , until there are no more boundary vertices to add. The final $\mathsf{Res}$ is the resampling set, and all the random objects with indices in $\bigcup_{u\in\mathsf{Res}}\mathcal{I}(u)$ are resampled. This construction is deterministic, in the sense that the final resampling set is a deterministic function of $\mathbf{Y}$ .

We refer to Guo et al., (2017) for a proof of correctness of the algorithm. We must note that in Guo et al., (2017), each $Y_{i}$ is a real-valued random variable, where as in this paper, we allow a more general setting by treating each $Y_{i}$ as a random object. However, the correctness proof still holds true for this general case as well.

Example** (Hard-core model on a lattice).**

To illustrate the PRS algorithm, consider the following hard-core model defined on a square lattice. Each node $i$ of the lattice is associated with an independent Bernoulli random variable $Y_{i}\sim\mathsf{Bern}\left(\frac{\lambda}{1+\lambda}\right)$ for some $\lambda>0$ . The node $i$ is said to be occupied if $Y_{i}=1$ . Associated with each edge $\{i,j\}$ , there is a bad event $B_{i,j}$ which holds if both the endpoints $i$ and $j$ are occupied; see Figure 2. If we let $\mu^{\otimes}$ be the distribution of $Y_{i}$ ’s, then using the PRS algorithm we can generate a sample $\mathbf{Y}\sim\mu^{\otimes}$ conditioned on none of the bad events occurring.

The corresponding dependency graph $G=(V,E)$ consists of $V$ , the set of all the edges in the lattice, and

[TABLE]

Clearly, if $v=\{i,j\}\in V$ , we have $\mathcal{I}(v)=\{i,j\}$ . As shown in Section 6.2 of Guo et al., (2017), for this hard-core model, it is easy to find the resampling set for any given set of bad events. Suppose at an iteration of the algorithm, if $\mathsf{Bad}$ is the set of bad edges of the lattice, then the resampling set $\mathsf{Res}$ is the union of $\mathsf{Bad}$ and $\partial\mathsf{Bad}$ , where one endpoint of each edge in $\partial\mathsf{Bad}$ is not occupied and the other endpoint is shared with one of the edges in $\mathsf{Bad}$ ; see Figure 2. ∎

As mentioned earlier, in general, generating a sample from $\mu^{\otimes}$ conditioned on none of the bad events happening is NP-hard. However, under some additional conditions, Algorithm 1 can be efficient in the sense that the expected number of iterations of the algorithm can be $O(\log|V|)$ ; see Guo et al., (2017). Lemma 1 deals with one such case. Refer to Guo et al., (2017, Section 5) for a proof. Let $A_{u,v}$ be the event that the partial assignment on $\mathcal{I}(v)\cap\mathcal{I}(u)$ can be extended to make $B_{v}$ occur, that is, with $W=\mathcal{I}(u)\cap\mathcal{I}(v)$ ,

[TABLE]

In particular, if $\{u,v\}\in E$ then $u\leftrightarrow v$ implies that $W=\mathcal{I}(u)\cap\mathcal{I}(v)\neq\varnothing$ and hence $A_{u,v}$ is the set of $\omega$ for which $B_{v}$ is not disjoint from $\mathbf{Y}(\omega)|_{W}$ .

Define $\displaystyle p=\max_{v\in V}\mathbb{P}_{\mu^{\otimes}}(B_{v})$ and $\displaystyle q=\max_{\{u,v\}\in E}\mathbb{P}_{\mu^{\otimes}}(A_{u.v})$ . Let $\mathsf{Bad}_{t}$ and $\mathsf{Res}_{t}$ be, respectively, the set of bad vertices and the resampling set at iteration $t\geq 0$ of Algorithm 1. Further let $\Delta$ be the maximum degree of the dependency graph $G$ .

Lemma 1 (Lemma 5.4 of Guo et al., (2017)).

For any ${\Delta\geq 2}$ , if ${6\mathrm{e}p\Delta^{2}\leq 1}$ and ${3\mathrm{e}q\Delta\leq 1}$ , then for all $t\geq 0$ ,

[TABLE]

Note that $\mathsf{Bad}_{t}\subseteq\mathsf{Res}_{t}$ for all $t\geq 0$ . From Lemma 1 and the fact that $|\mathsf{Bad}_{0}|=|\mathsf{Res}_{0}|=|V|$ (since the algorithm starts with a fresh copy of all the random elements),

[TABLE]

for all $t\geq 0$ , under the hypothesis of the lemma. These observations are useful for the running time complexity analysis in Section 6.

5 Perfect Sampling for Gibbs Point Processes

In this section, we propose a methodology to use the PRS algorithm for generating perfect samples of the Gibbs processes defined in Section 3. For this, we partition the underlying space $S$ and using this partition, we identify certain bad events such that the target distribution can be expressed as a product distribution conditioned on none of these bad events occurring. For the case where $S=[0,1]^{d}$ , we consider a cubic-grid partitioning and specialize the PRS algorithm.

Recall the definition of Gibbs process with distribution $\mu$ , given in Section 3. Assume that the corresponding potential function $\mathcal{U}$ has a finite interaction range $2r$ and

[TABLE]

for a function $f:S\times S\rightarrow\mathbb{R}_{+}\cup\{\infty\}$ such that $f(x,y)=f(y,x)$ . Recall that $\mu\ll\rho$ , with $\rho$ being the distribution of a (marked) PPP. Clearly, both PIP process or PSM model (defined in Section 3) can be seen as special cases of the above description.

Suppose $\left\{C_{1},C_{2},\dots,C_{n}\right\}$ is a partition of $S$ (i.e., the $C_{i}$ ’s are mutually disjoint and $\cup_{i=1}^{n}C_{i}=S$ ). Let $V=\left\{v=\{i,j\}:\mathsf{Dist}(C_{i},C_{j})<2r\text{ and }i\neq j\right\}$ be the set of unordered pairs $\{i,j\}$ such that points that fall in $C_{i}$ can interact with points in $C_{j}$ and vice versa. As a consequence of (6), for any $\mathbf{x}\in\mathscr{G}$ ,

[TABLE]

Hence,

[TABLE]

where $\{U_{i,j}:\{i,j\}\in V\}$ is a set of $i.i.d.$ $\mathsf{Unif(0,1)}$ random variables, independent of everything else.

For each $i$ , let $\rho_{i}$ be the distribution of the reference (marked) PPP restricted to the cell $C_{i}$ , that is, if $\mathbf{X}\sim\rho$ then $\rho_{i}$ is the distribution of $\mathbf{X}_{C_{i}}$ , and $\mathbf{X}_{C_{i}}$ and $\mathbf{X}_{C_{j}}$ are independent when $i\neq j$ (see the property (ii) in the definition of PPPs). Now let $\mu_{i}$ be the distribution of a Gibbs process on $C_{i}$ such that $\mu_{i}\ll\rho_{i}$ with the interaction range $2r$ and the potential function $\mathcal{U}\left(\mathbf{x}\right)=\sum_{\{x,y\}\in\mathbf{x}}f(x,y)$ for all finite subsets $\mathbf{x}\subseteq C_{i}$ . This means that $\mu_{i}$ is the distribution of the target Gibbs process restricted to $C_{i}$ . Furthermore, define bad events

[TABLE]

Let $\rho=\rho_{1}\times\dots\times\rho_{n}$ and $\mu^{\otimes}:=\mu_{1}\times\mu_{2}\times\dots\times\mu_{n}$ . From the definition of $\mu$ , $\mu\ll\mu^{\otimes}$ and

[TABLE]

Equivalently, $\mu$ is equal to the distribution $\mu^{\otimes}$ conditioned on none of the bad events $B_{i,j}$ happening. Here, the normalizing constant is

[TABLE]

Since $\mu^{\otimes}$ is a product measure, if it is possible to generate samples from $\mu_{i}$ ’s, we can use the PRS, Algorithm 1, to generate samples from $\mu$ . The corresponding dependency graph is $G=(V,E)$ , where, from the definition, $E=\{\{u,v\}:u,v\in V\text{ and }u\leftrightarrow v\}$ with $B_{\{i,j\}}=B_{i,j}$ .

To complete the argument, we need to spell out how to identify the resampling subset $\mathsf{Res}(\mathbf{Y}(\omega))$ of $V$ for every $\omega\in\varOmega$ , where

[TABLE]

This depends on knowing the condition for $B_{i,j}$ being disjoint from $\mathbf{Y}(\omega)|_{W}$ for all ${W\subseteq V}$ and all ${\{i,j\}\in\partial W}$ . Lemma 2 establishes this condition.

To simplify the notion, we can take ${\mathcal{I}(\{i,j\})=\{i,j\}}$ for all ${\{i,j\}\in V}$ . That means, at any iteration of PRS, if ${\{i,j\}\in\mathsf{Res}}$ then $\mathbf{X}_{C_{i}}$ , $\mathbf{X}_{C_{j}}$ and $U_{i,j}$ will be resampled independently from their respective distributions.

Lemma 2.

Let ${W\subseteq V}$ and $\{j,k\}\in\partial W$ with ${j\in\mathcal{I}(W)}$ . Then for any ${\omega\in\varOmega}$ , $B_{j,k}$ is disjoint from the partial assignment $\mathbf{Y}(\omega)|_{W}$ if and only if $\mathsf{Dist}(\mathbf{X}_{C_{j}}(\omega),C_{k})\geq 2r$ .

Proof.

Observe that $k\notin\mathcal{I}(W)$ , because $j\in\mathcal{I}(W)$ and $\{j,k\}\in\partial W$ . Hence $\mathcal{I}(\{j,k\})\cap\mathcal{I}(W)=\{j\}$ . This implies that if $\mathsf{Dist}(\mathbf{X}_{C_{j}}(\omega),C_{k})\geq 2r$ then $B_{j,k}$ can not occur for any extension of $\{\mathbf{X}_{C_{j}}(\omega)\}$ , because no matter what is the configuration on $C_{k}$ , it never interacts with the points of $\mathbf{X}_{C_{j}}(\omega)$ . On the other hand if $\mathsf{Dist}(\mathbf{X}_{C_{j}}(\omega),C_{k})<2r$ , we can find a $\omega^{\prime}\in\Omega$ such that $\mathbf{X}_{C_{j}}(\omega)=\mathbf{X}_{C_{j}}(\omega^{\prime})$ and $\omega^{\prime}\in B_{j,k}$ (that is, the points of $\mathbf{X}_{C_{k}}(\omega^{\prime})$ interact with points of $\mathbf{X}_{C_{j}}(\omega)$ to result in occurrence of $B_{j,k}$ ). ∎

5.1 Cubic-grid partitioning

Consider the PIP processes and PSM models defined in Section 3. Suppose that $S=[0,1]^{d}$ is equipped with a cubic grid of cell edge length $2r$ . The area-interaction case is discussed at the end of this section.

So, the cells are $d$ -dimensional cubes with volume $(2r)^{d}$ , except some of the boundary cells that can be rectangular in shape with each edge is of length at most $2r$ . When $2r=1/K$ for some integer $K$ , every cell is a cube. We say that cells $C_{i}$ and $C_{j}$ are adjacent if $\mathsf{Dist}(C_{i},C_{j})=0$ . From this construction, it is evident that $\{i,j\}\in V$ if and only if $C_{i}$ and $C_{j}$ are adjacent. Furthermore, there is no cross interaction between point configurations on two non-adjacent cells, that is,

[TABLE]

for all ${\omega\in\Omega}$ if $C_{i}$ and $C_{j}$ are not adjacent to each other; see Figure 3.

Recall from Lemma 2 that the implementation of the PRS algorithm for Gibbs processes depends on deciding whether $\mathsf{Dist}(\mathbf{X}_{C_{i}},C_{j})<2r$ , or not, for a given $\{i,j\}\in V$ and a point configuration $\mathbf{X}_{C_{i}}$ on the $i^{th}$ cell. The interesting aspect of this cubic grid partitioning is that for any $\{i,j\}\in V$ , $\mathsf{Dist}(\mathbf{X}_{C_{i}},C_{j})<2r$ if and only if the point configuration $\mathbf{X}_{C_{i}}=\varnothing$ . To verify this claim notice that the ’if’ part trivially follows from the definition of $\mathsf{Dist}$ , and the ’only if’ part follows from the observation that each cell has edges of length at most $2r$ and when $\mathbf{X}_{C_{i}}\neq\varnothing$ , we can select a point configuration on $C_{j}$ and a value for $U_{i,j}$ so that the bad event $B_{i,j}$ occur, making it not disjoint from $\mathbf{X}_{C_{i}}$ . As a consequence, observe that for any realization of $\mathbf{Y}$ in an iteration of the PRS algorithm, $\mathsf{Res}$ is the minimal subset of $V$ such that $\mathsf{Bad}\subseteq\mathsf{Res}$ and

[TABLE]

Under this setup, we now restate the PRS algorithm for Gibbs processes. We remind the reader that for each $i$ , samples $\mathbf{X}_{C_{i}}$ from $\mu_{i}$ are generated using any existing method such as dCFTP.

To generate perfect samples of an area-interaction process on $S=[0,1]^{d}$ with inverse temperature $\beta$ and the intensity of the reference PPP is $\lambda$ , we use Algorithm 2 to generate samples of the modified PSM model on ${S(r)=[-2r,1+2r]^{d}}$ where the reference PPP consists of type- $1$ and type- $2$ points, with type- $1$ points being $\lambda$ -homogeneous PPP on $S$ and type- $2$ points being $\beta$ -homogeneous PPP on $S(r)$ . Type- $1$ points in the output of the algorithm is a sample of the target area-interaction process. Refer Section 3 for the connection between area-interaction processes and PSM models. In Algorithm 2, instead of $S$ , we equip $S(r)$ with a cubic-grid partitioning. On each cell $C_{i}$ , $\mu_{i}$ is the distribution of a modified PSM model where the reference PPP consists of type- $1$ and type- $2$ points, with type- $1$ points being $\lambda$ -homogeneous PPP on $C_{i}\cap S$ and type- $2$ points being $\beta$ -homogeneous PPP on $C_{i}\setminus S$ .

6 Running Time Analysis

In this section, we assume that $S=[0,1]^{d}$ and the target Gibbs distribution $\mu\ll\rho$ , with potential function given by (6) and interaction range $2r\leq 1$ . We further assume that $\rho$ is the distribution of a $\kappa$ -homogeneous (marked) PPP on $S$ with $\kappa=\kappa_{0}/(\mathsf{v}_{d}r^{d})$ for some $\kappa_{0}>0$ . We analyze the running time complexity of the partitioning based PRS (described in Section 5) as $r\to 0$ . We further compare this method with two well-known existing methods by establishing trivial lower bounds on the running time complexities of the existing methods.

Suppose that $\{C_{1},C_{2},\dots,C_{n(r)}\}$ is a partition of $S$ such that samples from $\mu_{i}$ can be simulated using any of the existing perfect sampling algorithms, such as the dCFTP (see Section 5 for the definition of $\mu_{i}$ ). As shown in Figure 3, one possible partitioning is a cubic grid. For each $i=1,2,\dots,n(r)$ , let $N_{i}$ be the number of cells $C_{j}$ , $j\neq i$ , such that $\mathsf{Dist}(C_{i},C_{j})<2r$ . Theorem 1 below establishes that if the volume of each cell is chosen to be of order $r$ and the $N_{i}$ ’s are uniformly bounded for all $r$ , then there exist a constant $\bar{\kappa}$ such that for all $\kappa_{0}\leq\bar{\kappa}$ , the expected number of iterations the PRS algorithm takes to generate a perfect sample is $O\left(\log\left(\frac{1}{r}\right)\right)$ . Observe that for the cubic-grid partition of Subsection 5.1, $N_{i}$ is bounded by a constant uniformly and ${\frac{\mathsf{Vol}(C_{i})}{r^{d}}\leq 2^{d}}$ , for all $r$ and $i$ . Hence the conditions in Theorem 1 hold.

As mentioned in Guo et al., (2017), an interesting feature of the PRS algorithm is that it is distributive. In particular, if we assume that each cell $i$ is associated with a processor that can generate samples from $\mu_{i}$ and can communicate with other processors within a constant time, then as we argue later in the proof of Theorem 1, using parallel programming, the expected running time complexity of finding the resampling set in any iteration can be reduced to $O(1)$ . In that case the expected running time complexity of the PRS algorithm is simply of order of the expected number of iterations, which is $O\left(\log\left(\frac{1}{r}\right)\right)$ . See, for example, Feng and Yin, (2018); Feng et al., (2017) for recent works on distributed sampling.

Theorem 1.

Suppose that there exists constants $a,b>0$ such that for all $i=1,\dots,n(r)$ , $N_{i}\leq a$ and

[TABLE]

Then there exists $\bar{\kappa}>0$ such that for all $\kappa_{0}\leq\bar{\kappa}$ , we have

(i)

the expected number of iterations of the PRS algorithm is $O\left(\log\left(\frac{1}{r}\right)\right)$ ,

(ii)

the expected running time complexity of the algorithm is $O\left(\frac{1}{r^{d}}\log\left(\frac{1}{r}\right)\right)$ , and

(iii)

if the implementation of the algorithm is distributive, then the expected running time complexity is $O\left(\log\left(\frac{1}{r}\right)\right)$ .

Proof.

The expected number of points generated under $\rho$ within cell $i$ is $\kappa\mathsf{Vol}(C_{i})$ , which can be upper bounded by $\frac{\kappa_{0}\,b}{\mathsf{v}_{d}}$ , under the assumption (7). Since the bound is independent of $r$ , for any given $\kappa_{0}$ , the running time complexity of generating a sample from $\mu_{i}$ is $O(1)$ , for all $i$ , using any standard perfect sampling algorithm; see Chapter 7 of Huber, (2016).

Since $N_{i}\leq a$ for all $r$ and $i$ , the total number of nodes $|V|$ of the dependency graph is of order $n(r)$ and the maximum degree $\Delta$ is uniformly bounded for all $r$ . The lower bound in (7) implies that $n(r)$ is of order $1/r^{d}$ .

We first show that both $p$ and $q$ go to zero as $\kappa_{0}$ goes to zero, and then we prove $(i)-(iii)$ as a consequence of (5). For any $\{i,j\}\in V$ , occurrence of the event $B_{i,j}$ implies that both the cells $i$ and $j$ have non-empty configurations. Since $\mu_{i}\ll\rho_{i}$ , from the definition of Gibbs process, the probability of cell $i$ has an empty configuration is

[TABLE]

where $\widetilde{Z}_{i}=\mathbb{E}_{\rho}\left[\exp\left(-\mathcal{U}\left(\mathbf{X}_{C_{i}}\right)\right)\right]=\mathbb{E}_{\rho_{i}}\left[\exp\left(-\mathcal{U}\left(\mathbf{X}\right)\right)\right]$ . Observe that for any ${\omega\in\Omega}$ , if either ${\mathbf{X}_{C_{i}}=\varnothing}$ or ${\mathbf{X}_{C_{j}}=\varnothing}$ then $\omega\in B^{c}_{i,j}$ . Hence,

[TABLE]

where the last inequality holds because $\widetilde{Z}_{i}\leq 1$ for all $i$ . Using (7), we write that

[TABLE]

Therefore, $p$ goes to zero as $\kappa_{0}$ goes to zero.

Recall that $A_{u,v}$ is the event that the partial assignment on $\mathcal{I}(u)\cap\mathcal{I}(v)$ can be extended to make $B_{u}$ true. Also recall that if ${\{u,v\}\in E}$ , there exists $i,j$ and $k$ such that $u=\{i,k\}$ and $v=\{j,k\}$ . Thus, $\mathcal{I}(u)\cap\mathcal{I}(v)=\{k\}$ . This implies that the event $A_{v,u}$ can not occur if the common cell $k$ is empty. Thus,

[TABLE]

As a consequence $q\leq 1-\exp\left(-\frac{\kappa_{0}\,b}{\mathsf{v}_{d}}\right)$ , which goes to zero as $\kappa_{0}$ goes to zero.

Since the maximum degree $\Delta$ of the dependency graph does not change with the value of $\kappa_{0}$ , there exists a constant $\bar{\kappa}$ such that $6\mathrm{e}p\Delta^{2}\leq 1$ and $3\mathrm{e}q\Delta\leq 1$ for all $\kappa_{0}\leq\bar{\kappa}$ . From (5), within an order of $\log|V|$ iterations the expected number of bad events is less than $1$ . Since $|V|$ is $O(1/r^{d})$ , the expected number of iterations of the PRS algorithm is $O\left(\log(1/r)\right)$ . This proves $(i)$ .

Furthermore, since the number of random objects resampled at iteration $t$ of the algorithm is $|\mathsf{Res}_{t}|$ , the expected running time of the algorithm is of order $\sum_{t=0}^{\infty}|\mathsf{Res}_{t}|$ , which is less than $|V|/p$ , by (5). Hence $(ii)$ is established.

In order to prove $(iii)$ , it is enough to show using parallel computation that the expected complexity of constructing the resampling set in each iteration of the algorithm is $O(1)$ . To show this, we suppose that starting from a bad event, using breadth-first search, we identify the resampling events associated with the bad event. This can be done in parallel starting from every bad event. Then the final resampling set is the union of all the resampling events identified. Note that for each bad event, we first add the boundary events of the bad event to the resampling set; the number of boundary events is at most $\Delta$ . For each added event, on average at most $q\Delta$ events from its boundary events are added to the resampling set. This will go on until there are no more events to add. So, for each bad event, the number of resampling events added is bounded by

[TABLE]

which is further bounded by $3\mathrm{e}\Delta/(3\mathrm{e}-1)$ when $3\mathrm{e}q\Delta\leq 1$ . Since in each iteration, the resampling sets associated with the bad events are constructed in parallel, the expected running time complexity of constructing the final resampling set in each iteration is $O(1)$ . ∎

6.1 Comparison with existing well-known methods

In this subsection, we consider two well-known methods, namely, the naive rejection sampling and the dCFTP methods, and establish a trivial lower bounds on their expected running time complexity.

Naive rejection method: For a Gibbs process of the form (1), the naive rejection sampling method repeatedly simulates a sample $\mathbf{X}$ from $\rho$ until it is accepted. The last sample has the target distribution $\mu$ . The acceptance probability at each iteration is $Z=\mathbb{E}_{\rho}\left[\exp(-\mathcal{U}(\mathbf{X}))\right]$ , and thus the expected number of iterations is $1/Z$ . Since the expected number of points generated in each iteration is $\kappa$ (because $\rho$ is $\kappa$ -homogeneous), the expected running time of the naive algorithm is proportional to $\kappa/Z$ . Below, we use a standard argument to show that $\kappa/Z$ increases faster than an exponential function as $r$ decreases to [math].

Consider the cubic grid partitioning of Subsection 5.1. Note that each cell is at most as big as a cube with the edge length $2r$ and the number of cells $n(r)$ is at least $\lceil 1/2r\rceil$ . Therefore, by ignoring the cross correlations between the cells and using the fact that there are at least $(n(r)-2)^{d}$ cubic cells, we obtain

[TABLE]

where ${C=[0,2r]^{d}}$ , ${\varepsilon=\mathbb{E}_{\rho}[\exp(-\mathcal{U}(\mathbf{X}_{C}))]}$ , and the inequalities hold because ${n(r)\geq 1/2r}$ and ${\varepsilon\leq 1}$ .

Since for any fixed $\kappa_{0}>0$ , the value of $\varepsilon$ is strictly less than $1$ and does not depend on $r$ (because, $\varepsilon$ is the same if $C=[0,1]^{d}$ , $\rho$ is the distribution of $(2^{d}\kappa_{0}/\mathsf{v}_{d})$ -homogeneous PPP on $C$ , and the interaction range of the potential function $\mathcal{U}$ is $1$ ). By using the value of $\kappa$ and the upper bound on $Z$ we obtain a lower bound on the expected running time complexity $\kappa/Z$ , given by

[TABLE]

which increases faster than an exponential function, as $r$ goes to [math].

Dominated CFTP: In order to establish a lower bound on the expected running time of a dominated CFTP method, we first briefly state the general description the method (for a detailed description, we refer the reader to, for example, Kendall and Møller, (2000)). Let $\mathbf{D}=\{\mathbf{D}(t):t\in\mathbb{R}\}$ be the (free) birth-and-death process on $S=[0,1]^{d}$ with birth rate $\kappa$ , where each birth is a (marked) point uniformly and independently selected on $S$ and alive for a random time exponentially distributed with mean one. It is not difficult to show that the steady-state distribution of $\mathbf{D}$ is $\rho$ . Since the target Gibbs distribution $\mu\ll\rho$ , using coupling, it is possible to construct a process $\mathbf{Z}=\{\mathbf{Z}(t):t\in\mathbb{R}\}$ such that $\mathbf{Z}(t)\subseteq\mathbf{D}(t)$ for all $t\in\mathbb{R}$ and the steady-state distribution of $\mathbf{Z}$ is $\mu$ . Any dCFTP method consists of two steps: i) constructing the dominating spatial birth-and-death process $\{\mathbf{D}(t):-s\leq t\leq 0\}$ backward in time, for some $s>0$ , starting at time zero with $\mathbf{D}(0)\sim\rho$ , and ii) use thinning on the dominating process to obtain an upper bounding process $\{\mathbf{U}_{s}(t):t\geq-s\}$ with $\mathbf{U}_{s}(-s)=\mathbf{D}(-s)$ and a lower bounding process $\{\mathbf{L}_{s}(t):t\geq-s\}$ with $\mathbf{L}_{s}(-s)=\varnothing$ , forward in time such that the condition $\mathbf{L}_{s}(t)\subseteq\mathbf{Z}(t)\subseteq\mathbf{U}_{s}(t)\subseteq\mathbf{D}(t)$ is guarantee to hold for all $t\geq-s$ . If $\mathbf{U}_{s}$ and $\mathbf{L}_{s}$ coalescence at time [math], that is, $\mathbf{U}_{s}(0)=\mathbf{L}_{s}(0)$ , then $\mathbf{U}_{s}(0)$ is a perfect sample from the target distribution $\mu$ . If there is no coalescence, then in the next iteration, increase $s$ and extend the dominating process further backward to time $-s$ and repeat the same procedure.

The criteria for thinning depends on the definition of the target distribution. However, the dominating process depends only on $\rho$ . Let $\widetilde{T}$ be the backward coalescence time given by

[TABLE]

The running time complexity of a dCFTP method is at least of order of the number of computations needed to construct the dominating process $\{\mathbf{D}(t):-\widetilde{T}\leq t\leq 0\}$ . Let

[TABLE]

Since the dominating process $\mathbf{D}$ is time-reversible and $\mathbf{D}(0)\sim\rho$ , we have $\mathbf{D}(-s)\sim\rho$ for all $s\geq 0$ . Hence, the distribution of $T_{s}$ does not depend on $s$ .

Since $\mathbf{U}_{s}(-s)=\mathbf{D}(-s)$ and $\mathbf{L}_{s}(-s)=\varnothing$ , if a point of $\mathbf{D}(-s)$ is alive at time [math], then $\mathbf{U}_{s}(0)\neq\mathbf{L}_{s}(0)$ . Therefore, $T_{s}\geq s$ implies that $\widetilde{T}\geq s$ and thus $s$ is at least $T_{s}$ for the coalescence to happen. As a consequence, the expected running time complexity of any dCFTP algorithm is at least of order of the expected number of births in $\mathbf{D}$ that are generated during an interval of length $\mathbb{E}[T_{s}]$ , which is $\kappa\,\mathbb{E}[T_{s}]$ because the births in the dominating process are Poisson with rate $\kappa$ .

Further recall that each birth is alive for a random time independently and exponentially distributed with mean one. Conditioned on $|\mathbf{D}(-s)|=m$ , $T_{s}$ is the maximum of $m$ $i.i.d.$ mean one exponential random variables. Since $|\mathbf{D}(-s)|\sim\mathsf{Poi}(\kappa)$ for all $s$ , we have

[TABLE]

where ${H(m)=\sum_{i=1}^{m}\frac{1}{i}}$ is the $m^{th}$ harmonic number. Using the fact that ${H(m)\geq\log m}$ for all ${m\geq 1}$ ,

[TABLE]

From Chernoff bound on the tail probabilities of Poisson distribution, there exists a constant $a>0$ such that $\mathbb{P}\left(|\mathbf{D}(-s)|\geq\kappa/2\right)=1-\exp\left(-a\kappa\right)$ , for all values of $\kappa$ . In conclusion, the expected running time complexity of any dCFTP algorithm is at least of order of $\kappa\log\kappa$ , which is of order of $\frac{1}{r^{d}}\log\left(\frac{1}{r}\right)$ for any $\kappa_{0}$ .

7 Simulations

In this section, we take $S=[0,1]^{2}$ and apply Algorithm 2 to generate perfect samples of hard-core process, Strauss process and PSM models. We ignore the case of area-interaction process as the implementation is similar to that of PSM model and expected to have same order of complexity.

We estimate the expected number of iterations of the algorithm for different values of the model parameters. As long as the samples on each cell are perfect, the reported results are the same for any choice of existing method to generate samples from $\mu_{i}$ ’s.

Hard-core and Strauss processes: Consider the Strauss process with intensity $\kappa=\kappa_{0}/(\mathsf{v}_{d}r^{d})$ . Recall that the hard-core process is a Strauss process with ${\gamma=0}$ . Panels (a), (b) and (c) in Figure 4 are correspond to $\kappa_{0}$ is equal to $0.1$ , $0.2$ and $0.25$ , respectively. Each panel has two curves corresponds to $\gamma=0$ (i.e., hard-core process) and $\gamma=0.5$ . Each curve is the estimated expected number of iterations of the algorithm as a function of $K$ , when the interaction range $2r=1/K$ . Perfect samples from $\mu_{i}$ on each cell $i$ are generated using the dCFTP method by Huber, (2012). Observe that in each case, the expected number of iterations of the algorithm seems to be $O\left(\log(1/r)\right)$ as shown in Theorem 1. These results suggest that $\bar{\kappa}$ in Theorem 1 can be at least $0.25$ .

Figure 5 corresponds a hard-core process with the interaction range $2r=0.01$ , and it compares the expected number of iterations of the new algorithm with that of the method proposed by Guo and Jerrum, (2018), for different values of $\kappa_{0}$ . The complexity of Algorithm 2 is slightly higher than that of Guo and Jerrum, (2018). However, as mentioned earlier, the algorithm of Guo and Jerrum, (2018) is restricted to hard-core processes, where as the new algorithm can be applied to more general Gibbs processes.

PSM model: Consider the PSM model with the interaction range $2r=1/K$ for some $K\geq 1$ . We take the intensities of type- $1$ and type- $2$ points to be $\displaystyle\kappa^{(1)}_{0}/(\pi r^{2})$ and $\displaystyle\kappa^{(2)}_{0}/(\pi r^{2})$ , respectively. Figure 6 plots the estimated expected number of iterations of the algorithm as a function of $K$ . Perfect samples from $\mu_{i}$ on each cell $i$ are generated using the dCFTP method by Kendall, (1998). Again, we see that the expected number of iterations of the algorithm seems to be $O\left(\log(1/r)\right)$ for small values of $\kappa_{0}=\kappa^{(1)}_{0}+\kappa^{(2)}_{0}$ .

8 Conclusion

In this paper, we considered the problem of perfect sampling for Gibbs point processes with a finite interaction range $2r$ , defined on $S\subseteq\mathbb{R}^{d}$ . We proposed a new perfect sampling algorithm by combining the existing perfect sampling methods and the partial rejection sampling proposed by Guo et al., (2017). For pairwise interaction processes, penetrable spheres mixture models, and area-interaction processes that are absolutely continuous with respect to a $\kappa$ -homogeneous Poisson point process on $S=[0,1]^{d}$ , we showed that if $\kappa=\kappa_{0}/(\mathsf{v}_{d}r^{d})$ , the proposed algorithm can be implemented with the expected running time complexity of $O(\log(1/r))$ as $r$ goes to [math], for sufficiently small values of $\kappa_{0}$ . We illustrated our findings using several simulation results. From these simulations, we notice that the value of $\kappa_{0}$ can be at least $0.25$ for Strauss processes. However, at this stage, we do not have a theoretical justification to support this claim, and we would like to address this in future research.

Acknowledgements

This work has been supported by the Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS), under grant number CE140100049. We would like to thank Michel Mandjes for bringing the partial rejection sampling to our attention.

Bibliography20

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Baddeley and Nair, (2012) Baddeley, A. and Nair, G. (2012). Fast approximation of the intensity of Gibbs point processes. Electron. J. Stat. , 6:1155–1169.
2Baddeley and Turner, (2005) Baddeley, A. and Turner, R. (2005). An R package for analyzing spatial point patterns. Journal of Statistical Software , 12(6):1–42.
3Baddeley and van Lieshout, (1995) Baddeley, A. J. and van Lieshout, M. N. M. (1995). Area-interaction point processes. Annals of the Institute of Statistical Mathematics , 47(4):601–619.
4Bezáková et al., (2016) Bezáková, I., Galanis, A., Goldberg, L. A., Guo, H., and Stefankovic, D. (2016). Approximation via correlation decay when strong spatial mixing fails. In 43rd International Colloquium on Automata, Languages, and Programming , volume 55 of LIP Ics. Leibniz Int. Proc. Inform. , pages Art. No. 45, 13. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern.
5Chiu et al., (2013) Chiu, S. N., Stoyan, D., Kendall, W. S., and Mecke, J. (2013). Stochastic geometry and its applications . Wiley Series in Probability and Statistics. John Wiley & Sons, Ltd., Chichester, third edition.
6Feng et al., (2017) Feng, W., Sun, Y., and Yin, Y. (2017). What can be sampled locally? In Proceedings of the ACM Symposium on Principles of Distributed Computing , PODC ’17, pages 121–130, New York, NY, USA.
7Feng and Yin, (2018) Feng, W. and Yin, Y. (2018). On local distributed sampling and counting. In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing , PODC ’18, pages 189–198, New York, NY, USA.
8Ferrari et al., (2002) Ferrari, P. A., Fernández, R., and Garcia, N. L. (2002). Perfect simulation for interacting point processes, loss networks and Ising models. Stochastic Process. Appl. , 102(1):63–88.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Perfect Sampling for Gibbs Point Processes Using

Abstract

1 Introduction

2 Notation

3 Spatial Point Processes

4 Partial Rejection Sampling

Example** (Hard-core model on a lattice).**

Lemma 1** (Lemma 5.4 of Guo et al., (2017)).**

5 Perfect Sampling for Gibbs Point Processes

Lemma 2**.**

Proof.

5.1 Cubic-grid partitioning

6 Running Time Analysis

Theorem 1**.**

Proof.

6.1 Comparison with existing well-known methods

7 Simulations

8 Conclusion

Acknowledgements

Lemma 1 (Lemma 5.4 of Guo et al., (2017)).

Lemma 2.

Theorem 1.