Simple perfect samplers using monotone birth-and-death processes
Hiroyuki Masuyama

TL;DR
This paper introduces simple perfect sampling algorithms based on monotone birth-and-death processes that efficiently generate exact samples from arbitrary finite discrete distributions, with bounds on their expected running times.
Contribution
It constructs monotone birth-and-death processes matching the target distribution and derives bounds for the coalescence and running times of two perfect samplers, including a memory-efficient method.
Findings
Derived upper bounds for coalescence times of the processes.
Established bounds for the running times of Doubling and Read-once CFTP algorithms.
Demonstrated the effectiveness of the proposed sampler for unnormalized distributions.
Abstract
This paper proposes simple perfect samplers using monotone birth-and-death processes (BD-processes), which draw samples from an arbitrary finite discrete target distribution. We first construct a monotone BD-process whose stationary distribution is equal to the target distribution. We then derive upper bounds for the expected coalescence time of the copies of the monotone BD-process. We also establish upper bounds for the expected values and tail probabilities of the running times of two perfect samplers, which are Doubling CFTP and Read-once CFTP using our monotone BD-process. The latter sampler can draw samples exactly from unnormalized target distributions with little memory consumption.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Statistical Methods and Bayesian Inference · Advanced Statistical Process Monitoring
Simple perfect samplers using monotone
birth-and-death processes111This paper has been submitted for publication in a special issue on “Queueing Theory and Network Applications” in Annals of Operations Research.****
Hiroyuki Masuyama222E-mail: [email protected]
Department of Systems Science, Graduate School of Informatics, Kyoto University
Kyoto 606-8501, Japan
Abstract
[TABLE]
[TABLE]
1 Introduction
Perfect sampling algorithms are based on “Coupling From The Past (CFTP)”, proposed by Propp and Wilson (1996). CFTP is a powerful technique that enables us to perform perfect sampling from the target distribution, i.e., to generate, in a finite time, samples that perfectly follow the target distribution. Basically, CFTP is time- and memory-consuming because we have to check whether or not the copies of a Markov chain used for CFTP coalesce at a single state every time we extend the sample paths of the copies to the past.
Propp and Wilson (1996) stated that CFTP is effectively achieved by a monotone Markov chain (see, e.g., Keilson and Kester 1977) constructed from the target distribution, which is called monotone CFTP or monotonic CFTP (MCFTP). As far as we know, there have been a small number of examples for which MCFTP algorithms are established, for example, attractive spin systems (Propp and Wilson 1996), closed Jackson networks (Kijima and Matsui 2008a, b), discretized Dirichlet distributions (Matsui et al. 2010) and truncated Gaussian distributions (Philippe and Robert 2003). In particular, the algorithms proposed by Kijima and Matsui (2008a, b) and Matsui et al. (2010) are remarkably fast, though they are somewhat sophisticated.
The main purpose of this paper is to establish simple perfect samplers, which draw samples from an arbitrary target distribution on an arbitrary finite discrete set . It should be noted that is mapped one-to-one to a finite set of nonnegative numbers. Thus, we assume, without loss of generality, that , where is a positive integer. We also assume that
[TABLE]
For later use, let , , and for any . For such that , let . Let and for . Furthermore, we use the notation to represent .
In this paper, we first construct a monotone birth-and-death process (monotone BD-process or MBD for short) whose stationary distribution is equal to the target distribution . More specifically, we construct a monotone stochastic matrix such that
[TABLE]
where
[TABLE]
By definition, for and . We prove that is an irreducible and monotone stochastic matrix whose stationary distribution is equal to the target distribution (see Theorem 2.1 below). We then discuss the first time when the copies of the MBD characterized by coalesce at a single state, which is called the coalescence time and denoted by . Utilizing the existing results on BD-processes, we derive the upper bound for the expected coalescence time:
[TABLE]
where is a certain parameter (possibly depending on ).
Next we consider Doubling CFTP and Read-once CFTP (see, e.g., Huber 2016) using our MBD, which is referred to as Doubling-MBD sampler and Read-once-MBD sampler, respectively. Using (1.16), we obtain upper bounds for the expected values and tail probabilities of the running times of Doubling-MBD and Read-once-MBD samplers. These upper bounds show that the expected running times of the two MBD samplers are , and thus they are slower than the sophisticated special-purpose algorithms mentioned above. However, the construction of our MBD is very simple and little memory-consuming. In general, Doubling MCFTP and Read-once MCFTP are easily implementable (for details, see, e.g., Huber 2016). Therefore, Doubling-MBD and Read-once-MBD samplers are easily implementable and general-purpose perfect sampling algorithm. Furthermore, Read-once-MBD sampler is little memory-consuming, though the sampler is somewhat more time-consuming than Doubling-MBD sampler. As a result, Read-once-MBD sampler can draw samples from unnormalized target distributions with little memory consumption. This is a remarkable feature of Read-once-MBD sampler.
The rest of this paper is divided into two sections. Section 2 discusses our MBD constructed from the target distribution. Section 3 considers the performance of the two perfect samplers using our MBD.
2 Monotone BD-process from the target distribution
This section consists of two subsections. Section 2.1 constructs a monotone BD-process (MBD) whose stationary distribution is equal to the target distribution. Section 2.2 derives some upper bounds for the expected coalescence time of the copies of the MBD.
2.1 Construction of a monotone BD-process from the target distribution
The following theorem is the fundamental result of this paper.
Theorem 2.1
The stochastic matrix defined by (1.8) together with (1.11)–(1.15) is an irreducible and monotone one whose stationary distribution is equal to the target distribution .
Proof.
From (1.1) and (1.11)–(1.15), we have
[TABLE]
which show that is an irreducible stochastic matrix and that the target distribution is a reversible measure and thus a unique stationary distribution of . Therefore, it suffices to prove that
[TABLE]
where (2.2) is the condition for the monotonicity of (see, e.g., Keilson and Kester 1977, Definition 1.2).
From (1.11) and (1.15), we have, for ,
[TABLE]
and thus
[TABLE]
where the last inequality follows from (2.3) and the last equality follows from (2.1). Similarly, for ,
[TABLE]
The proof is completed. ∎∎
The following corollary is immediate from Theorem 2.1.
Corollary 2.1
Suppose that the conditions of Theorem 2.1 are satisfied. We then have the following:
- (i)
If is nondecreasing, i.e.,
[TABLE]
then (1.11) and (1.14) are reduced to
[TABLE] 2. (ii)
If is nonincreasing, i.e.,
[TABLE]
then (1.11) and (1.14) are reduced to
[TABLE]
Remark 2.1
Suppose that the conditions of the statement (i) of Corollary 2.1 are satisfied. Let denote an MBD with state space and transition probability matrix in (1.8) together with (2.4) and (2.5). Furthermore, suppose that the BD-processes starts with an initial distribution such that
[TABLE]
Note here that (2.6) yields
[TABLE]
which shows that the target distribution is log-concave. Therefore, it follows from Fill and Kahn (2013, Proposition 3.2, Corollary 3.3(a) and Theorem 5.1) that the BD-process mixes (i.e., converges to stationarity) faster in total variation distance than does an arbitrary MBD that has the same state space , stationary distribution and initial distribution as the BD-process .
Next we describe a construction of the copies of the MBD with state space and transition probability matrix , which can be used for MCFTP. To this end, we define as a sequence of independent and identically distributed (i.i.d.) uniform random variables in . We then have the following result.
Theorem 2.2
Suppose that the conditions of Theorem 2.1 are satisfied. Let denote a function such that, for and ,
[TABLE]
where and are given in (1.11) and (1.14), respectively. Furthermore, for each , let denote a sequence of random variables such that
[TABLE]
Under these conditions, the stochastic processes ’s, , are MBDs with transition probability matrix , which satisfy
[TABLE]
Proof.
It is clear that ’s, , are MBDs with transition probability matrix . Thus, we prove that (2.9) holds.
It follows from (2.7) that, for ,
[TABLE]
It also follows from (2.2) that for . Therefore,
[TABLE]
Combining (2.8) and (2.10) yields (2.9). ∎∎
Theorem 2.2 shows that the function , together with the uniform random variables ’s, generates MBDs with transition probability matrix . Thus, we refer to as a monotone update function for MBDs with . Note here that ’s can be considered the copies of a generic BD-process driven by the monotone update function , which is denoted by . Especially, we refer to and as the upper-bounding and lower-bounding copies, respectively, of .
2.2 Expected coalescence time of the copies of the monotone BD-process
Let denote
[TABLE]
which is the first time when all the copies ’s coalesce at a single state in the state space . Thus, we call the coalescence time of the copies ’s of . It follows from (2.9) and (2.11) that
[TABLE]
We now define , , , as a generic random variable for the first passage time from state to state . We assume that are independent and so are , which does not lose generality due to the skip-free property of BD-processes. It then follows from (2.12) that
[TABLE]
where the symbol “” represents the equality in distribution. Therefore,
[TABLE]
We can readily obtain (see, e.g., Theorem 4.11 of Heyman and Sobel (2004), where continuous-time BD-processes are considered)
[TABLE]
Substituting (2.14) and (2.15) into (2.13) yields
[TABLE]
Using (2.16), we obtain the following result.
Theorem 2.3
If the conditions of Theorem 2.1 are satisfied, then
[TABLE]
where is a positive constant such that
[TABLE]
Proof.
Note that
[TABLE]
Substituting these inequalities into (2.16) leads to (2.17) with (2.18). ∎∎
Under some additional conditions, we obtain simpler bounds for .
Theorem 2.4
Suppose that the conditions of Theorem 2.1 are satisfied. We then have the following:
- (i)
If there exists some independent of such that
[TABLE]
then
[TABLE] 2. (ii)
If there exists some independent of such that
[TABLE]
then
[TABLE]
Remark 2.2
If is nonincreasing or nondecreasing, then (2.21) holds for and thus the statement (ii) of Theorem 2.4 yields
[TABLE]
*Proof of Theorem 2.4. * We first prove the statement (i). Applying (2.19) to (2.16) yields
[TABLE]
which shows that (2.20) holds. Next we prove the statement (ii). Combining (2.21) and (2.16), we have either of the following inequalities:
[TABLE]
[TABLE]
Each of the two inequalities shows that (2.22) holds. ∎
Example 2.1** **(Truncated geometric distribution)
Consider a truncated geometric distribution. To this end, fix
[TABLE]
where . Clearly, for , which satisfies the conditions of the statement (i) of Corollary 2.1. Thus, from (2.4) and (2.5), we have
[TABLE]
Note here that
[TABLE]
Combining these results and the statement (i) of Theorem 2.4 yields
[TABLE]
Example 2.2** **(Zipf distribution)
Consider the following Zipf distribution :
[TABLE]
where . We then have
[TABLE]
which is decreasing with . Therefore, according to the statement (ii) of Corollary 2.1, we fix
[TABLE]
Furthermore, since is decreasing (see Remark 2.2), it follows from (2.23) and (2.24) that
[TABLE]
3 Perfect samplers using the monotone BD-process
In this section, we discuss the running times of Doubling CFTP and Read-once CFTP using the monotone update function , which are referred to as Doubling-MBD sampler and Read-once-MBD sampler, respectively.
To facilitate the subsequent discussion, we introduce some definitions. For and , let
[TABLE]
For convenience, let for all and . In addition, for , and , let denote
[TABLE]
where is the monotone update function given in (2.7) and is a sequence of i.i.d. uniform random variables in . Note that, for any , the two processes and are the upper- and lower-bounding copies of an MBD with transition probability matrix , which run from time to time .
We first consider Doubling MBD sampler, which is described in Algorithm 1.
Let denote the number of the uniform random variables used by Algorithm 1, i.e., is equal to a positive integers such that
[TABLE]
Following Huber (2008), we read as the running time of Algorithm 1. Using Huber (2008, Lemma 5.4), we obtain the following result.
Proposition 3.1** **(Doubling-MBD sampler)
[TABLE]
where is the positive constant given in (2.18).
Proof.
It follows from Huber (2008, Lemma 5.4) that
[TABLE]
Combining these with Theorem 2.3 results in (3.1) and (3.2). ∎∎
Next we consider Read-once-MBD sampler, which is described in Algorithm 2 below.
Remark 3.1
When Algorithm 2 stops, we have
[TABLE]
As with Algorithm 1, we define as the number of the uniform random variables used by Algorithm 2, and then read as the running time of Algorithm 2. Let and denote the numbers of the iterations in Steps (ii) and (iii), respectively, of Algorithm 2. By definition, and are independent and
[TABLE]
In addition,
[TABLE]
Using Theorem 2.3 together with (3.3) and (3.4), and proceeding as in the proof of Huber (2008, Lemma 5.4), we obtain the following result.
Theorem 3.1** **(Read-once-MBD sampler)
Fix such that , and fix the block size of Algorithm 2 such that , where is the positive constant given in (2.18). We then have
[TABLE]
where . In addition, the value of integer minimizing the right hand side of (3.5) is equal to six, or equivalently,
[TABLE]
Proof.
It follows from Markov’s inequality that, for any fixed ,
[TABLE]
Note here that is log-subadditive (see, e.g., Propp and Wilson 1996, Theorem 6). Thus, from (3.8), we have
[TABLE]
We now fix to maximizing . It then follows from (3.9) that
[TABLE]
From and Theorem 2.3, we also have . Using this and (3.10), we obtain
[TABLE]
Substituting (3.11) into (3.3) yields
[TABLE]
Therefore, there exist independent random variables and such that
[TABLE]
It follows from (3.13) that
[TABLE]
and
[TABLE]
Combining these results with (3.4) and (3.12), we have
[TABLE]
which imply that (3.5) and (3.6) hold due to .
In what follows, we prove (3.7), which is equivalent to
[TABLE]
where denotes a function such on that
[TABLE]
By definition, is convex and
[TABLE]
Let , , denote the numerator of in the above equation, i.e.,
[TABLE]
We then have
[TABLE]
which lead to and . Note here that and . Therefore, the convexity of yields and , which results in (3.14). ∎∎
Proposition 3.1 and Theorem 3.1 imply that the running time of Doubling-MBD sampler is less than the running time of Read-once-MBD sampler. However, Doubling-MBD sampler has to store all the generated (uniform) random numbers until it outputs a sample following the target distribution. On the other hand, Read-once-MBD sampler is little memory-consuming because the sampler uses, only one time, each of the generated random numbers.
We close this section by comparing our perfect samplers with the inverse transform sampling (see, e.g., Fishman 1996). The inverse transform sampling for discrete target distributions is easy implementable and takes the running time in order to draw a sample from the target distribution. Therefore, the inverse transform sampling is less time-consuming than our perfect samplers.
To discuss this topic from a different perspective, we suppose that the target distribution is not normalized, in other words, we have an unnormalized target distribution such that and
[TABLE]
It then follows from (1.15) and (3.15) that
[TABLE]
Therefore, our two perfect samplers still work well by using the unnormalized target distribution . On the other hand, the inverse transform sampling has a problem in the present situation because it needs the cumulative distribution , where for . To obtain the cumulative distribution , we have to compute the normalizing constant by summing the unnormalized target distribution over its support set .
It should be note that the obtained constant includes, at worst, the rounding error. Such rounding error can be reduced to if is computed by pairwise summation (see, e.g., Higham 1993). Furthermore, if is computed by Kahan summation algorithm, then the rounding error can be basically reduced to but its computational complexity is four times as much as that of naive summation (see, e.g., Higham 1993). Even though we take any of these options, we have to store all the information of the cumulative distribution . Such memory consumption is not necessary for our two perfect samplers.
As a result, although our MBD perfect samplers may not be particularly superior in speed to other methods, they are easily implementable and can draw samples exactly from unnormalized target distributions. Especially, Read-one MBD sampler achieves such exact sampling with little memory consumption.
Acknowledgments
The author thanks Dr. Shuji Kijima for helpful comments that motivated this work.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Fill and Kahn (2013) Fill, J. A., Kahn, J., 2013. Comparison inequalities and fastest-mixing Markov chains. The Annals of Applied Probability 23 (5), 1778–1816.
- 2Fishman (1996) Fishman, G., 1996. Monte Carlo: Concepts, Algorithms, and Applications. Springer, New York.
- 3Heyman and Sobel (2004) Heyman, D. P., Sobel, M. J., 2004. Stochastic Models in Operations Research, Vol. I: Stochastic Processes and Operating Characteristics, paperback Edition. Dover Publications, Mineola, New York.
- 4Higham (1993) Higham, N. J., 1993. The accuracy of floating point summation. SIAM Journal on Scientific Computing 14 (4), 783–799.
- 5Huber (2008) Huber, M., 2008. Perfect simulation with exponential tails on the running time. Random Structures and Algorithms 33 (1), 29–43.
- 6Huber (2016) Huber, M., 2016. Perfect Simulation. CRC Press, Boca Raton, FL.
- 7Keilson and Kester (1977) Keilson, J., Kester, A., 1977. Monotone matrices and monotone Markov processes. Stochastic Processes and their Applications 5, 231–241.
- 8Kijima and Matsui (2008 a) Kijima, S., Matsui, T., 2008 a. Approximation algorithm and perfect sampler for closed jackson networks with single servers. SIAM Journal on Computing 38 (4), 1484–1503.
