A large sample property in approximating the superposition of i.i.d. point processes
Tianshu Cong, Aihua Xia, Fuxi Zhang

TL;DR
This paper investigates the large sample property (LSP) for the superposition of i.i.d. point processes, extending classical results from sums of i.i.d. variables to point process superpositions.
Contribution
It establishes the LSP for the superposition of i.i.d. point processes, a novel extension of the law of small numbers in the context of point processes.
Findings
LSP holds for superpositions of i.i.d. point processes
Error in approximation decreases with sample size
Extends classical LSP results to point process superpositions
Abstract
One of the main differences between the central limit theorem and the Poisson law of small numbers is that the former possesses the large sample property (LSP), i.e., the error of normal approximation to the sum of independent identically distributed (i.i.d.) random variables is a decreasing function of . Since 1980's, considerable effort has been devoted to recovering the LSP for the law of small numbers in discrete random variable approximation. In this paper, we aim to establish the LSP for the superposition of i.i.d. point processes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Data Management and Algorithms · Stochastic processes and statistical mechanics
\GraphInit
[vstyle = Shade]
A large sample property in approximating the superposition of i.i.d. point processes
Tianshu Cong111School of Mathematics and Statistics, The University of Melbourne, VIC 3010, Australia, E-mail: [email protected]. Work supported by a Research Training Program Scholarship and a Cross-Disciplinary PhD Scholarship in Mathematics and Statistics at the University of Melbourne., Aihua Xia222School of Mathematics and Statistics, The University of Melbourne, VIC 3010, Australia, E-mail: [email protected]. Work supported in part by the Belz fund, Australian Research Council Grants Nos DP150101459 and DP190100613. and Fuxi Zhang333School of Mathematical Sciences, Peking University, Beijing 100871, China, E-mail: [email protected]. Work supported in part by NSF of China 11371040.
Abstract
One of the main differences between the central limit theorem and the Poisson law of small numbers is that the former possesses the large sample property (LSP), i.e., the error of normal approximation to the sum of independent identically distributed (i.i.d.) random variables is a decreasing function of . Since 1980’s, considerable effort has been devoted to recovering the LSP for the law of small numbers in discrete random variable approximation. In this paper, we aim to establish the LSP for the superposition of i.i.d. point processes.
Key words and phrases: point process approximation, superposition, central limit theorem.
AMS 2010 Subject Classification: Primary 60F05; secondary 60E15, 60G55.
Running title: Superposition of Point Processes
1 Introduction
The central limit theorem states that the distribution of the sum of independent copies of a random variable with finite second moment, after being normalized, converges weakly to the standard normal distribution. The Berry-Esseen bound ensures that, if has the finite third moment, the error of the normal approximation, measured in the Kolmogorov metric, is not worse than , where is a constant determined by the distribution of . In other words, the central limit theorem has the large sample property (LSP), i.e., the quality of the approximation improves as the sample size becomes large. The LSP can also be established for the functional central limit theorem measured in the Lévy-Prokhorov distance [Borovkov & Sakhanenko (1980), Haeusler (1984), Kubilius (1985), Ferger (1994), Utev (1986)]. Moreover, Stein’s method can be used to estimate the errors of diffusion approximation [Barbour (1990)].
The Poisson law of small numbers, on the other hand, does not possess the LSP. More precisely, if ’s are independent indicator random variables with for each , then the total variation distance between the distribution of and the Poisson distribution with mean is of the order [Barbour & Hall (1984)]. In particular, if for all , one can see that the quality of approximation does not improve when becomes large. This is due to the fact that a Poisson distribution has only one parameter while a normal distribution has two parameters. To recover the LSP, one has to introduce more parameters into the approximating distributions, e.g., signed compound Poisson measures, translated Poisson, compound Poisson, negative binomial and polynomial birth-death distributions [Presman (1983), Kruopis (1986), Čekanavičius (1997), Barbour & Xia (1999), Barbour & Choi (2004), Röllin (2007), Barbour, Chen & Loh (1992), Brown & Phillips (1999), Brown & Xia (2001)].
If we consider point processes rather than nonnegative integer-valued random variables, the counterpart is the superposition of point processes . The pioneering work of Grigelionis [Grigelionis (1963)] demonstrates that the distribution of the superposition of independent sparse point processes on the carrier space converges weakly to a Poisson process distribution. The same phenomenon can be established for the superposition of dependent sparse point processes on a general carrier space [Goldman (1967), Jagers (1972), Brown (1978), Kallenberg (1983)]. The accuracy of Poisson point process approximation has been of considerable interest since 1970’s [Serfling (1975), Brown (1978)]. Stein’s method for Poisson process approximation was subsequently established by [Barbour (1988), Barbour & Brown (1992)] for estimating the approximation errors and the method was further refined by [Brown, Weinberg & Xia (2000), Xia (2005b), Chen & Xia (2004)]. In the context of the aforementioned superposition of i.i.d. point processes, no error estimates were studied until the last decade [Schuhmacher (2005), Chen & Xia (2011)] and these studies show that the Poisson point process approximation to the superposition of i.i.d. point processes does not possess the LSP either. The aim of this note is to show that, by introducing more parameters into the approximating point process distribution, it is possible to recover a LSP in approximating the superposition of i.i.d. point processes.
Given that a Poisson point process on a compact metric space can be viewed as a Poisson number of i.i.d. points in the space, a natural step of introducing more parameters into the approximating point process is to replace the Poisson number by a random variable whose distribution is controlled by two or more parameters, such as the translated Poisson [Barbour & Choi (2004), Röllin (2007)], negative binomial [Brown & Phillips (1999)] and polynomial birth-death distributions [Brown & Xia (2001)]. The family of approximating distributions we will consider in this note is the polynomial birth-death process distributions introduced in [Xia & Zhang (2008)]. To quantify the difference between two point processes, as in [Schuhmacher (2005), Chen & Xia (2011)], we use the Wasserstein distance initiated in [Barbour & Brown (1992)]. The formal statement of the main result is given in Theorem 2.2. Several applications are provided in Section 3 to illustrate the order of convergence in the LSP. Section 4 is devoted to the proof of the main result.
2 Preliminaries and the main result
1. Point processes. For the reader’s convenience, in this part, we collect some basic concepts and facts, and introduce a partitional total variation distance for comparing point processes under a partition of the carrier space. The basic concepts needed for this note are point process, reduced palm process [Kallenberg (1983), Chapter 10], the Wasserstein distance [Barbour & Brown (1992)] and partition [Xia & Zhang (2012)].
Let be a compact metric space with metric bounded by 1. Let be the Borel -algebra induced by . A configuration on is a collection of finitely many particles located in . Equivalently, it can be represented as a non-negative integer-valued finite measure on . Denote by the total mass of a measure . Therefore, we can write as , where is the Dirac measure at . Let be the set of all configurations on , and be the -algebra generated by the mappings , [Kallenberg (1983), p. 12]. A point process is a measurable mapping from a probability space into . We use to stand for configurations, , to stand for point processes, and to stand for the laws of point processes.
Let be a point process with finite mean measure . The family of point processes are said to be the reduced Palm processes associated with if for any measurable function ,
[TABLE]
[Kallenberg (1983), Chapter 10]. Furthermore, suppose is finite, then one can define the second order reduced Palm processes associated with by
[TABLE]
for any measurable function [Kallenberg (1983), Chapter 12].
[Barbour & Brown (1992)] introduce a Wasserstein distance for quantifying the difference between two probability measures , on . The metric is defined in two stages. First, for two finite measures and on , define
[TABLE]
where is the normalized measure of , and . In particular, by the Kantorovich-Rubinstein duality theorem [Rachev (1991), Theorem 8.1.1], for two probability measures and on , , where are -valued -measurable random elements. For two configurations , we have the duality representation when and otherwise, where is taken over all permutations of . The metric is defined as
[TABLE]
where , and the last equality is due to the duality theorem [Rachev (1991), Theorem 8.1.1].
For a partition of , where is a finite set, let , that is, is a point such that is as small as possible, . We call a center of . Let . We call an -partition of if . Denote all -partitions of by .
For any partition , we define an assembling mapping as
[TABLE]
The assembling mapping, when applied to a configuration , shifts all particles of in to its center . For a point process , we define the partitional total variation distance as
[TABLE]
where for two probability measures and on , We write
[TABLE]
2. Polynomial birth-death point process. As mentioned in the Introduction, there are various ways to introduce more parameters into the approximating point process for better accuracy of approximation. In this part, we collect the facts around the polynomial birth-death point process established in [Xia & Zhang (2008)].
For , , , we define the polynomial birth-death distribution introduced in [Brown & Xia (2001)] as
[TABLE]
where
[TABLE]
The distribution can be viewed as the equilibrium distribution of the birth-death process with birth rates and death rates . The polynomial birth-death point process is given by
[TABLE]
where are independent, , with being a probability measure on , . Denote by \mbox{\boldmath\pi}_{a,b;{\beta};\mu}.
3. Main Result. Suppose are independent and identically distributed point processes. In each of the following two cases, we can give the polynomial birth-death point process approximation of the superposition under the Wasserstein distance . Denote
[TABLE]
[TABLE]
Case 1. If , we take and .
Case 2. Either or and , we set and
[TABLE]
Remark 2.1
Note that and
[TABLE]
we have and for .
In all cases, let
[TABLE]
where is the reduced Palm distribution of at . Our main result is as follows.
Theorem 2.2
For both cases above, there exists a constant , depending on , and such that for any ,
[TABLE]
for in Case 1 and in Case 2. In particular,
[TABLE]
valid for the same range of specified above, where is defined as
[TABLE]
Remark 2.3
The last terms of (4) and (6) are typically of order for a positive constant depending on the distribution of [Chung & Lu, Theorem 2.7] but the remaining terms of (4) and (6) are typically of order no better than .**
Proposition 2.4
* is decreasing in .*
Proof. First, since for any partition . It follows that . Noting that when and when , we obtain .
Remark 2.5
Case 1 is known as over-dispersion [Faddy (1994)]. It is shown in [Brown, Hamza & Xia (1998)] that over-dispersion in statistics arising from natural phenomena is much more common than under-dispersion, i.e., .**
Remark 2.6
Let be the normalized distribution of . Since is the normalized distribution of , we have
[TABLE]
Thus in Theorem 2.2 can be replaced by at the cost of being added to the upper bound.**
3 Examples
In this section, we demonstrate the use of Theorem 2.2 in five applications: Bernoulli process, Bernoulli process with shifts, compound Poisson process, renewal process and entrances and exits of Markov process. For simplicity, except in subsection 3.3, we only consider point processes on the carrier space with . Extension to any compact carrier space is a straightforward exercise.
3.1 Bernoulli process
As a warming up example, we consider a simple Bernoulli process , where are independent Bernoulli random variables with , and is a finite positive integer. This is a typical case where the actual support space of the point process is a subset of the carrier space and it reminds us that the partition technique should not be applied blindly. The following theorem is a generalisation of [Xia & Zhang (2008)] with the same order of convergence as that for the special case in [Xia & Zhang (2008)].
Theorem 3.1
For i.i.d. Bernoulli processes , if , let and be as defined in Case 2, we have
[TABLE]
Remark 3.2
The distances amongst play no role in the speed of convergence. **
Proof of Theorem 3.1. The support of is a reduced carrier space , so it suffices to consider the reduced carrier space with partition and . With the partition , corresponds to the vector . Let be the sum of independent copies of , and be the vector with value at the -th component and [math] otherwise. Then, by the independence,
[TABLE]
Noting that is the sum of independent Bernoulli random variables, we have , which implies
[TABLE]
Hence, it follows from (8) that the second term of (5) is bounded by .
3.2 Bernoulli process with shifts
The aim of this example is to show that we may use a marked point process to get better approximation bounds.
Similar to the previous subsection, we define , where are independent with having Bernoulli distribution with , taking values in , and is a fixed positive integer.
Theorem 3.3
If , let and be as defined in Case 2, then for Bernoulli processes with shifts ,
[TABLE]
Remark 3.4
If we apply a partition as introduced in the previous section directly, then the bound we may obtain is at most of order . **
Proof. According to Remark 2.6, with , it suffices to show
[TABLE]
We embed into marked point processes [Daley & Vere-Jones (2008), pp. 194–195] and use Theorem 3.1 to complete the proof. To this end, we take fixed points with distances from each other and define a ground process [Daley & Vere-Jones (2008), p. 194] of as . The metric induces and in the same way as that generates and . The mean measure of is . Let be the same as those defined in Theorem 3.3 and set For Bernoulli processes , it follows from Theorem 3.1, Remark 3.2 and Remark 2.6 that
[TABLE]
Using the Rubinstein duality theorem [Rachev (1991), Theorem 8.1.1] and decompositions of point processes [Kallenberg (1983), §2.1], we can find -valued random vectors and such that , \mathscr{L}\left(\sum_{i=1}^{m}Y_{2i}\delta_{t_{i}}\right)=\mbox{\boldmath\pi}_{a,b;{\beta};\nu^{\prime}} and
[TABLE]
We now use and as ground processes to construct marked point processes as suitable realisations of and \mbox{\boldmath\pi}_{a,b;{\beta};\nu}. Let be independent copies of such that is independent of , define
[TABLE]
then , \mathscr{L}\left({\cal W}_{2}\right)=\mbox{\boldmath\pi}_{a,b;{\beta};\nu}, and
[TABLE]
Combining (10), (11) and (12) gives (9).
3.3 Compound Poisson process
[Barbour, Chen & Loh (1992)] and [Barbour & Månsson (2002)] demonstrate that a compound Poisson process is often good enough as a suitable asymptotic model for a variety of random phenomena. In this example, we show that the superposition of such a model can be well described by Theorem 2.2.
Recall that a compound Poisson process on a compact carrier space is defined as , where are independent Poisson processes with mean measures on respectively and we write .
Theorem 3.5
If with and being absolutely continuous with respect to , then for i.i.d. compound Poisson processes , with chosen as in Case 1, we have
[TABLE]
Remark 3.6
Noting that the superposition , Theorem 3.5 states that, with suitably chosen parameters, \mbox{\boldmath\pi}_{a,b;{\beta};\mu} can be used to replace a compound Poisson process in the context of superposition of point processes.**
Remark 3.7
The condition that is absolutely continuous with respect to guarantees aperiodicity of the distribution and it plays the crucial role in the theory of compound Poisson approximation in [Barbour, Chen & Loh (1992), Barbour & Utev (1998), Barbour & Utev (1999), Barbour & Månsson (2002), Xia (2005a)].**
Remark 3.8
It can be observed from the proof below that better upper bounds are possible if more information about is available.**
Proof of Theorem 3.5. Taking a reduced carrier space if necessary, without loss of generality, we assume equals the support of , that is, the smallest closed set such that . Since , Case 1 applies. Set , let be a partition and be a Poisson process on with mean measure , then
[TABLE]
where the last inequality is from Proposition A.2.7 in [Barbour, Holst & Janson (1992)]. Hence
[TABLE]
which implies, for arbitrary , . It then follows from (5) that
[TABLE]
completing the proof.
3.4 Renewal process
The superposition of renewal processes is not a renewal process except that they are Poisson processes [Feller (1968), p. 370] and the exact behaviour of the superposition is generally hard to extract. In this subsection, we establish its asymptotic behaviour.
Let , , , be independent non-negative random variables defined on a probability space . The variables , , are strictly positive and identically distributed, which play the role of inter-renewal times of the renewal process . We assume and choose the delay to make the renewal process stationary [Daley & Vere-Jones (2008), p. 75]. We define , which is the renewal point process restricted to [Kallenberg (1983), p. 12]. Before stating the result in this subsection, we briefly recall three terminologies. The support of a random variable is defined as the smallest closed set such that and, for two subsets of , and .
Theorem 3.9
Assume the renewal time satisfies
[TABLE]
and , then for renewal processes , with chosen as in Case 1 if and in Case 2 if , we have
[TABLE]
Remark 3.10
If , then it satisfies (13) and the bound in Theorem 3.9 holds.**
Remark 3.11
The condition (13) is almost necessary. See counterexample 3.13 below.**
Remark 3.12
The condition can not be easily deduced from the moments of . However, if we consider a sufficiently large carrier space, the asymptotic behaviour of the renewal process ensures that the condition can be verified through the first two moments of .**
Proof of Theorem 3.9. For any , we take an such that . We divide into equally spaced intervals with so that , where and for . The centre of is and . Consequently, the first term in (5) is bounded by . With the partition , set , define and as the sum of independent copies of . Applying [Barbour, Luczak & Xia (2018), Lemma 4.1], we obtain
[TABLE]
where and is a universal constant. If , then the second term in (5) with is also dominated by for sufficiently large , which implies that the bound in (5) can be made arbitrarily small as . To establish , we make use of the assumption that the support of satisfies . Since is closed, and in , the operation is continuous, and are both closed, which means that . This in turn implies that there exists at least one such that both and are positive for all . It is also possible to find a such that for all , . For the convenience of argument, we extend the stationary renewal point process to on . For , if is small enough, the set
[TABLE]
has positive Lebesgue measure.
From stationarity, there is a positive probability that there is at least one point in , and conditional on the largest point in and the past, the renewal process has a positive probability for the future inter-renewal times , , to evolve as for all until time and it also guarantees a positive probability that the incoming inter-renewal times , , evolve as , , for until time . The choice of and synchronicity of and ensure that an extra renewal point caused by is added in , and the subsequent renewal points of the two renewal processes occur in the same partition sets simultaneously. Consequently, we can set aside a positive probability event such that on , the two renewal processes run together until the point in and then one runs according to and the other evolves as . Figure 1 shows the coupling when , , , , a renewal happens at around , with a positive probability, the next three inter-arrival times are each around ; with another positive probability, the incoming four inter-arrival times respectively take values around , , , .
For this coupling, the corresponding vectors and satisfy that on (in Figure 1, and ), which implies that for all . This concludes the proof.
Counterexample 3.13
If for some , then for some so the method does not work.**
In fact, for large enough, when , there is one point of sitting in the interval almost surely. But when we have , there are no points in except in . On the other hand, for large enough, almost surely because . In this situation, , i.e., .
Remark 3.14
It is possible to extend Theorem 3.9 to the superposition of non-stationary renewal processes, provided there are use-friendly criteria for ensuring for all and .**
3.5 Entrances or exits of Markov Process
Let be a time-reversible and irreducible Markov chain with finite state space . Let be a proper subset of . As the exit process from can be viewed as the entrance process of , we consider entrance process to only. Let and for . Then the total number of entrances to in can be written as with , and the times of entrances form a point process with convention when . Clearly, is almost surely finite.
Theorem 3.15
For i.i.d. entrance processes , with chosen as in Case 1,
[TABLE]
Remark 3.16
When is a single point set, forms a renewal process, Theorem 3.15 becomes a special case of Theorem 3.9. However, when contains more than one state, then is no longer a renewal process.**
Proof of Theorem 3.15. [Brown, Hamza & Xia (1998), Corollary 2] implies that , so Case 1 applies. The rest of the proof is essentially the same as that of Theorem 3.9. For any , we choose an such that . Let and , where and for . The centre of is at and . This partition ensures that the first term in (5) is bounded by . Set and define and as the sum of independent copies of . It follows from [Barbour, Luczak & Xia (2018), Lemma 4.1] that
[TABLE]
where and is a universal constant. It remains to show that . Since is irreducible, we can choose a state such that there is a positive probability of entering immediately after leaving . Let be the first time that the Markov chain enters , be the first time after to depart from , and be the first and second jump times of after time [math]. From the assumption that is finite irreducible, we can conclude that and so and
[TABLE]
which in turn imply , as claimed.
4 The Proof of Theorem 2.2
The advantage of using \mbox{\boldmath\pi}_{a,b;{\beta};\mu} as approximating distribution is that it can be considered as the unique stationary distribution of an -valued positive recurrent process with the generator
[TABLE]
see [Xia & Zhang (2012)] for more details. We use to stand for a birth-death point process with generator and initial configuration . For any bounded measurable function on , it can be shown that
[TABLE]
is well defined and is the solution of the Stein equation
[TABLE]
To estimate d_{2}(\mathscr{L}({\cal W}),\mbox{\boldmath\pi}_{a,b;{\beta};\mu}), it is equivalent to bound for all defined on page 2. As can be expressed via the differences of , the successful application of the Stein method hinges on sharp upper bounds of
[TABLE]
Let . Then it is shown in [Xia & Zhang (2012)] that
[TABLE]
Now, we are ready to prove Theorem 2.2.
Proof of Theorem 2.2. The inequalities (5) and (7) are due to the well-known concentration inequality, see [McDiarmid (1998), Theorem 2.7] and [Chung & Lu, Theorem 2.7]. Hence it remains to show (4) and (6).
Suppose . The “assembling mapping” ensures that for any configuration ,
[TABLE]
It follows that for any point process , which yields
[TABLE]
To compute the distance between and , we concentrate on the space and apply Stein’s method. Denote by the class of all configurations on . For any , let
[TABLE]
Then, with generator , we have a positive recurrent Markov process on . The unique stationary measure is \tilde{}\mbox{\boldmath\pi}:=\mbox{\boldmath\pi}_{a,b;{\beta};\tilde{\mu}}=\mathscr{L}(\mathscr{M}_{\mathscr{G}}\circ{\cal Z}), where
[TABLE]
Denote
[TABLE]
For any , let be the unique solution of
[TABLE]
Then
[TABLE]
Now we concentrate on estimating . Let
[TABLE]
Then
[TABLE]
First of all, we can write via ’s. Namely, since are independent identically distributed,
[TABLE]
With the reduced Palm processes, one can write as
[TABLE]
We subtract in the first four terms and in the last one. Then, can be written via ’s, provided that the number of added is balanced with that of added. More precisely, we need
[TABLE]
which is equivalent to (2) and (3). With (2) and (3), we write via ’s. For example, the first term in (18) becomes
[TABLE]
The difference can be telescoped out as the sum of functions. Provided the number of is balanced with that of , one can further write via ’s or differences of two ’s. To this end, let and
[TABLE]
Then,
[TABLE]
provided that
[TABLE]
In both cases, and are taken to ensure the above equality.
To estimate , we decompose them into the sum of functions of the forms
[TABLE]
The bounds in the following lemma can be found in [Xia & Zhang (2012), pp. 3060-3061].
Lemma 4.1
For any point process , and , both and are bounded above by
[TABLE]
where is defined in (2) and is defined in (1).
To estimate , let , , . Then,
[TABLE]
Since is independent of , it follows that
[TABLE]
Similarly, we have
[TABLE]
Since , we have , where
[TABLE]
It is not difficult to check that in each of the two cases, has order , is a constant and has order . Hence has order . Let be a constant independent of such that , then for ,
[TABLE]
where is a constant. Using the fact that and have the same distribution, we combine (16) and (17) to conclude that
[TABLE]
Since is arbitrary, the proof of Theorem 2.2 is complete.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1]
- 2[Barbour (1988)] Barbour, A. D. (1988) Stein’s method and Poisson process convergence. J. Appl. Probab. 25 (A), 175–184.
- 3[Barbour (1990)] Barbour, A. D. (1990) Stein’s method for diffusion approximations. Probab. Theory Related Fields 84 , 297–322.
- 4[Barbour & Brown (1992)] Barbour, A. D. & Brown, T. C. (1992) Stein’s method and point process approximation. Stochastic Processes Appl. 43 , 9–31.
- 5[Barbour, Chen & Loh (1992)] Barbour, A. D., Chen, L. H. Y. & Loh, W. L. (1992) Compound Poisson approximation for nonnegative random variables via Stein’s method. Ann. Probab. 20 , 1843–1866.
- 6[Barbour & Choi (2004)] Barbour, A. D. & Choi, K. P. (2004) A non-uniform bound for translated Poisson approximation. Electron. J. Probab. 9 , 18–36.
- 7[Barbour & Hall (1984)] Barbour, A. D. & Hall, P. (1984) On the rate of Poisson convergence. Math. Proc. Cambridge Philos. Soc. 95 , 473–480.
- 8[Barbour, Holst & Janson (1992)] Barbour, A. D., Holst, L. & Janson, S. (1992) Poisson Approximation. Oxford Univ. Press.
