On Poisson approximations for the Ewens sampling formula when the mutation parameter grows with the sample size
Koji Tsukuda

TL;DR
This paper investigates Poisson approximations for the Ewens sampling formula when the mutation parameter increases with the sample size, expanding understanding of its asymptotic properties in this regime.
Contribution
It advances the analysis of the Ewens sampling formula by studying its asymptotic behavior with a growing mutation parameter using Poisson approximation techniques.
Findings
Asymptotic properties of the total number of alleles analyzed
Distribution of component counts approximated by Poisson distributions
New results for the case where mutation parameter grows with sample size
Abstract
The Ewens sampling formula was firstly introduced in the context of population genetics by Warren John Ewens in 1972, and has appeared in a lot of other scientific fields. There are abundant approximation results associated with the Ewens sampling formula especially when one of the parameters, the sample size or the mutation parameter which denotes the scaled mutation rate, tends to infinity while the other is fixed. By contrast, the case that grows with has been considered in a relatively small number of works, although this asymptotic setup is also natural. In this paper, when grows with , we advance the study concerning the asymptotic properties of the total number of alleles and of the counts of components in the allelic partition assuming the Ewens sampling formula from the viewpoint of Poisson approximations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On Poisson approximations for the Ewens sampling formula when the mutation parameter grows with the sample size
Koji Tsukuda111Graduate School of Arts and Sciences, the University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902. mail: [email protected]
Abstract
The Ewens sampling formula was firstly introduced in the context of population genetics by Warren John Ewens in 1972, and has appeared in a lot of other scientific fields. There are abundant approximation results associated with the Ewens sampling formula especially when one of the parameters, the sample size or the mutation parameter which denotes the scaled mutation rate, tends to infinity while the other is fixed. By contrast, the case that grows with has been considered in a relatively small number of works, although this asymptotic setup is also natural. In this paper, when grows with , we advance the study concerning the asymptotic properties of the total number of alleles and of the counts of components in the allelic partition assuming the Ewens sampling formula, from the viewpoint of Poisson approximations.
1 Introduction
For a positive integer , consider the sequence of nonnegative integer-valued random variables satisfying and for . For , let us denote and . This denotes the component counts in a random combinatorial structure of size . In the context of population genetics, Ewens (1972) introduced what is called the Ewens sampling formula
[TABLE]
as the distribution of the allelic partition in a sample of size from the population which follows the stationary distribution of the infinitely-many neutral allele model with scaled mutation rate , where is the rising factorial . See for instance Section 2.5 of Feng (2010) for its derivation and basic properties. Hereafter, we consider (1.1) as a model of . The unsigned Stirling number of the first kind () is the coefficent of in , and is in conformity with the number of permutations of elements with disjoint cycles. Hence, if (1.1) is assumed, the total number of alleles included in the sample, in other words the total number of distinct cycles in a random permutation, follows the falling factorial distribution (Watterson, 1974a)
[TABLE]
In this paper, we will present asymptotic properties, especially Poisson approximations, of and when both and increase.
Beyond population genetics domain, the Ewens sampling formula has been widely applied to other fields such as ecology, disclosure risk assessments, nonparametric statistics and so on. In addition, laws of component counts in a lot of random structures are approximated by the Ewens sampling formula. For a general review and an up-to-date review with discussions, we refer the reader to Chapter 41 of Johnson, Kotz and Balakrishnan (1997), whose write-up was provided by S. Tavaré and W.J. Ewens, and to Crane (2016), respectively. For (1.1), (1.2) and related probabilistic models, a lot of works have discussed asymptotic properties under the situations with fixed or with fixed , see for instance Feng (2016). It is natural to consider some relations between the population size and the sample size. Since is proportional to the population size in the context of population genetics, Feng (2007) and Tsukuda (2017a) discussed the asymptotic behavior of under the settings that both and simultaneously tend to infinity. Under this asymptotic setting, Feng (2007) established the large deviation principle and Tsukuda (2017a) demonstrated asymptotic properties of the maximum likelihood estimator of .
Following previous works, we set three major goals. Tsukuda (2017a) extended the asymptotic normality of as with fixed , which is due to Watterson (1974b), to the situation when both and increase. The first goal of this paper is discussing this result from the viewpoint of Poisson approximations. Moreover, Arratia, Barbour and Tavaré (1992) showed the Poisson process approximation of as with fixed when is fixed or grows with , and Arratia, Stark and Tavaré (1995) established its total variation asymptotics. The second goal is studying corresponding asymptotic results about when grows with . Furthermore, Hansen (1990) provided a functional central limit theorem for the Ewens sampling formula, and Arratia and Tavaré (1992) gave its elegant proof via the Poisson process approximation. Our third goal is to discuss extensions of this and related weak convergence results.
1.1 Notations
Consider sequences and . If , then we write . Let be a constant. If then we write , if then we write , and if then we write . Let and for any sequence , and let for any value . When we consider the limits of and simultaneously, we use the notation .
Let denote the coefficient of in the power series expansion of . Let denote the -th derivative of function . Let and denote the floor function and the ceiling function, respectively. Let be the gamma function and the digamma function. For real , denotes the positive part of .
The space is the set of càdlàg functions on endowed with the Skorokhod topology. The space is equivalence classes of real valued functions on which are square integrable with respect to the Lebesgue measure endowed with the topology.
The total variation distance between the laws which random vectors and follow is denoted by . The convergence of to in probability and the weak convergence of to are denoted by and , respectively.
1.2 Asymptotic settings
Letting be a finite constant, we study the following asymptotic settings in this paper:
Case A: ; Case B: ; Case C: ;
Case C1: and ; Case C2: ; Case C3: .
This devision is introduced in Tsukuda (2017a). It should be noted that in Section 4 of Feng (2007), when does not converge to 0, the relation between and are divided into Cases A, B, C above and with fixed . Moreover, throughout this paper, we assume that does not decrease as increase.
Remark 1**.**
In Case C3, it holds that . Note that when in which we are not interested since , it holds that . These convergences can be checked through showing the convergence in first mean.
1.3 Organization
In Section 2, we review asymptotic results associated with the Ewens sampling formula in the literature which will be discussed in this paper. Before probabilistic result, in Section 3, let us provide some preliminary evaluations for sequences related to the mean of . Section 4 is devoted to show Poisson approximations for and in Case A and C, respectively. Section 5 is devoted to discuss independent process approximations for in a Ewens partition. Section 6 shows the functional central limit theorems for the Ewens sampling formula when grows with . In addition, Appendix includes some lemmas used in proofs.
2 Results in the literature
2.1 Normal and Poisson approximations for
In the combinatorial context, it is worthwhile to know when typical distributions such as Normal, Poisson or other distributions asymptotically appear. See for instance Flajolet and Soria (1990). For the total number of alleles which follows (1.2), Watterson (1974b) proved the following central limit theorem (CLT for short): For fixed ,
[TABLE]
as , where is a standard normal variable. A stronger result, the Poisson approximation for , was stated by Arratia and Tavaré (1992): For fixed
[TABLE]
as , where is a Poisson variable with mean . Later, in order to improve the approximation accuracy, Yamato (2013) provided the following CLT which adopts another standardization: For fixed ,
[TABLE]
as . Moreover, Yamato (2013) showed the approximation for by a Poisson variable with the approximate mean: For fixed
[TABLE]
as , where is a Poisson variable with mean .
When grows with , the standardization should be changed in many cases. Let , where and . Tsukuda (2017a) showed that
[TABLE]
where and is a Poisson variable with mean .
Remark 2**.**
Professor Shuhei Mano pointed out that the proof of Theorem 2 in Tsukuda (2017a) is incorrect in Case C1. In this remark, let us correct the failure. As it is stated in the right-hand side in the equation (14) of Tsukuda (2017a), it holds that where
[TABLE]
In Case C1, since , it holds that , and hence
[TABLE]
We thus have
[TABLE]
By using , the first term in the right-hand side of (2.6) is
[TABLE]
The second term in (2.6) is also because it holds that
[TABLE]
and that . Therefore , and, consequently,
Remark 3**.**
As a corollary to the large deviation principle for when , Feng (2007) provided the following weak law of large numbers in Corollary 4.1:
[TABLE]
and as with fixed . These law of large numbers in Cases A, B and C can be obtained directly from the calculation of , see Proposition 2 of Tsukuda (2017a).
2.2 Independent process approximations for
Consider a sequence of independent Poisson variables with for and denote for a positive integer . Then, it is well-known that (1.1) can be derived from the conditioning relation
[TABLE]
see for instance Watterson (1974a). It means that the dependence in is given by the condition . It is of interest to discuss whether the effect of this dependence asymptotically vanishes or not. It was answered by Arratia, Barbour and Tavaré (1992) who showed the small components can be approximated by independent Poisson variables: For any fixed positive integer , it holds that
[TABLE]
as . Note that (2.9) is equivalent to because both and are discrete.
It is more interesting to consider the case that grows with . For positive integer , let us denote the total variation distance and the distance in the Wasserstein metric between and by and , respectively, that is,
[TABLE]
For these quantities, it holds that
[TABLE]
As for the Ewens sampling formula, is a convenient measure of approximations because a concrete construction, the Feller coupling, can be given. See Arratia, Barbour and Tavaré (1992, 2016). The Feller coupling is as follows: Let be a sequence of Bernoulli variables with for any . Then, the Ewens sampling formula (1.1) is given as the joint distribution of
[TABLE]
and
[TABLE]
for . Moreover, define
[TABLE]
for , then follows the independent Poisson distribution with mean for any . That is because the convergences in probability and for any yield that , and so (2.9) yields that for any . By using this construction, Arratia, Barbour and Tavaré (1992) proved the Poisson process approximation for growing with :
[TABLE]
if then
[TABLE]
Note that (2.11), (2.12) and (2.14) are not asymptotic results. Lower bound results for the total variation distance, which complement (2.11), were given by Arratia, Barbour and Tavaré (1992): ; and by Barbour (1992): if then for some .
Another compelling result for evaluating is deriving the leading term of , which were given by Arratia, Stark and Tavaré (1995) for general logarithmic assemblies. If the Ewens sampling formula is considered, the statement is as follows: If then
[TABLE]
where . As it is stated in Corollary 4 of their paper, if and if then the leading term of is given by the first term in the right-hand side of (2.15).
2.3 Functional central limit theorems
The results by Arratia, Barbour and Tavaré (1992) provide an elegant way to derive asymptotic properties. Among others, by using (2.13), Arratia and Tavaré (1992) provided an alternative proof of the functional central limit theorem for the Ewens sampling formula which was originally proven by Hansen (1990): The random process
[TABLE]
converges weakly to in as , where is a standard Brownian motion. This approach is generalized to broader logarithmic structures. See Arratia, Stark and Tavaré (1995) and Arratia, Barbour and Tavaré (2000). Moreover, by using the Poisson process approximation, Tsukuda (2017b) provided a weighted version in : Both of the random processes
[TABLE]
and
[TABLE]
converge weakly to in as , where is a positive constant.
Remark 4**.**
In the case that , the weak convergence of in was provided by DeLaurentis and Pittel (1985).
Let be the -th cycle length in a random permutation of which has disjoint cycles, and the loglength of -th cycle is defined by . Consider its empirical distribution function
[TABLE]
for . Define the random processes
[TABLE]
where is a positive constant. When , the weak convergence of to a standard Brownian bridge in was shown by DeLaurentis and Pittel (1985), see Notes (2) after Theorem in their paper. Its extension to the Ewens sampling formula and version are presented as follows, which may have not appeared in the literature.
Proposition 2.1**.**
*(i) The random process converges weakly to in as .
(ii) The random process converges weakly to in as .*
We omit its proof because we will present an extended version in Proposition 6.3. From Proposition 2.1, it follows from the continuous mapping theorem that
[TABLE]
as .
Remark 5**.**
As it is stated in DeLaurentis and Pittel (1985), Proposition 2.1 means that is nearly , which is the distribution function of the standard uniform distribution.
2.4 Auxiliary results
In this subsection, let us set out some auxiliary results concerning Poisson approximations which will be used in the proofs of our statements.
Consider a sequence of independent Bernoulli variables and its partial sum , where for any . Then, by using the Chen–Stein method, Theorems 1 and 2 of Barbour and Hall (1984) gave the sharp bound for the Poisson approximation for a partial sum of Bernoulli variables: For a Poisson variable with mean , it holds that
[TABLE]
Moreover, from a property of the Hellinger integral, a bound for the total variation distance between two Poisson distributions were given in Theorem 2.1 of Yannors (1991): For Poisson variables and with respective means and , it holds that
[TABLE]
3 Preliminary results
Before discussing probabilistic results, let us show asymptotic evaluations on sums of sequences which will be used. Consider two sequences and given by
[TABLE]
Proposition 3.1**.**
(i) It holds that
[TABLE]
and that
[TABLE]
Especially, in Case A, it holds that
[TABLE]
and that if then
[TABLE]
(ii) It holds that
[TABLE]
and that
[TABLE]
Especially, in Case C, it holds that
[TABLE]
Proof. (i) Since
[TABLE]
the result (3.2) holds. Since
[TABLE]
the result (3.3) holds.
(ii) From
[TABLE]
and from
[TABLE]
the results (3.4) and (3.5) follow. This completes the proof.
Proposition 3.2**.**
In Case A, it holds that
[TABLE]
Proof. It follows from
[TABLE]
that the left-hand side of (3.6) is . Since
[TABLE]
and as , it holds that
[TABLE]
In Case A, the first term in the right-hand side is . This completes the proof.
Remark 6**.**
As it is stated in (2.3), Yamato (2013) discussed the asymptotic normality of standardized by , which means that is approximated by from (3.7). If , the bound in (3.6) is meaningless to discuss CLT. On the other hand, if the centering by is better than centering by , which was used in Corollary 2 of Tsukuda (2017a), because .
Proposition 3.3**.**
In Case C, it holds that
[TABLE]
Proof. The triangle inequality yields that
[TABLE]
The first term is , the second term is , and from as the third term is
[TABLE]
This completes the proof.
4 Poisson approximations for the total number of alleles
Introduce two Poisson variables and whose means are given by and , respectively, where follows (1.2) and where and are given in (3.1). Poisson approximations corresponding to (2.2) are given in the following proposition.
Proposition 4.1**.**
(i) In Case A,
[TABLE]
and
[TABLE]
(ii) In Case C,
[TABLE]
and
[TABLE]
Proof. Let and be sequences of Bernoulli variables with respective parameters and for . Then, it holds that and that . To prove the desired results, we will use (2.21) and Proposition 3.1.
(i) The result (4.1) follows from
[TABLE]
Since , it holds that
[TABLE]
for enough large . Two displays above imply .
(ii) The result (4.2) follows from
[TABLE]
In Case C1, since , it holds that
[TABLE]
for enough large . Two displays above imply the . In Case C2, since and since is bounded by some constant for enough large , the same evaluation provides . In Case C3, since , it holds that and that
[TABLE]
for enough large . We thus have . This completes the proof.
Remark 7**.**
From asymptotic properties of the Poisson distribution and Proposition 4.1, the result of (2.5) in Cases A and C can be derived.
In Proposition 4.1, we have considered Poisson variables with rigorous means and . Next, let us discuss centerings by approximate means presented by Yamato (2013) and Tsukuda (2017a) from the viewpoint of Poisson approximation. Introduce three Poisson variables and whose means are given by , and respectively.
Lemma 4.2**.**
(i) In Case A, it holds that
[TABLE]
and that
[TABLE]
(ii) In Case C, it holds that
[TABLE]
Proof. We will use (2.22). (i) First we see (4.3). Since and tend to infinity in Case A, for enough large . Moreover, by using Proposition 3.1, it holds that
[TABLE]
and hence (4.3).
Next we see (4.4). By using Propositions 3.1 and 3.2, it holds that
[TABLE]
and hence (4.4).
(ii) First consider Case C1. Since and tend to infinity in Case C1, for enough large . By using Proposition 3.1, it holds that
[TABLE]
and hence (4.5) holds as .
Next consider Case C2. The magnitude relationship of and is not determined, but they have the same bound because . Hence (4.5) holds as .
Finally, consider Case C3. Since and tend to 0, for enough large . By using Proposition 3.1, it holds that
[TABLE]
and hence (4.5) holds as . This completes the proof.
From what has already been proven, the triangle inequality yields the following Poisson approximations corresponding to (2.4).
Proposition 4.3**.**
(i) In Case A, if then
[TABLE]
and if then
[TABLE]
Moreover, in Case A, if then
[TABLE]
and if and then
[TABLE]
(ii) In Case C, it holds that
[TABLE]
5 On independent process approximation of component counts
5.1 Case A
First we see the asymptotic independence of small components in Case A when for some , recalling that we assume that does not decrease as increase. We will not discuss the other case, for all , because we are interested in large
Consider defined in Subsection 2.2. Let us denote for , and for and . It follows from the conditioning relation (2.8) that
[TABLE]
where and .
Proposition 5.1**.**
In Case A, if for some and if , then for any with any fixed positive integer .
Proof. From (5.1), in order to prove the desired result, it suffices to show that
[TABLE]
We first calculate . Letting , we have
[TABLE]
see equation (5) of Arratia, Barbour and Tavaré (1992).
Let be a positive integer such that . It holds that
[TABLE]
where
[TABLE]
Since the right-hand side of (5.2) is
[TABLE]
the first term and the second term will be evaluated in Lemma 5.2 and Lemma 5.3, respectively. From Lemma 5.2, the elements in the bracket of the first term is . Next we see . It follows from Lemma 5.3 that
[TABLE]
where and with constants such that . By letting be a positive constant, the right-hand side is for any positive since is fixed and since . Thence .
Now we have
[TABLE]
and, as a result,
[TABLE]
On the other hand,
[TABLE]
If , and so
[TABLE]
If , Lemma A.1 and the Stirling formula yield that
[TABLE]
and hence
[TABLE]
From what has already been proven, we obtain
[TABLE]
This completes the proof.
In the following lemma, we see the first term of (5.3).
Lemma 5.2**.**
Let . For , for and for any positive integers and , it holds that
[TABLE]
Moreover, in Case A, if , , for some and , then
[TABLE]
Proof. Let be . It holds that
[TABLE]
for . Thus, for ,
[TABLE]
For , The Faà di Bruno formula yields that
[TABLE]
where is the partial Bell polynomial, so
[TABLE]
for any . By using the triangle inequality,
[TABLE]
where is the Stirling number of the second kind. The above two displays and the triangle inequality imply that
[TABLE]
For , the Stirling formula yields that
[TABLE]
where we have used and . We thus have
[TABLE]
for , which is (5.8).
Next we prove (5.9). If for all , the result is obvious because the left-hand side of (5.9) is 1. Otherwise, by letting be an positive integer such that , the desired result follows from
[TABLE]
and from
[TABLE]
This completes the proof.
The following lemma is used to evaluate in (5.3).
Lemma 5.3**.**
Let and let
[TABLE]
Then, for any positive integers and , it holds that
[TABLE]
where , , , and is an arbitrary positive constant.
Proof. Consider a complex variable . Since and are analytic in , by using the Cauchy inequality for coefficients, it holds that
[TABLE]
The right-hand side is
[TABLE]
because
[TABLE]
where we have used Lemma A.2 for the second inequality. Hence, it follows from the Cauchy inequality again that
[TABLE]
This completes the proof.
Let us provide some remarks on Proposition 5.1.
Remark 8**.**
Proposition 5.1 indicates that when the components of are asymptotically independent, and asymptotically follows the Poisson distribution with mean for . As a consequence, for any fixed , if then
[TABLE]
and if then
[TABLE]
where is a -dimensional standard normal variable with independent coordinates.
Remark 9**.**
Proposition 5.9 below is stronger than Proposition 5.1, but the proof is included because some evaluations are different from the proof of Theorem 1 of Arratia, Barbour and Tavaré (1992) who used the Darboux lemma (see for instance Theorem of Knuth and Wilf (1989)), and because Lemmas 5.2 and 5.3 will be used in the proof of Theorem 5.8 below.
In Proposition 5.1, is assumed. The following proposition shows that this assumption is necessary for the approximation of by Poisson variables .
Proposition 5.4**.**
In Case A, if for some , then for any with any fixed positive integer only if .
Proof. To prove the assertion, we see the case that . Let , then we have . From (5.3), equals
[TABLE]
Since
[TABLE]
from the proof of Proposition 5.1, it is enough to show that
[TABLE]
only if .
Since is assumed not to decrease as increases, we study the following three cases: (i) for some ; (ii) for all and for some ; (iii) for all . First, consider (i). Let be a positive integer such that . Then, it holds that
[TABLE]
where we have used Lemma A.3 for the second inequality. From the binomial theorem, the right-hand side is equal to
[TABLE]
The above display is not less than 1 and converges to 1 only if . Second, consider (ii). Let be a positive integer such that . Then, it holds that
[TABLE]
which converges to 1 only if . Finally, consider (iii). Let be a positive integer such that . Then, it holds that
[TABLE]
This completes the proof.
Thence, we have the following corollary to Propositions 5.1 and 5.4.
Corollary 5.5**.**
In Case A, if for some , then for any with any fixed positive integer if and only if .
Subsequently, let us derive the result corresponding to (2.15) following a similar programme to Arratia, Stark and Tavaré (1995). It follows from (2.8) that
[TABLE]
see (50) of Arratia, Stark and Tavaré (1995). Firstly, via the large deviation inequality, we see that is approximated by
[TABLE]
with
[TABLE]
From the definition, if then and otherwise . In contrast to Arratia, Stark and Tavaré (1995), includes since we consider , but a similar treatment perform well.
Lemma 5.6**.**
In Case A, with , it holds that
[TABLE]
for any positive .
Proof. The first inequality is obvious, so we see the latter one. From Lemma 8 of Arratia, Stark and Tavaré (1995), for any , it holds that
[TABLE]
If then, by putting , the right-hand side of (5.12) is
[TABLE]
which tends to minus infinity faster than for any positive . If then, by putting , the right-hand side of (5.12) is
[TABLE]
which tends to minus infinity faster than for any positive . This completes the proof.
The next lemma shows that is approximately .
Lemma 5.7**.**
In Case A, if then it holds that
[TABLE]
for and for any positive .
Proof. From the Schwartz inequality, it follows that
[TABLE]
where we have used for the second inequality. Lemma 5.6 yields that for any positive . This completes the proof.
The following result is an extension of (2.15) to large setup.
Theorem 5.8**.**
In Case A, if for some and , then
[TABLE]
for
[TABLE]
In addition, when , it holds that .
Proof. Let be a positive integer such that . Since it follows from Lemma 5.6 that
[TABLE]
for any positive , we see the first term.
Let . For , we have
[TABLE]
and the last term should be evaluated for and growing with .
If does not diverge, as it is seen in the proof of Proposition 5.1, since . Thence, we consider the case that . Using Lemma 5.3 with , we have
[TABLE]
Since and since , the right hand side is asymptotically equal to
[TABLE]
where and . From (5.14), the right-hand side is
[TABLE]
where we have used . The right-hand side is for any positive constant . After all, we have even when .
Now we have
[TABLE]
This expansion, , and (5.5)–(5.7) yield that
[TABLE]
Since and which follow from and (5.14), the binomial expansion and Lemma A.1 yield that
[TABLE]
and hence
[TABLE]
Therefore, it holds that
[TABLE]
From what has already been proven, it holds that
[TABLE]
where we have used Lemma 5.7 in the fourth equality and the relation , which follows from , in the fifth equality.
Finally, consider the case that . It follows from the Jensen inequality that
[TABLE]
which implies . This completes the proof.
Next we discuss the Poisson process approximation via the Feller coupling (see Subsection 2.2). The following result follows directly from (2.14).
Proposition 5.9**.**
Suppose that for all and . In Case A, if, and only if, . In addition, when , it holds that
[TABLE]
Remark 10**.**
When , Theorem 5.8 and Proposition 5.9 yield that
[TABLE]
with , which shows that the asymptotic decay rates of and are different.
Let be the -th shortest cycle length in a Ewens partition, that is,
[TABLE]
for and when there is no such . See Section 2E of Arratia and Tavaré (1992). The following statement is a direct corollary to Theorem 5.8.
Corollary 5.10**.**
Let be a positive integer such that and let . Under the assumption of Proposition 5.9,
[TABLE]
Proof. Proposition 5.9 yields that
[TABLE]
This completes the proof.
Remark 11**.**
Corollary 5.10 yields that, under the assumption of Proposition 5.9, , so if then and if then . Note that when the Pitman sampling formula is considered, the shortest cycle length converges to 1 in probability except the Ewens sampling formula (see Mano (2017)).
The uniform bound with respect to , which gives an extension of (2.13), is given in the following proposition. Its applications to functional central limit theorems will be presented in the next section.
Proposition 5.11**.**
In Case A,
[TABLE]
Proof. By using the triangle inequality and (2.12), it holds that
[TABLE]
for any , see the proof of Theorem 2 of Arratia, Barbour and Tavaré (1992). When , by setting , the first and third terms in (5.17) are and , respectively. Otherwise, by setting the result holds with the bound . Hence, This completes the proof.
5.2 Case C
The probability mass function (1.1) is obtained from the conditioning relation (2.8) with a sequence of independent Poisson variables with respective means . We also get (1.1) from (2.8) with Poisson varivables with respective means (see for instance Watterson (1974a)). In Case C, the following lemma shows that rather fit.
Lemma 5.12**.**
In Case C,
[TABLE]
Therefore, for , if , then
[TABLE]
Proof. It holds that
[TABLE]
which is (2.18) of Watterson (1974a). Since the Stirling formula as yields that as for any , it holds that
[TABLE]
Hence, the result (5.18) follows from . This completes the proof.
Remark 12**.**
Since
[TABLE]
and
[TABLE]
it holds that
[TABLE]
for .
According to Lemma 5.12, it may be natural to consider that and Poisson variable with mean are asymptotically similar, but Proposition 3.3 indicates that, except Case C3, an independent process approximation by Poisson variables with means seems difficult in the sense of the joint distribution. Actually, the following theorem shows that in Case C2 the linear relation between and asymptotically remains.
Theorem 5.13**.**
(i) In Case C2,
[TABLE]
*where is a Poisson variable with , and .
(ii) In Case C3,*
[TABLE]
Proof. (i) It follows from (5.19) that
[TABLE]
which implies that
[TABLE]
It yields that and so
[TABLE]
These displays yield that
[TABLE]
From this, we obtain
[TABLE]
We conclude from (2.5), which means , that
[TABLE]
hence that . Moreover, from what has already been proven, we obtain .
(ii) Since
[TABLE]
Lemma 5.12 yields the result. This completes the proof.
Theorem 5.13 directly implies the following corollaries which represent properties of the shortest cycle length and the longest cycle length in a Ewens partition. These extreme sizes are of interest in the combinatorial context, see for instance Mano (2017).
Corollary 5.14**.**
In Case C2 or C3,
[TABLE]
Proof. In Case C3, the conclusion is obvious, so we see Case C2. It holds that
[TABLE]
This completes the proof.
Corollary 5.15**.**
In Case C2, for a positive integer
[TABLE]
Proof. For , it holds that
[TABLE]
For , it holds that This completes the proof.
Remark 13**.**
Since the law of the singleton in a Ewens partition is given by
[TABLE]
for , Corollary 5.15 directly follows from
[TABLE]
6 Functional central limit theorems
As a corollary to Proposition 5.11, the functional central limit theorems which extend the results of Hansen (1990) and Tsukuda (2017b) follow. Before the results, as a corollary to Proposition 5.11, let us states the error bounds of Poisson process approximations in the sense of the expectation of the error in the supremum norm and in the norm.
Corollary 6.1**.**
In Case A,
[TABLE]
and
[TABLE]
Proof. The result (6.1) follows from
[TABLE]
and Proposition 5.11. The result (6.2) follows from
[TABLE]
(for this evaluation see the proof of Lemma 3.1 of Tsukuda (2017b)) and from Proposition 5.11. This completes the proof.
By using Corollary 6.1, we provide functional central limit theorems which slightly extend the preceding result in which is assumed to be fixed.
Proposition 6.2**.**
(i) In Case A, if
[TABLE]
*then the random process defined in (2.16) converges weakly to a standard Brownian motion in .
(ii) In Case A, if*
[TABLE]
then both of the random processes and , which are defined in (2.17) and (2.18) respectively, converge weakly to in .
Proof. (i) From (6.1) and the assumption (6.3), it follows that
[TABLE]
By using the functional central limit theorem for Poisson processes in , the random process
[TABLE]
converges weakly to a standard Brownian motion in (see the Proof of Theorem 5 of Arratia and Tavaré (1992)). Since
[TABLE]
the random process
[TABLE]
converges weakly to in because of the assumption (6.3). From (6.5) and the weak convergence of (6.6), Theorem 2.7 (iv) of van der Vaart (1998) yields the result.
(ii) First we argue . From (6.2) and (6.4), it follows that
[TABLE]
It holds that
[TABLE]
where is a homogeneous Poisson process with unit intensity. Since
[TABLE]
and the other hypotheses hold with , and (see Subsection 6.2 of Tsukuda (2017b)) Lemma A.4 in Appendix implies that (6.7) converges weakly to in . From what has been already proven, Theorem 2.7 (iv) of van der Vaart (1998) yields the result.
Next we argue . Since
[TABLE]
which follow from the almost same argument as the proof of Theorem 7.1 of Tsukuda (2017b) by the assumption (6.4), we have
[TABLE]
From Lemma A.4 in Appendix with , and , it holds that the random process
[TABLE]
converges weakly to in . Consequently, the desired result follows. This completes the proof.
Remark 14**.**
It follows from Proposition 6.2 (i) that if (6.3) holds then (2.1) holds. But as it is stated in (2.5), the asymptotic normality of holds for far larger .
The following result, promised in Subsection 2.3, is an extension of Proposition 2.1.
Proposition 6.3**.**
*(i) In Case A, if (6.3) holds then the random process defined in (2.19) converges weakly to a standard Brownian bridge in .
(ii) In Case A, if (6.4) holds then the random process defined in (2.20) converges weakly to in .*
Proof. (i) Since it holds that
[TABLE]
for any , it is sufficient to show
[TABLE]
and
[TABLE]
in . Firstly, (6.8) holds because the assumption (6.3) yields that and because it follows from (2.7) that
[TABLE]
Next, we show (6.9). Since it follows from Proposition 5.11 and from the assumption (6.3) (see (6.5)) that
[TABLE]
and that
[TABLE]
the triangle inequality yields that
[TABLE]
where
[TABLE]
By using the functional central limit theorem for Poisson processes in , converges weakly to in (see Arratia and Tavaré (1992)).
(ii) By the same reason as (i), it is sufficient to show (6.8) and
[TABLE]
in . Firstly, it holds that
[TABLE]
and that
[TABLE]
These right-hand sides of (6.11) and (6.12) converge to 0 in probability because the expectations of their square root converge to 0 by the assumption (6.4). Secondly, by letting be a homogeneous Poisson process with unit intensity, it follows from
[TABLE]
that
[TABLE]
and that
[TABLE]
Thirdly, it holds that
[TABLE]
The distributions of the first term and second term in the right-hand side are equal to
[TABLE]
and
[TABLE]
respectively. Both of them converge to 0 in probability because their expectations tend to 0 from the assumption (6.4). Thus, the triangle inequality yields that
[TABLE]
where
[TABLE]
Since
[TABLE]
Theorem 4 of Tsukuda (2016) yields that by setting with , and . Consequently, we have (6.10). This completes the proof.
Appendix A Appendix. Auxiliary lemmas
Lemma A.1**.**
In Case A, if then
[TABLE]
Proof. The left-hand side of (A.1) equals
[TABLE]
By using the asymptotic series expansion as , it holds that
[TABLE]
and that
[TABLE]
Hence, the left-hand side of (A.1) is
[TABLE]
and, from the asymptotic expansion as it follows that
[TABLE]
and hence (A.2) is
[TABLE]
This completes the proof.
Lemma A.2**.**
Let be an positive integer. For , it holds that
[TABLE]
Proof. It holds that
[TABLE]
As for the last term, it holds that
[TABLE]
This completes the proof.
Lemma A.3**.**
For any and for any positive integer , is increasing with respect to .
Proof. The proof is by induction on . When , is increasing. Let and satisfy . If the conclusion of the lemma is true for , then the conclusion is also true for because
[TABLE]
This completes the proof.
Lemma A.4**.**
Let be a homogeneous Poisson process with intensity satisfying . Define the non-decreasing function with respect to and with respect to which satisfies for all ,
[TABLE]
with an increasing function of satisfying , and
[TABLE]
for some . Then, the random process
[TABLE]
converges weakly to a Gaussian process in as .
Remark 15**.**
Lemma A.4 is a slight generalization of Lemma 2.1 of Tsukuda (2017b). The only difference is condition (A.3), where corresponding condition (2.1) of Tsukuda (2017b) is the case that with a constant . To show Lemma A.4, the equation
[TABLE]
in the proof of Lemma 2.1 of Tsukuda (2017b) should be replaced by
[TABLE]
and the other part has no need to change.
Acknowledgements
The author would like to his heartfelt gratitude to Professor Shuhei Mano for a lot of constructive comments.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Arratia, Barbour and Tavaré (1992) Arratia, R., Barbour, A.D., Tavaré, S. (1992). Poisson process approximations for the Ewens sampling formula. Ann. Appl. Probab. 2 , no. 3, 519–535.
- 2Arratia, Barbour and Tavaré (2000) Arratia, R., Barbour, A.D., Tavaré, S. (2000). Limits of logarithmic combinatorial structures. Ann. Probab. 28 , no. 4, 1620–1644.
- 3Arratia, Barbour and Tavaré (2016) Arratia, R., Barbour, A.D., Tavaré, S. (2016). Exploiting the Feller coupling for the Ewens sampling formula. Statist. Sci. 31 , no. 1, 27–29.
- 4Arratia, Stark and Tavaré (1995) Arratia, R., Stark, D., Tavaré, S. (1995). Total variation asymptotics for Poisson process approximations of logarithmic combinatorial assemblies. Ann. Probab. 23 , no. 3, 1347–1388.
- 5Arratia and Tavaré (1992) Arratia, R., Tavaré, S. (1992). Limit theorems for combinatorial structures via discrete process approximations. Random Structures Algorithms 3 , no. 3, 321–345.
- 6Barbour (1992) Barbour, A.D. (1992). Refined approximations for the Ewens sampling formula. Random Structures Algorithms 3 , no. 3, 2670–276.
- 7Barbour and Hall (1984) Barbour, A.D., Hall, P. (1984). On the rate of Poisson convergence. Math. Proc. Cambridge Philos. Soc. 95 , no. 3, 473–480.
- 8Crane (2016) Crane, H. (2016). The ubiquitous Ewens sampling formula. Statist. Sci. 31 , no. 1, 1–19.
