On Poisson approximations for the Ewens sampling formula when the   mutation parameter grows with the sample size

Koji Tsukuda

arXiv:1704.06768·math.PR·March 29, 2022

On Poisson approximations for the Ewens sampling formula when the mutation parameter grows with the sample size

Koji Tsukuda

PDF

TL;DR

This paper investigates Poisson approximations for the Ewens sampling formula when the mutation parameter increases with the sample size, expanding understanding of its asymptotic properties in this regime.

Contribution

It advances the analysis of the Ewens sampling formula by studying its asymptotic behavior with a growing mutation parameter using Poisson approximation techniques.

Findings

01

Asymptotic properties of the total number of alleles analyzed

02

Distribution of component counts approximated by Poisson distributions

03

New results for the case where mutation parameter grows with sample size

Abstract

The Ewens sampling formula was firstly introduced in the context of population genetics by Warren John Ewens in 1972, and has appeared in a lot of other scientific fields. There are abundant approximation results associated with the Ewens sampling formula especially when one of the parameters, the sample size $n$ or the mutation parameter $θ$ which denotes the scaled mutation rate, tends to infinity while the other is fixed. By contrast, the case that $θ$ grows with $n$ has been considered in a relatively small number of works, although this asymptotic setup is also natural. In this paper, when $θ$ grows with $n$ , we advance the study concerning the asymptotic properties of the total number of alleles and of the counts of components in the allelic partition assuming the Ewens sampling formula from the viewpoint of Poisson approximations.

Equations442

P (C_{n}^{n} = a_{n}) = \frac{n !}{( θ ) _{n}} j = 1 \prod n (\frac{θ}{j})^{a_{j}} \frac{1}{a _{j} !} 1 {j = 1 \sum n j a_{j} = n}

P (C_{n}^{n} = a_{n}) = \frac{n !}{( θ ) _{n}} j = 1 \prod n (\frac{θ}{j})^{a_{j}} \frac{1}{a _{j} !} 1 {j = 1 \sum n j a_{j} = n}

P (K_{n} = k) = \overset{s}{ˉ} (n, k) \frac{θ ^{k}}{( θ ) _{n}} .

P (K_{n} = k) = \overset{s}{ˉ} (n, k) \frac{θ ^{k}}{( θ ) _{n}} .

\frac{K _{n} - θ lo g n}{θ lo g n} \Rightarrow N (0, 1)

\frac{K _{n} - θ lo g n}{θ lo g n} \Rightarrow N (0, 1)

d_{T V} (K_{n}, P_{E [K_{n}]}) = Θ (\frac{1}{lo g n})

d_{T V} (K_{n}, P_{E [K_{n}]}) = Θ (\frac{1}{lo g n})

\frac{K _{n} - θ ( lo g n - ψ ( θ ))}{θ ( lo g n - ψ ( θ ))} \Rightarrow N (0, 1)

\frac{K _{n} - θ ( lo g n - ψ ( θ ))}{θ ( lo g n - ψ ( θ ))} \Rightarrow N (0, 1)

d_{T V} (K_{n}, P_{θ (l o g n - ψ (θ))}) = O (\frac{1}{lo g n})

d_{T V} (K_{n}, P_{θ (l o g n - ψ (θ))}) = O (\frac{1}{lo g n})

Z_{n} \Rightarrow ⎩ ⎨ ⎧ N (0, 1) (c /2 - P_{c /2}) / c /2 0, (Case A, B, C1) (Case C2) (Case C3)

Z_{n} \Rightarrow ⎩ ⎨ ⎧ N (0, 1) (c /2 - P_{c /2}) / c /2 0, (Case A, B, C1) (Case C2) (Case C3)

A = θ (\frac{θ}{n + θ} - 1) {\frac{t}{σ} - (e^{t / σ} - 1)} - \frac{t}{σ} θ e^{t / σ} + (θ e^{t / σ} + n) lo g (1 + \frac{θ}{n + θ} (e^{t / σ} - 1)) .

A = θ (\frac{θ}{n + θ} - 1) {\frac{t}{σ} - (e^{t / σ} - 1)} - \frac{t}{σ} θ e^{t / σ} + (θ e^{t / σ} + n) lo g (1 + \frac{θ}{n + θ} (e^{t / σ} - 1)) .

(θ e^{t / σ} + n) lo g (1 + \frac{θ}{n + θ} (e^{t / σ} - 1))

(θ e^{t / σ} + n) lo g (1 + \frac{θ}{n + θ} (e^{t / σ} - 1))

A

A

n (\frac{n}{θ} - \frac{n ^{2}}{θ ^{2}} + O (\frac{n ^{3}}{θ ^{3}})) (\frac{t ^{2}}{2 σ ^{2}} + O (\frac{1}{σ ^{3}})) = \frac{n ^{2}}{θ} \frac{t ^{2}}{2 σ ^{2}} + o (1)

n (\frac{n}{θ} - \frac{n ^{2}}{θ ^{2}} + O (\frac{n ^{3}}{θ ^{3}})) (\frac{t ^{2}}{2 σ ^{2}} + O (\frac{1}{σ ^{3}})) = \frac{n ^{2}}{θ} \frac{t ^{2}}{2 σ ^{2}} + o (1)

(e^{- t / σ} - 1) (1 - e^{t / σ}) = (- \frac{t}{σ} + \frac{t ^{2}}{2 σ ^{2}} + o (\frac{1}{σ ^{2}})) (- \frac{t}{σ} - \frac{t ^{2}}{2 σ ^{2}} + o (\frac{1}{σ ^{2}})) = \frac{t ^{2}}{σ ^{2}} + O (\frac{θ ^{2}}{n ^{4}})

(e^{- t / σ} - 1) (1 - e^{t / σ}) = (- \frac{t}{σ} + \frac{t ^{2}}{2 σ ^{2}} + o (\frac{1}{σ ^{2}})) (- \frac{t}{σ} - \frac{t ^{2}}{2 σ ^{2}} + o (\frac{1}{σ ^{2}})) = \frac{t ^{2}}{σ ^{2}} + O (\frac{θ ^{2}}{n ^{4}})

\frac{K _{n}}{θ lo g ( n / θ )} \to^{p} 1, (Case A)

\frac{K _{n}}{θ lo g ( n / θ )} \to^{p} 1, (Case A)

\frac{K _{n}}{n} \to^{p} {lo g (1 + \frac{1}{c})^{c}, 1, (Case B) (Case C)

P (C_{n}^{n} = a_{n}) = P (Z_{n} = a_{n} j = 1 \sum n j Z_{j} = n),

P (C_{n}^{n} = a_{n}) = P (Z_{n} = a_{n} j = 1 \sum n j Z_{j} = n),

(C_{1}^{n}, \dots, C_{b}^{n}) \Rightarrow (Z_{1}, \dots, Z_{b})

(C_{1}^{n}, \dots, C_{b}^{n}) \Rightarrow (Z_{1}, \dots, Z_{b})

d_{b} (n)

d_{b} (n)

d_{b}^{W} (n)

d_{b} (n) = couplings in f P (C_{b}^{n} \neq = Z_{b}) = couplings in f P (j = 1 \sum b ∣ C_{j}^{n} - Z_{j} ∣ \geq 1) \leq d_{b}^{W} (n) .

d_{b} (n) = couplings in f P (C_{b}^{n} \neq = Z_{b}) = couplings in f P (j = 1 \sum b ∣ C_{j}^{n} - Z_{j} ∣ \geq 1) \leq d_{b}^{W} (n) .

C_{1}^{n} = i = 1 \sum n - 1 ξ_{i} ξ_{i + 1} + ξ_{n}

C_{1}^{n} = i = 1 \sum n - 1 ξ_{i} ξ_{i + 1} + ξ_{n}

C_{j}^{n} = i = 1 \sum n - j ξ_{i} (1 - ξ_{i + 1}) \dots (1 - ξ_{i + j - 1}) ξ_{i + j} + ξ_{n - j + 1} (1 - ξ_{n - j + 2}) \dots (1 - ξ_{n})

C_{j}^{n} = i = 1 \sum n - j ξ_{i} (1 - ξ_{i + 1}) \dots (1 - ξ_{i + j - 1}) ξ_{i + j} + ξ_{n - j + 1} (1 - ξ_{n - j + 2}) \dots (1 - ξ_{n})

C_{j}^{\infty} = i = 1 \sum \infty ξ_{i} (1 - ξ_{i + 1}) \dots (1 - ξ_{i + j - 1}) ξ_{i + j}

C_{j}^{\infty} = i = 1 \sum \infty ξ_{i} (1 - ξ_{i + 1}) \dots (1 - ξ_{i + j - 1}) ξ_{i + j}

d_{b} (n) \to 0 \Leftrightarrow b = o (n);

d_{b} (n) \to 0 \Leftrightarrow b = o (n);

d_{b} (n) \leq \frac{b θ}{θ + n} (θ + \frac{n}{θ + n - b});

d_{b}^{W} (n) \leq \frac{b θ}{θ + n - b} (θ + \frac{n}{θ + n});

d_{n}^{W} (n) = O (1);

\frac{θ ( θ - 1 ) b}{θ + n - 1} {1 - \frac{( θ - 1 ) ( b + 1 )}{4 ( θ + n - 1 )}} \leq d_{b}^{W} (n) \leq \frac{b θ ( θ + 1 )}{θ + n} .

\frac{θ ( θ - 1 ) b}{θ + n - 1} {1 - \frac{( θ - 1 ) ( b + 1 )}{4 ( θ + n - 1 )}} \leq d_{b}^{W} (n) \leq \frac{b θ ( θ + 1 )}{θ + n} .

d_{b} (n) = \frac{∣1 - θ ∣}{2 n} E [∣ T_{0 b} - θ b ∣] + o (\frac{b}{n}),

d_{b} (n) = \frac{∣1 - θ ∣}{2 n} E [∣ T_{0 b} - θ b ∣] + o (\frac{b}{n}),

X_{n}^{1} (\cdot) = (\frac{\sum _{i = 1}^{⌊ n^{u} ⌋} C _{j}^{n} - u θ lo g n}{θ lo g n})_{0 \leq u \leq 1}

X_{n}^{1} (\cdot) = (\frac{\sum _{i = 1}^{⌊ n^{u} ⌋} C _{j}^{n} - u θ lo g n}{θ lo g n})_{0 \leq u \leq 1}

X_{n}^{2} (\cdot) = \frac{\sum _{i = 1}^{⌊ n^{u} ⌋} C _{j}^{n} - θ \sum _{j = 1}^{⌊ n^{u} ⌋} 1/ j}{θ \sum _{j = 1}^{⌊ n^{u} ⌋} 1/ j}_{0 < u < 1}

X_{n}^{2} (\cdot) = \frac{\sum _{i = 1}^{⌊ n^{u} ⌋} C _{j}^{n} - θ \sum _{j = 1}^{⌊ n^{u} ⌋} 1/ j}{θ \sum _{j = 1}^{⌊ n^{u} ⌋} 1/ j}_{0 < u < 1}

X_{n}^{3} (\cdot) = (\frac{\sum _{i = 1}^{⌊ n^{u} ⌋} C _{j}^{n} - u θ lo g n}{u θ lo g n} 1 {u > \frac{ε}{lo g n}})_{0 < u < 1}

X_{n}^{3} (\cdot) = (\frac{\sum _{i = 1}^{⌊ n^{u} ⌋} C _{j}^{n} - u θ lo g n}{u θ lo g n} 1 {u > \frac{ε}{lo g n}})_{0 < u < 1}

F_{n} (u) = \frac{\sum _{j = 1}^{K_{n}} 1 { lo g _{n} R _{j} \leq u }}{K _{n}} = \frac{\sum _{j = 1}^{⌊ n^{u} ⌋} C _{j}^{n}}{K _{n}}

F_{n} (u) = \frac{\sum _{j = 1}^{K_{n}} 1 { lo g _{n} R _{j} \leq u }}{K _{n}} = \frac{\sum _{j = 1}^{⌊ n^{u} ⌋} C _{j}^{n}}{K _{n}}

X_{n}^{4} (\cdot)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On Poisson approximations for the Ewens sampling formula when the mutation parameter grows with the sample size

Koji Tsukuda111Graduate School of Arts and Sciences, the University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8902. mail: [email protected]

Abstract

The Ewens sampling formula was firstly introduced in the context of population genetics by Warren John Ewens in 1972, and has appeared in a lot of other scientific fields. There are abundant approximation results associated with the Ewens sampling formula especially when one of the parameters, the sample size $n$ or the mutation parameter $\theta$ which denotes the scaled mutation rate, tends to infinity while the other is fixed. By contrast, the case that $\theta$ grows with $n$ has been considered in a relatively small number of works, although this asymptotic setup is also natural. In this paper, when $\theta$ grows with $n$ , we advance the study concerning the asymptotic properties of the total number of alleles and of the counts of components in the allelic partition assuming the Ewens sampling formula, from the viewpoint of Poisson approximations.

1 Introduction

For a positive integer $n$ , consider the sequence $\{C_{j}^{n}\}_{j=1}^{\infty}$ of nonnegative integer-valued random variables satisfying $\sum_{j=1}^{n}jC_{j}^{n}=n$ and $C_{j}^{n}=0$ for $j>n$ . For $b=1,\ldots,n$ , let us denote $\textbf{C}^{n}_{b}=(C_{1}^{n},\ldots,C_{b}^{n})$ and $\textbf{a}_{b}=(a_{1},\ldots,a_{b})$ . This $\textbf{C}^{n}_{b}$ denotes the component counts in a random combinatorial structure of size $n$ . In the context of population genetics, Ewens (1972) introduced what is called the Ewens sampling formula

[TABLE]

as the distribution of the allelic partition in a sample of size $n$ from the population which follows the stationary distribution of the infinitely-many neutral allele model with scaled mutation rate $\theta>0$ , where $(\theta)_{n}$ is the rising factorial $\theta(\theta+1)\cdots(\theta+n-1)$ . See for instance Section 2.5 of Feng (2010) for its derivation and basic properties. Hereafter, we consider (1.1) as a model of $\{C^{n}_{j}\}_{j=1}^{n}$ . The unsigned Stirling number of the first kind $\bar{s}(n,k)$ ( $k=1,2,\ldots,n$ ) is the coefficent of $\theta^{k}$ in $(\theta)_{n}$ , and is in conformity with the number of permutations of $n$ elements with $k$ disjoint cycles. Hence, if (1.1) is assumed, the total number $K_{n}=\sum_{j=1}^{n}C_{j}^{n}$ of alleles included in the sample, in other words the total number of distinct cycles in a random permutation, follows the falling factorial distribution (Watterson, 1974a)

[TABLE]

In this paper, we will present asymptotic properties, especially Poisson approximations, of $\textbf{C}^{n}_{b}$ and $K_{n}$ when both $\theta$ and $n$ increase.

Beyond population genetics domain, the Ewens sampling formula has been widely applied to other fields such as ecology, disclosure risk assessments, nonparametric statistics and so on. In addition, laws of component counts in a lot of random structures are approximated by the Ewens sampling formula. For a general review and an up-to-date review with discussions, we refer the reader to Chapter 41 of Johnson, Kotz and Balakrishnan (1997), whose write-up was provided by S. Tavaré and W.J. Ewens, and to Crane (2016), respectively. For (1.1), (1.2) and related probabilistic models, a lot of works have discussed asymptotic properties under the situations $n\to\infty$ with fixed $\theta$ or $\theta\to\infty$ with fixed $n$ , see for instance Feng (2016). It is natural to consider some relations between the population size and the sample size. Since $\theta$ is proportional to the population size in the context of population genetics, Feng (2007) and Tsukuda (2017a) discussed the asymptotic behavior of $K_{n}$ under the settings that both $n$ and $\theta$ simultaneously tend to infinity. Under this asymptotic setting, Feng (2007) established the large deviation principle and Tsukuda (2017a) demonstrated asymptotic properties of the maximum likelihood estimator of $\theta$ .

Following previous works, we set three major goals. Tsukuda (2017a) extended the asymptotic normality of $K_{n}$ as $n\to\infty$ with fixed $\theta$ , which is due to Watterson (1974b), to the situation when both $\theta$ and $n$ increase. The first goal of this paper is discussing this result from the viewpoint of Poisson approximations. Moreover, Arratia, Barbour and Tavaré (1992) showed the Poisson process approximation of $\textbf{C}_{b}^{n}$ as $n\to\infty$ with fixed $\theta$ when $b$ is fixed or grows with $n$ , and Arratia, Stark and Tavaré (1995) established its total variation asymptotics. The second goal is studying corresponding asymptotic results about $\textbf{C}_{b}^{n}$ when $\theta$ grows with $n$ . Furthermore, Hansen (1990) provided a functional central limit theorem for the Ewens sampling formula, and Arratia and Tavaré (1992) gave its elegant proof via the Poisson process approximation. Our third goal is to discuss extensions of this and related weak convergence results.

1.1 Notations

Consider sequences $\{x_{n}\}_{n=1}^{\infty}$ and $\{y_{n}\}_{n=1}^{\infty}$ . If $x_{n}/y_{n}\to 1$ , then we write $x_{n}\sim y_{n}$ . Let $c<\infty$ be a constant. If $x_{n}/y_{n}\to 0$ then we write $x_{n}=o(y_{n})$ , if $x_{n}/y_{n}\to c$ then we write $x_{n}=O(y_{n})$ , and if $x_{n}/y_{n}\to c\neq 0$ then we write $x_{n}=\Theta(y_{n})$ . Let $\sum_{j=1}^{0}x_{j}=0$ and $\prod_{j=1}^{0}x_{j}=1$ for any sequence $\{x_{\cdot}\}$ , and let $(x)_{0}=1$ for any value $x$ . When we consider the limits of $n$ and $\theta$ simultaneously, we use the notation $\lim_{n,\theta}$ .

Let $[x^{k}]f(x)$ denote the coefficient of $x^{k}$ in the power series expansion of $f(x)$ . Let $f^{(i)}(\cdot)$ denote the $i$ -th derivative of function $f(\cdot)$ . Let $\lfloor\cdot\rfloor$ and $\lceil\cdot\rceil$ denote the floor function and the ceiling function, respectively. Let $\Gamma(\cdot)$ be the gamma function and $\psi(\cdot)=(\log\Gamma(\cdot))^{\prime}$ the digamma function. For real $x$ , $x^{+}$ denotes the positive part of $x$ .

The space $D[0,1]$ is the set of càdlàg functions on $[0,1]$ endowed with the Skorokhod topology. The space $L^{2}(0,1)$ is equivalence classes of real valued functions on $(0,1)$ which are square integrable with respect to the Lebesgue measure endowed with the $L^{2}$ topology.

The total variation distance between the laws which random vectors $X$ and $Y$ follow is denoted by $d_{TV}(X,Y)$ . The convergence of $X$ to $Y$ in probability and the weak convergence of $X$ to $Y$ are denoted by $X\to^{p}Y$ and $X\Rightarrow Y$ , respectively.

1.2 Asymptotic settings

Letting $c$ be a finite constant, we study the following asymptotic settings in this paper:

Case A: $n/\theta\to\infty$ ; Case B: $n/\theta\to c>0$ ; Case C: $n/\theta\to 0$ ;

Case C1: $n/\theta\to 0$ and $n^{2}/\theta\to\infty$ ; Case C2: $n^{2}/\theta\to c>0$ ; Case C3: $n^{2}/\theta\to 0$ .

This devision is introduced in Tsukuda (2017a). It should be noted that in Section 4 of Feng (2007), when $\theta$ does not converge to 0, the relation between $n$ and $\theta$ are divided into Cases A, B, C above and $\theta\to\infty$ with fixed $n$ . Moreover, throughout this paper, we assume that $\theta$ does not decrease as $n$ increase.

Remark 1.

In Case C3, it holds that $K_{n}-n\to^{p}0$ . Note that when $\theta=o(1/\log{n})$ in which we are not interested since $\theta\to 0$ , it holds that $K_{n}-1\to^{p}0$ . These convergences can be checked through showing the convergence in first mean.

1.3 Organization

In Section 2, we review asymptotic results associated with the Ewens sampling formula in the literature which will be discussed in this paper. Before probabilistic result, in Section 3, let us provide some preliminary evaluations for sequences related to the mean of $K_{n}$ . Section 4 is devoted to show Poisson approximations for $K_{n}$ and $n-K_{n}$ in Case A and C, respectively. Section 5 is devoted to discuss independent process approximations for $\textbf{C}_{b}^{n}$ in a Ewens partition. Section 6 shows the functional central limit theorems for the Ewens sampling formula when $\theta$ grows with $n$ . In addition, Appendix includes some lemmas used in proofs.

2 Results in the literature

2.1 Normal and Poisson approximations for $K_{n}$

In the combinatorial context, it is worthwhile to know when typical distributions such as Normal, Poisson or other distributions asymptotically appear. See for instance Flajolet and Soria (1990). For the total number $K_{n}$ of alleles which follows (1.2), Watterson (1974b) proved the following central limit theorem (CLT for short): For fixed $\theta>0$ ,

[TABLE]

as $n\to\infty$ , where $N(0,1)$ is a standard normal variable. A stronger result, the Poisson approximation for $K_{n}$ , was stated by Arratia and Tavaré (1992): For fixed $\theta>0,$

[TABLE]

as $n\to\infty$ , where $P_{{\sf E}[K_{n}]}$ is a Poisson variable with mean ${\sf E}[P_{{\sf E}[K_{n}]}]={\sf E}[K_{n}]$ . Later, in order to improve the approximation accuracy, Yamato (2013) provided the following CLT which adopts another standardization: For fixed $\theta>0$ ,

[TABLE]

as $n\to\infty$ . Moreover, Yamato (2013) showed the approximation for $K_{n}$ by a Poisson variable with the approximate mean: For fixed $\theta>0,$

[TABLE]

as $n\to\infty$ , where $P_{\theta(\log{n}-\psi(\theta))}$ is a Poisson variable with mean ${\sf E}[P_{\theta(\log{n}-\psi(\theta))}]=\theta(\log{n}-\psi(\theta))$ .

When $\theta$ grows with $n$ , the standardization should be changed in many cases. Let $Z_{n}=(K_{n}-\mu)/\sigma$ , where $\mu=\theta\log{(1+n/\theta)}$ and $\sigma^{2}=\theta(\log{(1+n/\theta)}-n/(n+\theta))$ . Tsukuda (2017a) showed that

[TABLE]

where $c=\lim_{n,\theta}n^{2}/\theta$ and $P_{c/2}$ is a Poisson variable with mean ${\sf E}[P_{c/2}]=c/2$ .

Remark 2.

Professor Shuhei Mano pointed out that the proof of Theorem 2 in Tsukuda (2017a) is incorrect in Case C1. In this remark, let us correct the failure. As it is stated in the right-hand side in the equation (14) of Tsukuda (2017a), it holds that $\log{\sf E}[e^{Z_{n}t}]=-\sigma t+(e^{t/\sigma}-1)\sigma^{2}+A+o(1),$ where

[TABLE]

In Case C1, since $\sigma\sim\sqrt{n^{2}/2\theta}$ , it holds that $\theta\{n/(n+\theta)\}^{3}(-t/\sigma)^{3}=O(1/\sqrt{\theta})$ , and hence

[TABLE]

We thus have

[TABLE]

By using $n/(n+\theta)=n/\theta-n^{2}/\theta^{2}+O(n^{3}/\theta^{3})$ , the first term in the right-hand side of (2.6) is

[TABLE]

The second term in (2.6) is also $-n^{2}t^{2}/(2\theta\sigma^{2})+o(1)$ because it holds that

[TABLE]

and that $(1+ne^{-t/\sigma}/\theta)/(1+n/\theta)^{2}=1+O(n/\theta)$ . Therefore $A=o(1)$ , and, consequently, $\log{\sf E}[e^{Z_{n}t}]=-\sigma t+(e^{t/\sigma}-1)\sigma^{2}+o(1)\to t^{2}/2.$

Remark 3.

As a corollary to the large deviation principle for $K_{n}$ when $\theta\to\infty$ , Feng (2007) provided the following weak law of large numbers in Corollary 4.1:

[TABLE]

and $K_{n}\to^{p}n$ as $\theta\to\infty$ with fixed $n$ . These law of large numbers in Cases A, B and C can be obtained directly from the calculation of ${\sf E}[|K_{n}/{\sf E}[K_{n}]-1|^{2}]$ , see Proposition 2 of Tsukuda (2017a).

2.2 Independent process approximations for ${\bf C}_{b}^{n}$

Consider a sequence $\{Z_{j}\}_{j=1}^{\infty}$ of independent Poisson variables with ${\sf E}[Z_{j}]=\theta/j$ for $j=1,2,\ldots$ and denote $\textbf{Z}_{b}=(Z_{1},\ldots,Z_{b})$ for a positive integer $b$ . Then, it is well-known that (1.1) can be derived from the conditioning relation

[TABLE]

see for instance Watterson (1974a). It means that the dependence in $\{C_{j}^{n}\}_{j=1}^{\infty}$ is given by the condition $\sum_{j=1}^{n}jZ_{j}=n$ . It is of interest to discuss whether the effect of this dependence asymptotically vanishes or not. It was answered by Arratia, Barbour and Tavaré (1992) who showed the small components can be approximated by independent Poisson variables: For any fixed positive integer $b$ , it holds that

[TABLE]

as $n\to\infty$ . Note that (2.9) is equivalent to $\lim_{n\to\infty}d_{TV}({\bf C}_{b}^{n},{\bf Z}_{b})=0$ because both ${\bf C}_{b}^{n}$ and ${\bf Z}_{b}$ are discrete.

It is more interesting to consider the case that $b$ grows with $n$ . For positive integer $b$ , let us denote the total variation distance and the distance in the Wasserstein $\ell^{1}$ metric between ${\bf C}^{n}_{b}=(C_{1}^{n},\ldots,C_{b}^{n})$ and ${\bf Z}_{b}=(Z_{1},\ldots,Z_{b})$ by $d_{b}(n)$ and $d_{b}^{W}(n)$ , respectively, that is,

[TABLE]

For these quantities, it holds that

[TABLE]

As for the Ewens sampling formula, $d_{b}^{W}(n)$ is a convenient measure of approximations because a concrete construction, the Feller coupling, can be given. See Arratia, Barbour and Tavaré (1992, 2016). The Feller coupling is as follows: Let $\{\xi_{j}\}_{j=1}^{\infty}$ be a sequence of Bernoulli variables with ${\sf P}(\xi_{j}=1)=p_{j}=\theta/(\theta+j-1)$ for any $j=1,2,\ldots$ . Then, the Ewens sampling formula (1.1) is given as the joint distribution of

[TABLE]

and

[TABLE]

for $j=2,\ldots,n$ . Moreover, define

[TABLE]

for $j=1,\ldots,n$ , then $C_{j}^{\infty}$ follows the independent Poisson distribution with mean ${\sf E}[C_{j}^{\infty}]=\theta/j$ for any $j=1,2,\ldots$ . That is because the convergences in probability $\xi_{n}\to^{p}0$ and $\xi_{n-j+1}(1-\xi_{n-j+2})\cdots(1-\xi_{n})\to^{p}0$ for any $j=2,3,\ldots$ yield that $C_{j}^{n}\Rightarrow C_{j}^{\infty}$ , and so (2.9) yields that $C_{j}^{\infty}=^{d}Z_{j}$ for any $j=1,2,\ldots$ . By using this construction, Arratia, Barbour and Tavaré (1992) proved the Poisson process approximation for $b$ growing with $n$ :

[TABLE]

if $\theta\geq 1$ then

[TABLE]

Note that (2.11), (2.12) and (2.14) are not asymptotic results. Lower bound results for the total variation distance, which complement (2.11), were given by Arratia, Barbour and Tavaré (1992): $\liminf_{n\to\infty}nd_{b}(n)\geq(b\theta|\theta-1|/2)\exp\left(-\theta\sum_{j=1}^{b}1/j\right)$ ; and by Barbour (1992): if $\theta\neq 1$ then $d_{b}(n)\geq c_{3}b/n$ for some $c_{3}=c_{3}(\theta)>0$ .

Another compelling result for evaluating $d_{b}(n)$ is deriving the leading term of $d_{b}(n)$ , which were given by Arratia, Stark and Tavaré (1995) for general logarithmic assemblies. If the Ewens sampling formula is considered, the statement is as follows: If $b=o(n/\log{n})$ then

[TABLE]

where $T_{0b}=\sum_{j=1}^{b}jZ_{j}$ . As it is stated in Corollary 4 of their paper, if $\theta\neq 1$ and if $b=o(n/\log{n})$ then the leading term of $d_{b}(n)$ is given by the first term in the right-hand side of (2.15).

2.3 Functional central limit theorems

The results by Arratia, Barbour and Tavaré (1992) provide an elegant way to derive asymptotic properties. Among others, by using (2.13), Arratia and Tavaré (1992) provided an alternative proof of the functional central limit theorem for the Ewens sampling formula which was originally proven by Hansen (1990): The random process

[TABLE]

converges weakly to $(B(u))_{0\leq u\leq 1}$ in $D[0,1]$ as $n\to\infty$ , where $B(\cdot)$ is a standard Brownian motion. This approach is generalized to broader logarithmic structures. See Arratia, Stark and Tavaré (1995) and Arratia, Barbour and Tavaré (2000). Moreover, by using the Poisson process approximation, Tsukuda (2017b) provided a weighted version in $L^{2}(0,1)$ : Both of the random processes

[TABLE]

and

[TABLE]

converge weakly to $(B(u)/\sqrt{u})_{0<u<1}$ in $L^{2}(0,1)$ as $n\to\infty$ , where $\varepsilon$ is a positive constant.

Remark 4.

In the case that $\theta=1$ , the weak convergence of $X_{n}^{1}(\cdot)$ in $D[0,1]$ was provided by DeLaurentis and Pittel (1985).

Let $R_{j}$ be the $j$ -th cycle length in a random permutation of $n$ which has $K_{n}$ disjoint cycles, and the loglength of $j$ -th cycle is defined by $\log_{n}{R_{j}}$ . Consider its empirical distribution function

[TABLE]

for $0\leq u\leq 1$ . Define the random processes

[TABLE]

where $\varepsilon$ is a positive constant. When $\theta=1$ , the weak convergence of $X_{n}^{4}(\cdot)$ to a standard Brownian bridge $(B^{\circ}(u))_{0\leq u\leq 1}$ in $D[0,1]$ was shown by DeLaurentis and Pittel (1985), see Notes (2) after Theorem in their paper. Its extension to the Ewens sampling formula and $L^{2}$ version are presented as follows, which may have not appeared in the literature.

Proposition 2.1.

*(i) The random process $X^{4}_{n}(\cdot)$ converges weakly to $(B^{\circ}(u))_{0\leq u\leq 1}$ in $D[0,1]$ as $n\to\infty$ .

(ii) The random process $X^{5}_{n}(\cdot)$ converges weakly to $(B^{\circ}(u)/\sqrt{u(1-u)})_{0\leq u\leq 1}$ in $L^{2}(0,1)$ as $n\to\infty$ .*

We omit its proof because we will present an extended version in Proposition 6.3. From Proposition 2.1, it follows from the continuous mapping theorem that

[TABLE]

as $n\to\infty$ .

Remark 5.

As it is stated in DeLaurentis and Pittel (1985), Proposition 2.1 means that $F_{n}(u)$ is nearly $u$ , which is the distribution function of the standard uniform distribution.

2.4 Auxiliary results

In this subsection, let us set out some auxiliary results concerning Poisson approximations which will be used in the proofs of our statements.

Consider a sequence of independent Bernoulli variables $\{\xi_{j}\}_{j=1}^{\infty}$ and its partial sum $S_{n}=\sum_{j=1}^{n}\xi_{j}$ , where ${\sf P}(\xi_{j}=1)=p_{j}$ for any $j=1,2,\ldots$ . Then, by using the Chen–Stein method, Theorems 1 and 2 of Barbour and Hall (1984) gave the sharp bound for the Poisson approximation for a partial sum of Bernoulli variables: For a Poisson variable $P_{\lambda}$ with mean ${\sf E}[P_{\lambda}]=\lambda=\sum_{j=1}^{n}p_{j}$ , it holds that

[TABLE]

Moreover, from a property of the Hellinger integral, a bound for the total variation distance between two Poisson distributions were given in Theorem 2.1 of Yannors (1991): For Poisson variables $P^{1}_{\lambda_{1}}$ and $P^{2}_{\lambda_{2}}$ with respective means $\lambda_{1}$ and $\lambda_{2}$ , it holds that

[TABLE]

3 Preliminary results

Before discussing probabilistic results, let us show asymptotic evaluations on sums of sequences which will be used. Consider two sequences $\{p_{j}\}_{j\geq 1}$ and $\{q_{j}\}_{j\geq 1}$ given by

[TABLE]

Proposition 3.1.

(i) It holds that

[TABLE]

and that

[TABLE]

Especially, in Case A, it holds that

[TABLE]

and that if $\theta\to\infty$ then

[TABLE]

(ii) It holds that

[TABLE]

and that

[TABLE]

Especially, in Case C, it holds that

[TABLE]

Proof. (i) Since

[TABLE]

the result (3.2) holds. Since

[TABLE]

the result (3.3) holds.

(ii) From

[TABLE]

and from

[TABLE]

the results (3.4) and (3.5) follow. This completes the proof.

Proposition 3.2.

In Case A, it holds that

[TABLE]

Proof. It follows from

[TABLE]

that the left-hand side of (3.6) is $\theta(\psi(n+\theta)-\log{n})$ . Since

[TABLE]

and $\log{n}=\sum_{j=1}^{n}1/j-\gamma+O(1/n)$ as $n\to\infty$ , it holds that

[TABLE]

In Case A, the first term in the right-hand side is $O(\theta/n)$ . This completes the proof.

Remark 6.

As it is stated in (2.3), Yamato (2013) discussed the asymptotic normality of $K_{n}$ standardized by $\theta(\log{n}-\psi(\theta))$ , which means that $\psi(n+\theta)$ is approximated by $\log{n}$ from (3.7). If $\theta^{3}/(n^{2}\log(n/\theta))\to\infty$ , the bound in (3.6) is meaningless to discuss CLT. On the other hand, if $\theta^{2}/n\to 0$ the centering by $\theta(\log{n}-\psi(\theta))$ is better than centering by $\theta\log(1+n/\theta)$ , which was used in Corollary 2 of Tsukuda (2017a), because $\sum_{j=1}^{n}p_{j}-\theta\log(1+n/\theta)=\Theta\left(n/(n+\theta)\right)$ .

Proposition 3.3.

In Case C, it holds that

[TABLE]

Proof. The triangle inequality yields that

[TABLE]

The first term is $O(n/(n+\theta))=O(n/\theta)$ , the second term is $\theta\log(1-n^{2}/\theta^{2})=O(n^{2}/\theta)$ , and from $\log{(1-x)}^{-1}=x+x^{2}/2+\cdots$ as $x\to 0$ the third term is

[TABLE]

This completes the proof.

4 Poisson approximations for the total number of alleles

Introduce two Poisson variables $Z^{A}$ and $Z^{C}$ whose means are given by $\lambda_{A}={\sf E}[K_{n}]=\sum_{j=1}^{n}p_{j}$ and $\lambda_{C}=n-{\sf E}[K_{n}]=\sum_{j=1}^{n}q_{j}$ , respectively, where $K_{n}$ follows (1.2) and where $\{p_{j}\}_{j=1}^{\infty}$ and $\{q_{j}\}_{j=1}^{\infty}$ are given in (3.1). Poisson approximations corresponding to (2.2) are given in the following proposition.

Proposition 4.1.

(i) In Case A,

[TABLE]

and

[TABLE]

(ii) In Case C,

[TABLE]

and

[TABLE]

Proof. Let $\{\xi_{j}\}_{j=1}^{\infty}$ and $\{\zeta_{j}\}_{j=1}^{\infty}$ be sequences of Bernoulli variables with respective parameters ${\sf P}(\xi_{j}=1)=p_{j}$ and ${\sf P}(\zeta_{j}=1)=q_{j}$ for $j=1,2,\ldots$ . Then, it holds that $K_{n}=^{d}\sum_{i=1}^{n}\xi_{i}$ and that $n-K_{n}=^{d}\sum_{i=1}^{n}\zeta_{i}$ . To prove the desired results, we will use (2.21) and Proposition 3.1.

(i) The result (4.1) follows from

[TABLE]

Since $\sum_{j=1}^{n}p_{j}\to\infty$ , it holds that

[TABLE]

for enough large $n$ . Two displays above imply $d_{TV}(K_{n},Z^{A})=\Theta\left(1/\log(n/\theta)\right)$ .

(ii) The result (4.2) follows from

[TABLE]

In Case C1, since $\sum_{j=1}^{n}q_{j}\to\infty$ , it holds that

[TABLE]

for enough large $n$ . Two displays above imply the $d_{TV}(n-K_{n},Z^{C})=\Theta\left(n/\theta\right)$ . In Case C2, since $1-e^{-\sum_{j=1}^{n}q_{j}}\leq 1$ and since $(1/\sum_{j=1}^{n}q_{j})$ is bounded by some constant for enough large $n$ , the same evaluation provides $d_{TV}(n-K_{n},Z^{C})=\Theta\left(n/\theta\right)$ . In Case C3, since $\sum_{j=1}^{n}q_{j}\to 0$ , it holds that $1-e^{-\sum_{j=1}^{n}q_{j}}\sim n^{2}/(2\theta)$ and that

[TABLE]

for enough large $n$ . We thus have $d_{TV}(n-K_{n},Z^{C})=\Theta\left(n^{3}/\theta^{2}\right)$ . This completes the proof.

Remark 7.

From asymptotic properties of the Poisson distribution and Proposition 4.1, the result of (2.5) in Cases A and C can be derived.

In Proposition 4.1, we have considered Poisson variables with rigorous means ${\sf E}[K_{n}]$ and $n-{\sf E}[K_{n}]$ . Next, let us discuss centerings by approximate means presented by Yamato (2013) and Tsukuda (2017a) from the viewpoint of Poisson approximation. Introduce three Poisson variables $Y^{A},Y^{a}$ and $Y^{C}$ whose means are given by $\mu_{A}={\sf E}[Y^{A}]=\theta\log(1+n/\theta)$ , $\mu_{a}={\sf E}[Y^{a}]=\theta(\log{n}-\psi(\theta))$ and $\mu_{C}={\sf E}[Y^{C}]=n-\theta\log(1+n/\theta),$ respectively.

Lemma 4.2.

(i) In Case A, it holds that

[TABLE]

and that

[TABLE]

(ii) In Case C, it holds that

[TABLE]

Proof. We will use (2.22). (i) First we see (4.3). Since $\lambda_{A}$ and $\mu_{A}$ tend to infinity in Case A, $|\sqrt{\lambda_{A}}-\sqrt{\mu_{A}}|\leq|\lambda_{A}-\mu_{A}|$ for enough large $n$ . Moreover, by using Proposition 3.1, it holds that

[TABLE]

and hence (4.3).

Next we see (4.4). By using Propositions 3.1 and 3.2, it holds that

[TABLE]

and hence (4.4).

(ii) First consider Case C1. Since $\lambda_{C}$ and $\mu_{C}$ tend to infinity in Case C1, $|\sqrt{\lambda_{C}}-\sqrt{\mu_{C}}|\leq|\lambda_{C}-\mu_{C}|$ for enough large $n$ . By using Proposition 3.1, it holds that

[TABLE]

and hence (4.5) holds as $d_{TV}(Z^{C},Y^{C})=O(1/\sqrt{\theta})$ .

Next consider Case C2. The magnitude relationship of $|\sqrt{\lambda_{C}}-\sqrt{\mu_{C}}|$ and $|\lambda_{C}-\mu_{C}|$ is not determined, but they have the same bound $O(1/\sqrt{\theta})$ because $1/\sqrt{\theta}=\Theta(1/n)$ . Hence (4.5) holds as $d_{TV}(Z^{C},Y^{C})=O(1/\sqrt{\theta})$ .

Finally, consider Case C3. Since $\lambda_{C}$ and $\mu_{C}$ tend to 0, $|\sqrt{\lambda_{C}}-\sqrt{\mu_{C}}|\geq|\lambda_{C}-\mu_{C}|$ for enough large $n$ . By using Proposition 3.1, it holds that

[TABLE]

and hence (4.5) holds as $d_{TV}(Z^{C},Y^{C})=O(n/\theta)$ . This completes the proof.

From what has already been proven, the triangle inequality yields the following Poisson approximations corresponding to (2.4).

Proposition 4.3.

(i) In Case A, if $(\log{(n/\theta)})/\theta\to\infty$ then

[TABLE]

and if $(\log{(n/\theta)})/\theta=O(1)$ then

[TABLE]

Moreover, in Case A, if $(\theta^{3}\log{(n/\theta)})/n^{2}=O(1)$ then

[TABLE]

and if $(\theta^{3}\log{(n/\theta)})/n^{2}\to\infty$ and $\theta^{3}/(n^{2}\log{(n/\theta)})=O(1)$ then

[TABLE]

(ii) In Case C, it holds that

[TABLE]

5 On independent process approximation of component counts

5.1 Case A

First we see the asymptotic independence of small components ${\bf C}^{n}_{b}=(C_{1}^{n},\ldots,C_{b}^{n})$ in Case A when $\theta\geq 1$ for some $n$ , recalling that we assume that $\theta$ does not decrease as $n$ increase. We will not discuss the other case, $\theta<1$ for all $n$ , because we are interested in large $\theta$

Consider $\{Z_{j}\}_{j=1}^{\infty}$ defined in Subsection 2.2. Let us denote ${\bf Z}_{b}=(Z_{1},\ldots,Z_{b})$ for $b=1,2,\ldots,n$ , and $T_{lm}=\sum_{j=l+1}^{m}jZ_{j}$ for $l=0,1,\ldots,n-1$ and $m=l+1,\ldots,n$ . It follows from the conditioning relation (2.8) that

[TABLE]

where $\textbf{a}_{b}=(a_{1},\ldots,a_{b})$ and $a=\sum_{j=1}^{b}ja_{j}$ .

Proposition 5.1.

In Case A, if $\theta\geq 1$ for some $n$ and if $\theta^{2}/n\to 0$ , then ${\sf P}({\bf C}^{n}_{b}={\bf a}_{b})\sim{\sf P}({\bf Z}_{b}={\bf a}_{b})$ for any $\textbf{a}_{b}$ with any fixed positive integer $b$ .

Proof. From (5.1), in order to prove the desired result, it suffices to show that

[TABLE]

We first calculate $g_{n-a}=\exp(\theta\sum_{j=b+1}^{n}1/j){\sf P}(T_{bn}=n-a)$ . Letting $f(x)=\exp(-\theta\sum_{j=1}^{b}x^{j}/j)$ , we have

[TABLE]

see equation (5) of Arratia, Barbour and Tavaré (1992).

Let $n$ be a positive integer such that $\theta\geq 1$ . It holds that

[TABLE]

where

[TABLE]

Since the right-hand side of (5.2) is

[TABLE]

the first term and the second term will be evaluated in Lemma 5.2 and Lemma 5.3, respectively. From Lemma 5.2, the elements in the bracket of the first term is $1+O(\theta^{2}/n)$ . Next we see $[x^{n-a}]h(x)$ . It follows from Lemma 5.3 that

[TABLE]

where $r_{1}=1+c_{1}r$ and $r_{2}=2+c_{2}r$ with constants $c_{1},c_{2}$ such that $1<c_{1}<c_{2}$ . By letting $r$ be a positive constant, the right-hand side is $o\left(1/n^{k}\right)$ for any positive $k$ since $b$ is fixed and since $\theta^{2}/n\to 0$ . Thence $[x^{n-a}]h(x)=o(1/n)$ .

Now we have

[TABLE]

and, as a result,

[TABLE]

On the other hand,

[TABLE]

If $\theta\to c<\infty$ , $(\theta)_{n}/n!\sim n^{\theta-1}/\Gamma(\theta)$ and so

[TABLE]

If $\theta\to\infty$ , Lemma A.1 and the Stirling formula yield that

[TABLE]

and hence

[TABLE]

From what has already been proven, we obtain

[TABLE]

This completes the proof.

In the following lemma, we see the first term of (5.3).

Lemma 5.2.

Let $f(x)=\exp(-\theta\sum_{j=1}^{b}x^{j}/j)$ . For $\theta>1$ , for $k=1,\ldots,\lceil\theta\rceil-1$ and for any positive integers $a<n$ and $b$ , it holds that

[TABLE]

Moreover, in Case A, if $a=o(n)$ , $b=o(n/\theta^{2})$ , $\theta\geq 1$ for some $n$ and $\theta^{2}/n\to 0$ , then

[TABLE]

Proof. Let $g(x)$ be $-\theta\sum_{j=1}^{b}x^{j}/j$ . It holds that

[TABLE]

for $1\leq i\leq b$ . Thus, for $1\leq i\leq b$ ,

[TABLE]

For $i>b$ , $g^{(i)}(1)=0\geq-\theta b^{i}.$ The Faà di Bruno formula yields that

[TABLE]

where $B_{k,j}(\cdot)$ is the partial Bell polynomial, so

[TABLE]

for any $k=1,2,\ldots$ . By using the triangle inequality,

[TABLE]

where $\mathcal{S}(k,j)$ is the Stirling number of the second kind. The above two displays and the triangle inequality imply that

[TABLE]

For $k\leq\lceil\theta\rceil-1$ , the Stirling formula yields that

[TABLE]

where we have used $\Gamma(\theta+k)/\Gamma(\theta-k)=(\theta-k)(\theta^{2}-(k-1)^{2})\cdots(\theta^{2}-1^{2})\theta\leq\theta^{2k}$ and $\Gamma(\theta-k+n-a)/\Gamma(\theta+n-a)=1/((\theta-1+n-a)\cdots(\theta-k+n-a))\leq 1/(n-a)^{k}$ . We thus have

[TABLE]

for $k\leq\lceil\theta\rceil-1$ , which is (5.8).

Next we prove (5.9). If $\theta\leq 1$ for all $n$ , the result is obvious because the left-hand side of (5.9) is 1. Otherwise, by letting $n$ be an positive integer such that $\theta>1$ , the desired result follows from

[TABLE]

and from

[TABLE]

This completes the proof.

The following lemma is used to evaluate $[x^{n-a}]h(x)$ in (5.3).

Lemma 5.3.

Let $f(x)=\exp(-\theta\sum_{j=1}^{b}x^{j}/j)$ and let

[TABLE]

Then, for any positive integers $a<n$ and $b$ , it holds that

[TABLE]

where $r_{1}=1+c_{1}r$ , $r_{2}=2+c_{2}r$ , $1<c_{1}<c_{2}$ , and $r$ is an arbitrary positive constant.

Proof. Consider a complex variable ${\bf z}\in\mathbb{C}$ . Since $h({\bf z})$ and $f({\bf z})$ are analytic in $\mathbb{C}$ , by using the Cauchy inequality for coefficients, it holds that

[TABLE]

The right-hand side is

[TABLE]

because

[TABLE]

where we have used Lemma A.2 for the second inequality. Hence, it follows from the Cauchy inequality again that

[TABLE]

This completes the proof.

Let us provide some remarks on Proposition 5.1.

Remark 8.

Proposition 5.1 indicates that when $\theta^{2}/n\to 0$ the components of $(C_{1}^{n},\ldots,C_{b}^{n})$ are asymptotically independent, and $C_{j}^{n}$ asymptotically follows the Poisson distribution with mean $\theta/j$ for $j=1,\ldots,b$ . As a consequence, for any fixed $b$ , if $\theta\to c<\infty$ then

[TABLE]

and if $\theta\to\infty$ then

[TABLE]

where $N_{b}(0,I)$ is a $b$ -dimensional standard normal variable with independent coordinates.

Remark 9.

Proposition 5.9 below is stronger than Proposition 5.1, but the proof is included because some evaluations are different from the proof of Theorem 1 of Arratia, Barbour and Tavaré (1992) who used the Darboux lemma (see for instance Theorem of Knuth and Wilf (1989)), and because Lemmas 5.2 and 5.3 will be used in the proof of Theorem 5.8 below.

In Proposition 5.1, $\theta^{2}/n\to 0$ is assumed. The following proposition shows that this assumption is necessary for the approximation of $\{C_{j}^{n}\}_{j=1}^{b}$ by Poisson variables $\{Z_{j}\}_{j=1}^{b}$ .

Proposition 5.4.

In Case A, if $\theta\geq 1$ for some $n$ , then ${\sf P}({\bf C}^{n}_{b}={\bf a}_{b})\sim{\sf P}({\bf Z}_{b}={\bf a}_{b})$ for any $\textbf{a}_{b}$ with any fixed positive integer $b$ only if $\theta^{2}/n\to 0$ .

Proof. To prove the assertion, we see the case that $b=1$ . Let $f(x)=\exp(-\theta x)$ , then we have $f^{(k)}(x)=(-\theta)^{k}f(x)$ . From (5.3), $g_{n-a}=[x^{n-a}](1-x)^{-\theta}f(x)$ equals

[TABLE]

Since

[TABLE]

from the proof of Proposition 5.1, it is enough to show that

[TABLE]

only if $\theta^{2}/n\to 0$ .

Since $\theta$ is assumed not to decrease as $n$ increases, we study the following three cases: (i) $\theta\geq 2$ for some $n$ ; (ii) $\theta<2$ for all $n$ and $\theta>1$ for some $n$ ; (iii) $\theta\leq 1$ for all $n$ . First, consider (i). Let $n$ be a positive integer such that $\theta\geq 2$ . Then, it holds that

[TABLE]

where we have used Lemma A.3 for the second inequality. From the binomial theorem, the right-hand side is equal to

[TABLE]

The above display is not less than 1 and converges to 1 only if $\theta^{2}/n\to 0$ . Second, consider (ii). Let $n$ be a positive integer such that $\theta>1$ . Then, it holds that

[TABLE]

which converges to 1 only if $\theta^{2}/n\to 0$ . Finally, consider (iii). Let $n$ be a positive integer such that $\theta=1$ . Then, it holds that

[TABLE]

This completes the proof.

Thence, we have the following corollary to Propositions 5.1 and 5.4.

Corollary 5.5.

In Case A, if $\theta\geq 1$ for some $n$ , then ${\sf P}({\bf C}^{n}_{b}={\bf a}_{b})\sim{\sf P}({\bf Z}_{b}={\bf a}_{b})$ for any $\textbf{a}_{b}$ with any fixed positive integer $b$ if and only if $\theta^{2}/n\to 0$ .

Subsequently, let us derive the result corresponding to (2.15) following a similar programme to Arratia, Stark and Tavaré (1995). It follows from (2.8) that

[TABLE]

see (50) of Arratia, Stark and Tavaré (1995). Firstly, via the large deviation inequality, we see that $d_{b}(n)$ is approximated by

[TABLE]

with

[TABLE]

From the definition, if $1\leq b\leq n\theta^{-2}(\log{n})^{-3}$ then $J_{n}=b\theta\log{n}$ and otherwise $J_{n}=b^{2/3}(\theta n)^{1/3}=b(\theta n/b)^{1/3}$ . In contrast to Arratia, Stark and Tavaré (1995), $J_{n}$ includes $\theta$ since we consider $\theta\to\infty$ , but a similar treatment perform well.

Lemma 5.6.

In Case A, with $b=o(n/\theta^{2})$ , it holds that

[TABLE]

for any positive $k$ .

Proof. The first inequality is obvious, so we see the latter one. From Lemma 8 of Arratia, Stark and Tavaré (1995), for any $b\geq 1,w>0$ , it holds that

[TABLE]

If $1\leq b\leq n\theta^{-2}(\log{n})^{-3}$ then, by putting $w=\theta\log{n}$ , the right-hand side of (5.12) is

[TABLE]

which tends to minus infinity faster than $-k\log{n}$ for any positive $k$ . If $b\geq n\theta^{-2}(\log{n})^{-3}$ then, by putting $w=(\theta n/b)^{1/3}$ , the right-hand side of (5.12) is

[TABLE]

which tends to minus infinity faster than $-k\log{(n/b)}$ for any positive $k$ . This completes the proof.

The next lemma shows that $(|1-\theta|/n){\sf E}\left[(T_{0b}-\theta b)^{+}1\{T_{0b}\leq J_{n}\}\right]$ is approximately $(|1-\theta|/n){\sf E}\left[(T_{0b}-\theta b)^{+}\right]$ .

Lemma 5.7.

In Case A, if $\theta^{2}/n\to 0$ then it holds that

[TABLE]

for $b=o(n/\theta^{2})$ and for any positive $k$ .

Proof. From the Schwartz inequality, it follows that

[TABLE]

where we have used ${\sf E}[(T_{0b}-{\sf E}[T_{0b}])^{2}]={\rm var}(T_{0b})=\sum_{j=1}^{b}j^{2}(\theta/j)=\theta\sum_{j=1}^{b}j\leq\theta b^{2}$ for the second inequality. Lemma 5.6 yields that ${\sf P}(T_{0b}>J_{n})=o((b/n)^{2k})$ for any positive $k$ . This completes the proof.

The following result is an extension of (2.15) to large $\theta$ setup.

Theorem 5.8.

In Case A, if $\theta\geq 1$ for some $n$ and $\theta^{2}/n\to 0$ , then

[TABLE]

for

[TABLE]

In addition, when $\theta\to\infty$ , it holds that $d_{b}(n)=o(b\theta^{2}/n)$ .

Proof. Let $n$ be a positive integer such that $\theta\geq 1$ . Since it follows from Lemma 5.6 that

[TABLE]

for any positive $k$ , we see the first term.

Let $g_{n-a}=\exp(\theta\sum_{j=b+1}^{n}1/j){\sf P}(T_{bn}=n-a)$ . For $a\leq\lfloor J_{n}\rfloor$ , we have

[TABLE]

and the last term should be evaluated for $a$ and $b$ growing with $n$ .

If $b$ does not diverge, as it is seen in the proof of Proposition 5.1, $[x^{n-a}]h(x)=o(1/n^{2})$ since $a/n\leq J_{n}/n\to 0$ . Thence, we consider the case that $b\to\infty$ . Using Lemma 5.3 with $r=1/b$ , we have

[TABLE]

Since $(1+c_{1}/b)^{n-a}\sim e^{(n-a)c_{1}/b}$ and since $(1+c_{2}/b)^{b}\sim e^{c_{2}}$ , the right hand side is asymptotically equal to

[TABLE]

where $A_{1}=(\exp({e^{c_{2}}}))/2$ and $A_{2}=4/(c_{2}-c_{1})$ . From (5.14), the right-hand side is

[TABLE]

where we have used $a/n\leq J_{n}/n\to 0$ . The right-hand side is $o\left(1/n^{k}\right)$ for any positive constant $k$ . After all, we have $\left|[x^{n-a}]h(x)\right|=o\left(b/n^{2}\right)$ even when $b\to\infty$ .

Now we have

[TABLE]

This expansion, $f(1)=\exp(-\sum_{j=1}^{b}1/j)$ , ${\sf P}(T_{bn}=n-a)=\exp\left(-\theta\sum_{j=b+1}^{n}1/j\right)g_{n-a}$ and (5.5)–(5.7) yield that

[TABLE]

Since $a\theta/n\leq aJ_{n}/n\to 0$ and $a^{2}/(nb)\leq J_{n}^{2}/(nb)\to 0$ which follow from $\theta/J_{n}\to 0$ and (5.14), the binomial expansion and Lemma A.1 yield that

[TABLE]

and hence

[TABLE]

Therefore, it holds that

[TABLE]

From what has already been proven, it holds that

[TABLE]

where we have used Lemma 5.7 in the fourth equality and the relation ${\sf E}[(T_{0b}-b\theta)^{+}]={\sf E}[|T_{0b}-b\theta|]/2$ , which follows from ${\sf E}[T_{0b}-b\theta]=0$ , in the fifth equality.

Finally, consider the case that $\theta\to\infty$ . It follows from the Jensen inequality that

[TABLE]

which implies $d_{b}(n)=o(b\theta^{2}/n)$ . This completes the proof.

Next we discuss the Poisson process approximation via the Feller coupling (see Subsection 2.2). The following result follows directly from (2.14).

Proposition 5.9.

Suppose that $\theta>1$ for all $n$ and $\theta^{2}/n\to 0$ . In Case A, $d_{b}^{W}(n)\to 0$ if, and only if, $b=o(n/\theta^{2})$ . In addition, when $\theta\to\infty$ , it holds that

[TABLE]

Remark 10.

When $\theta\to\infty$ , Theorem 5.8 and Proposition 5.9 yield that

[TABLE]

with $b=o(n/(\theta^{2}\log{n}))$ , which shows that the asymptotic decay rates of $d_{b}(n)$ and $d_{b}^{W}(n)$ are different.

Let $S^{k}_{n}$ be the $k$ -th shortest cycle length in a Ewens partition, that is,

[TABLE]

for $k=1,2,\ldots$ and $S^{k}_{n}=\infty$ when there is no such $j$ . See Section 2E of Arratia and Tavaré (1992). The following statement is a direct corollary to Theorem 5.8.

Corollary 5.10.

Let $r$ be a positive integer such that $r=o(n/\theta^{2})$ and let $\delta_{r}=\sum_{j=1}^{r}\theta/j$ . Under the assumption of Proposition 5.9,

[TABLE]

Proof. Proposition 5.9 yields that

[TABLE]

This completes the proof.

Remark 11.

Corollary 5.10 yields that, under the assumption of Proposition 5.9, ${\sf P}(S_{n}^{1}=1)\sim e^{-\theta}$ , so if $\theta\to\infty$ then ${\sf P}(S_{n}^{1}=1)\to 1$ and if $\theta\to c<\infty$ then ${\sf P}(S_{n}^{1}=1)\to e^{-c}<1$ . Note that when the Pitman sampling formula is considered, the shortest cycle length converges to 1 in probability except the Ewens sampling formula (see Mano (2017)).

The uniform bound with respect to $b$ , which gives an extension of (2.13), is given in the following proposition. Its applications to functional central limit theorems will be presented in the next section.

Proposition 5.11.

In Case A,

[TABLE]

Proof. By using the triangle inequality and (2.12), it holds that

[TABLE]

for any $b=1,2,\ldots,n$ , see the proof of Theorem 2 of Arratia, Barbour and Tavaré (1992). When $\theta\to\infty$ , by setting $b=\lfloor n/\theta\rfloor$ , the first and third terms in (5.17) are $O(\theta)$ and $O(\theta\log{\theta})$ , respectively. Otherwise, by setting $b=\lfloor n/2\rfloor$ the result holds with the bound $d_{n}^{W}(n)=O(1)$ . Hence, $d_{n}^{W}(n)=O((\theta\log{\theta})\vee 1)=O(\theta\log{(1+\theta)}).$ This completes the proof.

5.2 Case C

The probability mass function (1.1) is obtained from the conditioning relation (2.8) with a sequence of independent Poisson variables with respective means $\theta/j$ . We also get (1.1) from (2.8) with Poisson varivables with respective means $(\theta/j)(n/\theta)^{j}$ (see for instance Watterson (1974a)). In Case C, the following lemma shows that ${\sf E}[Z_{j}]=(\theta/j)(n/\theta)^{j}$ rather fit.

Lemma 5.12.

In Case C,

[TABLE]

Therefore, for $j=2,3,\ldots$ , if $\theta(n/\theta)^{j}\to 0$ , then

[TABLE]

Proof. It holds that

[TABLE]

which is (2.18) of Watterson (1974a). Since the Stirling formula $\Gamma(x)=\sqrt{2\pi}x^{x-1/2}e^{-x}+O(x^{x-3/2}/e^{x})$ as $x\to\infty$ yields that $\Gamma(x-c)/\Gamma(x)\sim x^{-c}$ as $x\to\infty$ for any $c<x$ , it holds that

[TABLE]

Hence, the result (5.18) follows from $C_{j}^{n}\geq 0$ . This completes the proof.

Remark 12.

Since

[TABLE]

and

[TABLE]

it holds that

[TABLE]

for $j=1,2,\ldots,n$ .

According to Lemma 5.12, it may be natural to consider that $C_{j}^{n}$ and Poisson variable with mean $(\theta/j)(n/\theta)^{j}$ are asymptotically similar, but Proposition 3.3 indicates that, except Case C3, an independent process approximation by Poisson variables with means $(\theta/j)(n/\theta)^{j}$ seems difficult in the sense of the joint distribution. Actually, the following theorem shows that in Case C2 the linear relation $n-(C_{1}^{n}+2C_{2}^{n})\Rightarrow 0$ between $C_{1}^{n}$ and $C_{2}^{n}$ asymptotically remains.

Theorem 5.13.

(i) In Case C2,

[TABLE]

*where $P_{c/2}$ is a Poisson variable with ${\sf E}[P_{c/2}]=\lim_{n,\theta}\{n^{2}/(2\theta)\}=c/2$ , and $\sum_{j=3}^{n}|C_{j}^{n}|\Rightarrow 0$ .

(ii) In Case C3,*

[TABLE]

Proof. (i) It follows from (5.19) that

[TABLE]

which implies that

[TABLE]

It yields that $\sum_{j=3}^{n}|C_{j}^{n}|\to^{p}0$ and so

[TABLE]

These displays yield that

[TABLE]

From this, we obtain

[TABLE]

We conclude from (2.5), which means $K_{n}-\theta\log(1+n/\theta)\Rightarrow c/2-P_{c/2}$ , that

[TABLE]

hence that $C_{2}^{n}\Rightarrow P_{c/2}$ . Moreover, from what has already been proven, we obtain $n-C_{1}^{n}\Rightarrow 2P_{c/2}$ .

(ii) Since

[TABLE]

Lemma 5.12 yields the result. This completes the proof.

Theorem 5.13 directly implies the following corollaries which represent properties of the shortest cycle length $S_{n}=S_{n}^{1}$ and the longest cycle length $L_{n}$ in a Ewens partition. These extreme sizes are of interest in the combinatorial context, see for instance Mano (2017).

Corollary 5.14.

In Case C2 or C3,

[TABLE]

Proof. In Case C3, the conclusion is obvious, so we see Case C2. It holds that

[TABLE]

This completes the proof.

Corollary 5.15.

In Case C2, for a positive integer $r$

[TABLE]

Proof. For $r=1$ , it holds that

[TABLE]

For $r\geq 2$ , it holds that ${\sf P}(L_{n}\leq r)\sim 1.$ This completes the proof.

Remark 13.

Since the law of the singleton $C_{1}^{n}$ in a Ewens partition is given by

[TABLE]

for $k=0,1,\ldots,n$ , Corollary 5.15 directly follows from

[TABLE]

6 Functional central limit theorems

As a corollary to Proposition 5.11, the functional central limit theorems which extend the results of Hansen (1990) and Tsukuda (2017b) follow. Before the results, as a corollary to Proposition 5.11, let us states the error bounds of Poisson process approximations in the sense of the expectation of the error in the supremum norm and in the $L^{2}$ norm.

Corollary 6.1.

In Case A,

[TABLE]

and

[TABLE]

Proof. The result (6.1) follows from

[TABLE]

and Proposition 5.11. The result (6.2) follows from

[TABLE]

(for this evaluation see the proof of Lemma 3.1 of Tsukuda (2017b)) and from Proposition 5.11. This completes the proof.

By using Corollary 6.1, we provide functional central limit theorems which slightly extend the preceding result in which $\theta$ is assumed to be fixed.

Proposition 6.2.

(i) In Case A, if

[TABLE]

*then the random process $X^{1}_{n}(\cdot)$ defined in (2.16) converges weakly to a standard Brownian motion $(B(u))_{0\leq u\leq 1}$ in $D[0,1]$ .

(ii) In Case A, if*

[TABLE]

then both of the random processes $X^{2}_{n}(\cdot)$ and $X^{3}_{n}(\cdot)$ , which are defined in (2.17) and (2.18) respectively, converge weakly to $(B(u)/\sqrt{u})_{0<u<1}$ in $L^{2}(0,1)$ .

Proof. (i) From (6.1) and the assumption (6.3), it follows that

[TABLE]

By using the functional central limit theorem for Poisson processes in $D[0,1]$ , the random process

[TABLE]

converges weakly to a standard Brownian motion $(B(u))_{0\leq u\leq 1}$ in $D[0,1]$ (see the Proof of Theorem 5 of Arratia and Tavaré (1992)). Since

[TABLE]

the random process

[TABLE]

converges weakly to $(B(u))_{0\leq u\leq 1}$ in $D[0,1]$ because of the assumption (6.3). From (6.5) and the weak convergence of (6.6), Theorem 2.7 (iv) of van der Vaart (1998) yields the result.

(ii) First we argue $X^{2}_{n}(\cdot)$ . From (6.2) and (6.4), it follows that

[TABLE]

It holds that

[TABLE]

where $(N^{1}(t))_{t\geq 0}$ is a homogeneous Poisson process with unit intensity. Since

[TABLE]

and the other hypotheses hold with $\lambda=1$ , $s_{n}(u)=\sum_{j=1}^{\lfloor n^{u}\rfloor}\theta/j$ and $f(n)=\theta\log{n}$ (see Subsection 6.2 of Tsukuda (2017b)) Lemma A.4 in Appendix implies that (6.7) converges weakly to $(B(u)/\sqrt{u})_{0<u<1}$ in $L^{2}(0,1)$ . From what has been already proven, Theorem 2.7 (iv) of van der Vaart (1998) yields the result.

Next we argue $X^{3}_{n}(\cdot)$ . Since

[TABLE]

which follow from the almost same argument as the proof of Theorem 7.1 of Tsukuda (2017b) by the assumption (6.4), we have

[TABLE]

From Lemma A.4 in Appendix with $\lambda=1$ , $s_{n}(u)=uf(n)$ and $f(n)=\theta\log{n}$ , it holds that the random process

[TABLE]

converges weakly to $(B(u)/\sqrt{u})_{0<u<1}$ in $L^{2}(0,1)$ . Consequently, the desired result follows. This completes the proof.

Remark 14.

It follows from Proposition 6.2 (i) that if (6.3) holds then (2.1) holds. But as it is stated in (2.5), the asymptotic normality of $K_{n}$ holds for far larger $\theta$ .

The following result, promised in Subsection 2.3, is an extension of Proposition 2.1.

Proposition 6.3.

*(i) In Case A, if (6.3) holds then the random process $X^{4}_{n}(\cdot)$ defined in (2.19) converges weakly to a standard Brownian bridge $(B^{\circ}(u))_{0\leq u\leq 1}$ in $D[0,1]$ .

(ii) In Case A, if (6.4) holds then the random process $X^{5}_{n}(\cdot)$ defined in (2.20) converges weakly to $(B^{\circ}(u)/\sqrt{u(1-u)})_{0<u<1}$ in $L^{2}(0,1)$ .*

Proof. (i) Since it holds that

[TABLE]

for any $u\in[0,1]$ , it is sufficient to show

[TABLE]

and

[TABLE]

in $D[0,1]$ . Firstly, (6.8) holds because the assumption (6.3) yields that $\log{\theta}/\log{n}\to 0$ and because it follows from (2.7) that

[TABLE]

Next, we show (6.9). Since it follows from Proposition 5.11 and from the assumption (6.3) (see (6.5)) that

[TABLE]

and that

[TABLE]

the triangle inequality yields that

[TABLE]

where

[TABLE]

By using the functional central limit theorem for Poisson processes in $D[0,1]$ , $P_{4}^{\circ}(\cdot)$ converges weakly to $(B^{\circ}(u))_{0\leq u\leq 1}$ in $D[0,1]$ (see Arratia and Tavaré (1992)).

(ii) By the same reason as (i), it is sufficient to show (6.8) and

[TABLE]

in $L^{2}(0,1)$ . Firstly, it holds that

[TABLE]

and that

[TABLE]

These right-hand sides of (6.11) and (6.12) converge to 0 in probability because the expectations of their square root converge to 0 by the assumption (6.4). Secondly, by letting $(N^{1}(t))_{t\geq 0}$ be a homogeneous Poisson process with unit intensity, it follows from

[TABLE]

that

[TABLE]

and that

[TABLE]

Thirdly, it holds that

[TABLE]

The distributions of the first term and second term in the right-hand side are equal to

[TABLE]

and

[TABLE]

respectively. Both of them converge to 0 in probability because their expectations tend to 0 from the assumption (6.4). Thus, the triangle inequality yields that

[TABLE]

where

[TABLE]

Since

[TABLE]

Theorem 4 of Tsukuda (2016) yields that $\left(P_{5}^{\circ}(u)\right)_{0<u<1}\Rightarrow\left(B^{\circ}(u)/\sqrt{u}\right)_{0<u<1}$ by setting $H_{s}=1$ with $d=1$ , $\lambda_{s}=1$ and $T=\theta\log{n}$ . Consequently, we have (6.10). This completes the proof.

Appendix A Appendix. Auxiliary lemmas

Lemma A.1.

In Case A, if $\theta^{2}/n\to 0$ then

[TABLE]

Proof. The left-hand side of (A.1) equals

[TABLE]

By using the asymptotic series expansion $\Gamma(x)=\sqrt{2\pi}e^{-x}x^{x-1/2}\{1+1/(12x)\}+O\left(e^{-x}x^{x-5/2}\right)$ as $x\to\infty$ , it holds that

[TABLE]

and that

[TABLE]

Hence, the left-hand side of (A.1) is

[TABLE]

and, from the asymptotic expansion $\left(1+1/x\right)^{x}=e\left(1-1/(2x)+O\left(x^{-2}\right)\right)$ as $x\to\infty$ it follows that

[TABLE]

and hence (A.2) is

[TABLE]

This completes the proof.

Lemma A.2.

Let $b$ be an positive integer. For $a>1$ , it holds that

[TABLE]

Proof. It holds that

[TABLE]

As for the last term, it holds that

[TABLE]

This completes the proof.

Lemma A.3.

For any $a>0$ and for any positive integer $b$ , $(x-a)_{b}/(x)_{b}$ is increasing with respect to $x>a$ .

Proof. The proof is by induction on $b$ . When $b=1$ , $(x-a)/x=1-a/x$ is increasing. Let $x_{1}$ and $x_{2}$ satisfy $a<x_{1}<x_{2}$ . If the conclusion of the lemma is true for $b$ , then the conclusion is also true for $b+1$ because

[TABLE]

This completes the proof.

Lemma A.4.

Let $(N_{t})_{t\geq 0}$ be a homogeneous Poisson process with intensity $\lambda>0$ satisfying $N_{0}=0$ . Define the non-decreasing function $(n,u)\mapsto s_{n}(u)$ with respect to $0\leq u\leq 1$ and with respect to $n=1,2,\ldots$ which satisfies $\inf_{u\in(\tau,1)}s_{n}(u)>0$ for all $0<\tau<1$ ,

[TABLE]

with $n\mapsto f(n)$ an increasing function of $n$ satisfying $\lim_{n\to\infty}f(n)=\infty$ , and

[TABLE]

for some $\delta>0$ . Then, the random process

[TABLE]

converges weakly to a Gaussian process $(B(u)/\sqrt{u})_{0<u<1}$ in $L^{2}(0,1)$ as $n\to\infty$ .

Remark 15.

Lemma A.4 is a slight generalization of Lemma 2.1 of Tsukuda (2017b). The only difference is condition (A.3), where corresponding condition (2.1) of Tsukuda (2017b) is the case that $f(n)=K\log{n}$ with a constant $K$ . To show Lemma A.4, the equation

[TABLE]

in the proof of Lemma 2.1 of Tsukuda (2017b) should be replaced by

[TABLE]

and the other part has no need to change.

Acknowledgements

The author would like to his heartfelt gratitude to Professor Shuhei Mano for a lot of constructive comments.

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Arratia, Barbour and Tavaré (1992) Arratia, R., Barbour, A.D., Tavaré, S. (1992). Poisson process approximations for the Ewens sampling formula. Ann. Appl. Probab. 2 , no. 3, 519–535.
2Arratia, Barbour and Tavaré (2000) Arratia, R., Barbour, A.D., Tavaré, S. (2000). Limits of logarithmic combinatorial structures. Ann. Probab. 28 , no. 4, 1620–1644.
3Arratia, Barbour and Tavaré (2016) Arratia, R., Barbour, A.D., Tavaré, S. (2016). Exploiting the Feller coupling for the Ewens sampling formula. Statist. Sci. 31 , no. 1, 27–29.
4Arratia, Stark and Tavaré (1995) Arratia, R., Stark, D., Tavaré, S. (1995). Total variation asymptotics for Poisson process approximations of logarithmic combinatorial assemblies. Ann. Probab. 23 , no. 3, 1347–1388.
5Arratia and Tavaré (1992) Arratia, R., Tavaré, S. (1992). Limit theorems for combinatorial structures via discrete process approximations. Random Structures Algorithms 3 , no. 3, 321–345.
6Barbour (1992) Barbour, A.D. (1992). Refined approximations for the Ewens sampling formula. Random Structures Algorithms 3 , no. 3, 2670–276.
7Barbour and Hall (1984) Barbour, A.D., Hall, P. (1984). On the rate of Poisson convergence. Math. Proc. Cambridge Philos. Soc. 95 , no. 3, 473–480.
8Crane (2016) Crane, H. (2016). The ubiquitous Ewens sampling formula. Statist. Sci. 31 , no. 1, 1–19.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On Poisson approximations for the Ewens sampling formula when the mutation parameter grows with the sample size

Abstract

1 Introduction

1.1 Notations

1.2 Asymptotic settings

Remark 1**.**

1.3 Organization

2 Results in the literature

2.1 Normal and Poisson approximations for KnK_{n}Kn​

Remark 2**.**

Remark 3**.**

2.2 Independent process approximations for Cbn{\bf C}_{b}^{n}Cbn​

2.3 Functional central limit theorems

Remark 4**.**

Proposition 2.1**.**

Remark 5**.**

2.4 Auxiliary results

3 Preliminary results

Proposition 3.1**.**

Proposition 3.2**.**

Remark 6**.**

Proposition 3.3**.**

4 Poisson approximations for the total number of alleles

Proposition 4.1**.**

Remark 7**.**

Lemma 4.2**.**

Proposition 4.3**.**

5 On independent process approximation of component counts

5.1 Case A

Proposition 5.1**.**

Lemma 5.2**.**

Lemma 5.3**.**

Remark 8**.**

Remark 9**.**

Proposition 5.4**.**

Corollary 5.5**.**

Lemma 5.6**.**

Lemma 5.7**.**

Theorem 5.8**.**

Proposition 5.9**.**

Remark 10**.**

Corollary 5.10**.**

Remark 11**.**

Proposition 5.11**.**

5.2 Case C

Lemma 5.12**.**

Remark 12**.**

Theorem 5.13**.**

Corollary 5.14**.**

Corollary 5.15**.**

Remark 13**.**

6 Functional central limit theorems

Corollary 6.1**.**

Proposition 6.2**.**

Remark 14**.**

Proposition 6.3**.**

Appendix A Appendix. Auxiliary lemmas

Lemma A.1**.**

Lemma A.2**.**

Lemma A.3**.**

Lemma A.4**.**

Remark 15**.**

Acknowledgements

Remark 1.

2.1 Normal and Poisson approximations for $K_{n}$

Remark 2.

Remark 3.

2.2 Independent process approximations for ${\bf C}_{b}^{n}$

Remark 4.

Proposition 2.1.

Remark 5.

Proposition 3.1.

Proposition 3.2.

Remark 6.

Proposition 3.3.

Proposition 4.1.

Remark 7.

Lemma 4.2.

Proposition 4.3.

Proposition 5.1.

Lemma 5.2.

Lemma 5.3.

Remark 8.

Remark 9.

Proposition 5.4.

Corollary 5.5.

Lemma 5.6.

Lemma 5.7.

Theorem 5.8.

Proposition 5.9.

Remark 10.

Corollary 5.10.

Remark 11.

Proposition 5.11.

Lemma 5.12.

Remark 12.

Theorem 5.13.

Corollary 5.14.

Corollary 5.15.

Remark 13.

Corollary 6.1.

Proposition 6.2.

Remark 14.

Proposition 6.3.

Lemma A.1.

Lemma A.2.

Lemma A.3.

Lemma A.4.

Remark 15.