Integral representation of probabilities in Kingman coalescent
Youzhou Zhou

TL;DR
This paper develops a complex integral representation for the finite time distribution of the Kingman coalescent and uses steepest descent to analyze it, leading to a local central limit theorem at small times.
Contribution
It introduces a novel integral representation for the Kingman coalescent's finite time distribution and applies steepest descent analysis to derive a local CLT.
Findings
Derived a complex integral representation for the distribution
Applied steepest descent to analyze the integral
Established a local central limit theorem at small times
Abstract
Kingman Coalescent was first proposed by Kingman [7] in population genetics to describe population's genealogical structure. Now it becomes a bench-mark model for coalescent process. Extensive studies have been conducted on Kingman coalescent. In particular, its explicit finite time distribution was obtained by Tavar\'e [12]. However, very few people use this explicit distribution to do analysis for it is an intractable infinite series. In this article, we are going to establish a complex integral representation for the finite time distribution, then we use steepest descent method to analyze this integral representation to obtain local central limit theorem at small time regime.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic processes and statistical mechanics · Bayesian Methods and Mixture Models · Statistical Distribution Estimation and Applications
Integral Representation of Probabilities
in Kingman Coalescent
Youzhou Zhou
Department of Mathematical Science
Xi’an Jiaotong-Liverpool University
111 Renai Road
Suzhou, China 215 123
Abstract.
Kingman Coalescent was first proposed by Kingman [9] in population genetics to describe population’s genealogical structure. Now it becomes a benchmark model for coalescent process. Extensive studies have been conducted on Kingman coalescent. In particular, its explicit finite time distribution was obtained by Tavaré [13]. However, very few people use this explicit distribution to do analysis for it is an intractable infinite series. In this article, we are going to establish a complex integral representation for the finite time distribution, then we use steepest descent method to analyze this integral representation to obtain local central limit theorem at small time regime.
Key words and phrases:
Kingman Coalescent, Integral Representation, Local Central Limit Theorem
2010 Mathematics Subject Classification:
Primary 60F10; secondary 60C05
This research is supported by NSFC: 11701570.
1. Introduction
Kingman coalescent is a first coalescent model in population genetics. It was first proposed by Kingman [9] to describe the genealogical structure of a sample. It eventually finds its extensive applications in biology. Also there are many generalizations of Kingman coalescent such as coalescent [12]. For recent development of coalescent processes, one can refer to [3]. All these coalescent processes have a nice dual correspondence with evolution dynamics. Because if we observe the population forward in time we get evolution process, if, however, we look the population backward in time, we end up with coalescent. Coalescing structure is ubiquitous in nature, one can refer to [2] for a nice review on mathematical coalescent models in physics and chemistry. Coalescent process can be treated in different mathematical setting as well. For instance, Kingman coalescent is a Markov chain, but it can also be regarded as a random metric space by Evans in [6]. In this article, we will only regard Kingman coalescent as a Markov chain.
To describe genealogical structure, one can use an equivalent relation defined according to whether two individuals share common parents in previous generation. Each equivalent relation will generate a partition of a sample. Therefore, it is natural to use partitions of this sample to represent its genealogical structures. Let be a sample of size . Denote to be the set of all partitions of this sample. Kingman coalescent is defined to be a continuous time Markov chain valued in . If we only consider the number of blocks of partition, then we will get an integer-valued pure death process, called block-counting process and denoted as . The transition rate of is
[TABLE]
where is mutation rate. As , will converges to a pure death chain whose initial state is . The finite time probability mass function of is . These probabilities sever as convex coefficients of the explicit transition function of Fleming-Viot process with parent-independent mutation [5] and the ininitely-many-neutral-alleles model [14]. In 1984, Tavaré obtained the following explicit expression of in [13]
[TABLE]
It has been 30 years, people only use this expression to derive ergodic inequality of Fleming-Viot process with parent-independent mutation and the ininitely-many-neutral-alleles model.
One distinctive feature of Kingman coalescent is that it has an entrance boundary , also termed as coming down from infinity. Therefore this process will jump to finite state immediately, and many details are hided in the small time regime. There are many studies on the small time asymptotic of Kingman coalescent [11],[10],[7],[4]. However, none of them have ever used the explicit expression of . One possible reason is that the expression of is intractable series, and some probability tools, such as martingale, are already powerful enough to handle problems about Kingman coalescent. But if one would like to derive some fine asymptotic result, it is usually more hopeful to work with the explicit distribution .
In this article, we are going to find an integral representation for probabilities . Integral representation is usually more suitable for asymptotic analysis. The idea to establish integral representation is to replace some combinatorial terms and by integrals. Cauchy integral formula and Fourier transform are tailor-made tool for this job. Then one will end up with an integral whose integrand is essentially a geometric series. Therefore, one can easily obtained a simplified integrals by summing up the geometric series. Sometimes, if possible, one also needs to do carefully residue computation to simplify the integral further. Once we have an integral representation, we will do some contour deformation and then use steepest descent method to obtain the asymptotic behavior. As an application, we obtain local central limit theorem. In [11] and [10], Limic and Talarczyk obtained functional fluctuation theorem at small time regime. However, local central limit theorem is new. In [4], Depperschmidt, Pfaffelhuber and Scheuringer discussed small time large deviations for Kingman coalescent. We believe the integral representation in this article can also be used to get both large deviations and moderate deviations.
This article is planned as follows: In section 2, we will discuss the integral representation in Theorem 2.1 and its proof. This integral representation is essentially the same as (2.16) in [8]. In section 3, we will present an alternative integral representation in Theorem 3.1 which is more suitable for asymptotic analysis, then we use steepest descent method to obtain local central limit theorem. This alternative integral representation is brand new. Proofs of some lemmas are left in section 4. Last, we will give some remarks in section 5.
In this article, notation “” will reappear many times. We say if . Notation for constants will also show up a few times. Whenever they appear, we just regard them as a constant independent of . Also is a reserved word for imaginary unit and .
2. Integral Representation for
In this section, we are going to derive some integral representations for probabilities of Kingman coalescent. Generally speaking, integral representation is usually more tractable than series expression if one want to directly use to study small time asymptotic of Kingman coalescent.
In series expression of , there are combinatorial coefficients and . One can express them in terms of complex integrals with specifically chosen contours, then the series expansion becomes complex integral whose integrand is essentially a geometric series. Therefore, one can easily sum up this to get a simplified integrand. To rewrite combinatorial coefficients as contour integrals, one only needs to use Cauchy integral formula. Because is of gaussian type, one can use Fourier transform to get its integral form. Last, one also need to do careful residue calculation to get the following integral representation.
Theorem 2.1**.**
For , has the following integral representation
[TABLE]
where and is defined to be principal branch of power function .
Proof.
First, by simple algebraic calculation, we have
[TABLE]
and also
[TABLE]
By Fourier transform, we have
[TABLE]
where . Moreover, due to Cauchy integral formula, one has
[TABLE]
where is a small circle centered at with radius and is a circle with center and radius . These contours are chosen to guarantee the uniform convergence of upcoming series.
Replacing combinatorial coefficients and by their integral forms in , one can have
[TABLE]
The specifically chosen contours guarantee that
[TABLE]
Thus, it is safe to switch summation and integration. After summing up the geometric series, one has
[TABLE]
Now we first integrate out , then
[TABLE]
where are residues at and and
[TABLE]
Then
[TABLE]
One can show that
[TABLE]
because the maximum degree of numerator is and the degree of denominator is . Moreover,
[TABLE]
where the contour integral over only equals to the residue at for is outside of . Thus,
[TABLE]
Consider substitution , then one will have
[TABLE]
where . But one can show that can be any positive number because for
[TABLE]
The usual argument for Fourier transform shows that one can remove the restriction for . ∎
3. Local Central Limit Theorem
As we know that (refer to [3]). One needs to consider asymptotic behavior of to establish central limit theorem. However, the integral representation in Theorem 2.1 is not suitable for asymptotic analysis, so we derive an alternative integral representation through contour deformation.
Theorem 3.1**.**
For , has the following integral representation
[TABLE]
where is a unit circle,
[TABLE]
[TABLE]
and
[TABLE]
Proof.
Let
[TABLE]
One can easily show that . Therefore,
[TABLE]
Then
[TABLE]
where contour are two parallel horizontal lines. One can easily deform the contour to contours in Figure 1.
Even though is not analytic on segments for they are cut lines of . Due to symmetry of , their integrations on cut lines are perfectly cancelled.
Therefore,
[TABLE]
where is chosen to be a circle centered at with radius and is a unit circle. Therefore,
[TABLE]
where is evaluated as principal branch. If we denote
[TABLE]
then
[TABLE]
The function is indeed a double periodic function with period in and in . It is actually the famous Jacobi theta function. In the following, we need the famous Jacobi triple product identity formula (refer to [1])
Lemma 3.1**.**
For , we have
[TABLE]
Because one can rewrite as
[TABLE]
where
[TABLE]
Then by Lemma 3.1, we have
[TABLE]
Note that
[TABLE]
then
[TABLE]
If we denote
[TABLE]
then our theorem is proved. ∎
3.1. Steep Descent Contour for Asymptotic Analysis
We are going to use steepest descent method to do asymptotic analysis, so we need to find out steep descent path for asymptotic analysis. If we denote
[TABLE]
one can easily show that is an odd function, i.e. . Note that is an even function. Then
[TABLE]
where is the upper half circle and it has counter clockwise orientation. Hence
[TABLE]
Now we consider two paths (Figure 2)
[TABLE]
[TABLE]
These two paths and serve as steep descent path for integral and respectively though neither of them are steepest descent contour. However, these two paths are good enough for our analysis. Indeed, we can rewrite as
[TABLE]
where
[TABLE]
similarly,
[TABLE]
Since has one solution . Then is a critical point of , and is a critical point of . Moreover, because on ,
[TABLE]
one can easily check that
[TABLE]
therefore is decreasing as increases. Likewise,
[TABLE]
and is also decreasing as increases. So and are both decreasing as is moving away from along and from along respectively. Also one can eventually show that the integration over remaining vertical paths on and can be ignored. Thus, the major contribution is around and . So and serve as steep descent contour for and respectively. Also and has a one-to-one correspondence
3.2. Contour Deformation to Steep Descent Contour
To deform the upper half unit circle to steep descent contour, we can first deform the upper half unit circle to a big upper half circle with radius . Though and are not analytic on the , the deformation can still go through because the integrand is odd. Therefore,
[TABLE]
where and has opposite orientation. If we let and be two discs with center and and radius . If we denote be the segment within and be the segment outside .
Lemma 3.2**.**
There exists , such that
[TABLE]
where are two constants and independent of .
One will see that the major contribution is from the integration over . To pinpoint the above limits, we need to expand on . Because
[TABLE]
where is Bernoulli number (refer to [1] for above expansion). We consider the following parametrizations:
[TABLE]
Then one can show the following lemma.
Lemma 3.3**.**
As , we have
[TABLE]
Thus, we have our main theorem.
Theorem 3.2**.**
Let . As , we have
[TABLE]
Proof.
By Theorem 3.1, we know
[TABLE]
where
[TABLE]
By Stirling’s formula, we know as
[TABLE]
Due to Lemma 3.2 and Lemma 3.3 and equation (1), one can see that
[TABLE]
Therefore,
[TABLE]
∎
4. Proofs of Lemma 3.2 and Lemma 3.3
4.1. Proof of Lemma 3.2
Proof.
The path has two parts
[TABLE]
On ,
[TABLE]
where . Then
[TABLE]
By the monotonicity of , we know that
[TABLE]
and Thus,
[TABLE]
So
[TABLE]
and
[TABLE]
On part , we know , and is decreasing in , then
[TABLE]
It is not difficult to show that as
[TABLE]
Therefore, there exists such that
[TABLE]
Moreover, , then by inequality , one can show that
[TABLE]
Thus,
[TABLE]
So
[TABLE]
∎
4.2. Proof of Lemma 3.3
Proof.
Consider the following parametrization:
[TABLE]
Then by the Taylor expansion of in (2), one can show that there exists such that
[TABLE]
[TABLE]
where and is independent of . Thus, if we denote
[TABLE]
[TABLE]
then it is not difficult to show that as the following results hold uniformly for
[TABLE]
Therefore,
[TABLE]
where
[TABLE]
One can easily show that
[TABLE]
Due to uniform convergence in (3), we know .
Because , then and . By Lebesgue’s dominant convergent theorem, we can show that
[TABLE]
Then by Fourier transformation, we know
[TABLE]
So
[TABLE]
∎
5. Conclusion Remarks
The integral representation of probabilities has other possible applications such as large deviations and moderate deviations for at small time regime. In proof of local central limit theorem, we use a steep descent path, which is actually not the steepest descent path. The steepest descent path is actually given by otherwise the oscillation of the imaginary part will produce many cancellations. It will eventually prevent us from getting precise estimation. But for the proof of local central limit theorem, our path is good enough. If one can carefully handle (3), it is possible to get some asymptotic expansion.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Milton Abramowitz and Irene A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables . Dover, New York, ninth dover printing, tenth gpo printing edition, 1964.
- 2[2] David J. Aldous. Deterministic and stochastic models for coalescence (aggregation and coagulation): a review of the mean-field theory for probabilists. Bernoulli , 5(1):3–48, 1999.
- 3[3] Nathanaël Berestycki. Recent progress in coalescent theory , volume 16 of Ensaios Matemáticos [Mathematical Surveys] . Sociedade Brasileira de Matemática, Rio de Janeiro, 2009.
- 4[4] Andrej Depperschmidt, Peter Pfaffelhuber, and Annika Scheuringer. Some large deviations in Kingman’s coalescent. Electron. Commun. Probab. , 20, no. 7:1–14, 2015.
- 5[5] S. N. Ethier and R. C. Griffiths. The transition function of a Fleming-Viot process. Ann. Probab. , 21(3):1571–1590, 1993.
- 6[6] Steven N. Evans. Kingman’s coalescent as a random metric space. In Stochastic models (Ottawa, ON, 1998) , volume 26 of CMS Conf. Proc. , pages 105–114. Amer. Math. Soc., Providence, RI, 2000.
- 7[7] R. C. Griffiths. Asymptotic line-of-descent distributions. J. Math. Biol. , 21(1):67–75, 1984.
- 8[8] R. C. Griffiths. Coalescent lineage distributions. Adv. Appl. Prob. , 38:405–429, 2006.
