When Can The Discrete Moran Process May Bereplaced By Wright-fisher Diffusion?
Gorgui Gackou, A Guillin, Arnaud Personne (UCA)

TL;DR
This paper quantifies the error when approximating the discrete Moran process with the Wright-Fisher diffusion in population genetics, especially under weak selection and immigration, extending to Markovian processes.
Contribution
It provides a quantitative large population limit of the approximation error, including cases with Markovian selection and immigration processes.
Findings
Error bounds for diffusion approximation under weak selection and immigration
Extension to Markovian processes with jump or diffusion limits
Robust approach applicable to various population dynamics
Abstract
The Moran discrete process and the Wright-Fisher modelare the most popular models in population genetics. It is common tounderstand the dynamics of these models to use an approximating diffusionprocess, called Wright-Fisher diffusion. Here, we give a quantitativelarge population limit of the error committed by using the approximationdiffusion in the presence of weak selection and weak immigrationin one dimension. The approach is robust enough to consider the casewhere selection and immigration are Markovian processes, with limitsjump or diffusion processes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolution and Genetic Dynamics · Mathematical and Theoretical Epidemiology and Ecology Models · Stochastic processes and statistical mechanics
When can the discrete Moran process may be replaced by Wright-Fisher diffusion?
** Gorgui GACKOU ♢ **
Gorgui GACKOU
Laboratoire de Mathématiques Blaise Pascal, CNRS UMR 6620, Université Clermont-Auvergne
,
** A. Guillin ♢ **
Arnaud GUILLIN
Laboratoire de Mathématiques Blaise Pascal, CNRS UMR 6620, Université Clermont-Auvergne, avenue des Landais, F-63177 Aubière.
and
** Arnaud Personne ♢ **
Arnaud PERSONNE
Laboratoire de Mathématiques Blaise Pascal, CNRS UMR 6620, Université Clermont-Auvergne, avenue des Landais, F-63177 Aubière.
Abstract.
The Moran discrete process and the Wright-Fisher model are the most popular models in population genetics. It is common to understand the dynamics of these models to use an approximating diffusion process, called Wright-Fisher diffusion. Here, we give a quantitative large population limit of the error committed by using the approximation diffusion in the presence of weak selection and weak immigration in one dimension. The approach is robust enough to consider the case where selection and immigration are Markovian processes, with limits jump or diffusion processes.
♢ Université Clermont-Auvergne
1. Introduction
The diffusion approximation is a technique in which a complicated and intractable (as the dimension increases) discrete Markovian process is replaced by an appropriate diffusion which is generally easier to study. This technique is used in many domains and genetics and population dynamics are no exceptions to the rule. Two of the main models used in population dynamics are the Wright-Fisher (see for example [14],[15],[23],[24]) and the Moran [20] models which describe the evolution of a population having a constant size and subject to immigration end environmental variations. For large population limit, it is well known that the Moran process is quite difficult to handle mathematically and numerically. For example, the convergence to equilibrium (independent of the population size) or the estimation of various biodiversity index such as the Simpson index are not known. It is thus tempting to approach the dynamics of these Markovian process by a diffusion, called the Wright-Fisher diffusion, see for example [6], [9] or [19], and work on this simpler (low dimensional) process to get good quantitative properties.
A traditional way to prove this result is to consider a martingale problem, as was developed by Stroock and Varadhan in [25], see also [4], [9] and [7] for example for Wright-Fisher process with selection but without rates. This technique ensures us that the discrete process converges to the diffusion when the size of the population grows to infinity. If the setting is very general and truly efficient, it is usually not quantitative as it does not give any order of the error committed in replacing the discrete process by the diffusion for fixed size of population. To obtain an estimation of this error we will consider another approach by Ethier-Norman in [22] (or [8]), which makes for a quantitative statement of the convergence of the generator using heavily the properties of the diffusion limit. For the Wright-Fisher model with immigration but without selection they showed that the error is of the order of the inverse f the population size, and uniform in time. Our main goal here will be to consider the more general model where 1) weak selection is involved; 2) immigration and selection may be also Markov processes. To include selection, constant or random is of course fundamental for modelization, see for example [12], [5], [18], [3], [16], [1], [2] for recent references. Also, to study biodiversity, a common index is the Simpson index, which is intractable in non neutral model (see [11] or [10] in the neutral case, and even not easy to approximate via Monte Carlo simulation when the population is large. Based on the Wright-Fisher diffusion, an efficient approximation procedure has been introduced in [17]. It is thus a crucial issue to get quantitative diffusion approximation result in the case of random selection to get a full approximation procedure for this biodiversity index.
Let us give the plan of this note. First in Section 2, we present the Moran model. As an introduction to the method, we will first consider the case of a constant selection and we find an error of the same order but growing exponentially or linearly in time. It will be done in section 3. Sections 4 and 5 are concerned with the case of random environment. Section 4 considers the case when the limit of the selection is a pure jump process and Section 5 when it is a diffusion process. We will indicate the main modifications of the previous proof to adapt to this setting. An appendix indicates how to adapt the preceding proofs to the case of the Wright-Fisher discrete process.
2. The Moran model and its approximation diffusion.
Consider to simplify a population of individuals with only two species. Note that there no other difficulties than tedious calculations to consider a finite number of species. At each time step, one individual dies and is replaced by one member of the community or a member of a distinct (infinite) pool. To make precise this mechanism of evolution, let us introduce the following parameters:
- •
is the immigration probability, i.e. the probability that the next member of the population comes from the exterior pool;
- •
is proportion of the first species in the pool;
- •
is the selection parameter, which acts at favoring one of the two species.
Let us first consider that , and are functions depending on time (but not random to simplify) and taking values in for the first two and in for the selection parameter.
Note that this process may also be described considering mutation, rather than immigration but there is a one to one relation between these two interpretations. Our time horizon will be denoted by .
Rather than considering the process following the number of elements in each species, we will study the proportion in the population of the first species. To do so, let , and we denote for all in , the bounded functions on ,
[TABLE]
Consider also the supremum norm of .
Let , with values in , be the proportion of individuals of the first species in the community.
In this section, is thus the Moran process, namely a Markov process evolving with the following transition probabilities: denote
[TABLE]
To study the dynamical properties of this process a convenient method developped first by Fisher [14], [15] and then Wright [23], [24], aims at approximating this discrete model by a diffusion when the size of the population tends to infinity.
In the special case of the Moran model with weak selection and weak immigration, meaning that the parameters and are inversely proportional to the population size , we usually use the process taking values in defined by the following generator:
[TABLE]
Note that, in weak selection and immigration, and , so the process defined by do not depend on . Its generator is
[TABLE]
or equivalently by the stochastic differential equation
[TABLE]
Our aim is to find for sufficiently regular test function, say , an estimation of :
[TABLE]
for and for all in . By replacing by we thus get :
[TABLE]
for , and .
So equivalently it is convenient to study, if we note :
[TABLE]
on ,and .
3. Estimate of the error in the approximation diffusion for constant weak immigration and selection
3.1. Main result
We now give our main result in the case where immigration and selection are constant. It furnishes an estimation of the error committed during the convergence of the discrete Moran process toward the Wright-Fisher diffusion process .
Theorem 1**.**
Let us consider the weak immigration and selection case, so that and for some , ( large enough). Let . Let then there exist positive and (depending on and ) such that:
[TABLE]
If we suppose moreover that then there exists such that
[TABLE]
Remark 1**.**
By considering then and we find back the uniform in time approximation diffusion with speed . Our method of proof, requiring the control of some Feynman-Kac formula based on the limiting process, seems limited to give non uniform in time result. Our hope is that we may get weaker conditions than to get linear in time estimates. Another possibility is to mix these dependance in time approximation with known ergodicity of the Wright-Fisher process, as in Norman [21].
Remark 2**.**
We have considered to simplify and but one may generalize a little bit the condition to locally bounded and such that and .
Remark 3**.**
Such approximation error is noticeably useful to polynomial test function , so that we may for example consider the Simpson index of the Moran process, see [17] for further details.
Remark 4**.**
The following figures show that the obtained rate is of the good order.
It shows that our rate may be the good one.
3.2. Proof
The proof relies on three ingredients:
- (1)
a "telescopic" decomposition of the error; 2. (2)
a quantitative estimate of the error at time 1 of the approximation of the Moran process by the diffusion; 3. (3)
quantitative control of the regularity of the Wright-Fisher process.
Note also that in the sequel we will not make distinction between function on and their restrictions on .
Let be defined on (the space of bounded functions on ) by :
[TABLE]
As is usual verifies for all in the semigroup property, namely that .
Let be the operator defined on the space of bounded continuous function by :
[TABLE]
It also defines a semigroup ,
Thanks to these properties, we have
[TABLE]
and as by triangular inequality, we get , .
[TABLE]
We have two main terms to analyze : for a "one-step" difference between the Moran process and the Wright-Fisher diffusion process, and for which we need regularity estimates.
Control of
Let us first study, for regular enough , . The main goal is to obtain the Taylor expansion of this function when is big enough.
Lemma 1**.**
When is big enough, i.e. , there exists such that
[TABLE]
Proof.
Let us begin by consideration on the Wright-Fisher diffusion process. Remark first, as usual for this diffusion process
[TABLE]
The Chapman-Kolmogorov backward equation reads
[TABLE]
and more generally if is enough regular, for in it is possible to define as:
[TABLE]
For this proof, we only need to go to the fourth order in . So let (possibly depending on ), using Taylor theorem for there exists , independent of , such as :
[TABLE]
By direct calculations, we have for the successive
[TABLE]
where the derivative of . Remark now that by our assumption on the boundedness of the successive derivatives of that there exists (depending also on )
[TABLE]
Thus, in the following this term could be neglected.
Let us now look at the Moran process and so get estimates on . The quantity is at least of the order of and when goes to infinity, goes to [math]. So using Taylor’s theorem, there exists such that :
[TABLE]
and thus
[TABLE]
Direct estimates (even if tedious) on the centered moments of the Moran process give
[TABLE]
where is a constant (independent of ).
We may then consider through (3) and (2) so that there exists a constant such as :
[TABLE]
with
[TABLE]
As selection and immigration are weak, we easily conclude that and are at most of the order of . ∎
Regularity estimates on
We have now to prove regularity estimates on . By [6, Th.1], we have that for all . Assume for now that and , , there are and independent of such that:
[TABLE]
with .
Let us see how to conclude if (8) is verified. First it exists a continuous (time dependent) function independent of such that:
[TABLE]
because if is big enough,
[TABLE]
and is independent of because is of the order of . And so with q(t)=\max\limits_{j\in{1,2}}\Big{(}\|\gamma_{j}\|\tilde{R}_{j}J^{3}\Big{)}, we obtain the result:
[TABLE]
This concludes the proof in the first case. Indeed, we easily see that the function is exponential in time in the general case. We will see later how, when some additional conditions are added on and , one may obtain a linear in time function.
We will now prove the crucial (8). It will be done through the following proposition.
Proposition 1**.**
*Let , and . Assume then and for , , there are and independent of such as
Proof.
First remark that the Chapman-Kolmogorov backward equation may be written :
[TABLE]
The following lemma gives the equations verified by :
Lemma 2**.**
Let be the derivative of with respect to then we get:
[TABLE]
where
[TABLE]
Let us remark that there are two new terms when there is selection in Moran processes, i.e. which will lead to the dependence in time of our estimates handled via Feynman-Kac formula, and one in which will be the key to the condition to get only linear in time dependence.
Proof.
A simple recurrence is sufficient to prove this result, for simplicity let us only look at the case j=1
[TABLE]
With , we find the good initial coefficients. ∎
Let us now use the Feymann-Kac formula to get ,
[TABLE]
with the process having for generator. Then look first at . As we are in weak selection and weak immigration,
[TABLE]
where is independent of . The case is proved.
We will then prove the result by recurrence: suppose true this hypothesis until .
For , denote , and remark that is no equal to zero and is independent of because the selection and immigration are weak. Thus
[TABLE]
The do not depend on , because , the and can be bounded independently of .
To conclude we have to justify that is finite for all . For it we just need to note that the processes are bounded by [math] and for all .
This is partly due to the fact that their generator has a negative drift at the neighbourhood of and a positive at the neighbourhood of [math], see Feller[13]. This argument completes the proof. ∎
Let us now consider the case where , we will show in this case that we obtain a linear in time dependance rather than an exponential one. Then, in the equation (3.2) we can use the following:
[TABLE]
where is a constant independent of time. And then,
[TABLE]
because if is big enough,
[TABLE]
and is independent of and independent of time.
4. Random limiting selection as a pure jump process
To simplify, we will consider a constant immigration, in order to see where the main difficulty arises. The results would readily apply also to this case.
Let us now assume that is no longer a constant but a Markovian jump process with homogeneous transition probability . We are in the weak selection case so is still of the order of and takes values in a finite space .
Assume furthermore
[TABLE]
As in the previous section, is the Moran process, but with a Markovian selection and takes values in . Finally denote . Consider now the processes tacking values in defined by the following generator:
[TABLE]
Its first coordinate is the process having the same generator as in the first part and the second is the Markovian jump process having for generator and for transition rates.
As in the previous part we want to quantify the convergence of towards in law, when goes to infinity. So the following theorem gives an estimation of the order of convergence of for .
Theorem 2**.**
Let denote and assume and , is in . Let then it exists a function at most exponential in time and a function linear in time which verifies when goes to infinity: there exists , such that
[TABLE]
Proof.
The sheme of proof will be the same than for constant selection. Let us focus on the first lemma, where some changes have to be highlighted.
Lemma 3**.**
There exist bounded functions of , (), and a constant such that :
[TABLE]
Proof.
We provide first the quivalent of (1) in our context, i.e. there exists such that
[TABLE]
In fact as before, is still of the order of , then
[TABLE]
Let now look at the order in of each term of the previous inequality. First with the arguments used in (7), there exist constant , and of the order of such as :
[TABLE]
Then recall that is of the order of and by the same calculations than in (4) \mathbb{E}_{x}\big{[}f(X_{1},s^{\prime})-f(x,s^{\prime})\big{]} is also of the order of so (12) is at most of the order of .
Finally (13) can be written
[TABLE]
and by (9) is at least .
Note that in the case where is Lipschitz in the second variable, as is of the order of , it’s possible to obtain a better order .
Anyway,
[TABLE]
∎
Assume now that , and note the the jth derivative in of . Note that the lemma 2 holds even if is no longer constant. Indeed is not affected by the derivative in . So we get and , that there exist and independent of such that :
[TABLE]
with .
We still have that ,. And there exists a continuous function at most exponential in time and a linear function of time independent of verifying:
[TABLE]
because if is big enough,
[TABLE]
and is independent of because is of the order of . Finally, let , so that does not depend on and is at most exponential in time. Then
[TABLE]
And this concludes the proof. ∎
5. Random limiting selection as a diffusion process
In this section, we assume that the limiting selection is an homogeneous diffusion process. Once again for simplicity we will suppose that the immigration coefficient is constant. First consider the following the stochastic differential equation:
[TABLE]
with and are both bounded and lipschitzian functions, i.e.: ,, it exists such that :
[TABLE]
for some constant . These assumptions guarantee the existence of strong solutions of and has for generator
[TABLE]
Let , then the process is independent of .
[TABLE]
For , let divide the interval in regular intervals and let introduce . Use now the standard Euler discretization and consider defined by the relation:
[TABLE]
where the quantity are i.i.d and follow a . It is well known that
[TABLE]
So it follows
[TABLE]
It is of course possible to use another discretization to approach and the following method will still hold. There is however a small issue: in the model described in first part, for rescaling argument, the selection parameters must be in . Our Markov process is in .
It is thus necessary to introduce the function where is a close bounded interval included in for some .
We assume is in and we consider now for the selection parameter.
Note that to have a non trivial stochastic part in our final equation, we need as in the first section that is of the order of . Many choices are possible for and will depend on modelisation issue.
Let us give back the definition of our Moran process in this context.
[TABLE]
Its first moments are given by, still denoting ,
[TABLE]
As in the previous case we use the process having the following generator to approach the Moran process when tends to infinity:
[TABLE]
So our aim is to give an upper bound for the error committed when
[TABLE]
Let denote by the generator of the two dimensional process .
[TABLE]
Let now state the main result of this section:
Theorem 3**.**
Let be in then there exists a function at most exponential in time such that
[TABLE]
Proof.
Let be the operator defined on the space of bounded functions on by:
[TABLE]
It is of course a semigroup so that . In parallel, let be defined on the space of bounded continuous functions by :
[TABLE]
also verifying, The starting point is as in the first part of (1),
[TABLE]
We now focus on the quantity , the following lemma gives a upper bound of the quantity for in .
Lemma 4**.**
Let be in it exists ,and such as :
[TABLE]
where and are of order .
Proof.
We will use the same methodology. First the Taylor expansion (in space) of gives:
[TABLE]
Indeed we have the quantities:
[TABLE]
And the Taylor expansion of in times gives:
[TABLE]
Indeed it is easy to see that is . Do now the difference
[TABLE]
Finally,
[TABLE]
Let us conclude by taking the norm to get
[TABLE]
so that we obtain the result. ∎
Then (9) still holds for this case as the proof of 1 is exactly the same, so the end follows as in the first part. ∎
6. Appendices : Wright-Fisher discrete model and its approximation diffusion
Let’s consider the Wright-Fisher discrete model with selection and immigration. The population still consists of two species, immigration and selection are still the same. But the Markovian process evolves according to the following probability:
[TABLE]
with .
At each step, all the population is renewed, so this process goes times faster than the Moran process. And we usually, in the case of weak selection and immigration, use the diffusion defined by the following generator to approach this discrete model, when the population goes to infinity.
[TABLE]
Theorem 4**.**
Let be in then there is a function growing at most exponentially in time, depending on and which satisfies when goes to infinity:
[TABLE]
Proof.
Even if the structure of the proof is the same than for the Moran model, however the difference of scale (in now) causes some small differences. Mainly, the calculation of the is a bit different. Note that we need to have in the previous theorem, which is stronger than for the Moran process. The main explanation comes from the calculation of , for which for the Wright-Fisher discrete process it is no longer of the order of . Let us give some details.
First consider the moments \{E[\big{(}X_{n+1}^{J}-x\big{)}^{k}|X_{n}=x]\}_{k\leqslant 5}:
[TABLE]
To get a quantity of the order of we need to go to the fifth moment of , so in the Taylor development we need to have in . Then,
[TABLE]
We are now able to give the expression of the , as in the lemma 1.
Lemma 5**.**
It exists bounded functions of , such as when J is big enough,
[TABLE]
where for , .
Proof.
The proof of this lemma is exactly the same than in lemma 1. Just the calculations are a little bit more tedious:
[TABLE]
∎
The end of the proof follow exactly the same pattern.
∎
So the Wright-Fisher dynamics causes harder calculations than the Moran model but the spirit of the proof is the same. So All the methods studies in this paper still hold for the Wright-Fisher model.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. Danino and N.M. Shnerb. Fixation and absorption in a fluctuating environment. Journal of theoretical biology , 441:84–92, 2018.
- 2[2] M. Danino and N.M. Shnerb. Theory of time-averaged neutral dynamics with environmental stochasticity. Physical Review E , 97(4):042406, 2018.
- 3[3] M. Danino, N.M. Shnerb, S. Azaele, W.E. Kunin, and D.A. Kessler. The effect of environmental stochasticity on species richness in neutral communities. Journal of Theoretical biology , 409:155–164, 2016.
- 4[4] Dawson.DA. Stochastic Population Systems . Summer school in probability at PIMS-UBC, 8 June-3 July, 2009.
- 5[5] A. Depperschmidt, A. Greven, , and P. Pfaelhuber. Tree- valued fleming-viot dynamics with mutation and selection. Ann. Appl.Probab. , (22):2560–2615, february 2012.
- 6[6] S. N. Ethier. A class of degenerate diffusion processes occurring in population genetics. Comm. Pure Appl. Math. , 29(5):483–493, 1976.
- 7[7] S. N. Ethier and Thomas Nagylaki. Diffusion approximations of the two-locus Wright-Fisher model. J. Math. Biol. , 27(1):17–28, 1989.
- 8[8] SN. Ethier and MF. Norman. Error estimate for the diffusion approximation of the wright-fisher model. Genetics , 74(11):5096–5098, November 1977.
