The seed bank coalescent with simultaneous switching
Jochen Blath, Adri\'an Gonz\'alez Casanova, Noemi Kurt, Maite, Wilke-Berenguer

TL;DR
This paper introduces a novel seed bank model with simultaneous switching, leading to a new jump-diffusion limit and a unique coalescent process with multiple activation and deactivation events, extending classical population genetics models.
Contribution
It presents a new Wright-Fisher type model incorporating simultaneous switching and establishes a novel coalescent structure with multiple activation/deactivation events.
Findings
Derivation of a new jump-diffusion limit for the scaled frequency processes.
Development of a new coalescent process with multiple activation/deactivation events.
Conditions for coming down from infinity for these coalescents.
Abstract
We introduce a new Wright-Fisher type model for seed banks incorporating "simultaneous switching", which is motivated by recent work on microbial dormancy. We show that the simultaneous switching mechanism leads to a new jump-diffusion limit for the scaled frequency processes, extending the classical Wright-Fisher and seed bank diffusion limits. We further establish a new dual coalescent structure with multiple activation and deactivation events of lineages. While this seems reminiscent of multiple merger events in general exchangeable coalescents, it actually leads to an entirely new class of coalescent processes with unique qualitative and quantitative behaviour. To illustrate this, we provide a novel kind of condition for coming down from infinity for these coalescents using recent results of Griffiths.
| W | ||||
|---|---|---|---|---|
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
The seed bank coalescent with simultaneous switching
Jochen Blath111TU Berlin, [email protected], Adrián González Casanova222UNAM, [email protected], Noemi Kurt333TU Berlin, [email protected], Maite Wilke-Berenguer444Ruhr-Universität Bochum, [email protected]
Abstract
We introduce a new Wright-Fisher type model for seed banks incorporating “simultaneous switching”, which is motivated by recent work on microbial dormancy ([LJ11], [SL17]). We show that the simultaneous switching mechanism leads to a new jump-diffusion limit for the scaled frequency processes, extending the classical Wright-Fisher and seed bank diffusion limits. We further establish a new dual coalescent structure with multiple activation and deactivation events of lineages. While this seems reminiscent of multiple merger events in general exchangeable coalescents, it actually leads to an entirely new class of coalescent processes with unique qualitative and quantitative behaviour. To illustrate this, we provide a novel kind of condition for coming down from infinity for these coalescents using recent results of Griffiths [Gri14].
Introduction
The evolutionary consequences of dormancy resp. the presence of a seed bank in a population are currently an active topic both in the biologically as well as the mathematically oriented population genetics communities (e.g. [KKL01], [VGO04], [TLL*+*11], [BGKS13], [BGCKWB16], [dHP17], [MKAv17], [SL17]). Indeed, seed banks are believed to strongly affect the interplay of classical evolutionary forces such as genetic drift, selection and migration; and mathematical (toy-) models and inference tools for seed banks are currently being developed ([BBKW18]). We refer to [SL17] for a comprehensive overview and many further references. However, at present there seems to be a whole range of more or less natural ways to model a seed bank, and different models predict different qualitative behaviour (e.g. “weak” vs. “strong” seed banks, cf. [BGCKWB16], [KKL01], [ZT12]). Moreover, for several important scenarios, adequate mathematical models are still missing entirely.
In [LJ11], Lennon and Jones discuss various biological mechanisms (with a focus on microbial species) that lead to the initiation of dormancy and the resuscitation of dormant organisms. In particular, they distinguish between spontaneous switching and simultaneous switching, where the first mechanism describes the spontaneous initiation of dormancy in a single microbe, independent of the state of rest of the population, while the latter describes the simultaneous initiation of dormancy in a whole fraction of the population, say in response to an environmental cue (such as changes in temperature, or availability of resources). This mechanism is thus also known as responsive switching. A similar distinction can be made for the resuscitation from a dormant state (individually vs. simultaneously due to a trigger event).
The first mechnism, spontaneous switching has been incorporated in [BGCE*+*15], [BGCKWB16] into a population model related to Wright’s two island model ([Wri51], [KZH08]), where the islands correspond to the active and the dormant sub-population (with the distinguishing feature that reproduction is blocked in the dormant part). Here, spontaneous switching events correspond to what one would traditionally call migration between the two sub-populations. Yet, rather surprisingly, there are several qualitative and quantitative differences between the resulting seed bank diffusion limit and the classical two island diffusion, see e.g. [BBGW17]. Both models have an interesting ancestral dual process, namely the seed bank coalescent (see [BGCE*+*15] and also [LM15] for a similar structure arising in the context of peripatric speciation models) and the well-known structured coalescent (cf. e.g. [Her94, Tak88, Not90]). While the structured coalescent is well-established, the seed bank coalescent is new and still under investigation, and inference tools are currently being developed ([BBKW18].
However, simultaneous switching seems to have not been incorporated in Wright-Fisher type seed bank models so far, although it appers to be an important mechanism for seed bank dynamics ([LJ11]). It is the purpose of this paper to provide a first (toy-)model for this scenario and to analyse its scaling limit and dual ancestral process. We will see below that the resulting coalescent process, called the seed bank coalescent with simultaneous switching, is a new mathematical object with unique properties. As in the classical seed bank coalescent, lines can be either active or dormant, and the coalescence dynamics regarding the active lines are similar to a Kingman coalescent, while dormant lines are blocked from coalescence. However, lines can switch their status from active to dormant and vice versa simultaneously according to some driving Poisson measure, so that multiple lines can become active or dormant at a time. This feature extends the individual switching of the seed bank coalescent and leads to new qualitative behaviour. The switching of multiple lines at the same time is reminiscent of multiple merger events in Lambda-coalescents ([Sag99], [Pit99], [DK99]), yet leads to different tree structures, which is reflected in a new type of criterion for “coming down from infinity”, interestingly involving arguments from rather elegant recent work by Griffiths [Gri14].
The paper is organized as follows. In Section 1, we define two variants of seed bank models incorporating simultaneous switching and show that their corresponding allele frequency processes converge to a certain jump-diffusion limit (the seed bank diffusion with jumps), under a classical re-scaling similar to the Wright-Fisher model and the Wright Fisher diffusion. In Section 2, we first define the seed bank coalescent with simultaneous switching and show that it is the moment dual to the seed bank diffusion with jumps, thus describing the ancestry of samples from this model. We will then discuss absorption probabilities and long-term behavior of the diffusion with the help of this dual, before investigating conditions for the coming down from infinity.
1 The forward model and its scaling limit
In this section we present forward in time population models with seed bank, allowing for spontaneous as well as simultaneous switching. We proceed in two steps, first presenting a model with a fixed fraction of individuals involved in simultaneous switching event, later generalizing to random numbers. The second model is a generalization of the first one, and most of our results will be stated for this general case. However, for simplicity of presentation, we start with the easier situation of fixed switching size.
Consider a haploid population of fixed size of active individuals reproducing in discrete non-overlapping generations Assume that individuals carry a genetic type from some type-space (we will later pay special attention to the bi-allelic setup, say , for the forward model). Further, assume that the population also sustains a seed bank of constant size , which consists of the dormant individuals. For simplicity, we will sometimes refer to the ‘active’ individuals as ‘plants’ and to the dormant individuals as ‘seeds’ (even if they are typically microorganisms).
1.1 Model A: Simultaneous switching of fixed size
Fix , which will describe the (small) number of individuals affected by spontaneous switching events, and fix as parameters for the large simultaneous migration events. The model is then defined with the help of three types of events. For simplicity of notation, we assume first that and are natural numbers (otherwise Gauss-brackets could to be introduced into the definition in a suitable manner). We assume that in each generation, reproduction is governed by one of the following three events:
- S
Spontaneous switching (small-scale migration event of size ) between active and dormant: For the new active generation, active indiviuals are obtained by multinomial sampling from the previous active generation. The remaining active slots are filled by sampling (without replacement) types independently and uniformly from the seed bank types of the previous generation. For the new dormant generation, dormant individuals chosen uniformly at random simply stay in the seed bank, and the remaining slots are filled up by new ones via multinomial sampling from active individuals in the previous generation.
- F
Simultaneous switching (large-scale migration of size ) from dormant to active, “forest fire”: For the new active generation, active individuals are obtained by multinomial sampling from the previous active generation. The remaining active slots are filled by sampling (without replacement) types independently and uniformly from the seed bank types of the previous generation. The seed bank stays as it is.
- D
Simultaneous switching (large-scale migration of size ) from dormant to active, “drought”: The active individuals in the next generation are produced by multinomial sampling from the active individuals in the previous generation. For the new seed bank generation, dormant individuals from the previous generation are replaced by new dormant individuals obtained by multinomial sampling from the previous active generation. The remaining dormant individuals stay in the seed bank.
Thus, in each of the three cases, we have again active and dormant individuals in the next generation. This assumption of fixed population sizes is common in population genetics and in particular in Wright-Fisher type models. A situation in which fluctuations in population sizes are allowed will be investigated in future work.
Note that in mechanism one needs to choose such that We denote by a random variable taking values in which determines the type of event that happens in generation It is clear that in order to get a non-trivial scaling limit, large-scale migration events have to be rare, while small scale migration should be “typical”. Here, the sequence will be chosen to be iid and independent of the previous random mechanisms, such that
[TABLE]
As a result, in the limit as simultaneous switching events can be expected to occur according to a Poisson process of finite rate.
Remark 1.1**.**
Our above model is a generalisation of the seed bank model from [BGCKWB16] by additionally introducing the simultaneous switching events. However, note that also the spontaneous switching mechanism was defined slightly differently in the above paper, where the event was replaced with the following:
- S’
Symmetric spontaneous switching: For the new active generation, active individuals are obtained by multinomial sampling from the previous active generation. The remaining active slots are filled by sampling (without replacement) types independently and uniformly from the seed bank types of the previous generation. For the new dormant generation, precisely these types are replaced by new ones via multinomial sampling from active individuals in the previous generation.
The advantage of working with instead of is the fact that spontaneous migration from active to dormant and from dormant to active are now decoupled, and in particular, one may choose to have small migration events only in one direction (by setting either or equal to 0). Most of our results are true for both S and S’, and the proofs immediate, by choosing in all the statements. This is due to the fact that in the limit it doesn’t matter if precisely the types that have been selected by multinomial sampling are being replaced themselves or not, which is the only difference.
1.2 Model B: Simultaneous switching of random size
Model A can be extended to include large migration events of varying size. For fix probability measures on
[TABLE]
In case choose such that Let denote a sequence of iid random variables with distribution and denote a sequence of iid random variables with distribution .
Again, reproduction will be governed by three events as before, which are selected by a sequence of random variables in an iid fashion as before. The event is precisely the same as before, but the events and contain additional randomness.
Indeed, whenever the fraction of dormant individuals becoming active in the F-event is given by the random number instead of the constant , and whenever the fraction of dormant individuals replaced by active offspring is given by the random number (instead of ). Otherwise, the process is defined exactly the same as in model A. Note that model A is contained in model B as a special case with the specific choices and for some fixed , and as in (1). However, the additional randomness in and may also require a different distribution of the in order to get a reasonable scaling limit. Below, we will give a condition jointly for the measures and the probabilities of occurrence of large events, which allow infinite rates for large migration events in the limit and still leads to a well-defined limiting model.
1.3 The allele frequency processes
From the above models, their allele frequency processes can be derived in the usual way.
Definition 1.2** (Forward type configuration process).**
Fix population size , seed bank size , genetic type space and parameters as in the definition of the models A resp. B above. Given initial type configurations and , denote by
[TABLE]
the random genetic type configuration in of the active population in generation (obtained from the above mechanism), and denote by
[TABLE]
correspondingly the genetic type configuration of the dormant population in . We call the discrete-time Markov chain with values in the type configuration process of the Wright-Fisher model with geometric seed bank component.
We now specialise to the bi-allelic case and define the frequency processes of alleles in the active population and in the seed bank.
Definition 1.3** (Forward frequency process, biallelic case).**
With the above notation and condition, define the discrete-time Markov chain on by
[TABLE]
Denote by the initial distribution under which starts in -a.s., i.e.
[TABLE]
(with analogous notation for the expectation ).
Our next aim is to characterise the corresponding time-homogeneous transition probabilities. To this end, we introduce auxiliary random variables in a similar fashion as in [BGCKWB16]. For a fixed time , let
- •
be the number of active individuals that are offspring of a dormant -individual,
- •
be the number of active individuals that are offspring of an active -individual,
- •
be the number of dormant individuals that are offspring of an active -individual,
- •
and be the number of dormant individuals that are offspring of a dormant -individual.
With this notation, if -almost surely, we have the representation
[TABLE]
According to our construction, these random variables are all independent. Of course, the distributions depend on the type of event, chosen by , and on the choice of model A or B. In model A, the distributions are given in Table 1, where denotes the hypergeometric distribution with parameters and is the binomial distribution with parameters and . The transitions from to can be described analogously. In model B, conditional on the sequences , the random variables can be constructed in a similar fashion.
1.4 Limiting generators of the frequency processes
Here, we follow the usual scaling limit paradigm in population genetics, where it is assumed that the population size tends to , and simultanoeously time is measured on a macrsoscopic scale increasing also with . Since in our case we have populations (of size and each), we first assume that the active and the dormant population keep the same relative size, that is we set , for some suitable constant , as . The following arguments follow the standard machinery for the convergence of Markov processes, as elaborated e.g. in [EK86], and thus we focus on the crucial steps and computations.
We begin with the scaling limit in the case of fixed simultaneous switching size, i.e. model A. We can define the discrete generator of the process on time scale acting on suitable functions (e.g. by
[TABLE]
With some experience, it is not hard to guess the shape of the limiting process. We know from [BGCKWB16] that the frequent small events lead to the seed bank diffusion with migration rates , and it is easy to see that this is still the case for , however this time with migration rates . The much rarer -event leads to a jump of size in the active population, and a -event leads to a jump of size in the dormant population.
To make this rigorous, we assume that for every the random variables which determine the jump types are iid and such that as
[TABLE]
for some , and
[TABLE]
Observe that and for each denote by the canonical projection on , mapping to the closest point in with coordinates smaller than resp. .
Theorem 1.4** (Limiting generator in model A).**
Under our above assumptions, we obtain with the choice that
[TABLE]
for all , where
[TABLE]
Remark 1.5**.**
If mechanism is assumed instead of in model A, the result holds with replaced by
Proof.
By standard arguments, it is sufficient to prove the stated convergence for polynomials on , since polynomials are dense in . Using (4) and (5), we can split according to the different values of to obtain (for large enough)
[TABLE]
for all . In [BGCKWB16], Proposition 2.4, it was shown that
[TABLE]
uniformly for all . The case for works similarly and leads to the same result, except that in the coefficient of , the constant is replaced by . We skip the somewhat tedious calculations and refer to the Appendix of [BGCKWB16] instead. Since converges for to 1 uniformly in and we obtain the desired convergence of the first summand in (6).
Consider now the case works similarly. By construction, we have for , using (3) and Table 1,
[TABLE]
We claim that for all , on
[TABLE]
with Then the result follows, since as uniformly in and
To prove (7), observe that
[TABLE]
where consists of mixed terms of the form
[TABLE]
with and combinatorial prefactors depending only on and Note that
[TABLE]
We are thus done once we prove that the th centered moments of and are of order at most uniformly in and For the first two centered moments of we have and
[TABLE]
For we have
[TABLE]
since is hypergeometric with values in and thus trivially . By induction, This implies that
[TABLE]
for all (recall ). Similar considerations hold for which is binomial. ∎
In model B we make the following assumption. Let and denote sequences of nonnegative numbers such that and converge to 0 as Assume that there exist measures on with
[TABLE]
such that weakly,
[TABLE]
and analogously for Observe that in particular need not be finite measures.
We now assume that each sequence is iid such that as
[TABLE]
and
[TABLE]
Theorem 1.6** (Limiting generator in model B).**
Under our above assumptions, we obtain with the choice that
[TABLE]
for all , where
[TABLE]
Remark 1.7**.**
Note that in particular the functions are in the domain of . If this follows from (8), if we have and analogously for
Proof.
The proof follows from Theorem 1.4 if we additionally show that uniformly in
[TABLE]
and
[TABLE]
hold and are finite, which by construction is the case if and only if the integrals on the rhs are finite for every due to the weak convergence of measures. By density of the monomials it is sufficient to check this for functions of the form and because we are working on by monotonicity, it is sufficient to look at and (all other mixed monomials are bounded by these two). But we have
[TABLE]
according to the assumption and and likewise for the other cases. This completes the proof. ∎
Remark 1.8**.**
The condition implies that for Borel sets defines a finite measure on which satisfies On the other hand, if is a finite measure on with then defines a finite measure on We may extend it to by setting which is no restriction, since choosing in the large migration mechanism has no effect. We will thus often work with instead of and similarly with instead of The condition on resp. on is also necessary to define a dual process with finite rates, see Definition 1.11 later on. We further elaborate on this point in remark 2.5.
Given a finite non-zero measure on with , it is always possible to construct a sequence of probability measures on and a sequence such that as and weakly.
1.5 The seed bank diffusion with jumps and its dual process
In the previous section, we showed that the generators of the rescaled frequency processes on time scale in model A resp. model B converge to a non-trivial Markov generator. We have not yet given an explicit jump-diffusion representation for the corresponding limiting processes , which we now provide. We will also use to state the moment duality of our system below.
Definition 1.9** (Seed bank diffusion with fixed-size jumps).**
For we call the unique strong solution , starting in , of the initial value problem
[TABLE]
with , where is a standard Brownian motion and and are independent standard Poisson processes driving the simultaneous switching events, seed bank diffusion with fixed-size jumps .
A similar representation can be provided for model B.
Definition 1.10** (Seed bank diffusion with variable-size jumps).**
For as in the previous section, we call the unique strong solution , starting in , of the initial value problem
[TABLE]
with , where is a standard Brownian motion and and are independent standard Poisson point processes on with intensity measure resp. driving the simultaneous switching events of random size, seed bank diffusion with variable-size jumps with jump laws . Here, denotes the Lebesgue measure on
Note that the above initial value problems are two-dimensional jump-diffusions with non-Lipschitz coefficients. Fortunately, existence and uniqueness results for such systems have recently drawn considerable interest, and we may refer e.g. [Kur07, Kur14] or the perhaps more readily accessible [BLP15] for an existence and strong uniqueness result.
With the limit thus being well-defined, under the condition that converge weakly to Theorem 1.4 resp 1.6 imply the weak convergence
[TABLE]
on the Skorohod space of càdlàg paths, where is the unique strong (and strong Markov) solution to the initial value problems (12) resp. (13) (see e.g. Theorem 19.28 of [Kal02] or Corollary 4.8.9 of [EK86]).
Before we state our envisaged moment duality, we define a suitable dual process. As usual, it turns out to be the block-counting process of the coalescent process describing the genealogy, to be defined formally in Section 2.
Definition 1.11**.**
With the notation of B, we define the block-counting process of the seed bank coalescent with large migration events to be the continuous time Markov chain taking values in with transitions
[TABLE]
For model A we consider the special case and for some
Denote by the distribution for which holds -a.s., and denote the corresponding expected value by . It is easy to see that, eventually, (as ), -a.s. for all . We now show that is the moment dual of .
Theorem 1.12**.**
For every , every and every
[TABLE]
Proof.
Let . Applying for fixed the generator of to acting as a function of and gives
[TABLE]
where we have used the binomial theorem and the observation that the summands for and disappear. Note that the rhs is the precisely the generator of applied to acting as a function of and for fixed Hence the duality follows from standard arguments, see e.g. [JK14], Proposition 1.2. ∎
Remark 1.13**.**
[Alternative description of the block-counting process] From the usual perspective of coalescents, we can describe the dynamics of the dual block–counting process 1.11 in the following intuitive way: Every block, independently of the others, migrates at rate from active to dormant and at rate from dormant to active. Every given pair of active blocks coalesces at rate 1. Moreover, at fixed, constant rate 1 a large migration event from active to dormant happens, where every active block participates with probability (chosen according to ) independently of the others. Likewise, at constant rate 1, a large migration event from dormant to active happens, where every dormant block participates at rate independently of the others. Note that these large migration events may result in a migration of 0 blocks (with probability ), or in a migration of 1 block (with probability ). This description makes it clear that different blocks move independently of the others, an observation which will be useful later, when we construct couplings of the block counting process with other processes.
1.6 Long-term behaviour and fixation probabilities
The fixation probabilities can be calculated as for the usual seed bank coalescent. For simplicity we formulate the results only for model A, they are easily generalised to model B by integrating out the and according to the respective measures.
Obviously, and are absorbing states for the system (13). They are also the only absorbing states, since absence of drift requires and for the fluctuations to disappear, it is necessary to have
Proposition 1.14**.**
In model A, all mixed moments of solving (13) converge to the same finite limit depending on . More precisely, for each fixed , we have
[TABLE]
Proof.
Let be as in Definition 1.11, started in . Let be the first time at which there is only one particle left in the system , that is,
[TABLE]
Note that for any finite initial configuration , the stopping time has finite expectation. Now, by Theorem 1.12,
[TABLE]
where the last equality holds by convergence to the invariant distribution of a single particle, jumping between the two states ‘active’ and ‘dormant’ at rate resp. , which is given by and independent of the choice of . ∎
Corollary 1.15** (Fixation in law).**
In model A, given , converges in distribution as to a two-dimensional random variable whose distribution is given by
[TABLE]
Proof.
It is easy to see that the only two-dimensional distribution on , for which all moments are constant equal to , is given by
[TABLE]
Indeed, uniqueness follows from the moment problem, which is uniquely solvable on Convergence in law follows from convergence of all moments due to Theorem 3.3.1 in [EK86] and the Stone-Weierstraß Theorem. ∎
2 The seed bank coalescent with simultaneous switching
We now analyse the backward in time process in more detail. First, we give a formal construction of the the seed bank coalescent with simultaneous switching in terms of marked partitions. For , let be the set of partitions of . For let be the number of blocks of the partition We define the space of marked partitions to be
[TABLE]
This enables us to attach to each partition block a flag which can be either ‘active’ or ‘dormant’ ( or ), so that we can trace whether an ancestral line is currently in the active or dormant part of the population.
Consider two marked partitions , we write if can be constructed by merging exactly 2 blocks of carrying the -flag, and the resulting block in obtained from the merging both again carries an -flag.
We use the notation if can be constructed by changing the flag of precisely blocks of from to and if can be constructed by changing the flag of precisely blocks of from to .
Definition 2.1** (The seed bank -coalescent with simultaneous switching).**
Fix and finite measures on such that For we define the seed bank -coalescent with simultaneous switching to be the continuous time Markov chain with values in , characterised by the following transitions:
[TABLE]
Remark 2.2**.**
Clearly, the block counting process of this coalescent is the same as in Definition 1.11, cf. Figure 1. Observe the relations and , see also remark 1.8. In this section we will work with instead of which is more convenient in the proofs of the main results.
Definition 2.3** (The seed bank coalescent with simultaneous switching).**
We define the seed bank coalescent with large migration events, with intensities , relative seed bank size and migration measures as the unique Markov process obtained as the projective limit as goes to infinity of the laws of the seed bank -coalescents with simultaneous switching.
Proving the existence of via projective limits is standard (the only slightly tedious piece of work is to show that the Markov property is retained under taking projections), which we therefore omit. Note that we are thus allowed to start the block counting process in any state
By entirely similar arguments as the ones presented in Section 3.1 of [BGCKWB16] one sees easily that the seed bank coalescent with simultaneous switching is indeed the ancestral process of the seed bank model with simultaneous switching.
For convenience, we give a different construction of the seed bank coalescent with simultaneous switching which will facilitate rigorous proofs in the following section as it allows e.g. for simple but precise constructions of couplings. For this construction, we introduce a family (or families) of Poisson point processes (PPP)
Definition 2.4**.**
Let for and finite measures on with
- •
, be a family of PPP on with intensity ,
- •
, be a family of PPP on with intensity ,
- •
, be a family of PPP on with intensity ,
- •
be a PPP on with intensity and
- •
be a PPP on with intensity .
Here, denotes the Lebesgue-measure on .
Remark 2.5**.**
Note that we require that the order of the singularity at zero of the intensity measure is at most of order . This may at first glance look surprising, since this is different from the condition on the singularity for the intensity measure driving a classical -coalescent, which is of order [Pit99]. However, a similar condition has been identified in the context of spatial -Fleming-Viot processes (see [BEV10]), where the authors use a Poisson process with an intensity with a singularity of order at zero to model large-scale extinction and recolonisation events. Note that one can interpret these events as simultaneous migration of ancestral lines. The point is that we observe singularities of order every time that we model “actions” of a single ancestral line (such as migration, mutation, selective events/branching), and of order in the case of “actions” that require more than one ancestral line (such as coalescence). This is a straightforward consequence of interpreting the restriction on the order of the singularity as a minimal condition for the total rate of the dual process to be finite.
From these objects, the following characterisation of the seed bank coalescent with simultaneous switching is evident:
Proposition 2.6**.**
[Poisson-Point-representation of the coalescent] Let the space of marked partitions defined in (18). The seed bank coalescent with simultaneous switching is a function of the PPPs given above in the following way: Set . If is a (random) time-point in
- •
: If and are the smallest integers in their respective blocks and both blocks have an -flag in , then is the partition where these two blocks are merged and all other blocks remain the same. Otherwise .
- •
: If is the smallest integer in its block and this has an -flag in , then is the partition where this block has a -flag and all other blocks remain the same.
- •
: If is the smallest integer in its block and this has a -flag in , then is the partition where this block has an -flag and all other blocks remain the same.
If is a point in
- •
, let be a sequence of independent uniform random variables on chosen independently of everything else and independent for each time point. Then is the partition where all the blocks that had an -flag in and whose smallest integer fulfilled have a -flag while all others remain unchanged.
- •
, let be a sequence of independent uniform random variables on chosen independently of everything else and independent for each time point. Then is the partition where all the blocks that had a -flag in and whose smallest integer fulfilled have an -flag while all others remain unchanged.
2.1 Coming down from infinity
The notion of coming down from infinity for exchangeable coalescents was introduced by Pitman [Pit99] and Schweinsberg [Sch00]. They say that the block-counting process of a coalescent “comes down from infinity”, if -a.s. and
[TABLE]
They further say that the process “stays infinite”, if for all Note that this leaves intermediate regimes: For example, the “star-shaped coalescent” with rates driven by has infinitely-many lines until an exp(1)-distributed random time, by which it jumps to a single line only. It thus does come down from infinity in a certain sense, but only after a strictly positive (random) time. Hence one might want to distinguish between “coming down from infinity instantaneously” (Pitman’s original definition), “coming down from infinity after a finite time”, and “staying infinite”. We mention this because our results regarding the seed bank coalescent with simultaneous switching exhibits all three regimes, as we will see below.
In [BGCKWB16] it was proved that the seed bank coalescent does not come down from infinity (neither instantaneously nor after a finite time), due to the fact that even within a very short time, infinitely many lines escape to the seed bank, from where it takes long to come back and be able to coalesce. It turns out that in the case of simultaneous switching, there is a qualitatively different behaviour.
Theorem 2.7** ((Not) coming down from infinity).**
Assume model B. Let be a random variable with distribution .
- (a)
If , then the process started in will stay infinite.
- (b)
If the process is started in then the process comes down from infinity instantaneously if and . If or it stays infinite.
- (c)
If and then the process started from comes down from infinity after a finite time, but not instantaneously.
Remark 2.8**.**
A finite measure on with is for example given by the measure which has density with respect to the Lebesgue measure, which has total mass
In order to prepare the proof of part (a), we formulate and prove the result for two special cases.
Lemma 2.9**.**
- (i)
If is a finite measure on and , then the process started in will stay infinite.
- (ii)
If there exists such that , then the process started in will stay infinite.
Proof.
Proof of (i). Let be defined as in Definition 2.4. In order to turn this into a stochastic process, we consider the first component as time axis and represent the PPP by its (a.s.) finite collection of atoms ordered in time-increasing fashion, which is possible due to the finiteness assumption on the measure We introduce its canonical filtration by letting
[TABLE]
Fix Denote by
[TABLE]
the index of the last atom of the PPP before time (which is of course random). Further, assume that the probability space on which the PPPs are defined is large enough to accommodate a doubly infinite sequence of independent uniform random variables independent of everything else. They are used to determine whether or not a block participates in the large migration event at time whose size is determined by cf. Proposition 2.6.
We now assume . Assign labels arbitrarily to the infinitely many dormant individuals. Let if the th individual never left the seed bank until time , and otherwise. Then,
[TABLE]
Note that conditionally on , the are independent Bernoulli random variables. By Borel Cantelli, we are done once we can show that
[TABLE]
where the probability is random, but independent of the index , hence there is a uniform (in ) random lower bound away from 0. We have, using measurability of the wrt. and independence of the from ,
[TABLE]
where the expectation in the second line only acts on the . We have since and due to the assumption that is finite. Thus the r.h.s above is strictly positve by independence, and the event
[TABLE]
has probability 1.
Proof of (ii). We first assume is a discrete measure, and proceed similarly to the proof of part (i). Let be of the form with such that Note that the latter condition ensures that is a finite measure, while may be infinite. Fix As in the proof of let if the th seed never left the seed bank until time , and otherwise. Denote by the number of points of size up to time in the PPP and let . Then again is a sequence of identically distributed Bernoulli random variables conditionally independent given , and
[TABLE]
Thus, as in part (i), by Borel Cantelli we are done if we prove that We have
[TABLE]
Now observe that , and hence for , since By Taylor expansion, this implies and in particular Thus (21) implies that and the process stays infinite.
For more general measure with we use an easy coupling with a discretised measure. Set and choose arbitrary points such that and for all (for example, one may choose ).
Let denote the process constructed using the same point processes from Definition 2.4 as but ignoring all events in and and where is replaced by the measure on defined via where
[TABLE]
This measure can be interpreted as follows: Whenever a value drawn according to the measure falls into the interval then the measure yields value Thus a seed bank process with measure makes jumps of larger size than for from the seed bank to the plant part. Let denote the corresponding block counting process. By construction, the processes can be coupled such that for all This implies that if also We have
[TABLE]
Thus stays infinite, as we have seen in the first case, and we are done. ∎
Proof of Theorem 2.7.
Part : Fix with Let and . Then is a finite measure by our assumptions, and according to Lemma 2.9 (i) the process with jump measure stays infinite. Due to part (ii) of the Lemma, also the process with stays infinite. We have and the support of the two measures don’t intersect. By construction, the Poisson point process is the superposition of two PPPs and with intensity measures resp. . Fix Since is finite, we can order its time points increasingly. At the process is infinite almost surely because only events of have happened before, and it stays infinite at . Thus by the strong Markov property is almost surely infinite for every and hence
Part : If in (b) there is then it can be seen by following the proof of Theorem 4.1 in [BGCKWB16], the process does not come down from infinity, since a sufficiently large number of blocks move immediately from active to dormant. Thus we are in the situation of (a), at least if The case will be discussed at the end of this proof. Assume now . We will consider auxiliary processes with helpful properties: Let be the process with the same mechanism of coalescence and migration from active to dormant, given by , but without any migration from dormant to active. For a formal definition of the process, use the construction of the seed bank coalescent via Poisson point processes provided in Proposition 2.6, using the same , and as for the original process, but ignoring all other events. has the essential mechanism we want to analyze in this part. But, as we will discuss in remark 2.10 below, it is not yet the suitable object for calculations. We will instead work with the process which has the same transitions as , but at any large migration event determined by the points of only one blocks moves to the seed bank. For a formal definition of this process, we use once more the Poisson construction: Let be the process using the exact same PPPs , and a slightly different mechanism for the events in (whilst still ignoring all other events/PPPs): If is a point in and a sequence of iid uniform random variables on as in Proposition 2.6, then is the partition where the block with an -flag containing the smallest integer among all the blocks with an -flag in fullfilling has a -flag while all others remain unchanged. In other words, every time commands a (possibly large) migration from active to dormant, will only let the line with the smallest integer migrate. Therefore its block counting process will only have jumps of size 1 at a frequency determined by .
In order to proceed, define the stopping times
[TABLE]
As we saw above for , we can easily couple to the Kingman-coalescent in a way that, if we denote by the blockcounting process of the Kingman-coalescent, we have
[TABLE]
This immediately implies
[TABLE]
for all , since the right-hand-side is the time to the most recent ancestor in the Kingman-coalescent. Let and denote the total number of lines that migrated from active to dormant in , resp. , at any point in time and let be the event that there was a migration (not a coalescence) at time , . A moment of thought reveals that the coupling between the two processes implies the estimates
[TABLE]
Hence, the number of lines that found its way into the seed bank depends on how many of the events are realized. Observe that, since a migration and a colaescence result in jumps of the same size, the events , are actually independent and we can calculate their probability. Define as the rate at which any migration event occurs given we have active (in ). Then
[TABLE]
where for a random variable uniformly distributed on independent of . The last equality is inspired by a represantation in Theorem 2 of [Gri14]. Therefore
[TABLE]
for any which in turn implies
[TABLE]
Borel-Cantelli gives that almost surely only finitely many of the events for happen, and thus both sums in (22) are finite, if and only if . Observe that is finite if and only if Since Kingman’s coalescent comes down from infinity instantaneously, we see that comes down from infinity immediately if Otherwise, the process stays infinite, at least provided since in that case by (a) infinitely many blocks migrating to the seed bank in an arbitrarily short time implies that the process stays infinite. If there is a positive probability for all dormant blocks to become active at the same time. In that case, the process starts afresh from By the strong Markov property, and the above proof, we will again have infinitely many blocks moving to the seed bank, and thus the process will stay infinite also in this case.
Part : As we just argued in the last lines of the proof of part (b), if there is a positive probability for all dormant blocks moving to the active part at once. By Borel Cantelli, this event eventually happens with probability one, and thus by (b) the process comes down from infinity. However, the coming down only happens after the seed bank has been emptied, and not instantaneously. ∎
Remark 2.10**.**
One might think it easier (or more precise) to estimate the number of lines that migrated from active to dormant in with the help of directly. With the same idea we can define stopping times . Since we have large jumps, these may coincide for several so one is tempted to define the random times as the actual jump times. There is a small difficulty in that we cannot bound the value of at any such time, but much worse is that we actually cannot calculate the probabilities of the events “there is a migration at ” or “there is a migration at ” as before. Indeed, both and contain information about the present and the future of the process and therefore the latter are not even stopping times.
Acknowledgements.
The authors acknowledge support by the DFG Priority Programme SPP 1590 “Probabilistic Structures in Evolution”, grants no. BL 1105/5-1 and KU 2886/1-1. ACG was supported by grant no. UNAM PAPIIT IA100419. Part of this work was completed while AGC was a BMS Substitute Professor at TU Berlin supported by the Berlin Mathematical School.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[BBGW 17] J. Blath, E. Buzzoni, A. González Casanova, and M. Wilke-Berenguer. Structural properties of the seed bank and the two-island diffusion. Ar Xiv e-prints , October 2017.
- 2[BBKW 18] J. Blath, E. Buzzoni, J. Koskela, and M. Wilke-Berenguer. Inference for seed bank coalescents. Preprint , 2018.
- 3[BEV 10] N. H. Barton, A. M. Etheridge, and A. Véber. A new model for evolution in a spatial continuum. Electron. J. Probab. , 15:no. 7, 162–216, 2010.
- 4[BGCE + 15] J. Blath, A. González Casanova, B. Eldon, N. Kurt, and M. Wilke-Berenguer. Genetic Variability under the Seedbank Coalescent. Genetics , 200(3):921–934, 2015.
- 5[BGCKWB 16] J. Blath, A. González Casanova, N. Kurt, and M. Wilke-Berenguer. A new coalescent for seed-bank models. Ann. Appl. Probab. , 26(2):857–891, 2016.
- 6[BGKS 13] J. Blath, A. González Casanova, N. Kurt, and D. Spanò. The ancestral process of long-range seed bank models. J. Appl. Probab. , 50(3):741–759, 2013.
- 7[BLP 15] Mátyás Barczy, Zenghu Li, and Gyula Pap. Yamada-Watanabe results for stochastic differential equations with jumps. Int. J. Stoch. Anal. , pages Art. ID 460472, 23, 2015.
- 8[d HP 17] F. den Hollander and G. Pederzani. Multi-colony Wright-Fisher with seed-bank. Indagationes Mathematicae , 28(3):637 – 669, 2017.
