Fitting Markovian binary trees using global and individual demographic data
Sophie Hautphenne, Melanie Massaro, Katharine Turner

TL;DR
This paper introduces methods for fitting Markovian binary trees to demographic data using TMAP models, enabling detailed analysis of population dynamics with confidence intervals, demonstrated on human and bird data.
Contribution
It develops parameter estimation techniques for Markovian binary trees with TMAP, including optimal phase selection and confidence interval computation, applied to real demographic data.
Findings
Effective parameter estimation methods for TMAP-based Markovian binary trees.
Optimal phase number selection improves model fit.
Confidence intervals provide uncertainty quantification for model outputs.
Abstract
We consider a class of branching processes called Markovian binary trees, in which the individuals lifetime and reproduction epochs are modeled using a transient Markovian arrival process (TMAP). We estimate the parameters of the TMAP based on population data containing information on age-specific fertility and mortality rates. Depending on the degree of detail of the available data, a weighted non-linear regression method or a maximum likelihood method is applied. We discuss the optimal choice of the number of phases in the TMAP, and we provide confidence intervals for the model outputs. The results are illustrated using real data on human and bird populations.
| 0 | 1 | 2 | 3 | |
|---|---|---|---|---|
| 2 | 5 | 9 | 13 | 13 |
| 3 | 5 | 7 | 14 | 14 |
| 4 | 5 | 15 | 14 | 14 |
| Ex. 1 | 3 | 6 | 3 | 2 | ||||||||||
| Ex. 2 | 4 | 2 | ||||||||||||
| Ex. 3 | 4 | 3 | 2 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Fitting Markovian binary trees using global and individual demographic data
Sophie Hautphenne111The University of Melbourne and Ecole polythechnique fédérale de Lausanne, [email protected], Melanie Massaro222Charles Sturt University, [email protected], and Katharine Turner 333Ecole polythechnique fédérale de Lausanne, [email protected]
Abstract
We consider a class of branching processes called Markovian binary trees, in which the individuals lifetime and reproduction epochs are modeled using a transient Markovian arrival process (TMAP). We estimate the parameters of the TMAP based on population data containing information on age-specific fertility and mortality rates. Depending on the degree of detail of the available data, a weighted non-linear regression method or a maximum likelihood method is applied. We discuss the optimal choice of the number of phases in the TMAP, and we provide confidence intervals for the model outputs. The results are illustrated using real data on human and bird populations.
Keywords: Markovian binary tree; transient Markovian arrival process; Markov modulated Poisson process; parameter estimation; non-linear regression; maximum likelihood; petroica traversi
1 Introduction
Simple birth-and-death processes do not offer enough flexibility to model real biological populations in which the age of individuals impacts on their fertility and mortality rates. The memoryless property inherent to these models implies that individuals do not age. However, they are tractable and amenable to efficient parameter estimation. In this paper, we model the lifetime and reproduction epochs of individuals in a population using a transient Markovian arrival process (TMAP). Roughly speaking, a TMAP is a point process in which the event rate depends on the state of an underlying transient Markov chain with transient states (also called phases), and one absorbing phase. Each event in the TMAP corresponds to the birth of a child, and the absorption in phase 0 corresponds to the individual’s death. The resulting continuous-time branching process, called Markovian binary tree (MBT), is the matrix generalisation of the birth-and-death process. It allows for much more flexibility than the latter, while keeping an excellent computational tractability.
Performance measures of MBTs include the extinction probability of the population, the distributions of the time until extinction, the population size at any given time, and the total progeny size until any given time. The MBT model has already been used to efficiently compare demographic properties of female families in different countries, see [4]. The motivation behind the present paper is to develop the statistical tools necessary to fit an MBT to populations of species for which detailed information about age-specific survival and reproductive rates of individuals is available. The model can then be used to calculate age-dependent demographic properties. By knowing the exact age of individuals of a population, its future survival probability can be assessed, which aids conservation management of endangered species.
We fit a TMAP to different types of datasets which may be available from demographic databases or from studying an animal population in the field. These datasets can have different degrees of detail. We distinguish between:
- •
Global population data, consisting of the average age-specific fertility and mortality rates over an entire population. This sort of data is usually provided in databases on human fertility and mortality.
- •
Individual demographic data, consisting of data on age-specific fertility and mortality counts for each individual in a population. This sort of data often exists for closely monitored animal species. Here we will use data from a highly threatened bird species, the Chatham Island black robin (Petroica traversi) [1, 10].
The parameter estimation method depends on the type of data which are available: in the global population case, we use a weighted non-linear regression method to fit the parameters, and in the individual demographic data case, we use a maximum likelihood method. We consider different validation methods to determine the optimal number of phases in the TMAP. Once a value of is determined and an estimator is found for the model parameters, we derive confidence intervals for the model outputs.
We apply our results to two real-world examples. The weighted non-linear regression method is applied in human demography leading to an improvement of the Markovian model considered in [4]. The maximum likelihood method is applied to the black robin population providing important insights about the species demography to be further discussed from a conservation biology point of view in a subsequent paper.
The paper is organised as follows: in Section 2, we introduce TMAPs and describe the special case that we shall focus on. In Section 3 we perform model parameter estimation based on the average age-specific fertility and mortality rates, and in Section 4, we estimate the parameters based on individual age-specific fertility and mortality counts. In Section 5 we apply each method on a real-world example.
2 Transient Markovian arrival processes
Transient Markovian arrival processes (TMAPs) are two-dimensional Markovian processes on the state space , where is finite, combining the level process , which counts the number of arrivals in , with the phase process, {, which is a continuous-time Markov chain. The states are absorbing for all ; the other states are transient.
A TMAP is characterized by two rate matrices and and a non-negative rate vector . Feasible transitions are from to , for and at the rate , or from to for at the rate , or from to at rate . The first transitions (at rate ) are hidden: the phase of the individual changes but the level is not incremented. The second transitions (at rate ) are observable: a birth (arrival) is recorded, and the state of the individual may or may not change. The third transitions (at rate ) indicate the termination of the individual’s life.
The matrix and the vector are nonnegative, has nonnegative off-diagonal elements and strictly negative elements on the diagonal such that , where is an vector of ones. One also defines the initial probability vector , and we assume that , so that a.s. More details on TMAPs can be found in [8].
There is a total of entries in the matrices if no assumption is made on their structure. A special case of TMAP, called the acyclic transient Markov modulated Poisson process (ATMMPP), assumes
- •
individuals start their lifetime in phase 1 with probability one,
- •
they can only move from phase to phase or to phase 0, with respective rates for and for ,
- •
while in phase , they reproduce at rate and do not make any simultaneous phase transition at reproduction time.
With these assumptions we have , diag, and the only non-zero entries of are and
[TABLE]
There is a total of parameters in an ATMMPP.
The lifetime distribution of a TMAP is phase-type PH see [8]. Due to the structure of and in the ATMMPP case, this corresponds to a Coxian distribution. Such distributions are important as any acyclic phase-type distribution has an equivalent Coxian representation. Therefore, in terms of the lifetime distribution, the ATMMPP does not impose much restriction compared to the general TMAP.
3 Global population data
3.1 Available data and model equivalent
For this section, we assume the available data are estimates of the expected age-specific fertility rates, , and estimates of the expected age-specific mortality rates, , where denotes the age, that is, the period of time during the lifetime, and is the maximal age for which data are available. The method developed in this section can be generalised to -year age classes, details are provided in Section 7.2 of the Supplementary Material.
The rates and are interpreted respectively as the expected number of offspring per year from a parent at age and the probability that an individual who reached age dies within the year. We denote by and the equivalent quantities computed from the TMAP model. These functions have the following analytic expression, the proof of which is provided in Section 7.1 of the Supplementary Material.
Proposition 3.1**.**
The age-specific mortality and fertility rates in a TMAP with phase transition rate matrix are respectively given by
[TABLE]
3.2 Parameter estimation
In [9] only death rates were used to fit phase-type lifetime distributions. We extend this approach, estimating the model parameters by minimizing the sum of weighted squared errors
[TABLE]
where the weights are the observed probabilities of survival until age ,
[TABLE]
As age increases there may be less available data leading to higher variance. These weights balance the potential resulting heteroscedasticity. If the estimated age-specific rates and are computed as averages of uncorrelated raw observations, another simple choice of weights would be .
Since the functions and are non-linear in both the input variable and in the parameters of the TMAP, we are dealing with a weighted non-linear regression. If there is missing information in the data and no estimate exists for or for some age , then we set the corresponding term or to zero in the sum.
Remark 3.1**.**
The function corresponds to the hazard rate at age in survival analysis. Several hazard models have been considered to fit mortality data, such as the Gompertz-Makeham, the Siler, and the Heligman-Pollard models [3]. Similarly, several age-specific fertility models have been studied, including the Hadwiger, the Beta, and the Gamma models [13]. The functions and are not claimed to provide better fits than these models, however, as opposed to the known mortality and fertility models which are generally studied separately, and are performance measures coming from the same Markovian model, and are thus optimised together. The estimated Markovian model then corresponds to the best model fitting both the mortality and fertility data, and can be used to make a complete demographic study of the population, as shown in [4].**
3.3 Goodness of fit and optimal value of
In the global population data case we estimate the model parameters by minimizing the objective function (3.1). Therefore, a natural choice for the mean square error function is
[TABLE]
Here , , and are respectively the age-specific mortality function, the age-specific fertility function, and the age-specific survival function corresponding to the true model, and and are the equivalent functions corresponding to the estimated model. If we know the true model then the MSE can be estimated through resampling. Alternatively this could be estimated when we are given a collection of datasets each containing global population information.
The value of the MSE indicates of the goodness of fit of the model. When the true model is unknown we estimate the optimal number of phases by minimizing the MSE. When the true model is known, comparing the MSE for different values of informs us on the sensitivity of the output with respect to the number of phases, as illustrated in Section 7.4 in the Supplementary Material.
4 Individual demographic data
4.1 Available data
In this section, we assume the data are individual age-specific fertility and mortality counts in successive age-classes of length 444Successive age-classes are assumed of equal length, but though computationally convenient, this is not essential.. They consist of vectors (one for each individual) of the type
[TABLE]
of variable length, whose entries are interpreted as follows:
- •
if the individual had offspring while in the age-class ,
- •
if the individual died in the previous age-class , possibly after producing some offspring, and
- •
if the individual was alive at the beginning of the age-class but there is no (or incomplete) information on her progeny in that age-class.
4.2 Parameter estimation
Based on a sample of i.i.d. individual life vectors , we maximize the log-likelihood function
[TABLE]
where , and is the probability of observing the individual life vector , under the model parameter .
Let be the maximum number of offspring per age-class among the individuals in the sample. The probabilities can be written as matrix products involving the matrices and vectors , , , and defined as
[TABLE]
for and . As an illustrative example, consider the four life vectors
[TABLE]
By conditioning on the phases of the TMAP at the boundaries of the successive -year intervals, the probability of observing these vectors is
[TABLE]
Note that if is a vector of size with all entries equal to , then is the probability that the individual survives at least the first age-classes.
The quantities defined in (4.3)–(4.6) can be computed explicitly, as shown in the next proposition, whose proof is provided in Section 7.3 of the Supplementary Material.
Proposition 4.1**.**
For , the matrix and the vector are given by
[TABLE]
where is the th unit row vector of size , and
[TABLE]
The matrix and vector are given by
[TABLE]
Our MLE method generalises the results of Davison and Ramesh [2] who considered the parameter estimation of Markov modulated Poisson processes in the binary data case . To our knowledge, other parameter estimation methods for such processes are based on the observation of the successive inter-event times rather than on the number of events within successive time intervals, see for instance [14] and [15]. As confirmed in Figure 10, when the length of the time intervals decreases to zero, the estimates obtained with our method converge to those obtained with the usual method based on the observation of the successive inter-event times.
4.3 Goodness of fit and optimal number of phases
We consider three different criteria for choosing the optimal value of the number of phases. These criteria are compared on numerical examples in Section 7.4 in the Supplementary Material.
Akaike Information Criterion (AIC)
We choose the value of which minimizes the AIC defined as
[TABLE]
where the number of parameters an ATMMPP model with phases. This criterion deals with the trade-off between goodness of fit of the model and its complexity. One advantage is that it does not rely on the knowledge of the true model: the value of AIC can be directly computed from the log-likelihood of the estimated model given the data.
Cross-validation (CV)
We perform a -fold cross-validation over the data sample of individual life vectors (with typical value ). The idea is to randomly divide the data into equal-sized parts. We leave out part , fit the model to the other parts (combined), and then evaluate the likelihood of the left-out th part (test set) under the estimated parameters. We choose the model maximizing the mean test likelihood obtained by averaging the results for . Similar to the AIC, this method does not require us to know the true model.
Mean squared integrated loss (MSIL)
Let be the set of all life vectors with entries in . Any life vector with at least one entry equal to is interpreted as a (disjoint) union of vectors in . For any fixed number of phases , and a given sample of life vectors , the MLE method is used to estimate the parameters of the TMAP model. From the estimate we define a corresponding probability mass function over as
[TABLE]
The optimal number of phases is the value of minimizing the mean squared integrated loss, defined as
[TABLE]
Since the first term is independent of , the value of minimizing also minimizes the MSIL. The problem therefore reduces to estimating for each .
If the true model is known, then is known, and the expectations in can be estimated through resampling. If the true model is unknown, then a -fold cross-validation method can be applied to estimate . Let and be the th training set and test set, respectively. Let denote the probability mass function estimator using phases and training set . We have
[TABLE]
and since the sets are all drawn from the true distribution , we have
[TABLE]
The set of life vectors being infinite, the sums in (4.7) and (4.8) need to be modified in practice. For a given pair of integers and , we partition the set to form a new finite set such that The vectors are of length and have their entries in the finite set , so that
[TABLE]
They define equivalence classes in as follows:
- •
if for all , then
[TABLE]
in which case
- •
if for some , , then
[TABLE]
in which case is computed as given in the next Lemma.
Lemma 4.1**.**
For any such that for some indices , , we have
[TABLE]
where the vector is such that, for ,
[TABLE]
**Proof. **We know by the definition of and . This sum contains embedded sums of the form , which we rewrite as . Using
[TABLE]
and rearranging the terms then lead to (4.9).
Replacing by results in a different version of the MSIL criterion which selects the best model capturing differences in the number of children less than or equal to over the first age-classes. It is clear that the larger the age-class length , the smaller and the larger should be chosen in order for to be well approximated by . When , a possible choice of the partitioning parameters is taking as the ceiling of the expected lifetime plus one, and as the maximal expected number of children per age-class, that is,
[TABLE]
Another choice leading to smaller equivalence classes in is
[TABLE]
where is a covering probability, and is the standard error of the age-specific fertility rate at age . Formulae for year age-classes are analogous.
4.4 Confidence intervals for the model outputs
For any performance measure of the model , such as the mortality or fertility functions at age , empirical and theoretical pointwise confidence intervals can be constructed.
If the true model is known, the pointwise mean and standard deviation of can be estimated through resampling. This provides a confidence interval for each value of , and the width of the resulting confidence band gives us an indication of the stability of the estimated model, given the true model. If the true model is unknown, bootstrapping from the data sample substitutes resampling from the true model.
Asymptotic theoretical confidence intervals are found using the delta method,
[TABLE]
where
[TABLE]
is the observed information matrix. A pointwise confidence interval for is then given by
[TABLE]
5 Numerical applications
In this section, the two parameter estimation methods are applied to three types of illustrative examples. We first analyse artificial examples in which we simulate ATMMPPs, then we estimate their parameters based on the simulations, we make a goodness of fit analysis and compare the different criteria for choosing the optimal number of phases. We provide a summary of the results here and refer to Section 7.4 in the Supplementary Material for details. Next, we use real global female population data in different countries to estimate the model parameters. Finally, we fit an MBT using real individual demographic data on the black robin population, and we give a biological interpretation to our results.
We used the Matlab function fmincon to minimize the sum of weighted squared errors (3.1) in the global population data case, or to maximize the log-likelihood function (4.2) in the individual demographic data case, under the constraint of positive parameters. The algorithm requires an initial value (seed) for the parameters. A reasonable guess for the model parameters is
[TABLE]
and
[TABLE]
In order to minimize the risk to converge to local extrema, we started the algorithm with 25 different seeds obtained by adding random noise to the above values, and we chose the optimal solution among all.
5.1 Artificial examples
We considered three examples of ATMMPPs with or phases and we applied the two parameter estimation methods on data samples constructed by simulating trajectories of these models. As expected, the fits corresponding to the individual demographic data are much closer to the real model, and are associated to smaller confidence bands, than those corresponding to the global population data.
We observe that the MSE does not seem to be a satisfactory criterion to determine the optimal number of phases as the real value of never minimizes the MSE on these examples. In all cases, the AIC provides the correct answer most of the time, while the CV and MSIL show similar trends and slightly under-estimate the true value of . The parameters and in the MSIL were chosen according to (4.10) and the criterion turned out not to be sensitive to this choice as the optimal value of is the same for neighbouring values of and . All the details and figures can be found in Section 7.4 in the Supplementary Material.
5.2 Female families in different countries
In [4] the authors used real global population data on female human mortality and fertility rates corresponding to five-year ages-classes from different countries to fit MBTs with phases; in these models, each phase corresponds exactly to one age-class. Here we use the weighted non-linear regression method described in Section 7.2 to estimate the parameters of new MBTs with phases, and we compare the model age-specific mortality and fertility curves resulting from both approaches. Besides facilitating the comparison, considering the same number of phases allows us to start the optimisation algorithm with the most realistic initial parameter values given by the model values in [4]. We show the results for a supercritical country (Congo), an almost-critical country (USA) and a subcritical country (Japan) in Figure 1. We see that the new fits have substantially improved: the MSE is divided by a factor of 5.21 for Congo, 1.89 for the USA and 1.38 for Japan. We observe that the fits of the mortality curves are less satisfactory for the older ages; note that removing the weights (which are decreasing with age) do not improve the fits.
5.3 Black Robin population
The black robin is an endangered songbird species endemic to the Chatham Islands, an isolated archipelago located 800 kilometres East of New Zealand. By 1980, the population of black robins had declined to five birds, including only a single successful breeding pair, on Mangere Island [1]. Through intensive conservation efforts in 1980-1989 by the New Zealand Wildlife Service (now the Department of Conservation), the population recovered to 93 birds by spring 1990 [7]. Over the next decade (1990-1998), the population was closely monitored, but without human intervention. Nevertheless the population continued to grow rapidly to 197 adults by 1998, but after this period, the population growth slowed considerably and it only reached 239 adults in 2011 and 298 in 2014 [11].
For the conservation management of highly threatened species it is important to know the potential future viability, or survival probability, of a population, because if a population is not viable (i.e. fertility and survival rates are low), it will eventually become extinct. Population viability depends on reproductive rates and survival of individuals, but these rates may vary between sexes and across an individual’s life span (with age). Hence, the exact male-to-female ratio and the ages of each individual within a population will influence a population’s future viability. Reintroduction of a species into previously occupied parts of its former natural range is nowadays a common hands-on conservation method. In order to maximize the survival chances of the new population, it is necessary to know the optimal age distribution of the reintroduced population, which can only be designed based on a complete age-specific demographic analysis of the species. The black robin is an ideal species for which to develop these new statistical tools, because biologists have been collecting complete raw datasets on this bird species for several decades. An age-specific reorganisation of these data leads to a total of 433 life vectors for the monitoring period 2007-2014.
We performed different tests to determine the optimal number of phases to fit the black robin data. The results are shown in Figure 2, where we see that the optimal value is according to the AIC, according to the CV criterion, and according to the MSIL criterion with and (determined using (4.10)). In this case, the MSIL criterion is quite sensitive to the choice of and , as indicated in Table 1.
The model fits based on the global population data and on the individual demographic data (life vectors) with are compared in Figure 3. Since life vectors are available here, the corresponding models are the most informative. Black robins reach maturity at 1 or 2-years of age. The age-specific mortality curves show that mortality of black robins is the highest before they reach maturity and the lowest when they are 1 to 2 years old. Once birds reach 3 years of age, mortality rates do not increase dramatically with age, nor do fertility rates decline, which would support the hypothesis that there is no senescence. However, as few birds reach the old ages, the accuracy of the estimates obtained using global population data declines with age. Figure 4 shows the 95 pointwise confidence intervals for the estimates of the model outputs, obtained by bootstrapping 25 samples from the original sample of life vectors. We see that the confidence intervals are quite narrow, especially for the estimates obtained using the individual demographic data.
One of the most informative model output is the probability of extinction of a female family (that is, consisting only of female descendants) generated by a singe female, as a function of the age of that first female. This probability can be computed from the MBT model using any of the available algorithms (see for instance [5, 6]) and is shown in Figure 5. The curve highlights the combined effect of the age-specific mortality and fertility rates on the viability of the female family, hence of the whole population, by extension. We see that, if we were to found a new population starting with a single female bird, in order to maximize the survival chance of the population, the optimal age of the initial female should be around one year old.
6 Future directions
There are a number of directions for future research, particularly for the study of global population data. Cross-validation is delicate in that case as there are generally few data points. Leave-one-out cross-validation could be used, where we leave one age-class out. Possible complications would then include the choice of the weights as these explicitly use the death rates for all age-classes.
Further methods for analysing global population data could be developed for when the age-specific mortality and fertility rates follow some specific distributions. For example, we may know that the birth rate and death rate at each age lie in an exponential family of distributions. The analysis could then involve a parallel process of estimating these distributions and using these distributions to simulate new samples of global population data. Each new sample can then be used to do parameter estimation and construct confidence intervals. Alternatively, the theory developed for finding confidence intervals for weighted non-linear regression methods could be used in this case; unfortunately, this process is not computationally straightforward.
Acknowledgements
Sophie Hautphenne thanks the Australian Research Council (ARC) for support through the Discovery Early Career Researcher Award DE150101044. The black robin research was funded by the New Zealand Foundation for Research, Science and Technology (UOCX0601) to Melanie Massaro, by the School of Biological Sciences, University of Canterbury, by the Brian Mason Scientific and Technical Trust, and by the Mohamed bin Zayed Species Conservation Fund. This research was only possible with permission from the Chatham Island Conservation Board and the logistic help of the Department of Conservation.
7 Supplementary Material
7.1 Proof of Proposition 3.1
Let be the lifetime of an individual and be the survival function. Since follows a PH distribution,
[TABLE]
The probability of death at age , , can thus be calculated as
[TABLE]
It is shown in [8] that the mean number of events until time in a TMAP started in phase at time 0 is given by
[TABLE]
Let denote the number of events in the TMAP in the time interval . By time-homogeneity of the TMAP,
[TABLE]
The mean number of offspring generated by an individual at age can thus be calculated as
[TABLE]
7.2 Global population data with year age classes
In demography, the available data often consist of age-specific fertility and mortality rates over year age-classes with , that is,
- •
the expected number of offspring per year from a parent in age-class , denoted as , and
- •
the probability that an individual who reached the age-class dies within the year, denoted as .
We extend the definition of and to year age-classes and define the functions and , computed from the TMAP model, as follows:
[TABLE]
It is a simple matter to generalise Proposition 3.1 to obtain
Corollary 7.1**.**
[TABLE]
The correspondence between and and the mortality and fertility rates in age-class is given in the next Lemma.
Lemma 7.1**.**
[TABLE]
where the symbol has to be interpreted as “is the model equivalent of”.
Proof.
The model function is the probability that an individual who reached age survives at least until age , that is, survives successive one-year age intervals, which occurs with probability .
The model function can be rewritten as
[TABLE]
which completes the proof. ∎
In order to estimate the model parameters in this case, the objective function (3.1) then needs to be modified according to Corollary 7.1 and Lemma 7.1.
7.3 Proof of Proposition 4.1
In order to compute , we actually compute for , and for any where
[TABLE]
and observe that . It is well known from the theory of MAPs that the probability generating function is given by the matrix exponential
[TABLE]
Since P(k,t)=(1/k!)[\partial^{k}P^{*}(z,t)/(\partial z)^{k}]\big{|}_{z=0} for any , we need to take the derivatives of the matrix exponential with respect to . The scalar rule of exponential differentiation only holds here if and commute, which is generally not the case. Instead, we first differentiate with respect to ,
[TABLE]
and we then take successive derivatives of this equation with respect to :
[TABLE]
This system of partial derivative equations can be rewritten as an ordinary differential equation for the unknown matrix containing the partial derivatives of with respect to ,
[TABLE]
whose solution is
[TABLE]
Taking and denoting
[TABLE]
we obtain
[TABLE]
Then, is obtained by conditioning on the time when the individual dies,
[TABLE]
Next,
[TABLE]
where , and finally
[TABLE]
7.4 Further details on the artificial examples
We consider three examples of ATMMPPs, simulating trajectories of these processes for units of time. The different parameter values are summarized in Table 2.
For each example, we associated each simulated trajectory of the ATMMPP with a life vector by counting the number of births falling in successive -year intervals. Here we took , so each entry of the vectors corresponds to a specific age-classes and the vectors have a total of entries. This produced samples of individual life vectors of the form (4.1). The average age-specific fertility and mortality rates and , for , were computed directly from these samples.
We first performed a goodness of fit analysis on Example 1. We set (the true number of phases), and we used the global population data and , and the full sample of life vectors, to estimate the model parameters using the corresponding statistical method, leading to two different parameter estimates and . In Figure 6 we compare the performance measures and corresponding to the age-specific mortality and fertility curves, obtained with the two different estimation methods. To further assess the accuracy of the estimates, we re-sampled 50 datasets of size from the true model, and we show in Figure 7 the mean curves compared to the real ones, as well as the corresponding pointwise confidence intervals. In Figure 8, we perform the same analysis by bootstrapping 50 times from a single dataset instead of resampling. We conclude from Figures 6, 7 and 8 that, as expected, the fits corresponding to the individual demographic data are much closer to the real model, and are associated to smaller confidence bands, than those corresponding to the global population data. In Figure 9 we show the theoretical pointwise confidence intervals given by (4.12) for the fits obtained using individual demographic data; these are comparable to those shown in Figure 7. Finally, in Figure 10 we compare the fits based on life vectors with different age-class lengths to those based on the observation of the successive inter-event times. We see that as decreases to zero, the estimates obtained with our method converge to those based on the successive inter-event times.
For all three examples, Figure 11 shows the sensitivity of the MSE with respect to the number of phases in the fitted model. This also highlights the fact that the MSE does not seem to be a satisfactory criterion to determine the optimal number of phases as the real value of never minimizes the MSE on these examples. Finally, Figure 12 compares all criteria to decide upon the optimal number of phases in the individual demographic data case. In all cases, the AIC provides the correct answer most of the time, while the CV and MSIL show similar trends and slightly under-estimate the true value of . The parameters and in the MSIL were chosen according to (4.10) and the criterion turned out not to be sensitive to this choice as the optimal value of is the same for neighbouring values of and .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. Butler and D. Merton. The Black Robin: Saving the World’s Most Endangered Bird . Oxford University Press, Auckland, 1992.
- 2[2] A. C. Davison and N. I. Ramesh. Some models for discretized series of events. Journal of the American Statistical Association , 91(434): 601–609, 1996.
- 3[3] T. B. Gage and C. J. Mode. Some laws of mortality: how well do they fit? Human Biology , 65(3) 445–461, 1993.
- 4[4] S. Hautphenne and G. Latouche. The Markovian binary tree applied to demography. Journal of mathematical biology , 64(7):1109–1135, 2012.
- 5[5] S. Hautphenne, G. Latouche, and M.-A. Remiche. Newton’s iteration for the extinction probability of a Markovian Binary Tree. Linear Algebra and its Applications , 428:2791–2804, 2008.
- 6[6] S. Hautphenne, G. Latouche, and M.-A. Remiche. Algorithmic approach to the extinction probability of branching processes. Methodology and Computing in Applied Probability . 13(1), 171-192, 2011.
- 7[7] E. S. Kennedy, C. E. Grueber, R. P. Duncan, and I. G. Jamieson. Severe inbreeding depression and no evidence of purging in an extremely inbred wild species—the Chatham Island black robin. Evolution , 68(4):987–995, 2014.
- 8[8] G. Latouche, M.-A. Remiche, and P. Taylor. Transient Markov arrival processes. Annals of Applied Probability , pages 628–640, 2003.
