Bayesian significance test for discriminating between survival distributions
Cachimo Assane, Basilio Pereira, Carlos Pereira

TL;DR
This paper evaluates the Fully Bayesian Significance Test (FBST) for selecting the best survival distribution among lognormal, gamma, and Weibull by using a mixture model and applying it to both simulated and real patient data.
Contribution
It introduces a Bayesian mixture model with reparametrized distributions for survival analysis and applies FBST to discriminate among competing survival distributions.
Findings
FBST effectively distinguishes between survival distributions in simulations.
The mixture model provides clear posterior weights indicating the most suitable distribution.
Application to real patient data demonstrates practical utility in medical survival analysis.
Abstract
An evaluation of FBST, Fully Bayesian Significance Test, restricted to survival models is the main objective of the present paper. A Survival distribution should be chosen among the tree celebrated ones, lognormal, gamma, and Weibull. For this discrimination, a linear mixture of the three distributions, for which the mixture weights are defined by a Dirichlet distribution of order three, is an important tool: the FBST is used to test the hypotheses defined on the mixture weights space. Another feature of the paper is that all three distributions are reparametrized in that all the six parameters - two for each distribution - are written as functions of the mean and the variance of the population been studied. Note that the three distributions share the same two parameters in the mixture model. The mixture density has then four parameters, the same two for the three discriminating…
| of Rc† | Model | of Cd‡ | ||||||
|---|---|---|---|---|---|---|---|---|
| - | - | - | ||||||
| Lognormal | ||||||||
| Gamma | ||||||||
| Weibull | ||||||||
| Lognormal | ||||||||
| Gamma | ||||||||
| Weibull | ||||||||
| Lognormal | ||||||||
| Gamma | ||||||||
| Weibull | ||||||||
| percentage of right-censoring | ||||||||
| percentage of correct decision | ||||||||
| Comparison | Null hypothesis | Evidence in favor of null hypothesis | |
|---|---|---|---|
| e-value | p-value∗ | ||
| *p-value calculated according to Diniz et al. (2012) | |||
| Parameter | Mean | SD | Median | ||
|---|---|---|---|---|---|
| Hip tese | e-valor | p-valor∗ |
|---|---|---|
| *p-value calculated according to Diniz et al. (2012) | ||
| Par metro | Mean | SD | Median | ||
|---|---|---|---|---|---|
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Distribution Estimation and Applications · Bayesian Methods and Mixture Models · Statistical Methods and Bayesian Inference
Bayesian significance test for discriminating between survival distributions
Cachimo Combo Assane [email protected]
Universidade Federal do Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil
Basilio de Bragança Pereira [email protected]
Universidade Federal do Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil
Carlos Alberto de Bragança Pereira [email protected]
Universidade de São Paulo (USP), São Paulo, Brazil
Abstract
An evaluation of FBST, Fully Bayesian Significance Test, restricted to survival models is the main objective of the present paper. A Survival distribution should be chosen among the tree celebrated ones, lognormal, gamma, and Weibull. For this discrimination, a linear mixture of the three distributions, for which the mixture weights are defined by a Dirichlet distribution of order three, is an important tool: the FBST is used to test the hypotheses defined on the mixture weights space. Another feature of the paper is that all three distributions are reparametrized in that all the six parameters – two for each distribution – are written as functions of the mean and the variance of the population been studied. Note that the three distributions share the same two parameters in the mixture model. The mixture density has then four parameters, the same two for the three discriminating densities and two for the mixture weights. Some numerical results from simulations with some right-censored data are considered. The lognormal-gamma-Weibull model is also applied to a real study with dataset being composed by patient s survival times of patients in the end-stage of chronic kidney failure subjected to hemodialysis procedures; data from Rio de Janeiro hospitals. The posterior density of the weights indicates an order of the mixture weights and the FBST is used for discriminating between the three survival distributions
Keywords: Model choice; Separate Models; Survival distributions; Mixture model; Significance test; FBST
1 Introduction
In many scientific disciplines, researchers are constantly faced with the fundamental problem of choosing among alternative statistical models. The Neyman-Pearson theory of hypothesis testing applies only if the models belong to the same family of distributions. Alternatively, special procedures are required if the models belong to families that are separate (or non-nested) in the sense that an arbitrary member of one family cannot be obtained as a limit of members of the other. The set of separate families of probability distributions includes the ones used here: lognormal, gamma, and Weibull models (Pereira, 1981; Araujo and Pereira, 2007; Pereira and Pereira, 2017) which have been used widely to describe survival data (Lawless, 2002; Lee and Wang, 2003).
A considerable amount of research on separate families of hypotheses has been realized since the fundamental work of Cox (1961, 1962), who first dealt with the problem. For reviews and references, see (2005); Araujo and Pereira (2007); and Pereira and Pereira (2017).
The Fully Bayesian Significance Test (FBST) introduced by Pereira and Stern (1999) is an alternative test to the ones that are based on Bayes factor or on the classical p-value; mostly for the case of precise hypotheses. The basis for the FBST is an index known as e-value (e stands for evidence) that measures the inconsistency of the hypothesis. For this, it considers the tangent set, ; the set of all parameter values for which their posterior density values are greater than the values of the posterior densities of all points that attend the hypothesis. For reviews and further references on FBST, see Pereira et al. (2008) and Stern and Pereira (2014). For a few interesting applications illustrating the use of e-values and the FBST to practical problems, see Diniz et al. (2012), Lauretto at al. (2003), Lauretto at al. (2007), and Pereira and Stern (1999).
In the present work, we consider the FBST for discriminating between the lognormal, gamma and Weibull distributions. We formulate this problem in the context of linear mixture model, as suggested by Cox (1961). It means that, the models under comparison are considered as components of a finite mixture model. The FBST is used for testing hypotheses defined on the mixture weights space. The e-value is the complementary of the posterior probability of the tangent set ; ,
Additionally, the density functions of the mixture components are reparametrized in terms of the mean and the variance of the population. Hence, the models under discrimination share common parameters (Kamary et al., 2014; Pereira and Pereira, 2017). A standard Bayesian approach to finite mixture models is to consider different pairs of parameters for each of these models and to adopt independent prior distributions for each pair of parameters and a Dirichlet prior on the mixture weights (Lauretto and Stern, 2005; Lauretto at al., 2007). However, since the comparison between the models is based on the same dataset and on the same sample, we believe that it would be inappropriate to consider different means and variances for these models. Note that this reparameterization reduces the number of the parameters to be estimated: in our case, including the weights, from eight to only four.
To illustrate the procedure, numerical results based on simulated right-censored survival times were considered. Also, a real example is introduced to use the lognormal-gamma-Weibull mixture model to the dataset of patients, from Rio de Janeiro hospitals, with end-stage chronic kidney failure who received hemodialysis.
Section 2 presents a brief review of basic concepts and notation for survival analysis. The parametric distributions used in this paper are also described. Section 3 reviews the basic concepts o FBST. Section 4 discusses the FBST formulation for discriminating between survival distributions in the context of mixture models. Section 5 presents the results of the simulation study. Section 6 is about the use of the lognormal-gamma-Weibull on the real dataset. Final remarks are presented in Section 7.
2 Survival analysis
2.1 Basic concepts and notation
Survival analysis is concerned with the analysis of time to occurrence of a certain event of interest, such as failure, death, relapse or development of a given disease.
Let be a non-negative random variable representing the time until some event of interest. There are three functions of primary interest used to characterize the distribution of , namely the survival function, the probability density function and the hazard function (Lee and Wang, 2003).
The survival function, denoted by , is defined as the probability that an individual survives beyond time :
[TABLE]
where is the distribution function of . Note that is a nonincreasing continuous function of time with and .
The probability density function, denoted by , is the probability of failure in a small interval per unit time. It can be expressed as
[TABLE]
The hazard function, denoted by , represents the probability of failure during a very small time interval, assuming that the individual has survived to the beginning of the interval:
[TABLE]
This function is also known as the conditional failure rate. The cumulative hazard function is defined as
[TABLE]
Therefore, when then, and ; and when then, and .
2.2 Parametric survival distributions
In this paper, we consider the the FBST for discriminating between the lognormal, gamma and Weibull distributions which are most frequently used in modeling survival data (Lawless, 2002; Lee and Wang, 2003). The probability density functions, the survival functions and the hazard functions of these distributions are highlighted below.
- i)
Let be a lognormal random variable with parameters , denoted by ,
[TABLE]
- ii)
If has a Gamma distribution with parameters , denoted by , then
[TABLE]
- iii)
If has a Weibull distribution with parameters , denoted by , then
[TABLE]
3 Fully Bayesian Significance Test (FBST)
The FBST of Pereira and Stern (1999), which is reviewed in Pereira et al. (2008), is a Bayesian version of significance testing, as considered by Cox (1977) and Kempthorne (1976), for precise (or sharp) hypotheses.
First, let us consider a real parameter , a point in the parameter space , and an observation of the random variable . A frequentist looks for the set of sample points that are at least as inconsistent with the hypothesis as is. A Bayesian looks for the tangential set (Pereira et al., 2008), which is a set of parameter points that are more consistent with the observed than the hypothesis is. An example of a sharp hypothesis in a parameter space of the real line is of the type . The evidence value in favor of for a frequentist is the usual p-value, , whereas for a Bayesian, the evidence in favor of is the e-value, .
In the general case of multiple parameters, , let the posterior distribution for given be denoted by , where is the prior probability density of and is the likelihood function. In this case, a sharp hypothesis is of the type , where is a subspace of smaller dimension than . Letting denote the supremum of , we define the general Bayesian evidence and the tangential set, , as follows:
[TABLE]
The Bayesian evidence value against is the posterior probability of ,
[TABLE]
It is important to note that evidence that favors is not evidence against the alternative, , because it is not a sharp hypothesis. This interpretation also holds for p-values in the frequentist paradigm. As in Pereira et al. (2008), we would like to point out that this Bayesian significance index uses only the posterior distribution, with no need for additional artifacts such as the inclusion of positive prior probabilities for the hypotheses or the elimination of nuisance parameters. The computation of the e-values does not require asymptotic methods, and the only technical tools needed are numerical optimization and integration methods.
4 Mixture of survival models
Let us consider a dataset and alternative parametric survival distributions with densities . Here, , are unknown (vector) parameters and the families of distributions are separate. The problem of interest is to measure the evidence in favor of each model for fitting the dataset. As suggested by Cox (1961), we can consider a general model including all candidate distributions where the choice of a specific distribution is a special case. In this work, we formulate the FBST for the linear mixture of the survival models as a selection procedure. Denoting , the density function for component mixture model is
[TABLE]
where is the vector of the mixture weights.
In the presente work, the density functions of the mixture components in (4.1) are reparametrized in terms of the mean and the variance of the population. Hence, the models under comparison share common parameters (Kamary et al., 2014; Pereira and Pereira, 2017). The main reason for this reparametrization is that, since the comparison between the models is based on the same dataset and on the same sample, we believe that it would be inappropriate to consider different means and variances for these models as is commonly performed in traditional Bayesian approach to finite mixture model. Therefore, we have denoting all parameters of the mixture model, where and are the connecting parameters, with corresponding to the vector of the mixture weights.
Assuming that the are conditionally (on the parameter) independent, the likelihood function is defined as
[TABLE]
The families of distributions considered include the lognormal, gamma and Weibull models. Hence, the relationship between the parameters of these models through the and is described as follows.
- (i)
Let be a , with probability density function
[TABLE]
We then have
[TABLE]
- (ii)
Let be a , with probability density function
[TABLE]
Therefore
[TABLE]
- (iii)
When , with probability density function
[TABLE]
then
[TABLE]
In order to find , the Newton-Rapson method can be used to solve the nonlinear equation. Here, we use the nleqslv function in the R package of the same name.
A special feature of survival data is that survival times are frequently censored. The survival time of an individual is said to be censored when the event of interest has not been observed for that individual, but is known only to occur in a certain period of time. There are various categories of censoring, such as right censoring, left censoring and interval censoring (see Klein and Moeschberger (2003) for more details). In this paper, we restrict ourselves to data in which the survival times are subject to right censoring, which is the most common censoring mechanism in medical research.
In the model for right-censored data, it is convenient to consider the following notation. Each individual is assumed to have an event time and a censoring time . The observations consist of , where and , indicating whether was observed () or not ().
Note that the likelihood function given by (4.2) is for uncensored (or exact) observations. Assuming noninformative censoring, i.e, independence between and , then, the likelihood function for right-censored observations is
[TABLE]
where, is the survival function associated with the mixture component .
Assuming independence, the joint prior density function of is given by . Therefore, according to the Bayesian paradigm, the posterior density of is
[TABLE]
In this paper, the prior distributions for the connecting parameters, and , are assumed to be independent gamma distributions, both with a mean of one and a variance of 100, that is, (Pereira and Pereira, 2017). For the mixture weights, we use a Dirichlet prior, when all families of models are considered () or a Beta prior with parameters (1,1) (uniform) for any combination of .
In order to measure the evidence in favour of each model, the hypotheses on the mixture weights are tested (Kamary et al., 2014; Pereira and Pereira, 2017).
The hypothesis specifying that has the density function is equivalent to
[TABLE]
On the other hand, the hypothesis that has not the density is equivalent to
[TABLE]
The alternative hypotheses to (4.13) and (4.14) are and , respectively, which are not sharp anyway.
The FBST procedure is used to test , according to the expressions (3.1) and (3.2). For the optimization step, we used the conjugate gradient method (Fletcher and Reeves, 1964). In order to perform the integration over the posterior measure, we used an Adaptive Metropolis Markov chain Monte Carlo algorithm (MCMC) of Haario et al. (2001).
In this paper, the implementation of the Bayesian models is carried out using LaplacesDemon R package. The LaplacesDemon is an open-source package that provides a complete environment for simulation in Bayesian inference (Statisticat, LCC, 2016).
5 Simulations
In this section we present some numerical results based on simulated right-censored survival times in order to evaluate the performance of the FBST for discriminating between the survival distributions via lognormal-gamma-Weibull mixture model (LGW). The main purpose is to measure the convergence rate of correct decisions, concerning the identification of the true model used to generate the survival times .
The simulations of this paper were performed on a Intel(R) Core(TM) i7-5500U CPU@ 2.40GHz computer.
5.1 Simulation scheme of sample points
Let , and be the hypotheses specifying the probability density functions of the lognormal, gamma and Weibull distributions, respectively. From each distribution, we generate samples of sizes , , , and . Each sample contain a desired proportion of right-censored observations.
The steps used to simulate a sample, , of size , in which part of the observations is right-censored, are shown below. For this example, we assume that the true survival times has a lognormal distribution.
Assign values to parameters e ; 2. 2.
Calculate the lognormal parameters using the expressions (4.3); 3. 3.
For ,
- •
Generate the survival time from ;
- •
Generate the right-censoring time from a exponential distribution, i.e, , where the parameter is chosen such that approximately a desired percentage of simulated observations are right-censored;
- •
Obtain the observed time
- •
Create an indicator random variable
Using this generated sample, we obtain the posterior samples for the mixture parameters from Adaptive Metropolis algorithm and we use the FBST to calculate the evidence measures in favor of each model.
The value for the censoring distribution parameter, , is determined by numerical methods (Wan, 2017). We let denote the right-censoring probability. We suppose that the censoring time has exponencial density function and the independence assumption between and holds. In order to simulate a sample with approximately of right-censored observations, the value of is obtained by solving the following equation:
[TABLE]
where and are the lognormal probability density and distribution functions of survival times, respectively.
For generating right-censored survival times from the gamma and Weibull distributions, an analogous procedure to that used for the lognormal distribution is employed.
5.2 Criteria for evaluating the performance of the FBST
In order to evaluate the performance of the FBST on selecting the true distribution used to generate the survival times, we have compared the measures of evidence in favor of the hypotheses and , , where are respectively the mixture weights associated with the lognormal, gamma and Weibull components in the LGW mixture model.
For instance, suppose again that the true survival time has a lognormal distribution. We consider that the FBST has made a correct choice on the LGW model, if the evidence in favor of is less than that in favor of and , and the evidence in favor of is greater than that in favor of e .
The calculation of the proportions of correct decisions made by FBST is based on replicates. In these simulations, we have assigned and . The FBST procedure is evaluated considering the samples with different censoring percentages: , and .
5.3 Simulation results
Table 1 presents the mean of the estimates for the LGW mixture model parameters and the percentages of correct decisions made by FBST on selecting the true distribution used to generate the survival times. It is observed that, regardless of the distribution used for generating the survival times and the sample sizes, the estimates for the mean are very close to each other and to the true value of the parameter. For the estimates of the variance , we observe a variation between them but, in general, they approach the true value of the parameter as the sample size increases.
It is observed that the FBST presents a high performance on identifying the Weibull distribution as the true data generation process and low performance on identifying the gamma distribution. This happens because, regarding the parameters chosen for these simulations, the gamma and lognormal densities are very similar. The general pattern of the simulation results shows that the FBST achieves good performance even for samples with right-censoring.
6 Application: Choice of a survival model for patients with end-stage kidney disease
6.1 Dataset
The dataset used in this paper refers to a cohort study of 473 patients with end-stage chronic kidney failure who received hemodialysis (HD) in four centers in the State of Rio de Janeiro, Brazil. The patients were followed up years. The observed time for each patient was the number of months from admission to hemodialysis until death or the end of the observation period (kidney transplant or end of the study) which indicates a right-censored survival time. For a complete description of this dataset, see (2014).
In this paper, our main interest is to apply the LGW model to the survival data for HD patients and use the FBST procedure to examine the mixture parameters in order to choose the parametric distribution that best fits the observed data. But before that, we have performed pairwise comparisons by fitting the lognormal-Weibull, lognormal-gamma, and gamma-Weibull mixture models.
6.2 Results
The measures of evidence provided by HD data in favor of the three models concerning the pairwise comparisons are presented in Table 2. For the comparison between the lognormal and Weibull distributions, the FBST indicates to choose the lognormal model since the e-values and . For selecting between the lognormal and gamma distributions, the evidence measures indicate that both models provide good fit to the dataset. Nevertheless, also we would prefer to choose the lognormal model which is the most plausible. The results of the tests for comparison between the gamma and Weibull distributions indicate that the Weibull distribution does not provide reasonable fit to the dataset.
Discrimination based on the LGW mixture model
In order to test simultaneously the three hypotheses, we have applied the the LGW model,
[TABLE]
to the HD data.
The estimates for the parameters of the model (6.1) are presented in Table 3. Here, SD, and denote the standard deviation, the th and the th percentiles of the posterior distribution of the LGW parameters, respectively. Both the classical and the Bayesian measures of evidence, presented in Table 4, indicate that neither the gamma and Weibull models should be considered because the null hypotheses e are not rejected. Consequently, among the three models, the lognormal model is the most appropriate for modeling HD data.
Figure 1 displays the survival curves calculated using Bayesian estimates of the lognormal model (Table 5), the LGW mixture model (Table 3) and a procedure called the piecewise exponential estimator (PEXE), introduced by Kim and Proschan (1976), representing the observed data. Unlike the well-known Kaplan-Meier estimator, the PEXE is smooth and continuous estimator of the survival function.
It appears reasonable to disregard both the gamma and the Weibull models; the lognormal model by itself produces a good estimate of survival function.
Note that the preference for the lognormal model is evident in evaluating the LGW mixture model more than in the comparison between the lognormal and gamma distributions, where the evidence measures in favor of both models are very close. It means that the discrimination power provided by LGW model is much higher than the power of the pairwise comparisons. This finding is in agreement with the discussion of Sawyer (1984).
7 Final Remarks
In this paper we considered the FBST for discriminating between survival distributions in the context of linear mixture model. The mixture approach allows us to compare between all alternative models at once by testing the hypotheses on the mixture weights space. The families of survival distributions considered include the lognormal, gamma and Weibull models. In this work, the density functions of the mixture components were reparametrized in terms of the mean and the variance of the population so that all models under discrimination share common parameters (Kamary et al., 2014; Pereira and Pereira, 2017).
From the simulation results, we observed that the FBST achieves good performance on identifying the true distribution used to generate the survival times.
The application of the LGW mixture model to the survival data for HD patients allowed us to identify the lognormal distribution as the most appropriate in modeling observed data. Therefore, one can construct a regression model to the HD data considering the lognormal model as the distribution of the response variable.
It would be of interesting to apply the proposed procedure to survival data also considering another censoring mechanisms.
Acknowledgements
The authors are grateful for the support of CNPq, COPPE/UFRJ and IME/USP.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) Alves, M. and Souza e Silva, N. A. and Salis, L. H. A. and Pereira, B. B. and Godoy, P. H. and Nascimento, E. M. and Oliveira, J. M. F. (2014) Survival and Predictive Factors of Lethality in Hemodyalisis: D/I Polymorphism of The Angiotensin I-Converting Enzyme and of the Angiotensinogen M 235T Genes. Arq Bras Cardiol. , 103 , 209–218.
- 2(2) Araujo, M. I. and Pereira, B. B. and Cleroux, R. and Fernandes, M. and Lazraq, A. (2005) Separate families of models: Sir David Cox contributions and recent developments. Student , 5 , 251–258.
- 3Araujo and Pereira (2007) Araujo, M. I. and Pereira, B. B. (2007) A Comparison of Bayes Factors for Separated Models: Some Simulation Results. Communications in Statistics–Simulation and Computation , 36 , 297–309.
- 4Cox (1961) Cox, D. R. (1961) Tests of separate families of hypotheses. Proceedings 4th Berkeley Symposium in Mathematical Statistics and Probability , 1 , 105–123.
- 5Cox (1962) Cox, D. R. (1962) Further results on test of separate families of hypotheses. Journal of the Royal Statistical Society , B , 406–424.
- 6Cox (1977) Cox, D. R. (1977) The role of significance tests. Scand. J. Statist , 4 , 49–70.
- 7Diniz et al. (2012) Diniz, M. and Pereira, C. A. B and Polpo, Adriano and Stern, J. M. and Wechsler, S. (2012) Relationship between Bayesian and Frequentist significance indices. International Journal for Uncertainty Quantification , 2 , 161–172.
- 8Fletcher and Reeves (1964) Fletcher, R. and Reeves, C. M. (1964) Function minimization by conjugate gradients. Computer Journal , 7 , 148–154.
