The relative efficiency of time-to-progression and continuous measures of cognition in pre-symptomatic Alzheimer's
Dan Li, Samuel Iddi, Paul S. Aisen, Wesley K. Thompson, Michael C., Donohue

TL;DR
This study shows that in pre-symptomatic Alzheimer's, using repeated continuous cognitive assessments in clinical trials is more statistically powerful than relying on time-to-progression measures, potentially improving early treatment evaluation.
Contribution
The paper introduces a simulation-based comparison demonstrating that continuous assessment models significantly increase power over time-to-progression models in preclinical Alzheimer's trials.
Findings
Repeated continuous assessments nearly double statistical power.
Simulation reveals potential bias from missing data patterns.
Type I error rate remains controlled at 5%.
Abstract
Pre-symptomatic (or Preclinical) Alzheimer's Disease is defined by biomarker evidence of fibrillar amyloid beta pathology in the absence of clinical symptoms. Clinical trials in this early phase of disease are challenging due to the slow rate of disease progression as measured by periodic cognitive performance tests or by transition to a diagnosis of Mild Cognitive Impairment. In a multisite study, experts provide diagnoses by central chart review without the benefit of in-person assessment. We use a simulation study to demonstrate that models of repeated cognitive assessments detect treatment effects more efficiently compared to models of time-to-progression to an endpoint such as change in diagnosis. Multivariate continuous data are simulated from a Bayesian joint mixed effects model fit to data from the Alzheimer's Disease Neuroimaging Initiative. Simulated progression events are…
| NC | SMC | Total | ||
|---|---|---|---|---|
| Variable | () | () | () | |
| Age | 75.21 (5.83) | 72.77 (5.78) | 74.57 (5.90) | |
| APOE4 alleles | 0 | 52 (43%) | 23 (53%) | 75 (46%) |
| 1 | 111 (57%) | 140 (47%) | 88 (54%) | |
| ADAS Delayed Word Recall | 2.96 ( 1.79) | 3.00 ( 2.08) | 2.97 ( 1.86) | |
| Logical Memory - Delayed Recall | 13.11 ( 3.15) | 12.63 ( 3.19) | 12.98 ( 3.16) | |
| Trails B | 93.40 (48.90) | 89.10 (32.00) | 92.30 (45.00) | |
| MMSE | 29.11 ( 1.13) | 29.09 ( 0.89) | 29.10 ( 1.07) | |
| Category Fluency (Animals) | 20.72 ( 5.32) | 19.72 ( 5.60) | 20.45 ( 5.40) | |
| CDR-SB | 0 | 111 (92%) | 36 (84%) | 147 (90%) |
| 0.5 | 8 ( 7%) | 7 (16%) | 15 ( 9%) | |
| 1 | 1 ( 1%) | 0 ( 0%) | 1 ( 1%) | |
| FAQ | 0 | 108 (90%) | 32 (74%) | 140 (86%) |
| 1 | 7 ( 6%) | 8 (19%) | 15 ( 9%) | |
| 2 | 2 ( 2%) | 0 ( 0%) | 2 ( 1%) | |
| 3 | 2 ( 2%) | 3 ( 7%) | 5 ( 3%) | |
| 5 | 1 ( 1%) | 0 ( 0%) | 1 ( 1%) |
| Progressor | Stable | |||||
| Parameter | Mean | 95% CI | Mean | 95% CI | ||
| ADAS Delayed Word Recall | ||||||
| Intercept | -8.244 | -4.913 | ||||
| Year | 0.330 | 0.064 | ||||
| Age | 0.110 | 0.062 | ||||
| APOE4 | 0.572 | 0.218 | ||||
| Logical Memory Paragraph Recall | ||||||
| Intercept | -6.897 | -1.840 | ||||
| Year | 0.261 | 0.033 | ||||
| Age | 0.096 | 0.020 | ||||
| APOE4 | 0.039 | 0.465 | ||||
| Trails B | ||||||
| Intercept | -9.458 | -6.364 | ||||
| Year | 0.353 | 0.022 | ||||
| Age | 0.124 | 0.084 | ||||
| APOE4 | 0.141 | 0.622 | ||||
| Mini-Mental State Examination | ||||||
| Intercept | 0.852 | -1.385 | ||||
| Year | 0.009 | 0.022 | ||||
| Age | 0.007 | 0.020 | ||||
| APOE4 | 0.040 | 0.115 | ||||
| Category Fluency - Animals | ||||||
| Intercept | 1.430 | 0.942 | ||||
| Year | 0.047 | 0.025 | ||||
| Age | -0.009 | -0.011 | ||||
| APOE4 | 0.036 | -0.118 | ||||
| Clinical Dementia Rating - Sum of Boxes | ||||||
| Intercept | -6.537 | 1.094 | ||||
| Year | 0.082 | 0.006 | ||||
| Age | 0.081 | -0.011 | ||||
| APOE4 | -0.224 | 0.117 | ||||
| Functional Assessment Questionnaire | ||||||
| Intercept | 3.458 | 0.261 | ||||
| Year | 0.023 | 0.0007 | ||||
| Age | -0.002 | -0.003 | ||||
| APOE4 | 0.343 | 0.014 | ||||
| Sample | Observed data | Completed data | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| size | Treatment | MMRM | CoxPH | MMRM | CoxPH | |||||
| 1000 | 0% | 0.021 | 0.051 | 0.053 | 0.040 | 0.027 | 0.049 | 0.057 | 0.046 | |
| 20% | 0.404 | 0.702 | 0.502 | 0.188 | 0.298 | 0.564 | 0.402 | 0.159 | ||
| 30% | 0.794 | 0.957 | 0.856 | 0.322 | 0.666 | 0.897 | 0.751 | 0.274 | ||
| 40% | 0.970 | 0.999 | 0.981 | 0.496 | 0.907 | 0.990 | 0.947 | 0.425 | ||
| 1500 | 0% | 0.024 | 0.042 | 0.054 | 0.058 | 0.014 | 0.048 | 0.051 | 0.055 | |
| 20% | 0.560 | 0.843 | 0.660 | 0.261 | 0.454 | 0.722 | 0.550 | 0.232 | ||
| 30% | 0.927 | 0.996 | 0.954 | 0.452 | 0.847 | 0.973 | 0.907 | 0.392 | ||
| 40% | 1.000 | 1.000 | 1.000 | 0.653 | 0.994 | 1.000 | 0.996 | 0.573 | ||
| Sample | 20% | 30% | 40% | ||||||
| size | Analysis Method | Median | Median | Median | |||||
| 1000 | MMRM | 0.018 | 0.028 | 0.037 | |||||
| 0.019 | 0.028 | 0.038 | |||||||
| 0.038 | 0.058 | 0.077 | |||||||
| CoxPH | -0.033 | -0.045 | -0.059 | ||||||
| MMRM-Mehrotra | -0.001 | -0.002 | -0.003 | ||||||
| -Mehrotra | -0.001 | -0.001 | -0.003 | ||||||
| -Mehrotra | -0.006 | -0.010 | -0.014 | ||||||
| 1500 | MMRM | 0.018 | 0.027 | 0.036 | |||||
| 0.018 | 0.027 | 0.037 | |||||||
| 0.037 | 0.056 | 0.075 | |||||||
| CoxPH | -0.028 | -0.042 | -0.055 | ||||||
| MMRM-Mehrotra | -0.002 | -0.003 | -0.004 | ||||||
| -Mehrotra | -0.001 | -0.002 | -0.003 | ||||||
| -Mehrotra | -0.008 | -0.012 | -0.016 | ||||||
| Sample | 20% | 30% | 40% | ||||||
| size | Analysis Method | Median | Median | Median | |||||
| 1000 | MMRM | 27.1 | 29.9 | 29.6 | |||||
| 29.6 | 29.8 | 29.7 | |||||||
| 24.5 | 26.5 | 26.2 | |||||||
| CoxPH | 17.4 | 22.2 | 25.5 | ||||||
| MMRM-Mehrotra | -4.4 | -2.9 | -2.8 | ||||||
| -Mehrotra | -1.7 | -1.7 | -2.0 | ||||||
| -Mehrotra | -6.0 | -4.5 | -4.7 | ||||||
| 1500 | MMRM | 27.5 | 28.2 | 28.3 | |||||
| 29.1 | 29.2 | 29.3 | |||||||
| 24.8 | 25.4 | 25.5 | |||||||
| CoxPH | 18.0 | 22.7 | 24.3 | ||||||
| MMRM-Mehrotra | -3.0 | -3.0 | -3.1 | ||||||
| -Mehrotra | -2.1 | -2.3 | -2.4 | ||||||
| -Mehrotra | -6.1 | -5.5 | -5.5 | ||||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
The relative efficiency of time-to-progression and continuous measures of cognition in pre-symptomatic Alzheimer’s
Dan Li
Samuel Iddi
Paul S. Aisen
Wesley K. Thompson
Michael C. Donohue
for the Alzheimer’s Disease Neuroimaging Initiative111Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf
Alzheimer’s Therapeutic Research Institute, University of Southern California, San Diego
University of Ghana, Accra
University of California, San Diego
Abstract
Pre-symptomatic (or Preclinical) Alzheimer’s Disease is defined by biomarker evidence of fibrillar amyloid beta pathology in the absence of clinical symptoms. Clinical trials in this early phase of disease are challenging due to the slow rate of disease progression as measured by periodic cognitive performance tests or by transition to a diagnosis of Mild Cognitive Impairment. In a multisite study, experts provide diagnoses by central chart review without the benefit of in-person assessment. We use a simulation study to demonstrate that models of repeated cognitive assessments detect treatment effects more efficiently compared to models of time-to-progression to an endpoint such as change in diagnosis. Multivariate continuous data are simulated from a Bayesian joint mixed effects model fit to data from the Alzheimer’s Disease Neuroimaging Initiative. Simulated progression events are algorithmically derived from the continuous assessments using a random forest model fit to the same data. We find that power is approximately doubled with models of repeated continuous outcomes compared to the time-to-progression analysis. The simulations also demonstrate that a plausible informative missing data pattern can induce a bias which inflates treatment effects, yet 5% Type I error is maintained.
keywords:
clinical trial simulations , Alzheimer’s Disease , Cox proportional hazards model , longitudinal data , mixed model of repeated measures (MMRM), statistical power , common close design , Bayesian joint mixed effect model
1 Introduction
Pre-symptomatic (or Preclinical) Alzheimer’s Disease (PAD) is defined by evidence of abnormal levels of fibrillar amyloid beta in brain as measured by positron emission tomography (PET) brain scan or cerebrospinal fluid (CSF) assay [1]. Clinical trials have been initiated in this early phase of disease with the hope that, as in other diseases, early interventions will be more successful in slowing progression [2, 3, 4].
In PAD, progression is typically measured by continuous assessments such as the Preclinical Alzheimer’s Cognitive Composite (PACC), a cognitive performance assessment sensitive to amyloid-related decline [5]. An alternative measure of progression is transition from normal cognition to Mild Cognitive Impairment (MCI). The diagnosis of MCI is not algorithmic. It is based on an expert clinician’s subjective impression of clinical tests and interviews with participants or study partners. In contrast to cancer progression or death, the cognitive diagnosis (normal or MCI) can vary from one clinician to the next, or from one study visit to the next. In a multicenter study, the diagnosis made by a clinician at a trial performance site may be confirmed by experts centrally based on review of assessments without the benefit of direct in-person assessment.
Some researchers prefer the inherent clinical meaningfulness of time-to-MCI analysis. Undoubtedly, for a given subject, a transition from normal cognition to MCI is more clinically meaningful than a point change in a continuous cognitive performance measure. However, in a clinical trial, we are still left to determine how large a randomized group difference in the rate of, or delay in, a clinically meaningful event is itself clinically meaningful.
The typical Alzheimer’s clinical trial assesses cognition at clinic visits conducted every three or six months. With a continuous outcome, the primary contrast is estimated at the last scheduled visit, at 4.5 years say. Proponents of time-to-progression argue that the endpoint allows for a common close design, similar to oncology studies, in which follow-up can continue until the last subject enrolled reaches the 4.5 year visit. The Cox Proportional Hazards model [6] admits data collected under such a design. Linear mixed-effects models can also admit data from a common close design, but assumptions about the mean trend (e.g. quadratic time trends) are necessary, similar to the proportional hazards assumption.
Some related work has demonstrated the advantages of analyzing continuous outcomes, when available, over time-to-event outcomes in other contexts. Donohue, et al. [7] reviewed the literature and provided an analytic demonstration that, under general conditions, a mixed-effect model comparison of rate of change on a continuous outcome is effectively always more powerful than an analysis of time-to-threshold. The authors also conducted simulations based on Alzheimer’s Disease Neuroimaging Initiative (ADNI) MCI subjects and demonstrated that the marginal linear model and linear mixed models are more robust and efficient than the Cox model of time from MCI to dementia.
Our goal is to extend our earlier work in the MCI population [7] to the earlier biomarker defined PAD population. Specifically, we aim to compare the performance of models of repeated measures of the PACC versus time-to-progression when evaluating treatment effects in randomized trials, and to assess bias due to informative missingness. We also compare the common close design and the fixed follow-up design. We apply the Mixed Models of Repeated Measures (MMRM) [8] for the analysis of change in the PACC score. Constrained longitudinal data analysis (cLDA) models [9] are also used to model the PACC scores treating time as a continuous variable. Cox proportional hazards model is applied to time-to-event endpoint.
2 Data
ADNI is a prospective observational cohort study, led by Principal Investigator Michael W. Weiner, MD, which is tracking cognitive, imaging, and biofluid markers of Alzheimer’s in volunteers diagnosed as cognitively normal (CN), subjective memory concern (SMC), mild cognitive impairment (MCI) and mild-to-moderate dementia. To simulate both longitudinal continuous markers and time-to-MCI for a Preclinical AD (PAD) clinical trial, we first model the disease markers and clinical diagnosis using data from PAD ADNI participants. The PAD population is defined by a diagnosis of CN or SMC at baseline, and florbetapir PET standardized uptake value ratio (SUVR) above 1.11 [10] or CSF amyloid beta (A) below 950.6 pg/ml. The CSF threshold of 950.6 pg/ml was selected because it yields the same proportion of PAD as the 1.11 SUVR threshold. Follow-up observations, including a site clinician’s diagnosis of CN, MCI, or dementia, are collected every three, six, or 12 months. For more information on the study design of ADNI, including protocols, see adni.loni.usc.edu.
Sensitive tests of cognition may show changes in PAD many years before the onset of functional decline [5, 11]. In this work, we focus on seven cognitive outcomes in the PAD population, namely:
ADAS Delayed Word Recall (ADAS-DWR) [12], 2. 2.
Logical Memory Paragraph Recall (LogMem) [13], 3. 3.
Trail Making Test Part B (Trails B) [14], 4. 4.
Mini-Mental State Examination (MMSE) [15], 5. 5.
Category Fluency - Animals, 6. 6.
Clinical Dementia Rating - Sum of Boxes (CDRSB) [16] and 7. 7.
Functional Assessment Questionnaire (FAQ).
Baseline covariates considered include age and carriage of an apolipoportein E4 (APOE4) allele. The PAD population includes a total of 163 individuals, in which 39 (23.9%) were observed to progress to MCI over a median follow-up time of 4.0 years (interquartile range 2.1 to 5.6 years; maximum 11.5 years). Baseline characteristics of the modeled PAD cohort are presented in Table 1.
3 Methods
3.1 Joint mixed-effects model for longitudinal data
To a derive a model to simulate plausible data, we first fit a model to observed ADNI data. We apply a joint (or multivariate) mixed-effects model (JMM) to simultaneous model continuous longitudinal data for disease markers in the PAD population. The model respects the within-subject correlation over time and among the battery outcomes.
Suppose we have a set of subjects followed over a time interval . The th subject provides a set of longitudinal quantitative measurements \{$$y_{ijk}, , k=1,\cdots,p$$\} at time points \{$$t_{ijk}, , k=1,\cdots,p$$\}. Linear mixed-effects models are commonly used to model continuous longitudinal data. The multivariate mixed-effects model is specified as y_{ijk}=\textbf{x}_{ijk}^{\prime}\mbox{\boldmath\beta}_{k}+b_{0ik}+b_{1ik}t_{ijk}+\varepsilon_{ijk}, where \mbox{\boldmath\beta}_{k} are fixed-effect regression coefficients, and are the subject- and outcome-specific random intercept and slope for individual and outcome . The random effects are assumed to follow a multivariate Gaussian distribution with mean vector 0 and variance-covariance matrix with dimension , that is \left(b_{0i1},\cdots,b_{0ip},b_{1i1},\cdots,b_{1ip}\right)^{\prime}\sim\mathcal{N}\left(\textbf{0},\mbox{\boldmath\Sigma}\right). The model with multivariate random effects has the advantage of reflecting the dependency within subjects and among outcomes. The is a measurement error term, which accounts for outcome-specific variance.
Since the outcomes are in different scales, we transform the raw outcome measures into a quantile scale ranging from 0 to 1 (least impaired to most severe dementia). Quantiles are calculated using the empirical cumulative distribution function using weights that are inversely proportional to the number of observations from each diagnostic category for each outcome. The quantiles were then transformed by the inverse Gaussian quantile function resulting an approximate -score before submitting to the model. When simulating data from these models, the simulated -scores can then be back transformed to the original scale, which is integer valued for some outcomes.
Bayesian estimation is performed via Markov Chain Monte Carlo (MCMC) sampling using the stan_mvmer function in R package Rstanarm [17]. Because the stan_mvmer function is limited to a maximum of three outcomes, we have coded our own version allowing up to twenty outcomes (available from github.com/mcdonohue/rstanarm).
3.2 Random forest algorithm for diagnosis of MCI
In order to simulate a clinician’s diagnosis of MCI or worse impairment, we first use ADNI data to learn an algorithm to approximate this decision. The random forest algorithm [18] is an ensemble learning method for classification and regression. It operates by generating several decision trees and aggregating them. It provides reasonable and easily interpretable model when a large number of predictors are present in the data and enables applications with mixed data-types such as continuous and categorical data.
In our application, clinician diagnosis of normal cognition versus MCI or worse impairment is the binary outcome variable, and the seven continuous markers, age and education are the predictors. The model is fit using the R package randomForest [19]. The fitted model is then applied to simulated continuous outcomes to predict a clinician’s diagnosis.
3.3 Competing clinical trial models for continuous and time-to-event outcomes in simulation study
The simulated treatment effect on time-to-progression is modeled by the Cox proportional hazards model. For the continuous PACC, we consider MMRM and the the constrained longitudinal data analysis (cLDA) proposed by Liang and Zeger [9]. Like most likelihood-based approaches for longitudinal data, all three models assume any missing data are missing at random (MAR).
The PACC is used as the continuous outcome measure for the PAD trials simulation study. The version of the PACC used in the study is a composite of four assessments: ADAS-DWR, LogMem, log transformation of Trails B, and MMSE. Each of the four component scores is first centered by subtracting the baseline sample mean and then divided by the baseline sample standard deviation of that component, to form standardized -scores. These -scores are averaged to form the composite.
The MMRM for treats change from baseline in the PACC score as the outcome and baseline PACC as a predictor. It treats time as a categorical variable, which allows general mean trends in each group. MMRM has been extensively used for testing treatment effects at specific time points in clinical trials, since participants are often evaluated at a fixed and relatively small number of time points [20]. In our simulation study, the within-subject dependence is modeled by a first-order autoregressive covariance structure.
We also explore models that treat time as a continuous variable. In cLDA, the baseline outcome is treated as a response variable rather than a covariate, and constrained to have equal mean at baseline across treatment groups [21, 22]. We explore models with linear or quadratic time trends for each group.
3.4 Simulation set-up
We conduct a simulation study to evaluate the performance of the competing models described in Section 3.3. In each of 1000 simulated clinical trials with visits every 6 months from 0 to 8 years, a total of 1000 and 1500 patients are respectively randomized to either treatment or placebo in 1:1 ratio. We also assume the proportion of MCI progressors is 24% (based on ADNI data, as noted above).
For the placebo group, no changes will be made to the JMM fit to ADNI. For the treatment group, we will impose large (40% improvement on rate of change over the control), moderate (30% improvement), small (20% improvement) and null (same as the control) treatment effects on all outcomes. The PACC scores are calculated by taking the average of the four simulated component -scores.
To simulate non-ignorable missing data, three dropout categories are considered: intolerability, inefficacy and missing completed at random (MCAR). Participants having intolerability or inefficacy drop out from the study immediately after six and twelve months, respectively. For MCAR, we assume linear attrition rate of 5% per year for both the treatment and placebo groups. The simulated dropout rates are:
Treatment group:
Null: inefficacy (15%), intolerability (10%), MCAR (5%/year attrition rate);
- -
Alternative: inefficacy (8%), intolerability (10%), MCAR (5%/year attrition rate);
- -
Placebo group: inefficacy (15%), MCAR (5%/year attrition rate).
In order to assess bias due to missing data, we simulate complete data for every subject. The complete data is appropriately censored for the analysis of “observed” data, and left uncensored for analysis of the “complete” data. Completers and MCAR dropouts are assumed to have the same longitudinal mean profile within each treatment arm. Dropouts due to intolerability are simulated to have the expected benefit, on average, until dropout, followed by an “unobserved” benefit that is diminished by a factor of 15%. Dropouts due to inefficacy are simulated to have no benefit.
The four competing clinical trial models are MMRM, (linear) and (quadratic) for continuous PACC scores; and Cox for time-to-progression, with two baseline covariates: age at baseline and carriage of the APOE4allele. The Cox model will use all data observed out to 8 years until the last subject reaches the final scheduled visit under the common close design. We assume a linear enrollment rate such that enrollment is completed in 4 years and about half the subjects contribute “extra” common close follow-up in the 4.5 to 8 year range to the Cox model. The MMRM, and will only use data up to last scheduled visit, i.e., from 0 to 4.5 years.
We focus on “treatment policy” estimands of interest. The estimand will be the difference between randomized groups in the intention-to-treat population in terms of either: (I) Rate (hazard ratio) of progression to MCI/Dementia (Cox); (II) Group difference in PACC at final study time point (MMRM and ); or (III) Area between mean PACC curves (). We show how to carry out the hypothesis test of case (III) in the Appendix. Let denote the simulated PACC scores for subject randomized to group at time point , where , and . And represents the baseline time point, is the treatment group and is the placebo group. If the estimand of interest is the change from baseline at time , i.e., . The object is to estimate the between-treatment difference , where . A two-tailed test versus is carried out to evaluate whether treatment is different from placebo.
For each simulated dataset, we apply all four competing models to calculate point estimates of using the observed data (i.e., ) and the complete data (i.e., ). For each model, “bias” is calculated as the median of the 1000 point estimates of minus ; “bias in percent” is computed as the median of the 1000 points estimates of minus and then divided by . The interquartiles and are also summarized.
In a real clinical trial, the endpoint is measured for completers but is missing for those who either drop out from the study either due to inefficacy or intolerability or those who remain in the study after initiating rescue medication. Mehrotra, et al. [23] discussed that the commonly used MMRM with the embedded MAR assumption can deliver an exaggerated estimate of the aforementioned estimand of interest, in favor of the drug. This happens, in part, due to implicit imputation of an overly optimistic mean for dropouts in the treatment group. To remedy this, they proposed a formula-based two-step approach by treating the true endpoint distribution for treatment group as a mixture of distributions (one each for the completers and dropouts) rather than a single distribution. Their approach reduces the bias associated with the traditional MMRM while maintaining power. To increase the precision in estimating , we apply their method to MMRM, and models in the simulation study.
4 Results
4.1 JMM and random forest fit to ADNI data
We fit a JMM for PAD participants who were observed to progress to MCI and a separate JMM for those who did not progress. Seven outcome measures described in Section 2 are included in the model. Fixed effect covariates for each outcome include age at baseline and carriage of the APOE4allele. Three parallel Markov chains are run for 4000 iterations and the first 2000 warm-up iterations are discarded. Every fourth value of the remaining part of each chain is stored to reduce correlation, yielding a total of 1500 samples for posterior analysis. Table 2 shows the posterior means and 95% credible intervals of the covariate-effect parameters. Figure 1 shows the subject-level observations and predictions according to time in years of the seven markers for all individuals, in which the blue and red lines are the curves using the LOESS smoother. The bottom panel shows that the predictions provide reasonable trends of the observations. The posterior estimates from JMM will be later used as the true parameter values to simulate the panel of continuous markers.
For the random forest, 500 trees are fitted and the number of variables selected at each split is 3. The node impurity of each tree is measured by the Gini index. The results show that CDRSB, LogMem and FAQ are three most important outcomes for determining the diagnosis of MCI. The model has a 6.19% out-of-bag error rate and 93.81% out-of-bag accuracy rate. Using the fitted random forest, the simulated cognitive status can be obtained from the simulated continuous markers. Figure 2 shows the Kaplan-Meier estimated progression rate of the ADNI-PAD population (black solid line) along with the progression rate from one large simulated placebo group (red dots). The simulated progression yields closer concordance with the Kaplan-Meier estimates at the earlier stage. Although we observe discrepancies between the two lines in the middle and the right tail, the red line still lies within the 95% confidence intervals. Both the subject-level trajectories and the progression rate illustrate that the simulated data plausibly mimics the observed data.
4.2 Simulation results
Figure 3 shows the results of one simulated clinical trial with a 20% treatment effect and sample size . The figure illustrates the group trends obtained by fitting the four different models.
Simulated power and Type I error are summarized in Table 3. Under the null hypothesis (no treatment effect), the MMRM exhibits smaller than expected Type I error (about 2%), whereas the other models are closer to the expect 5% error rate. The Cox model consistently exhibits the weakest power of the four models. MMRM has the next best performance, followed by the quadratic (cLDA2) and linear (cLDA1) models. For example, with a trial of size 1,000 subjects of drug with a 30% treatment effect, the simulated power is 33% for Cox, 79% for MMRM, 86% for cLDA2, and 96% for cLDA1. In comparing analysis of complete versus observed data, it seems the missing data does not increase Type I error, but it does inflate power. This suggests the bias is only an issue with an effective drug, in which case the effectiveness might appear inflated. Figure 4 shows the powers in all scenarios.
Tables 4 and 5 further examine the bias induced by the missing data pattern. The tables summarize the median and interquartile ranges of the bias on the PACC scale (Tables 4) and as a percent of effect seen in complete data (5). The Cox model seems to have smaller bias with 20% treatment effect, but as the treatment grows, the bias is comparable for all models. The method proposed by Mehrotra, et al. [23] successfully shrinks the magnitude of bias, e.g. from 27% in favor of treatment to -4.4% in favor of placebo for MMRM with 20% treatment effect. The method appears to overcorrect the bias in favor of placebo in these simulations.
5 Discussion
We use Bayesian joint mixed effects models fit using ADNI data to simulate correlated longitudinal data that might plausibly arise in a PAD clinical trial. We used a random forest algorithm, also fit using ADNI, to algorithmically diagnose MCI in the simulated data so that we could compare models of the PACC to the Cox model of time-to-progression. The models of PACC consistently provide at least twice the power of the Cox model even when the Cox model has the benefit of considerably more follow-up under a common close design. Given this inefficiency, the time-to-progression analysis should be avoided in PAD.
Some might still argue that the clinical meaningfulness of the time-to-progression is worth the cost of a larger, longer trial. However, given that the random forest provided a purely algorithmic diagnosis with 93.81% out-of-bag accuracy suggests that there is minimal additional value in the diagnosis. And again, while the progression outcome is more qualitative than the PACC on the subject level, the group level result is still quantitative (e.g. a hazard ratio) and requires additional interpretation to assign clinical meaning.
One might also argued that clinical diagnosis cannot be adequately modelled algorithmically using trial data. That is, clinical assessment and diagnosis by a trial site clinician may consider information not captured by trial measures. But the cognitive, clinical and functional assessments are designed to capture the relevant information, and clinicians generally rely on similar information obtained through less structured assessments. It seems questionable that a site clinician will gain much reliable information beyond the assessments; indeed, this is the justification for central expert panel adjudication of site diagnoses.
The Bayesian joint models are well-suited to simulating plausible panels of correlated longitudinal data necessary to compare clinical trial designs. This approach could be useful in many other contexts where one is interested in a fair comparison of different outcome measures, different combinations of correlated outcomes, or different models of treatment effect. Simulations which ignore the correlations among important outcomes will likely not provide reliable comparisons.
All of the models considered were susceptible to bias induced by a plausible missing data pattern. However, this bias seemed to only affect scenarios with an effective treatment and did not inflate Type I error under the null hypothesis. The Mehrotra method shows promise in correcting this bias, but it might overcorrect in favor of placebo, and it would be impossible to detect this overcorrection in practice. Given that Type I error is not inflated, we are inclined to suggest no change to the status quo approach in which the primary analysis is based on likelihood-based methods which are robust to MAR, and applying appropriate MNAR sensitivity analyses such as the delta method[24].
Conflicts of interest
The authors declare no potential conflicts of interest.
Acknowledgments
We are grateful to the ADNI study volunteers and their families.
This work was supported by National Institute on Aging grant R01-AG049750. Data collection and sharing for this project was funded by the ADNI (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research provided funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
Appendix
For cLDA model with quadratic time effects, we can write the part of fixed effects as
[TABLE]
The area between the curves of active group and placebo group is
[TABLE]
The null hypothesis is : . We use the R package glht to carry out the hypothesis test.
References
- [1]
R. A. Sperling, P. S. Aisen, L. A. Beckett, D. A. Bennett, S. Craft, A. M. Fagan, T. Iwatsubo, C. R. Jack Jr, J. Kaye, T. J. Montine, et al., Toward defining the preclinical stages of Alzheimer’s disease: Recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease, Alzheimer’s & dementia 7 (3) (2011) 280–292.
- [2]
R. A. Sperling, D. M. Rentz, K. A. Johnson, J. Karlawish, M. Donohue, D. P. Salmon, P. Aisen, The A4 study: stopping AD before symptoms begin?, Science translational medicine 6 (228) (2014) 228fs13–228fs13.
- [3]
ClinicalTrials.gov, An efficacy and safety study of atabecestat in participants who are asymptomatic at risk for developing Alzheimer’s dementia (EARLY), Tech. rep., National Library of Medicine (US), Bethesda, MD (2015 Oct 6 - 2018 Sept 25).
URL https://clinicaltrials.gov/ct2/show/NCT02569398
- [4]
A. Caputo, A. Racine, I. Paule, E. P. Martens, P. Tariot, J. B. Langbaum, R. G. Thomas, S. Hendrix, J. M. Ryan, C. Lopez-Lopez, et al., Rationale for selection of primary endpoints in the Alzheimer Prevention Initiative Generation study in cognitively healthy APOE4 homozygotes, Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association 13 (7) (2017) P1452.
- [5]
M. C. Donohue, R. A. Sperling, D. P. Salmon, D. M. Rentz, R. Raman, R. G. Thomas, M. Weiner, P. S. Aisen, The Preclinical Alzheimer Cognitive Composite: Measuring amyloid-related decline, JAMA neurology 71 (8) (2014) 961–970.
- [6]
D. R. Cox, Regression models and life-tables, Journal of the Royal Statistical Society: Series B (Methodological) 34 (2) (1972) 187–202.
- [7]
M. Donohue, A. Gamst, R. Thomas, R. Xu, L. Beckett, R. Petersen, M. Weiner, P. Aisen, A. D. N. Initiative, et al., The relative efficiency of time-to-threshold and rate of change in longitudinal data, Contemporary clinical trials 32 (5) (2011) 685–693.
- [8]
C. H. Mallinckrodt, T. M. Sanger, S. Dubé, D. J. DeBrota, G. Molenberghs, R. J. Carroll, W. Z. Potter, G. D. Tollefson, Assessing and interpreting treatment effects in longitudinal clinical trials with missing data, Biological psychiatry 53 (8) (2003) 754–760.
- [9]
K.-Y. Liang, S. L. Zeger, Longitudinal data analysis of continuous and discrete responses for pre-post designs, Sankhya: The Indian Journal of Statistics, Series B 62 (1) (2000) 134–148.
- [10]
S. M. Landau, M. M. A., A. D. Joshi, R. A. Koeppe, R. C. Petersen, P. S. Aisen, M. W. Weiner, W. J. Jagust, Amyloid deposition, hypometabolism, and longitudinal cognitive decline, Annals of Neurology 72 (4) (2012) 578–586.
- [11]
M. C. Donohue, R. A. Sperling, R. Petersen, C.-K. Sun, M. W. Weiner, P. S. Aisen, Association between elevated brain amyloid and subsequent cognitive decline among cognitively normal persons, JAMA 317 (22) (2017) 2305–2316.
- [12]
R. C. Mohs, L. Cohen, Alzheimer’s Disease Assessment Scale (ADAS)., Psychopharmacology bulletin 24 (4) (1988) 627.
- [13]
D. Wechsler, WMS-R: Wechsler Memory Scale–Revised: Manual (1987).
- [14]
T. N. Tombaugh, Trail Making Test A and B: normative data stratified by age and education, Archives of clinical neuropsychology 19 (2) (2004) 203–214.
- [15]
M. F. Folstein, S. E. Folstein, P. R. McHugh, Mini-mental state: a practical method for grading the cognitive state of patients for the clinician, Journal of psychiatric research 12 (3) (1975) 189–198.
- [16]
J. C. Morris, The Clinical Dementia Rating (CDR): current version and scoring rules., Neurology.
- [17]
B. Goodrich, J. Gabry, I. Ali, S. Brilleman, rstanarm: Bayesian applied regression modeling via Stan., r package version 2.17.4 (2018).
- [18]
L. Breiman, Random forests, Machine Learning 45 (1) (2001) 5–32.
- [19]
A. Liaw, M. Wiener, Classification and regression by randomForest, R News 2 (3) (2002) 18–22.
URL https://CRAN.R-project.org/doc/Rnews/
- [20]
O. Siddiqui, H. J. Hung, R. O’Neill, Mmrm vs. locf: a comprehensive comparison based on simulation study and 25 nda datasets, Journal of Biopharmaceutical Statistics 19 (2) (2009) 227–246.
- [21]
G. F. Liu, K. Lu, R. Mogg, M. Mallick, D. V. Mehrotra, Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials?, Statistics in Medicine 28 (2009) 2509–2530.
- [22]
K. Lu, On efficiency of constrained longitudinal data analysis versus longitudinal analysis of covariance, Biometrics 66 (3) (2010) 891–896.
- [23]
D. V. Mehrotra, F. Liu, T. Permutt, Missing data in clinical trials: control-based mean imputation and sensitivity analysis, Pharmaceutical Statistics 16 (5) (2017) 378–392.
- [24]
D. B. Rubin, Formalizing subjective notions about the effect of nonrespondents in sample surveys, Journal of the American Statistical Association 72 (359) (1977) 538–543.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. A. Sperling, P. S. Aisen, L. A. Beckett, D. A. Bennett, S. Craft, A. M. Fagan, T. Iwatsubo, C. R. Jack Jr, J. Kaye, T. J. Montine, et al., Toward defining the preclinical stages of Alzheimer’s disease: Recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease, Alzheimer’s & dementia 7 (3) (2011) 280–292.
- 2[2] R. A. Sperling, D. M. Rentz, K. A. Johnson, J. Karlawish, M. Donohue, D. P. Salmon, P. Aisen, The A 4 study: stopping AD before symptoms begin?, Science translational medicine 6 (228) (2014) 228fs 13–228fs 13.
- 3[3] Clinical Trials.gov, An efficacy and safety study of atabecestat in participants who are asymptomatic at risk for developing Alzheimer’s dementia (EARLY) , Tech. rep., National Library of Medicine (US), Bethesda, MD (2015 Oct 6 - 2018 Sept 25). URL https://clinicaltrials.gov/ct 2/show/NCT 02569398
- 4[4] A. Caputo, A. Racine, I. Paule, E. P. Martens, P. Tariot, J. B. Langbaum, R. G. Thomas, S. Hendrix, J. M. Ryan, C. Lopez-Lopez, et al., Rationale for selection of primary endpoints in the Alzheimer Prevention Initiative Generation study in cognitively healthy APOE 4 homozygotes, Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association 13 (7) (2017) P 1452.
- 5[5] M. C. Donohue, R. A. Sperling, D. P. Salmon, D. M. Rentz, R. Raman, R. G. Thomas, M. Weiner, P. S. Aisen, The Preclinical Alzheimer Cognitive Composite: Measuring amyloid-related decline, JAMA neurology 71 (8) (2014) 961–970.
- 6[6] D. R. Cox, Regression models and life-tables, Journal of the Royal Statistical Society: Series B (Methodological) 34 (2) (1972) 187–202.
- 7[7] M. Donohue, A. Gamst, R. Thomas, R. Xu, L. Beckett, R. Petersen, M. Weiner, P. Aisen, A. D. N. Initiative, et al., The relative efficiency of time-to-threshold and rate of change in longitudinal data, Contemporary clinical trials 32 (5) (2011) 685–693.
- 8[8] C. H. Mallinckrodt, T. M. Sanger, S. Dubé, D. J. De Brota, G. Molenberghs, R. J. Carroll, W. Z. Potter, G. D. Tollefson, Assessing and interpreting treatment effects in longitudinal clinical trials with missing data, Biological psychiatry 53 (8) (2003) 754–760.
