Disease Progression Modeling and Prediction through Random Effect   Gaussian Processes and Time Transformation

Marco Lorenzi; Maurizio Filippone; Daniel C. Alexander; Sebastien; Ourselin

arXiv:1701.01668·stat.AP·February 16, 2018

Disease Progression Modeling and Prediction through Random Effect Gaussian Processes and Time Transformation

Marco Lorenzi, Maurizio Filippone, Daniel C. Alexander, Sebastien, Ourselin

PDF

TL;DR

This paper introduces a novel Gaussian process-based model for disease progression that jointly analyzes imaging and clinical biomarkers, effectively handling missing data and providing uncertainty quantification for predictions in neurodegenerative disorders.

Contribution

The paper presents a flexible, Bayesian Gaussian process model that incorporates individual effects and time reparameterization for disease progression analysis.

Findings

01

Accurately models Alzheimer's pathology evolution over the disease span.

02

Demonstrates high predictive performance on clinical cohort data.

03

Effectively handles missing observations with uncertainty quantification.

Abstract

The development of statistical approaches for the joint modelling of the temporal changes of imaging, biochemical, and clinical biomarkers is of paramount importance for improving the understanding of neurodegenerative disorders, and for providing a reference for the prediction and quantification of the pathology in unseen individuals. Nonetheless, the use of disease progression models for probabilistic predictions still requires investigation, for example for accounting for missing observations in clinical data, and for accurate uncertainty quantification. We tackle this problem by proposing a novel Gaussian process-based method for the joint modeling of imaging and clinical biomarker progressions from time series of individual observations. The model is formulated to account for individual random effects and time reparameterization, allowing non-parametric estimates of the biomarker…

Tables3

Table 1. Table 1: Mean (sd) R 2 superscript 𝑅 2 R^{2} correlation coefficient across folds between estimated individual time-shifts and ground truth time reference.

		$N = 20$				$N = 100$
		$σ$				$σ$
		0.1	0.2	0.3	0.4	0.1	0.2	0.3	.4
$N_{b}$	4	.95 (.03)	.86 (.08)	.71 (.17)	.46 (.29)	.91 (.04)	.89(.04)	.76 (.17)	.75 (.12)
$N_{b}$	8	.97 (.01)	.91 (.06)	.86 (.06)	.66 (.3)	.94 (.04)	.94 (.02)	.88 (.06)	.84 (.07)

Table 2. Table 2: Baseline clinical and sociodemographic information for the study cohort. NA: no observations available for the considered group. NL: normal individuals, MCI: mild cognitive impairment, AD: Alzheimer’s patients.

Group	N	Age	Sex (% females)	ADAS13	Hippo volume ( $m m^{3}$ )	AV45
Training data
NL	76	75.8 (6)	53	9.5 (4.4)	7358 (762)	1.24 (0.2)
MCI conv	57	72.7 (7)	42	20.3 (6.8)	6464 (861)	1.44 (0.2)
AD	21	72.7 (10)	43	29.3 (8.7)	5872 (988)	1.4 (0.2)
Testing data
NL	30	77.5 (6)	43	11.1 (4.1)	7137 (800)	1.03 (0.14)
MCI stable	164	73.4 (6.9)	39	15.5 (6.19)	7028 (1009)	1.28 (0.2)
MCI conv	71	75.4 (6.7)	40	21.3 (5.4)	5882 (8644)	NA
AD	98	74.7 (8)	40	28.6 (8)	5709 (1105)	1.59 (0.1)

Table 3. Table 3: Average follow-up years and percentage of individuals with missing data for each biomarker (in parenthesis).

Ventr	Hippo	Ent	Whole Brain	ADAS13	FAQ	RAVLT	AV45	FDG
Training data
2.4 (0)	2.4 (0)	2.4 (0)	2.4 (0)	3.2 (0)	3.2 (0)	3.2 (0)	1.3 (0)	1.9 (0)
Testing data
1.8 (.1)	1.8 (.1)	1.8 (.1)	1.8 (.1)	2.4 (0)	2.5 (0)	2.4 (0)	1.3 (65)	1.6 (31)

Equations31

y^{j} (t) = (y_{b_{1}}^{j} (t), y_{b_{2}}^{j} (t), \dots, y_{b_{N_{b}}}^{j} (t))^{⊤}

y^{j} (t) = (y_{b_{1}}^{j} (t), y_{b_{2}}^{j} (t), \dots, y_{b_{N_{b}}}^{j} (t))^{⊤}

Σ_{G} (f, f)

Σ_{G} (f, f)

Σ_{b} (f_{b} (t_{1}), f_{b} (t_{2}))

Σ_{b} (f_{b} (t_{1}), f_{b} (t_{2}))

Σ_{S} (ν^{j}, ν^{j}) = diag [Σ_{b_{1}}^{j} (ν_{b_{1}}^{j}, ν_{b_{1}}^{j}), Σ_{b_{2}}^{j} (ν_{b_{2}}^{j}, ν_{b_{2}}^{j}), \dots, Σ_{b_{N_{b}}^{j}} (ν_{b_{N_{b}}}^{j}, ν_{b_{N_{b}}}^{j})],

Σ_{S} (ν^{j}, ν^{j}) = diag [Σ_{b_{1}}^{j} (ν_{b_{1}}^{j}, ν_{b_{1}}^{j}), Σ_{b_{2}}^{j} (ν_{b_{2}}^{j}, ν_{b_{2}}^{j}), \dots, Σ_{b_{N_{b}}^{j}} (ν_{b_{N_{b}}}^{j}, ν_{b_{N_{b}}}^{j})],

y^{j} (ϕ^{j} (τ)) = f (ϕ^{j} (τ)) + ν^{j} (ϕ^{j} (τ)) + ϵ .

y^{j} (ϕ^{j} (τ)) = f (ϕ^{j} (τ)) + ν^{j} (ϕ^{j} (τ)) + ϵ .

p (m (t) ∣ \dot{f} (t))

p (m (t) ∣ \dot{f} (t))

p (f, \dot{f}, ν^{j} ∣ y, m)

p (f, \dot{f}, ν^{j} ∣ y, m)

= p (f, \dot{f} ∣ t, t^{'}) p (ν ∣ t) p (y ∣ f, ν) k \prod l \prod Φ (\frac{1}{λ} \dot{f}_{b_{k}} (t_{l}^{'})),

p (f, \dot{f}, ν^{j} ∣ t, t^{'})

p (f, \dot{f}, ν^{j} ∣ t, t^{'})

f_{j o in t} = (f \dot{f})

q (f, \dot{f}, ν^{j} ∣ y^{j}, m)

q (f, \dot{f}, ν^{j} ∣ y^{j}, m)

\tilde{μ}_{j o in t} = (y \tilde{μ})

\tilde{μ}_{j o in t} = (y \tilde{μ})

q_{- k^{'} l^{'}} (\dot{f}_{b_{k^{'}}} (t_{l^{'}}^{'}))

q_{- k^{'} l^{'}} (\dot{f}_{b_{k^{'}}} (t_{l^{'}}^{'}))

lo g L

lo g L

μ^{*}

μ^{*}

Σ^{*}

p (t^{*} ∣ y^{*}, y, t, m, t^{'}) = p (y^{*} ∣ t^{*}, y, t, m, t^{'}) p (t^{*}) / p (y^{*} ∣ y, t, m, t^{'}),

p (t^{*} ∣ y^{*}, y, t, m, t^{'}) = p (y^{*} ∣ t^{*}, y, t, m, t^{'}) p (t^{*}) / p (y^{*} ∣ y, t, m, t^{'}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

11institutetext: Asclepios Research Project, Université Côte d’Azur, Inria, France 22institutetext: Centre for Medical Image Computing, University College London, UK 33institutetext: Department of Data Science, EURECOM, France

Disease Progression Modeling and Prediction through Random Effect Gaussian Processes and Time Transformation

Marco Lorenzi 1122

Maurizio Filippone 33

Daniel C. Alexander 22

Sebastien Ourselin for the Alzheimer’s Disease Neuroimaging Initiative Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (www.loni.usc.edu/ADNI). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: www.loni.usc.edu/ADNI/Collaboration/ADNI_Authorship_list.pdf22

Abstract

The development of statistical approaches for the joint modelling of the temporal changes of imaging, biochemical, and clinical biomarkers is of paramount importance for improving the understanding of neurodegenerative disorders, and for providing a reference for the prediction and quantification of the pathology in unseen individuals. Nonetheless, the use of disease progression models for probabilistic predictions still requires investigation, for example for accounting for missing observations in clinical data, and for accurate uncertainty quantification. We tackle this problem by proposing a novel Gaussian process-based method for the joint modeling of imaging and clinical biomarker progressions from time series of individual observations. The model is formulated to account for individual random effects and time reparameterization, allowing non-parametric estimates of the biomarker evolution, as well as high flexibility in specifying correlation structure, and time transformation models. Thanks to the Bayesian formulation, the model naturally accounts for missing data, and allows for uncertainty quantification in the estimate of evolutions, as well as for probabilistic prediction of disease staging in unseen patients. The experimental results show that the proposed model provides a biologically plausible description of the evolution of Alzheimer’s pathology across the whole disease time-span as well as remarkable predictive performance when tested on a large clinical cohort with missing observations.

1 Introduction

Neurodegenerative disorders (NDDs), such as Alzheimer’s disease (AD), are characterised by the progressive pathological alteration of the brain’s biochemical processes and morphology, and ultimately lead to the irreversible impairment of cognitive functions. The correct understanding of the relationship between the different pathological features is of paramount importance for improving the identification of pathological changes in patients, and for better treatment. To this end, the recent availability of collections of imaging and clinical data in NDDs is a unique opportunity to define statistical models for the joint modelling of the temporal changes of imaging, biochemical, and clinical biomarkers.

The goal of disease progression modeling in NDDs is twofolds: 1) quantifying the dynamics of NDDs along with the related temporal relationship between different biomarkers, and 2) staging patients based on individual observations for diagnostic and interventional purposes. The related challenge is in the definition of optimal methods to integrate and jointly analyze the heterogeneous and highly multi-modal information available to clinicians. Moreover, longitudinal clinical datasets of NDDs generally lack of a well-defined temporal reference, since the onset of the pathology may vary across individuals according to genetic and environmental factors [12]. Therefore, age or visit date information are biased time references for the individual longitudinal measurements.

To tackle this problem, in [3] the authors proposed to model the temporal biomarker trajectories as a random effect regression problem, building on the well established theory of self-modeling regression [6]. Practically, the trajectories are modelled by monotonic B-splines, and the estimation is performed by subsequent minimization of the partial residuals sum of squares associated with regression parameters and individual time shift, respectively. Based on the assumption of a logistic curve shape for the average biomarker trajectories, [11] frames the random effect regression model in a Riemannian setting, in which the random effects identify individual time-shift and acceleration factors. Finally, image-based progression models have been recently proposed [13, 2], based on time-reparameterization of voxel/mesh-based measures. While the main focus of current works mainly concerns the modeling of neurodegeneration, the use of disease progression models for predictive purposes is much less investigated. Predictive models of patient staging were proposed within the setting of event based models [4], or still through random effect modeling [5]. However, the event based model relies on the coarse binary discretization of the biomarker changes, and does not account for longitudinal observations, while the model proposed in [5] requires cohorts with known disease onset, and therefore does not easily generalise to the study of general clinical populations.

In this study we propose an unified approach to the consistent disease progression modeling and probabilistic prediction of clinical data, by introducing a Bayesian regression framework of imaging and clinical biomarker progressions from time series of individual observations. The model is based on Gaussian process (GP) regression, and is formulated to account for individual random effects and time reparameterization, as well to naturally account for missing observations. The methodological contribution of this study consists in reformulating disease progression modeling within a Bayesian context, allowing non-parametric estimates of the biomarker evolution, as well as high flexibility in specifying random effects structure, and time reparameterization models. The uniqueness of the time transformation is enforced by imposing a monotonicity constraint on the trajectories, via a prior on the temporal derivatives, while the model is computationally efficient thanks to the block-wise algebraic structure of the GP covariance function. From the application point of view, the proposed approach enables novel applications of disease progression modeling such as the probabilistic prediction of disease staging in unseen patients. Moreover, the model naturally accounts for missing data, and allows for uncertainty quantification of both evolutions and parameters.

Section 2 introduces the proposed GP regression model of joint temporal progression of biomarkers, along with the prior derivatives specification. The resulting posterior and approximated inference scheme through expectation propagation (EP) is described in Section 3. Finally, Section 4 illustrates the validation of our model on i) synthetic data and ii) clinical and multivariate imaging measurements from a cohort of 517 amyloid positive individuals of the ADNI database. The experimental results show that our model provides a biologically plausible description of the evolution of AD across the whole disease time-span, as well as statistically significant prediction of disease staging and high classification accuracy on unseen and incomplete testing data.

2 Gaussian process-based random effect modeling of longitudinal progressions

In what follows, longitudinal measurements of $N_{b}$ biomarkers $\{b_{1},\ldots,b_{N_{b}}\}$ over time are given for $N$ individuals.

We represent the longitudinal biomarker’s measures associated with each individual $j$ as a multidimensional array $(\mathbf{y}^{j}({t_{1}}),\mathbf{y}^{j}({t_{2}}),\ldots,\mathbf{y}^{j}({t_{k^{j}}}))^{\top}$ sampled at $k^{j}$ multiple time points $\mathbf{t}=\{{t}_{1},{t}_{2},\ldots,{t}_{k^{j}}\}$ . Although different biomarkers may be in reality sampled at different time-points, for sake of notation simplicity in what follows we will assume, without loss of generality, that the sampling time is common among them. The observations for individual $j$ at a single time point $t$ are thus a random sample from the following generative model:

[TABLE]

where $\mathbf{f}({t})=({{f}_{b_{1}}}({t}),{{f}_{b_{2}}}({t})\ldots,{{f}_{b_{N_{b}}}}({t}))^{\top}$ is the fixed effect function modelling the biomarker’s longitudinal evolution, $\boldsymbol{\nu}^{j}({t})=({\nu}_{b_{1}}^{j}({t}),{\nu}_{b_{2}}^{j}({t}),\ldots,{\nu}_{b_{N_{b}}}^{j}({t}))^{\top}$ is the individual random effect, and $\boldsymbol{\epsilon}=({\epsilon}_{b_{1}},{\epsilon}_{b_{2}},\ldots,{\epsilon}_{b_{N_{b}}})^{\top}$ is observational noise independent from time. The group-wise evolution is modelled as a zero-mean GP, $\mathbf{f}\sim\mathcal{GP}(0,\Sigma_{G})$ , the individual random effects are assumed to be Gaussian distributed correlated signal $\boldsymbol{\nu}^{j}\sim\mathcal{N}(0,\Sigma_{S})$ , while the observational noise is assumed to be a Gaussian heteroskedastic term $\boldsymbol{\epsilon}\sim\mathcal{N}(0,\Sigma_{\epsilon})$ , where $\Sigma_{\epsilon}$ is a diagonal matrix $\mbox{diag}[\boldsymbol{\sigma}_{b1}^{2},\boldsymbol{\sigma}_{b2}^{2},\ldots,\boldsymbol{\sigma}_{b_{N_{b}}}^{2}]$ .

Fixed Effect Process. The covariance function $\Sigma_{G}$ describes the biomarkers temporal variability, and is represented as a block-diagonal matrix

[TABLE]

where each block represents the within-biomarker temporal covariance expressed as a negative squared exponential function:

[TABLE]

and where the parameters $\eta_{b}$ and $l_{b}$ are the marginal variance and length-scale of the biomarker’s temporal evolution, respectively.

Individual Random Effects. The random covariance function $\Sigma_{S}$ models the individual deviation from the fixed effect, and is represented as a block-diagonal matrix

[TABLE]

where each block $\Sigma_{b}^{j}$ corresponds to the covariance function associated with the individual process $\boldsymbol{\nu}_{b}^{j}({t})$ . Thanks to the flexibility of the proposed generative model, any form of the random effect covariance $\Sigma_{S}$ can be easily specified in order to model the subject-specific biomarkers’ progression. In what follows we will use a linear covariance form $\Sigma_{b}^{j}(\boldsymbol{\nu}_{b}^{j}(t_{1}),\boldsymbol{\nu}_{b}^{j}(t_{2}))=(\sigma_{b}^{j})^{2}\left((t_{1}-\overline{\mathbf{t}})(t_{2}-\overline{\mathbf{t}})\right)$ , where $\overline{\mathbf{t}}$ is the average observational time for individual $j$ , if more than 4 measurements were available, and i.i.d. Gaussian covariance form $\Sigma_{b}^{j}(\boldsymbol{\nu}_{b}^{j}(t_{1}),\boldsymbol{\nu}_{b}^{j}(t_{2}))=(\sigma_{b}^{j})^{2}$ if 2 or 3 measurements were available, while assigning it to 0 otherwise. This choice is motivated by stability concerns, in order to keep the model complexity compatible with the generally limited number of measurements available for each individual.

Individual time transformation. The generative model (1) is based on the key assumption that the longitudinal observations across different individuals are defined with respect to the same temporal reference. This assumption may be invalid when the temporal alignment of the individual observations with respect to the common group-wise model is unknown, for instance in the typical scenario of a clinical trial in AD where the patients’ observational time is relative to the common baseline, and where the disease onset is a latent event (past or future) which is not directly measurable. We assume that each individual measurement is made with respect to an absolute time-frame $\tau$ through a time-warping function $t=\phi^{j}(\tau)$ that models the time-reparameterization with respect to the common group-wise evolution. Model (1) can thus be reparameterized as

[TABLE]

The present formulation allows the specification of any kind of time transformation, and in what follows we shall focus on the modelling of a linear reparameterization of the observational time $\phi^{j}(\tau)=\tau+d^{j}$ . This modeling assumption is mostly motivated by the choice of working with a reasonably limited number of parameters, compatibly with the generally short follow-up time available per individual (cfr. Table 3). Within this setting, the time-shift $d^{j}$ encodes the disease stage associated with the individual relatively to the group-wise model.

Overall, model (4) is identified by $(N_{j}+3)N_{b}+N_{j}$ parameters, represented by the fixed effects and noise $\boldsymbol{\theta}_{G}=\{\eta_{b_{k}},l_{b_{k}},\epsilon_{b_{k}}\}_{k=1}^{N_{b}}$ , by the individual random effects parameters $\boldsymbol{\theta}_{G}^{j}=\{\sigma_{b_{k}}^{j}\}_{k=1}^{N_{b}}$ and by the time-shifts $d^{j}$ .

Monotonic constraint in multimodal GP regression. Due to the non-parametric nature of Gaussian process regression, we need an additional constraint on model (4) in order to identify a unique solution for the time-warp. By assuming a steady temporal evolution of biomarkers from normal to pathological values, we shall assume that the biomarker trajectories described by (4) follow a (quasi) monotonic behaviour. This requirement can be implemented by imposing a prior positivity constraint on the derivatives of the GP function $f$ . Inspired by [10], we impose a monotonicity constraint by assuming a probit-likelihood for the derivative measurements $\mathbf{m}(t)$ associated with the derivative process $\mathbf{\dot{f}}(t)=\frac{\mathrm{d}\mathbf{f}(t)}{\mathrm{d}t}$ at time $t$ :

[TABLE]

with $\Phi(z)=\int_{-\infty}^{z}\mathcal{N}(x|0,1)\,dx$ . The quantity $\lambda>0$ is an additional model parameter controlling the degree of positivity enforced on the derivative process, with values approaching to zero for stronger monotonicity constraint. In what follows, the monotonicity of each biomarker is controlled by placing 10 derivative points equally spaced on the observation domain, and by fixing the $N_{b}$ derivative parameters $\{\lambda_{b_{k}}\}_{k=1}^{N_{b}}$ to the value of $e$ -6. This choice for the parameter is motivated by the need to enforce a strong monotonicity constraint, necessary for the stability of the initial time-shift estimation.

On adding a monotonic constraint to the individual observations. By following a similar construction, we could equally enforce a monotonic behaviour to the random effects associated with the individual trajectories. This additional constraint would however come with a cumbersome increase of the model complexity, since it would introduce an additional layer of virtual derivative parameters (with associated location) per individual. Moreover, while we are interested in modeling a globally monotonic biomarker trajectory on the fixed parameters, we relax this constraint at the individual level, since some subjects may be characterised by non strictly monotonic time-series due to specific clinical conditions.

3 Joint Model: marginal likelihood and inference

Given the sets of individual biomarker measurements $\mathbf{y}=\{(\mathbf{y}^{j}(t_{i}))_{i=1}^{k^{j}}\}_{j=1}^{N}$ , and of $D$ control derivatives $\mathbf{m}=\{m_{b_{k}}(t_{l}^{\prime})\}_{l=1}^{D}$ at points $t^{\prime}=\{t_{l}^{\prime}\}_{l=1}^{D}$ for the progression of each biomarker $b_{k}$ , the random effect GP model posterior is:

[TABLE]

where $\boldsymbol{\nu}=\{\mathbf{\nu}^{j}\}_{j=1}^{N}$ . Thanks to the linearity of GPs under derivation, we have that $Cov\left(\mathbf{f}(t),\mathbf{\dot{f}}(t^{\prime})\right)=\frac{\mathrm{d}Cov(\mathbf{f}(t),\mathbf{f}(t^{\prime}))}{\mathrm{d}t^{\prime}}$ , and that the joint distribution $p\left(\mathbf{f},\mathbf{\dot{f}}|t,t^{\prime}\right)$ is again a GP:

[TABLE]

3.1 Approximated inference with Expectation Propagation

Due to the non-Gaussianity of the derivative likelihood term, the direct inference on the posterior (6) is not possible due to its analytically intractable form. For this reason, we employ an approximate inference scheme based on expectation propagation (EP) [9, 8]. Following [10], we compute an approximated posterior distribution $q\left(\mathbf{f},\mathbf{\dot{f}},\boldsymbol{\nu}^{j}|\mathbf{y}^{j},\mathbf{m}\right)$ by replacing the derivative likelihood terms with local un-normalized Gaussian approximations:

[TABLE]

where $\prod_{k}\prod_{l}\tilde{Z}_{kl}\mathcal{N}(\dot{f}_{b_{k}}(t_{l}^{\prime})|\tilde{\mu}_{kl},\tilde{\sigma}^{2}_{kl})=\mathcal{N}(\boldsymbol{\tilde{\mu}},\tilde{\Sigma})\prod_{k,l}\tilde{Z}_{kl}$ , with $\boldsymbol{\tilde{\mu}}=[\tilde{\mu}_{kl}]$ , and $\tilde{\Sigma}$ is a diagonal matrix with elements $\tilde{\sigma}^{2}_{kl}$ . It follows that the marginal posterior has a Gaussian form, $q\left(\mathbf{f},\mathbf{\dot{f}},\boldsymbol{\nu}^{j}|\mathbf{y}^{j},\mathbf{m}\right)\sim\mathcal{N}(\boldsymbol{\mu},\Sigma)$ , with $\boldsymbol{\mu}=\Sigma\tilde{\Sigma}^{-1}\boldsymbol{\tilde{\mu}}_{joint}$ , and $\Sigma=(\Sigma_{joint}^{-1}+\tilde{\Sigma}_{joint}^{-1})^{-1}$ , where

[TABLE]

3.1.1 Estimating the EP parameters.

The EP update of the local Gaussian approximation parameters is classically done by iterative moment matching with respect to the product between the cavity distributions $q_{-k^{\prime}l^{\prime}}\left(\dot{f}_{b_{k^{\prime}}}(t^{\prime}_{l^{\prime}})\right)$ and the target likelihood term $\Phi\left(\frac{1}{\lambda}\dot{f}_{b_{k^{\prime}}}(t^{\prime}_{l^{\prime}})\right)$ . In the GP case the cavity distribution has a straightforward Gaussian form:

[TABLE]

As shown in [10] for univariate monotonic regression, moments and updates of the approximation parameters can be computed in an analogous manner as in the classical GP classification problem [9].

3.2 Marginal Likelihood and hyper-parameter estimation

The model’s log-marginal likelihood under the EP approximation is:

[TABLE]

In what follows, the optimal parameters are obtained by maximising $\log\mathcal{L}$ through conjugate gradient descent, via alternate optimization between the hyper-parameters $\boldsymbol{\theta}_{G}$ and $\boldsymbol{\theta}_{G}^{j}$ , and the individuals’ time-shifts $d^{j}$ . The position of the derivative points was updated at each iteration, according to the changes of the GP domain. Regularisation was also enforced by introducing Gaussian priors for the parameters $\boldsymbol{\theta}_{G}$ and $\boldsymbol{\theta}_{G}^{j}$ . We note that the block structure of the GP covariance allows the computation of the gradients with respect to the biomarkers’ and individual parameters by working on matrices of much smaller dimension than the one of the whole GP, thus considerably improving the numerical stability and the computational efficiency of the optimization procedure.

3.3 Prediction of observations and individual staging

Gaussian processes naturally allow probabilistic predictions given the observed data. At any given time point $t^{*}$ , the posterior biomarker distribution has the Gaussian form $p(\mathbf{f}^{*}|t^{*},\mathbf{y},t,\mathbf{m},t^{\prime})\sim\mathcal{N}(\mathbf{f}^{*}|\boldsymbol{\mu}^{*},\Sigma^{*})$ with parameters:

[TABLE]

We also derive a probabilistic model for the individual temporal staging given a set of biomarker observations $\mathbf{y}^{*}$ , thanks to the Bayes formula:

[TABLE]

which we compute by assuming an uniform distribution on $t^{*}$ , and by noting that $p(\mathbf{y}^{*}|t^{*},\mathbf{y},t,\mathbf{m},t^{\prime})\leavevmode\nobreak\ \sim\mathcal{N}(\boldsymbol{\mu}^{*},\Sigma^{*}+\Sigma_{\epsilon})$ . In particular, the joint covariance form $\Sigma_{G}(\mathbf{f}(t^{*}),\mathbf{f}(t^{*}))$ can be specified in order to account for incomplete data, and thus generalizes the GP model for predictions in presence of missing biomarker observations.

4 Experiments

4.1 Synthetic multivariate progressions

We benchmarked the model with respect to synthetic multivariate biomarker progressions. We generated random multivariate sigmoid functions for $N_{b}$ biomarkers, $\mathbf{f}(\tau)=(f_{b_{1}}(\tau),f_{b_{2}}(\tau),\ldots,f_{b_{N_{b}}}(\tau))^{\top}$ , with $f_{b_{k}}(\tau)=1/(1+\exp(-\alpha_{k}\tau))$ , $\tau\in[0,15]$ and $\alpha_{k}\sim\mathcal{N}(0,.06)$ , and we sampled $N$ individual noisy trajectories at time points $\tau_{k}^{j}$ : $\mathbf{y}_{k}^{j}(\tau_{k}^{j})=f_{k}(\tau_{k}^{j})+\epsilon$ , $\epsilon\sim\mathcal{N}(0,\sigma^{2})$ . For each individual we used the same initial sampling time point for every biomarker, while the number of samples per biomarker was allowed to independently vary between 1 and 4. The individual time points were subsequently centered by their mean $\mu_{k}^{j}$ to obtain shifted time-points $t_{k}^{j}=\tau_{k}^{j}-\mu_{k}^{j}$ defined in the interval $[-2,2]$ .

The model was applied to estimate biomarker progressions and individual time-shifts with respect to different combinations of trajectory noise $\sigma$ , sample size $N$ , and number of biomarkers $N_{b}$ . The accuracy of the model in reconstructing the original time series was quantified by Pearson’s correlation between the estimated time-shift $d^{j}$ and the original individual time reference. The experiments were repeated 10 times for each configuration of parameters $\sigma\in\{0.1,0.2,0.3,0.4\}$ , $N_{b}\in\{4,8\}$ , and $N\in\{20,100\}$ .

Results. Table 1 reports summary correlations between time-shift estimation and the ground truth individual sampling time. The correlation values are generally high, and increase with lower noise levels. Interestingly, the increase in number of modelled biomarkers is associated with a better performance in recovering the underlying disease staging. We also observe that larger sample sizes are associated with higher correlation values, especially with increasing noise levels. We note however an exception for the case $\sigma=0.1$ where, although the overall performance is still high, the correlation slightly decreases with $N=100$ .

4.2 Longitudinal modelling of Alzheimer’s disease progression

We collected longitudinal measurements for the ADNI individuals with baseline values of CSF A $\beta$ amyloid lower than the nominal values of 192 pg/ml. This preliminary selection is aimed to validate the model on a clinical population likely to represent the whole disease time-span.

The model was trained on a group including normal individuals, mild cognitive impairment subjects converted to AD (MCI conv), and AD patients having at least one measurement for each of the following biomarkers: volumetric measures (hippocampal, ventricular, entorhinal, and whole brain volumes), glucose metabolism (average normalized FDG uptake in prefrontal cortex, anterior cingulate, precuneus and parietal cortex), brain amyloidosys (average normalized AV45 uptake in frontal cortex, anterior cingulate, precuneus and parietal cortex), and cognitive function measured by common cognitive questionnaires (ADAS13, RAVLT learning score, FAQ). The testing set was composed by the remaining subjects with at least a missing biomarker, as well as by the subgroup of MCI non converted to AD during the observational time (MCI stable). The volumetric measures were scaled by the individual total intracranial volume, and all the biomarkers measurements were converted into quantile scores (0 to 1 for normal to abnormal values). Table 2 shows baseline clinical and sociodemographic information of the individuals used respectively in training and testing set, while in Table 3 we report the average follow-up time and the ratio of missing data of the pooled sample.

The model was applied in order to estimate the temporal biomarker evolution and the disease stage associated with each individual in training and testing set. The plausibility of the model was assessed by i) group-wise comparison of the predicted time-shift, ii) prediction of conversion to AD in the MCI testing group, and iii) correlation with respect to the time to AD diagnosis for the MCI individuals subsequently converted to AD. We finally compared the progression modelled with our approach with respect to the one estimated with the method proposed in [3]. The method was applied to the training data by using the standard parameters defined in the R package grace111https://mdonohue.bitbucket.io/grace/.

Results. The estimated biomarker progression (Figure 1) shows a biologically plausible description of the pathological evolution, compatible with previous findings in longitudinal studies in familial AD [1]. The progression is defined on a time scale spanning roughly 20 years, and is characterized at the initial stages by high-levels of AV45, followed by an increase in ventricles volume and abnormality of FDG uptake. These latter measures are however heterogeneously distributed across clinical groups, and with rather large variability. The evolution is further characterized by increasing abnormality of the volumetric measures, and finally by the steady worsening of neuropsychological scores such as FAQ.

Figure 2 shows the posterior predicted distribution of the individual time shift. Healthy individuals are associated with the early stages of the pathology in both training and testing data, while MCI and AD patients are characterized by respectively intermediate and late predicted progression stages. The group-wise comparison between the expected time-shifts is statistically significant between each group pairs ( $p$ $<$ 1e- $4$ ). Furthermore, the temporal positioning of the non converting MCI lies between controls and MCI converters: when using the temporal cut-off based on the $10^{th}$ quantile of the time-shift distribution in the training AD population ( $t=3.6$ years) we obtained an accuracy of $0.84$ for the discrimination between MCI converters and stable in the testing data. We also measured a negative correlation with respect to the time to AD diagnosis in training, testing, and pooled MCI converter groups, with $R^{2}$ respectively equal to $-0.20\,(p=0.1)$ , $-0.28\,(p=0.01)$ , and $-0.28\,(p=0.001)$ . Finally, when applying [3] to the training data we measured a strong correlation between the resulting individual time-shifts and those obtained with our method ( $R^{2}$ = $0.89$ , $p$ $<$ 1e- $16$ ). Nevertheless, our estimates provided consistently larger effect sizes for the group-wise separation: (1.96,1.36,0.57) with our methods and (1.74, 1.18, 0.47) with grace for AD vs NL, MCI vs NL, and AD vs MCI, respectively.

5 Conclusions

We proposed a unified GP-based approach to disease progression modeling from time-series of biomarker measurements enabling novel applications beyond the state-of-art, such as the probabilistic prediction of disease staging in unseen patients. Furthermore, the model naturally accounts for missing data, and provides uncertainty quantification of the biomarker evolutions. The model provided remarkable modeling and predictive performance when tested on a large clinical cohort, and thus represents a promising instrument for the analysis of clinical trials data.

Similarly to [3], in this work we focused on the modeling of disease staging represented by a time translation. However, the proposed framework can naturally account for more complex time transformations, provided that a sufficient number of time points is available for each individual. Future extensions of this model will focus on the quantification of the effect of each biomarkers in the predictive performance, for example by integrating feature selection based on automatic relevance determination. Moreover the present work can be extended to model differential progressions underlying sub-pathologies, by framing the proposed random effect regression within the realm of Gaussian process mixture models. Finally, thanks to the flexibility of our framework, further extension of the model will enable to integrate within (2) a spatio-temporal covariance model (such as the efficient Kronecker form of [7]), to provide a unified framework for jointly modelling time series of images and scalar biomarkers data in a coherent fully Bayesian setting.

6 Acknowledgments

EPSRC grants EP/J020990/01 and EP/M020533/1 support DCA and SO’s work on this topic. ML, DCA and SO also received support from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 666992 (EuroPOND) for this work. MF gratefully acknowledges support from the AXA Research Fund.

Bibliography13

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Bateman, R.J., Xiong, C., Benzinger, T.L., et al.: Clinical and biomarker changes in dominantly inherited Alzheimer’s disease. New England Journal of Medicine 367(9), 795–804 (2012)
2[2] Bilgel, M., Jedynak, B., Wong, D.F., Resnick, S.M., Prince, J.L.: Temporal trajectory and progression score estimation from voxelwise longitudinal imaging measures: Application to amyloid imaging. In: IPMI. pp. 424–436. Springer (2015)
3[3] Donohue, M.C., Jacqmin-Gadda, H., Le Goff, M., et al.: Estimating long-term multivariate progression from short-term data. Alzheimer’s & Dementia 10(5), S 400–S 410 (2014)
4[4] Fonteijn, H.M., Clarkson, M.J., Modat, M., et al.: An event-based disease progression model and its application to familial Alzheimer’s disease. In: IPMI. pp. 748–759. Springer (2011)
5[5] Guerrero, R., Schmidt-Richberg, A., Ledig, C., Tong, T., Wolz, R., Rueckert, D.: Instantiated mixed effects modeling of Alzheimer’s disease markers. Neuro Image 142, 113–125 (2016)
6[6] Kneip, A., Gasser, T.: Convergence and consistency results for self-modeling nonlinear regression. The Annals of Statistics pp. 82–112 (1988)
7[7] Lorenzi, M., Ziegler, G., Alexander, D.C., Ourselin, S.: Efficient Gaussian process-based modelling and prediction of image time series. In: IPMI. pp. 626–637. Springer (2015)
8[8] Minka, T.P.: Expectation propagation for approximate bayesian inference. In: Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence. pp. 362–369. Morgan Kaufmann Publishers Inc. (2001)