Quantifying the recency of HIV infection using multiple longitudinal   biomarkers

Loumpiana Koulai; Anne Presanis; Gary Murphy; Barbara Suligoi and; Daniela De Angelis

arXiv:1706.02508·stat.AP·June 9, 2017

Quantifying the recency of HIV infection using multiple longitudinal biomarkers

Loumpiana Koulai, Anne Presanis, Gary Murphy, Barbara Suligoi and, Daniela De Angelis

PDF

Open Access

TL;DR

This paper introduces a Bayesian mixed effects modeling approach to quantify the recency of HIV infection using multiple biomarkers, improving estimation accuracy by joint modeling of correlated biomarkers.

Contribution

It develops a novel Bayesian framework for characterizing biomarker growth patterns and estimating infection recency at an individual level, incorporating multiple biomarkers for better accuracy.

Findings

01

Joint biomarker models outperform univariate models in recency estimation.

02

Biomarker growth rate significantly influences estimation accuracy.

03

Simulation results demonstrate improved predictive performance with combined biomarkers.

Abstract

Knowledge of the time at which an HIV-infected individual seroconverts, when the immune system starts responding to HIV infection, plays a vital role in the design and implementation of interventions to reduce the impact of the HIV epidemic. A number of biomarkers have been developed to distinguish between recent and long-term HIV infection, based on the antibody response to HIV. To quantify the recency of infection at an individual level, we propose characterising the growth of such biomarkers from observations from a panel of individuals with known seroconversion time, using Bayesian mixed effect models. We combine this knowledge of the growth patterns with observations from a newly diagnosed individual, to estimate the probability seroconversion occurred in the X months prior to diagnosis. We explore, through a simulation study, the characteristics of different biomarkers that affect…

Tables1

Table 1. Table 1 : Parameter values for generating univariate and bivariate outcomes under the realistic scenario.

Model	Mean	Variance-Covariance	Measurement error
		matrix of random effects
Antibody Response
AR1	$μ_{β^{A R 1}} = (5, 2)$	$Σ_{β^{A R 1}} = (\begin{matrix} 0.5000 & - 0.1900 \\ - 0.1900 & 0.2000 \end{matrix})$	$σ_{ϵ^{A R 1}} = 0.1000$
AR2	$μ_{β^{A R 2}} = (0, - 1, 1)$	$Σ_{β^{A R 2}} = (\begin{matrix} 0.0000 & 0.0000 & 0.0000 \\ 0.0000 & 0.2000 & - 0.0850 \\ 0.0000 & - 0.0850 & 0.4000 \end{matrix})$	$σ_{ϵ^{A R 2}} = 0.0500$
AR3	$μ_{β^{A R 3}} = (0, - 1.5, 0.5)$	$Σ_{β^{A R 3}} = Σ_{β^{A R 2}}$	$σ_{ϵ^{A R 3}} = 0.0500$
AR4	$μ_{β^{A R 4}} = (1.5, - 1.5, 0.8)$	$Σ_{β^{A R 4}} = (\begin{matrix} 0.0000 & 0.0000 & 0.0000 \\ 0.0000 & 0.4000 & - 0.1470 \\ 0.0000 & - 0.1470 & 0.6000 \end{matrix})$	$σ_{ϵ^{A R 4}} = 0.0500$
Joint model AR1 & AR4	$μ_{𝜷} = (1.5, - 1.5, 0.8, 5, 2)$	$Σ_{𝜷} = (\begin{matrix} 0.0000 & 0.0000 & 0.0000 & 0.0000 & 0.0000 \\ 0.0000 & 0.4000 & - 0.1470 & 0.0450 & - 0.0280 \\ 0.0000 & - 0.1470 & 0.6000 & - 0.0550 & 0.1730 \\ 0.0000 & 0.0450 & - 0.0550 & 0.5000 & - 0.1900 \\ 0.0000 & - 0.0280 & 0.1730 & - 0.1900 & 0.2000 \end{matrix})$	$Σ_{ϵ} = (\begin{matrix} 0.0025 & 0 \\ 0 & 0.0100 \end{matrix})$
Viral load
VL	$μ_{β^{V L}} = (3, 2)$	$Σ_{β^{V L}} = (\begin{matrix} 1.0000 & 0.3536 \\ 0.3536 & 0.5000 \end{matrix})$	$σ_{ϵ^{V L}} = 0.2000$
Joint model AR4 & VL	$μ_{𝜷} = (1.5, - 1.5, 0.8, 3, 2)$	$Σ_{𝜷} = (\begin{matrix} 0.0000 & 0.0000 & 0.0000 & 0.0000 & 0.0000 \\ 0.0000 & 0.4000 & - 0.1470 & 0.0630 & 0.1340 \\ 0.0000 & - 0.1470 & 0.6000 & 0.2320 & 0.0550 \\ 0.0000 & 0.0630 & 0.2320 & 1.0000 & 0.3536 \\ 0.0000 & 0.1340 & 0.0550 & 0.3536 & 0.5000 \end{matrix})$	$Σ_{ϵ} = (\begin{matrix} 0.0025 & 0 \\ 0 & 0.0400 \end{matrix})$

Equations24

y_{ij}^{k}=g(t_{ij}+\tau_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k})+\epsilon_{ij}^{k}

y_{ij}^{k}=g(t_{ij}+\tau_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k})+\epsilon_{ij}^{k}

g(t_{ij}+\tau_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k})=\beta_{1i}^{k}+\beta_{2i}^{k}(t_{ij}+\tau_{i})

g(t_{ij}+\tau_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k})=\beta_{1i}^{k}+\beta_{2i}^{k}(t_{ij}+\tau_{i})

g(t_{ij}+\tau_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k})=\beta_{1i}^{k}+(\beta_{2i}^{k}-\beta_{1i}^{k})*exp\big{(}-exp(\beta_{3i}^{k})(t_{ij}+\tau_{i})\big{)}

g(t_{ij}+\tau_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k})=\beta_{1i}^{k}+(\beta_{2i}^{k}-\beta_{1i}^{k})*exp\big{(}-exp(\beta_{3i}^{k})(t_{ij}+\tau_{i})\big{)}

g(t_{ij}+\tau_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k})=\beta_{1i}^{k}\Big{(}1+exp\big{(}-\beta_{2i}^{k}(t_{ij}+\tau_{i})\big{)}\Big{)}

g(t_{ij}+\tau_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k})=\beta_{1i}^{k}\Big{(}1+exp\big{(}-\beta_{2i}^{k}(t_{ij}+\tau_{i})\big{)}\Big{)}

\begin{pmatrix}\mbox{ \boldmath$\!y\!$ \unboldmath}_{i}^{1}\\ \mbox{ \boldmath$\!y\!$ \unboldmath}_{i}^{2}\end{pmatrix}=\begin{pmatrix}\mbox{ \boldmath$\!g_{1}\!$ \unboldmath}(\tau_{i},\mbox{ \boldmath$\!t\!$ \unboldmath}_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{1})\\ \mbox{ \boldmath$\!g_{2}\!$ \unboldmath}(\tau_{i},\mbox{ \boldmath$\!t\!$ \unboldmath}_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{2})\end{pmatrix}+\begin{pmatrix}\mbox{ \boldmath$\!\epsilon\!$ \unboldmath}_{i}^{1}\\ \mbox{ \boldmath$\!\epsilon\!$ \unboldmath}_{i}^{2}\end{pmatrix}

\begin{pmatrix}\mbox{ \boldmath$\!y\!$ \unboldmath}_{i}^{1}\\ \mbox{ \boldmath$\!y\!$ \unboldmath}_{i}^{2}\end{pmatrix}=\begin{pmatrix}\mbox{ \boldmath$\!g_{1}\!$ \unboldmath}(\tau_{i},\mbox{ \boldmath$\!t\!$ \unboldmath}_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{1})\\ \mbox{ \boldmath$\!g_{2}\!$ \unboldmath}(\tau_{i},\mbox{ \boldmath$\!t\!$ \unboldmath}_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{2})\end{pmatrix}+\begin{pmatrix}\mbox{ \boldmath$\!\epsilon\!$ \unboldmath}_{i}^{1}\\ \mbox{ \boldmath$\!\epsilon\!$ \unboldmath}_{i}^{2}\end{pmatrix}

\mbox{ \boldmath$\!g_{k}\!$ \unboldmath}(\tau_{i},\mbox{ \boldmath$\!t\!$ \unboldmath}_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k})=\begin{pmatrix}g_{k}(t_{i1}+\tau_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k})\\ g_{k}(t_{i2}+\tau_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k})\\ \vdots\\ g_{k}(t_{in_{i}}+\tau_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k})\\ \end{pmatrix},\qquad k=1,2.

\mbox{ \boldmath$\!g_{k}\!$ \unboldmath}(\tau_{i},\mbox{ \boldmath$\!t\!$ \unboldmath}_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k})=\begin{pmatrix}g_{k}(t_{i1}+\tau_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k})\\ g_{k}(t_{i2}+\tau_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k})\\ \vdots\\ g_{k}(t_{in_{i}}+\tau_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k})\\ \end{pmatrix},\qquad k=1,2.

\displaystyle f(\mbox{ \boldmath$\!y\!$ \unboldmath}_{i}^{k}|\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k},\mbox{ \boldmath$\!t\!$ \unboldmath}_{i},\tau_{i},\sigma_{\epsilon^{k}})

\displaystyle f(\mbox{ \boldmath$\!y\!$ \unboldmath}_{i}^{k}|\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k},\mbox{ \boldmath$\!t\!$ \unboldmath}_{i},\tau_{i},\sigma_{\epsilon^{k}})

\displaystyle=(2\pi)^{-\frac{n_{i}}{2}}\absolutevalue{\Sigma_{i}}^{\frac{1}{2}}e^{-\frac{1}{2}\big{(}\mbox{ \boldmath$\!y\!$ \unboldmath}_{i}^{k}-\mbox{ \boldmath$\!g_{k}\!$ \unboldmath}(\tau_{i},\mbox{ \boldmath$\!t\!$ \unboldmath}_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k})\big{)}^{T}(\Sigma_{i})^{-1}\big{(}\mbox{ \boldmath$\!y\!$ \unboldmath}_{i}^{k}-\mbox{ \boldmath$\!g_{k}\!$ \unboldmath}(\tau_{i},\mbox{ \boldmath$\!t\!$ \unboldmath}_{i},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k})\big{)}}

\begin{split}p(\mbox{ \boldmath$\!\Theta_{1}\!$ \unboldmath}|\mbox{ \boldmath$\!y\!$ \unboldmath}^{k})\propto&\prod_{i=1}^{n}\Big{\{}f(\mbox{ \boldmath$\!y\!$ \unboldmath}_{i}^{k}|\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k},\mbox{ \boldmath$\!t\!$ \unboldmath}_{i},\tau_{i},\sigma_{\epsilon^{k}})\pi(\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k}|\mu_{\beta^{k}},\Sigma_{\beta^{k}})\pi(\tau_{i})\Big{\}}\times\pi(\mu_{\beta^{k}})\pi(\Sigma_{\beta^{k}})\pi(\sigma_{\epsilon^{k}})\end{split}

\begin{split}p(\mbox{ \boldmath$\!\Theta_{1}\!$ \unboldmath}|\mbox{ \boldmath$\!y\!$ \unboldmath}^{k})\propto&\prod_{i=1}^{n}\Big{\{}f(\mbox{ \boldmath$\!y\!$ \unboldmath}_{i}^{k}|\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k},\mbox{ \boldmath$\!t\!$ \unboldmath}_{i},\tau_{i},\sigma_{\epsilon^{k}})\pi(\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{k}|\mu_{\beta^{k}},\Sigma_{\beta^{k}})\pi(\tau_{i})\Big{\}}\times\pi(\mu_{\beta^{k}})\pi(\Sigma_{\beta^{k}})\pi(\sigma_{\epsilon^{k}})\end{split}

\begin{split}f(\mbox{ \boldmath$\!y\!$ \unboldmath}_{i}^{1},\mbox{ \boldmath$\!y\!$ \unboldmath}_{i}^{2}|\mbox{ \boldmath$\!\beta_{i}\!$ \unboldmath}^{1},\mbox{ \boldmath$\!\beta_{i}\!$ \unboldmath}^{2},\mbox{ \boldmath$\!t\!$ \unboldmath}_{i},\tau_{i},\Sigma_{\epsilon})=(2\pi)^{-\frac{n_{i}}{2}}\absolutevalue{\Sigma_{\epsilon}}^{\frac{1}{2}}e^{-\frac{1}{2}\big{(}Q_{i}^{T}\Sigma_{\epsilon}^{-1}Q_{i}\big{)}}\end{split}

\begin{split}f(\mbox{ \boldmath$\!y\!$ \unboldmath}_{i}^{1},\mbox{ \boldmath$\!y\!$ \unboldmath}_{i}^{2}|\mbox{ \boldmath$\!\beta_{i}\!$ \unboldmath}^{1},\mbox{ \boldmath$\!\beta_{i}\!$ \unboldmath}^{2},\mbox{ \boldmath$\!t\!$ \unboldmath}_{i},\tau_{i},\Sigma_{\epsilon})=(2\pi)^{-\frac{n_{i}}{2}}\absolutevalue{\Sigma_{\epsilon}}^{\frac{1}{2}}e^{-\frac{1}{2}\big{(}Q_{i}^{T}\Sigma_{\epsilon}^{-1}Q_{i}\big{)}}\end{split}

p(\mbox{ \boldmath$\!\Theta_{2}\!$ \unboldmath}|\mbox{ \boldmath$\!y\!$ \unboldmath}^{1},\mbox{ \boldmath$\!y\!$ \unboldmath}^{2})\propto\prod_{i=1}^{n}\Big{\{}f(\mbox{ \boldmath$\!y\!$ \unboldmath}_{i}^{1},\mbox{ \boldmath$\!y\!$ \unboldmath}_{i}^{2}|\mbox{ \boldmath$\!\beta_{i}\!$ \unboldmath}^{1},\mbox{ \boldmath$\!\beta_{i}\!$ \unboldmath}^{2},\mbox{ \boldmath$\!t\!$ \unboldmath}_{i},\tau_{i},\Sigma_{\epsilon})\pi(\tau_{i})\pi(\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{1},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{2}|\mu_{\mbox{ \boldmath$\!\beta\!$ \unboldmath}},\Sigma_{\mbox{ \boldmath$\!\beta\!$ \unboldmath}})\Big{\}}\pi(\mu_{\mbox{ \boldmath$\!\beta\!$ \unboldmath}})\pi(\Sigma_{\mbox{ \boldmath$\!\beta\!$ \unboldmath}})\pi(\Sigma_{\epsilon})

p(\mbox{ \boldmath$\!\Theta_{2}\!$ \unboldmath}|\mbox{ \boldmath$\!y\!$ \unboldmath}^{1},\mbox{ \boldmath$\!y\!$ \unboldmath}^{2})\propto\prod_{i=1}^{n}\Big{\{}f(\mbox{ \boldmath$\!y\!$ \unboldmath}_{i}^{1},\mbox{ \boldmath$\!y\!$ \unboldmath}_{i}^{2}|\mbox{ \boldmath$\!\beta_{i}\!$ \unboldmath}^{1},\mbox{ \boldmath$\!\beta_{i}\!$ \unboldmath}^{2},\mbox{ \boldmath$\!t\!$ \unboldmath}_{i},\tau_{i},\Sigma_{\epsilon})\pi(\tau_{i})\pi(\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{1},\mbox{ \boldmath$\!\beta\!$ \unboldmath}_{i}^{2}|\mu_{\mbox{ \boldmath$\!\beta\!$ \unboldmath}},\Sigma_{\mbox{ \boldmath$\!\beta\!$ \unboldmath}})\Big{\}}\pi(\mu_{\mbox{ \boldmath$\!\beta\!$ \unboldmath}})\pi(\Sigma_{\mbox{ \boldmath$\!\beta\!$ \unboldmath}})\pi(\Sigma_{\epsilon})

p (τ_{n}, Θ ∣ y^{k}, t, τ_{1 : (n - 1)})

p (τ_{n}, Θ ∣ y^{k}, t, τ_{1 : (n - 1)})

\times f (y_{n}^{k} ∣ β_{n}^{k}, t_{n}, τ_{n}, σ_{ϵ^{k}}^{2}) π (β_{n}^{k} ∣ μ_{β^{k}}, Σ_{β^{k}}) π (τ_{n}) π (μ_{β^{k}}) π (Σ_{β^{k}}) π (σ_{ϵ^{k}})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHIV Research and Treatment · HIV/AIDS Research and Interventions · HIV-related health complications and treatments

Full text

Quantifying the recency of HIV infection using multiple longitudinal biomarkers

Loumpiana Koulai, Anne Presanis, Gary Murphy, Barbara Suligoi

and Daniela De Angelis

Knowledge of the time at which an HIV-infected individual seroconverts, when the immune system starts responding to HIV infection, plays a vital role in the design and implementation of interventions to reduce the impact of the HIV epidemic. A number of biomarkers have been developed to distinguish between recent and long-term HIV infection, based on the antibody response to HIV. To quantify the recency of infection at an individual level, we propose characterising the growth of such biomarkers from observations from a panel of individuals with known seroconversion time, using Bayesian mixed effect models. We combine this knowledge of the growth patterns with observations from a newly diagnosed individual, to estimate the probability seroconversion occurred in the X months prior to diagnosis. We explore, through a simulation study, the characteristics of different biomarkers that affect our ability to estimate recency, such as the growth rate. In particular, we find that predictive ability is improved by using joint models of two biomarkers, accounting for their correlation, rather than univariate models of single biomarkers.

1 Introduction

Following infection with the Human Immunodeficiency Virus (HIV), the immune system responds by producing anti-HIV antibodies of different types at different stages from infection [5], culminating in what is known as seroconversion, i.e the time at which antibodies are detectable in blood serum. CD4 counts and viral load traditionally have been used as prognostic biomarkers of HIV progression [18, 22] but have been less successful for estimating time since infection, due to their non-monotonic behaviour and the difficulty of observing individuals at the early stages of infection . In recent years, focussing instead on the antibody response, a number of serological assays, able to detect different aspects of this diverse response, have been developed with the goal of distinguishing recent from long-standing infections. Typically, for a specified biomarker, a threshold is chosen and HIV positive individuals with a measured optical density (OD) below the threshold are classified as recently infected (see [31, 26, 17, 13, 11, 12] and references therein). This classification has been used to estimate HIV incidence at population level [10, 19]. At an individual level, however, this dichotomization does not allow a clear quantification of the recency of infection. The type of statement that can be made on recency can only be on average, based on knowledge of the mean time taken to cross the chosen threshold from seroconversion.

The key question of interest is whether it is possible to make probabilistic statements at individual-level about the recency of HIV infection. Can biomarker measurements for a newly diagnosed individual, combined with knowledge of the natural evolution of the biomarker, be used to infer the probability, $P_{X}$ , that an individual has seroconverted in the $X$ months prior to diagnosis? Antibody-response biomarkers increase monotonically and approach a plateau over time. An example is the Architect Avidity [31], whose growth pattern is shown in Figure 1. These data are routinely collected from HIV-positive patients attending clinics in Italy. These patients have known (or well-estimated) seroconversion times, and at each clinic visit following diagnosis, one or more biomarkers are measured. For such a panel of individuals, the growth pattern of each biomarker observed can be estimated. Figure 1 shows each individual’s OD values of Architect Avidity against time since seroconversion, with the estimated mean growth curve in blue. Given such growth curves and observations on a newly diagnosed individual, how well can we estimate the seroconversion probability $P_{X}$ ?

$P_{X}$ can be derived from the distribution of an individual’s seroconversion time. Up to now, little attention has been paid to developing methods for estimating individual seroconversion times. Traditionally, the midpoint between the last negative and the first positive HIV test date has been adopted as an estimate [1]. A number of authors have considered the use of markers of immune response to improve the estimation of seroconversion time. [23, 24] fit a Weibull model to the known seroconversion times of seroconverters, using CD4 counts as a covariate. The fitted model is then used to impute the unknown seroconversion times for seroprevalent individuals for whom CD4 counts are available. [7] develop a Bayesian model for estimating the conditional distribution of time since seroconversion given CD4 counts at the time of the first positive HIV test. More recently, [29] model the evolution of two non-linearly evolving biomarkers by using the same functional form for each biomarker, deriving parameter estimates through a maximum likelihood approach. Resulting estimates are then used via Bayes’ rule to estimate the distribution of infection time for each individual in their sample. Similarly, [15] uses a Bayesian bivariate non-linear mixed-effects model, with the same functional form for each antibody-response biomarker, to estimate the average time spent in the recent infection state. More recently, [2] models separately the level of a measured biomarker and presence/absence of recency, assuming that they are independent. These two sources of information are combined into one conditional probability that the time of infection is recent. However, in reality, levels or presence of different biomarkers may be correlated and this correlation should be taken into account in the estimation process.

Our aim is to explore the feasibility of using a limited number of serial measurements of one or more biomarkers to quantify the recency of HIV infection for any newly diagnosed patient. Univariate linear and non-linear mixed-effect models to describe the growth patterns of antibody response and viral load biomarkers are given in Section 2.1. Joint non-linear mixed-effects models of bivariate biomarkers are given in Section 2.2. We evaluate the performance of single and multiple intrinsically correlated biomarkers in estimating the probability $P_{X}$ of having seroconverted in the $X$ months prior to HIV diagnosis through a simulation study (Section 3). Biomarkers with different growth patterns are investigated to evaluate the impact of particular characteristics, such as the growth rate, on the accuracy of the estimation. Results are reported in Section 4 and we end with a discussion in Section 5.

2 Biomarker models

Let $y_{ij}^{k}$ denote the observed measurement of the random variable $Y^{k}$ representing the $k^{th}$ biomarker, for the $i^{th}$ individual at the $j^{th}$ observation time, $t_{ij}$ , where $i=1,\dots,n$ , $j=1,\dots,n_{i}$ , $k=1,\dots,K$ . Assume that the available data for $n$ individuals (see Figure 2) also include the dates of the last negative and the first positive HIV test, $t_{i}^{-ve}$ and $t_{i}^{+ve}$ respectively. The interval $[t_{i}^{-ve},t_{i}^{+ve}]$ is the interval within which individual $i$ has seroconverted, with length $sc_{i}=t_{i}^{+ve}-t_{i}^{-ve}$ . Note that $\tau_{i}$ is the time from seroconversion to $t_{i}^{+ve}$ and $T_{ij}^{*}=\tau_{i}+t_{ij}$ is the time from seroconversion to the $j^{th}$ measurement.

2.1 Single outcome models

Suppose that a single outcome $k$ is measured on each individual $i$ at each time point $t_{ij}$ from the first positive HIV test date. The observed longitudinal trajectories of biomarker $k$ can be modelled as

[TABLE]

where $\epsilon_{ij}^{k}\sim N(0,\sigma_{\epsilon^{k}}^{2})$ represent normally distributed measurement errors. Function $g(\cdot)$ represents the true underlying values of biomarker $k$ and depends on the time since seroconversion $(t_{ij}+\tau_{i})$ and random effects $\mbox{ \boldmath$ !\beta! $\unboldmath}_{i}^{k}$ , that are normally distributed with mean 0 and variance-covariance matrix $\Sigma_{\beta^{k}}$ . Different functional forms of $g(\cdot)$ can be used to capture the underlying evolution of a biomarker of interest. In what follows markers of antibody-response and viral presence will be considered.

2.1.1 Antibody response

Antibody response may evolve linearly over time since seroconversion [14]. Such evolution can be represented by a linear mixed-effects model with random intercept $\beta_{1i}^{k}$ and random slope $\beta_{2i}^{k}$ [6, 27, 8]:

[TABLE]

The intercept represents the value of the biomarker at seroconversion and the slope the growth rate.

Alternatively, and more commonly, antibody response follows a non-linear trajectory [suligoi2002, 31, 17, 13]. The three-parameter non-linear function used by Sweeting [30] could be adopted to describe the growth of monotonically increasing biomarkers:

[TABLE]

This function has intercept $\beta_{2i}^{k}$ and approaches an asymptote $\beta_{1i}^{k}$ over a period of time. The parameter $\beta_{3i}^{k}$ is the logarithm of the rate constant, representing the growth rate for each individual $i$ .

2.1.2 Viral presence

Viral presence is thought to be exponentially decreasing and approaching a plateau within a short period after seroconversion [21, 22, 16, 28], as the immune response starts controlling the infection. A two-parameter exponential decay function could be used to model such trajectories:

[TABLE]

This non-linear function has decay rate $\beta_{2i}^{k}$ and plateau $\beta_{1i}^{k}$ .

2.2 Bivariate outcome models

Suppose now that two outcomes are measured on each individual $i$ over time from the first positive HIV test date. The response vector for an individual $i$ at time $t_{ij}$ is $(y_{ij}^{1},y_{ij}^{2})^{T}$ with $\mbox{ \boldmath$ !y! $\unboldmath}_{i}^{k}=(y_{i1}^{k},y_{i2}^{k},\dots,y_{in_{i}}^{k})^{T}$ being the sequence of measurements for each biomarker $k$ and $\mbox{ \boldmath$ !t! $\unboldmath}_{i}=(t_{i1},t_{i2},\dots,t_{in_{i}})^{T}$ being the sequence of measurement times. A bivariate joint model for the response outcomes is:

[TABLE]

where the $\mbox{ \boldmath$ !\epsilon! $\unboldmath}_{i}^{1},\mbox{ \boldmath$ !\epsilon! $\unboldmath}_{i}^{2}$ are the within-subject measurement errors of the first and second biomarker respectively and

[TABLE]

The measurement errors are assumed to be independent and normally distributed with mean $\!0\!$ and variance-covariance matrix $\Sigma_{\epsilon}=\begin{pmatrix}\sigma_{\epsilon^{1}}^{2}\mbox{ \boldmath$ !I! $\unboldmath}_{n_{i}^{1}}&\mbox{ \boldmath$ !0! $\unboldmath}\\ \mbox{ \boldmath$ !0! $\unboldmath}&\sigma_{\epsilon^{2}}^{2}\mbox{ \boldmath$ !I! $\unboldmath}_{n_{i}^{2}}\end{pmatrix}$ where $\mbox{ \boldmath$ !I! $\unboldmath}_{n_{i}}$ denotes the $n_{i}\times n_{i}$ identity matrix. The random effects $\mbox{ \boldmath$ !\beta! $\unboldmath}_{i}^{1}$ , $\mbox{ \boldmath$ !\beta! $\unboldmath}_{i}^{2}$ follow the joint multivariate normal distribution with mean vector $\mu_{\mbox{ \boldmath$ !\beta! $\unboldmath}}$ and variance covariance matrix $\Sigma_{\mbox{ \boldmath$ !\beta! $\unboldmath}}=\begin{pmatrix}\Sigma_{\beta}^{1}&\Sigma_{\beta}^{12}\\ \Sigma_{\beta}^{21}&\Sigma_{\beta}^{2}\end{pmatrix}$ , which is partitioned into four sub-matrices: (a) $\Sigma_{\beta}^{1}$ includes variances and covariances of the random effects for biomarker $1$ , (b) $\Sigma_{\beta}^{2}$ includes variances and covariances of the random effects for biomarker $2$ , (c) $\Sigma_{\beta}^{12}=\Sigma_{\beta}^{21}$ includes covariances between random effects of each biomarker, allowing for correlation between the two biomarkers.

We first consider a bivariate outcome consisting of a linearly and a non-linearly evolving antibody-response biomarker, for example two different avidity assays [4, 31]. Additionally, a bivariate outcome of a non-linearly evolving antibody-response biomarker and viral load is examined.

3 Simulation study

Assume data from a panel of 100 HIV-infected “in-sample” individuals with known seroconversion times $(\tau_{i})$ are available. For each individual, measurements of biomarkers of antibody response and viral load are taken at HIV diagnosis and regularly thereafter. The information provided by the “in-sample” individuals is used to model the dynamics of biomarkers of interest. A new “out-of-sample” individual, with unknown seroconversion time in the seroconversion interval $[t_{i}^{-ve},t_{i}^{+ve}]$ , is diagnosed in a healthcare facility. For this new individual, a number of biomarkers are measured at HIV diagnosis and every few weeks thereafter.

3.1 Generating simulated datasets

We simulate 100 datasets, each consisting of 100 “in-sample” and 5 “out-of-sample” individuals. All “in-sample” individuals are assumed to be observed every three months from the first positive HIV test date to two years thereafter, resulting in nine observed values of univariate and bivariate outcomes. For each new individual we generate single and bivariate measures at HIV diagnosis, two weeks and one month afterwards. We use smaller time intervals between consecutive measurements for new individuals to investigate the feasibility of recency quantification within a reasonably short period after HIV diagnosis. The length of the seroconversion interval is assumed to be one year for both the “in-sample” and “out-of-sample” individuals. The seroconversion time for the “in-sample” individuals is generated uniformly from the seroconversion interval, $\tau_{i}\sim U(0,1),i\in 1,\ldots,100$ . The seroconversion time for the “out-of-sample” individuals is set to be five days (0.014 years), three months (0.250 years), six months (0.500 years), nine months (0.750 years) and 360 days (0.986 years) respectively before the first positive HIV test date.

Univariate and bivariate realizations of antibody response and viral load are generated by using equations (1) and (5) with values of growth parameters as in Table 1. In this “realistic scenario”, the mean and variance of the random effects for univariate outcomes are chosen so as to resemble the log-transformed trajectories of existing biomarkers of antibody response, such as the Avidity Index, LAg Avidity and viral load [1, suligoi2002, 31, 17, 13, 14]. We generate one linearly (AR1) and three non-linearly (AR2, AR3, AR4) evolving biomarkers of antibody response with their mean trajectories shown in Figure 3. The asymptote of each non-linearly evolving antibody-response biomarker is a fixed effect for all individuals. Biomarkers AR2, AR3, and AR4 differ in terms of their growth rates, asymptotes and intercepts. In particular, biomarker AR4 is steeper compared to biomarkers AR2 and AR3 and has a higher asymptote.

Bivariate outcomes of two antibody-response biomarkers are assumed to have positively correlated intercepts ( $\rho=0.1$ ) and growth rates ( $\rho=0.5$ , see Table 1). High initial viral load may trigger rapid growth of antibodies; conversely, high initial antibody response may induce a rapid decline in viral load. We therefore assume that viral load intercepts and antibody growth rates are positively correlated ( $\rho=0.3$ ); as are antibody intercepts and viral load declines ( $\rho=0.3$ ). Subject-specific trajectories of bivariate outcomes, generated using equation (5) and parameter values presented in Table 1, are shown in Figure 4.

The growth model parameters may affect the ability of antibody-response biomarkers and viral load to quantify the recency of HIV infection. For linearly evolving biomarkers, where the slope defines the change in biomarker values over time, a steep slope results in mean values differing at consecutive time points. Thus, the mean at each time point may be strongly related to a particular seroconversion time. For non-linearly evolving biomarkers, given a fixed asymptote, the intercept along with the growth rate define the time that the asymptote will be approached. In particular, a rapidly evolving biomarker will approach the asymptote within a short period after seroconversion. After the asymptote is approached, such a biomarker will no longer be discriminative of recency. In contrast, a slowly evolving biomarker will approach the asymptote a long time after seroconversion. However, the mean of such a biomarker may be very similar at different time points, with values that are challenging to relate to particular seroconversion times.

3.2 Analysing simulated datasets

The analysis is conducted in a Bayesian framework, using a Markov chain Monte Carlo (MCMC) algorithm as implemented in OpenBUGS 3.2.3 [20] to obtain the joint posterior distribution of the parameters of interest. Let $\pi(\cdot)$ and $p(\cdot)$ denote the prior and posterior distribution respectively.

For each individual $i$ we have a vector $\mbox{ \boldmath$ !y! $\unboldmath}_{i}^{k}=(y_{i1}^{k},y_{i2}^{k},\dots,y_{in_{i}}^{k})^{T}$ of $n_{i}$ responses of biomarker $k$ generated from equation (1). The joint probability density function for $Y_{i}^{k}=(Y_{i1}^{k},Y_{i2}^{k},\dots,Y_{in_{i}}^{k})^{T}$ , conditional on parameters, is given by:

[TABLE]

where $\Sigma_{i}=\sigma_{\epsilon^{k}}^{2}\mbox{ \boldmath$ !I! $\unboldmath}_{n_{i}^{k}}$ .

For univariate mixed-effects models, the generic form of the joint posterior distribution is:

[TABLE]

where $\mbox{ \boldmath$ !\Theta_{1}! $\unboldmath}=\big{\{}\mbox{ \boldmath$ !\beta! $\unboldmath}_{i}^{k},\mu_{\beta^{k}},\Sigma_{\beta^{k}},\sigma_{\epsilon^{k}},\tau_{i}\big{\}}$ .

When two outcomes are measured at the same time, for each individual $i$ we have a vector $Y_{i}=(Y_{i1}^{1},Y_{i2}^{1},\dots,Y_{in_{i}}^{1},Y_{i1}^{2},Y_{i2}^{2},\dots,Y_{in_{i}}^{2})^{T}$ . The joint probability density function for $Y_{i}$ conditional on parameters, can be expressed as:

[TABLE]

where $Q_{i}=\big{(}\mbox{ \boldmath$ !y! $\unboldmath}_{i}^{1}-\mbox{ \boldmath$ !g_{1}! $\unboldmath}(\tau_{i},\mbox{ \boldmath$ !t! $\unboldmath}_{i},\mbox{ \boldmath$ !\beta! $\unboldmath}_{i}^{1}),\mbox{ \boldmath$ !y! $\unboldmath}_{i}^{2}-\mbox{ \boldmath$ !g_{2}! $\unboldmath}(\tau_{i},\mbox{ \boldmath$ !t! $\unboldmath}_{i},\mbox{ \boldmath$ !\beta! $\unboldmath}_{i}^{2})\big{)}^{T}$ and $\Sigma_{\epsilon}=\begin{pmatrix}\sigma_{\epsilon^{1}}^{2}\mbox{ \boldmath$ !I! $\unboldmath}_{n_{i}^{1}}&\mbox{ \boldmath$ !0! $\unboldmath}\\ \mbox{ \boldmath$ !0! $\unboldmath}&\sigma_{\epsilon^{2}}^{2}\mbox{ \boldmath$ !I! $\unboldmath}_{n_{i}^{2}}\end{pmatrix}$ .

The generic form of the joint posterior distribution for a bivariate outcome is given by:

[TABLE]

where $\mbox{ \boldmath$ !\Theta_{2}! $\unboldmath}=\{\mbox{ \boldmath$ !\beta_{i}! $\unboldmath}^{1},\mbox{ \boldmath$ !\beta_{i}! $\unboldmath}^{2},\mu_{\mbox{ \boldmath$ !\beta! $\unboldmath}},\Sigma_{\mbox{ \boldmath$ !\beta! $\unboldmath}},\Sigma_{\epsilon},\tau_{i}\}$ .

For each simulated dataset, the first 100 individuals are considered as the “in-sample” individuals for whom the seroconversion time ( $\tau_{1:(n-1)}$ ) is known. The analysis is conducted by assuming that we have only one new individual ( $n=101$ ) in each dataset, with unknown seroconversion time, as shown in Figure 5.

The joint posterior distribution of the univariate model displayed in Figure 5 is therefore:

[TABLE]

where $\boldsymbol{\Theta}=\{\boldsymbol{\beta}_{i}^{k},\mu_{\beta^{k}},\Sigma_{\beta^{k}},\sigma_{\epsilon^{k}}^{2}\}$ . Similarly, the joint posterior distribution for a bivariate outcome is obtained by replacing $\boldsymbol{y}^{k}$ with $\boldsymbol{y}^{1},\boldsymbol{y}^{2}$ and $\boldsymbol{\Theta}=\{\boldsymbol{\beta}_{i}^{1},\boldsymbol{\beta}_{i}^{2},\mu_{\boldsymbol{\beta}},\Sigma_{\boldsymbol{\beta}},\Sigma_{\epsilon}\}$ .

Priors

Each new individual is assumed to have unknown seroconversion time occurring in their seroconversion interval, $\tau_{n}\in[t_{n}^{-ve},t_{n}^{+ve}]$ . Our a priori belief is therefore that $\tau_{n}\sim Uniform(0,1)$ , since each seroconversion interval is of length 1 year. Vague Gaussian priors, $N(0,10^{6})$ , are placed on the means of the random effects $(\mu_{\beta^{k}},\mu_{\boldsymbol{\beta}})$ , while $\sigma_{\epsilon^{k}}^{2}$ is given an inverse-Gamma prior, $\pi(\sigma_{\epsilon^{k}}^{2})\sim IG(2,0.01)$ . Each variance-covariance matrix of the random effects ( $\Sigma_{\beta^{k}},\Sigma_{\boldsymbol{\beta}}$ ) and of the measurement error ( $\Sigma_{\epsilon}$ ), is given an inverse-Wishart prior with degrees of freedom equal to one plus the matrix dimension. These priors effectively place a uniform distribution on each of the correlation parameters [9].

For each simulated dataset, MCMC samples from the marginal posterior distribution of the seroconversion time $p(\tau_{n}|\boldsymbol{y},\boldsymbol{t},\tau_{1:(n-1)})$ are obtained. These samples are used to derive posterior probabilities, $P_{X}=Pr(\tau_{n}\leq X|\boldsymbol{y},\boldsymbol{t},\tau_{1:(n-1)})$ , of a new individual having seroconverted in the last $X$ years before HIV diagnosis. We evaluate the predictive ability of the proposed models in quantifying the recency of HIV infection by calculating these probabilities for $X=0.167,0.333$ and $0.5$ years before HIV diagnosis ( $P_{2},P_{4}$ and $P_{6}$ respectively, corresponding to 2, 4 and 6 months).

4 Results

4.1 Probability of recent seroconversion

The simulated data are initially analysed assuming that the new individual contributes only a single measurement taken at HIV diagnosis. The analysis is repeated including consecutive measurements of the new individual taken either two weeks or one month after HIV diagnosis. No significant differences are observed when we add consecutive measurements to the estimation process. We therefore present results based only on the measurements taken at HIV diagnosis.

The distributions of the probabilities $P_{2},P_{4},P_{6}$ are summarized over the 100 simulated datasets in Figure 6 for each of the 5 “out-of-sample” individuals. For a perfectly discriminatory biomarker, we would expect these probabilities to lie near 0 or 1 depending on the truth. For instance, $P_{2}$ should lie around 0 for all patients with true seroconversion occurring more than two months before HIV diagnosis.

4.1.1 Single outcome

The linearly evolving biomarker AR1 leads to very similar estimates of $P_{2},P_{4}$ and $P_{6}$ for each new patient. In particular, $P_{2}$ is estimated to lie below 0.05 even for a new patient with true seroconversion occurring five days before HIV diagnosis (see Figure 6). A possible explanation might be that biomarker AR1 is only gradually evolving, so that its observed values are too similar across time.

The non-linear biomarkers of antibody response with a low asymptote, such as AR2 and AR3, perform worse compared to AR4 (see Web Appendix C, Figure 8). They lead to flat posterior distributions of seroconversion time (see Web Appendix B Figures 3-7). On the other hand, AR4 and viral load seem to be quite discriminative, providing strong information on seroconversion time, especially for seroconversions occurring a few days before HIV diagnosis. However, as both biomarkers approach their asymptotes, their ability to discriminate the seroconversion time vanishes. For instance, $P_{6}$ is greater than 0.6 for patients with long-standing infections, using the univariate model of viral load.

4.1.2 Bivariate outcomes

The quantification of recency is improved by using joint models of two biomarkers of interest. In particular, the joint model of AR4 and VL is able to accurately estimate $P_{2}$ for all new patients. It leads to estimates above 0.9 and below 0.1 for very recent and long-standing infections respectively. Similar results are obtained when we use a bivariate outcome of two antibody-response biomarkers. Notably, the combination of AR4 and AR1 leads to the most accurate estimates of $P_{2},P_{4}$ and $P_{6}$ for those individuals with long-standing infections (see Figure 6). More specifically, it gives estimates of $P_{6}$ below 0.05 for a new patient with $\tau_{n}=0.986$ years before HIV diagnosis, compared to 0.2 when a univariate model of AR4 is used. This improvement might be due to the fact that AR1 is linear and so does not plateau, providing some information on recency even if HIV diagnosis takes place a long time after infection.

The accuracy of the estimation clearly depends on particular characteristics of biomarkers of interest, as well as on the timing of the first measurement. If we had to choose only a single biomarker to quantify recency, we would prefer a rapidly evolving biomarker such as AR4. If viral load is available, a joint model of antibody response and viral load will lead to more accurate estimates of the probability of having seroconverted recently. However, even a joint model of two biomarkers lacks the ability to provide reliable estimates of the recency for all new patients.

4.2 Ideal Scenario

A further simulation exercise (see Web Appendix A) reveals that the magnitude of between-individual heterogeneity has a significant effect on the discriminatory ability of each biomarker or combination of biomarkers. To investigate this effect, we generated and analysed data as shown previously, but with the variance of all the random effects being set to 0.01, in an “ideal scenario”.

We obtain more accurate estimates of the probabilities $P_{2},P_{4}$ and $P_{6}$ for any new individual (see Figure 7), when univariate and bivariate outcomes are generated under the ideal compared to the realistic scenario.

4.2.1 Single outcome

Univariate models of AR4 and viral load lead to high values of $P_{2}$ for recently infected individuals, and very low values for long-standing infections. The same pattern is observed for $P_{4}$ when univariate outcomes are used. However, for a patient with true seroconversion occurring exactly six months before HIV diagnosis, all univariate outcomes lead to $P_{6}$ below 0.6 (See Web Appendix Figure 9).

4.2.2 Bivariate outcomes

Bivariate joint models improve the estimates of $P_{2},P_{4}$ and $P_{6}$ compared to their univariate counterparts. In particular, for a new patient who has seroconverted a few days ( $\tau_{n}=0.014$ years) before HIV diagnosis, the bivariate joint models lead to $P_{2}\geq 0.95$ . Furthermore, for patients who have seroconverted more than two months before HIV diagnosis, both bivariate models lead to estimates of $P_{2}\leq 0.05$ .

For $\tau_{n}=0.5$ , all models give a small probability of having seroconverted in the last six months, with the joint models leading to the smallest estimates. A possible explanation might be that the non-linear biomarkers of antibody response usually approach their asymptote around the first six months from seroconversion [4, hargrove2008, 31, 30]. Therefore, all measurements taken six months after seroconversion are very similar and are indicative of long-standing infections, but cannot discriminate the actual time of seroconversion.

5 Discussion

We have investigated a fully Bayesian approach to quantify the recency of HIV infection for a newly diagnosed individual, using values of one or more biomarkers and information on biomarker evolution obtained from a panel of HIV-infected individuals. This is the first study to investigate the ability of biomarkers of both antibody response and viral presence in quantifying recency at an individual level. We have also explored the characteristics that affect the discriminatory ability of such biomarkers to provide reliable estimates of the probability of having recently seroconverted. Linear and non-linear mixed-effects models are used to describe the growth/decline trajectories of biomarkers. We have introduced a bivariate non-linear mixed-effects model which allows for different non-linear trends to be modelled simultaneously.

To our knowledge, few studies with their main interest being the estimation of the seroconversion time have been published, and usually the number of CD4 T-cell counts is used as a biomarker of interest [23, 24, 7]. Bivariate linear mixed-effects models have been proposed for markers of immunological and virological status [3, 25, 32] but examples of multivariate non-linear mixed-models are less common in the context of HIV [29, 15]. The bivariate non-linear mixed-effects models proposed in Section 2 can be used when the aim of the study is to explore the association between the evolutions of two non-linear outcomes. The proposed method can be easily extended to a multivariate non-linear model if more than two outcomes are available.

The results of the simulation study suggest that we are able to learn about the probability of having recently seroconverted from longitudinal data on biomarkers of recent infection. The accuracy of the estimation is highly influenced by particular characteristics of markers, as well as the time of HIV diagnosis. The magnitude of the growth or decline rate plays a vital role in the estimation, with rapidly-evolving biomarkers (e.g. 3-6 months) providing more precise estimates of recency. The results indicate also that the level of the asymptote of the non-linear biomarkers affects their ability to discriminate the recency of infection.

In practice, physicians are interested in using a single biomarker to quantify the recency of HIV infection, especially if multiple biomarker measures are challenging to obtain due to time and cost restrictions. Biomarker AR4 seems to provide reliable estimates of the probability of having recently seroconverted when used in a univariate model. As shown in Web Appendix A: Figure 1, the distribution of AR4 at different time points overlaps less compared to all the other single biomarkers. Therefore, we suggest using an antibody-response biomarker similar to AR4, such as LAg Avidity, if restricted to a single biomarker.

However, we have demonstrated that the use of bivariate joint models improves the quantification of recency. The resulting posterior distributions of seroconversion time for each new individual that have narrower 95% highest posterior density (HPD) intervals compared to the univariate models, having accounted for the correlation between biomarkers. A combination of antibody response and viral load seems to perform slightly better for those seroconverting up to six months before HIV diagnosis. By contrast, for seroconversions occurring nine months or almost one year before HIV diagnosis, we obtain marginally better estimates when two antibody-response biomarkers are used in the estimation process. This result may be due to both non-linear antibody-response biomarkers and viral load approaching their asymptotes six months after seroconversion, when they are no longer discriminative. On the other hand, the antibody-response biomarker AR1, which is linearly evolving, allows the bivariate joint model to discriminate the seroconversion time for long-standing infections. Overall, we recommend a combination of two antibody-response biomarkers with different growth patterns, such as AR1 and AR4, to quantify recency of HIV infection.

Surprisingly, no significant differences were found in the quantification of recency when we use additional bivariate outcomes taken two weeks or one month after HIV diagnosis for each new individual. It seems that only the first measurement of the bivariate outcome is adequate to distinguish recency, especially if it is taken soon after seroconversion. On the other hand, the univariate models become slightly more discriminative with additional information taken every two weeks after diagnosis (results not shown).

The crucial finding to emerge from the analysis is that the heterogeneity between individuals plays a vital role in the discriminatory ability of biomarkers of interest. When the between-subjects variability is reduced to a minimum, the values of the bivariate outcome are indicative of the seroconversion time. As we increase the between-subjects variability, the biomarkers of recency become less discriminative, leading to very flat posterior distributions of seroconversion time. However, it is challenging to find currently existing biomarkers that are as homogeneous as those generated under the ideal scenario. Researchers developing new and/or alternative biomarkers for recent infection should aim to find markers that have minimum heterogeneity, if they are to be valuable for estimation at individual level.

The proposed method is based on specific assumptions about the number and time of measurements, the length of seroconversion intervals, the distribution of random effects and our prior beliefs about the parameters of interest. Further work might consider different design of observation times, leading to unbalanced longitudinal data, or wider seroconversion intervals. The seroconversion time for the new individual is given a uniform prior which reflects our belief that seroconversion is equally likely to occur at any time between the last negative and the first positive HIV test date. If information on testing behaviour is available, it may be incorporated in the choice of a different prior distribution.

Throughout this paper, we assume that all the information for the in-sample individuals is available. In practice, we may not have access to all data for the in-sample individuals but we may only know the posterior distribution of the growth model parameters. In this case, a two-stage analysis would be more applicable, where this posterior distribution is used as a prior distribution in the second stage.

Despite the further research required, we have provided a valuable proof-of-concept that fully Bayesian linear and non-linear mixed effects models for multiple biomarkers may be combined in joint models to improve estimation of the recency of HIV infection.

Acknowledgements

The authors would like to thank Dr Shaun Seaman, Dr Brian Tom, Dr Alex Welte, Dr Eduard Grebe, and the CEPHIA group for helpful discussions. This work was supported by the Medical Research Council [Unit Programme Number U105260566]; Public Health England; and the NIHR HPRU in Evaluation of Interventions.

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Alcabes [1993] Alcabes, P., Munoz, A., Vlahov, D., & Friedland, G. H. (1993). Incubation period of human immunodeficiency virus. Epidemiologic Reviews , 15 (2), 303-318.
2Borremans [2016] Borremans, B., Hens, N., Beutels, P., Leirs, H. & Reijniers, J. (2016). Estimating Time of Infection Using Prior Serological and Individual Information Can Greatly Improve Incidence Estimation of Human and Wildlife Infections. PLOS Computational Biology , 12 (5).
3Chakraborty [2003] Chakraborty, H., Helms, R. W., Sen, P. K., & Cohen, M. S. (2003). Estimating correlation by using a general linear mixed model: evaluation of the relationship between the concentration of HIV-1 RNA in blood and semen. Statistics in medicine , 22 (9):1457-1464.
4Chawla [2007] Chawla, A., Murphy, G., Donnelly, C., Booth, C. L., Johnson, M., Parry, J. V., … & Geretti, A. M. (2007). Human immunodeficiency virus (HIV) antibody avidity testing to identify recent infection in newly diagnosed HIV type 1 (HIV-1)-seropositive persons infected with diverse HIV-1 subtypes. Journal of Clinical Microbiology , 45 (2):415-420.
5Chen [2002] Chen, J., Wang, L., Chen, J. J. Y., Sahu, G. K., Tyring, S., Ramsey, K., … & Cloyd, M. W. (2002). Detection of antibodies to human immunodeficiency virus (HIV) that recognize conformational epitopes of glycoproteins 160 and 41 often allows for early diagnosis of HIV infection. Journal of Infectious Diseases , 186 (3), 321-331.
6Diggle [2013] Diggle, P., Heagerty, P., Liang, K.Y. & Zeger, S. (2013). Analysis of longitudinal data. Oxford University Press .
7Dubin [1994] Dubin, N., Berman, S., Marmor, M., Tindall, B., Jarlais, D. D., & Kim, M. (1994). Estimation of time since infection using longitudinal disease marker data. Statistics in medicine , 13 (3):231-244.
8Fitzmaurice [2008] Fitzmaurice, G., Davidian, M., Verbeke, G. & Molenberghs, G. (2008). Longitudinal data analysis. CRC Press .