Value of Information: Sensitivity Analysis and Research Design in   Bayesian Evidence Synthesis

Christopher Jackson; Anne Presanis; Stefano Conti; Daniela De Angelis

arXiv:1703.08994·stat.AP·November 25, 2021

Value of Information: Sensitivity Analysis and Research Design in Bayesian Evidence Synthesis

Christopher Jackson, Anne Presanis, Stefano Conti, Daniela De Angelis

PDF

1 Repo

TL;DR

This paper develops methods for Value of Information analysis in Bayesian evidence synthesis, helping identify key parameters and optimal data collection strategies to improve decision-making accuracy.

Contribution

It extends VoI techniques to Bayesian evidence synthesis, providing a framework for prioritizing data collection and understanding parameter influence in complex models.

Findings

01

Identified key parameters affecting HIV prevalence estimates.

02

Quantified expected improvements from additional data collection.

03

Demonstrated applicability to real-world health data synthesis.

Abstract

Suppose we have a Bayesian model which combines evidence from several different sources. We want to know which model parameters most affect the estimate or decision from the model, or which of the parameter uncertainties drive the decision uncertainty. Furthermore we want to prioritise what further data should be collected. These questions can be addressed by Value of Information (VoI) analysis, in which we estimate expected reductions in loss from learning specific parameters or collecting data of a given design. We describe the theory and practice of VoI for Bayesian evidence synthesis, using and extending ideas from health economics, computer modelling and Bayesian design. The methods are general to a range of decision problems including point estimation and choices between discrete actions. We apply them to a model for estimating prevalence of HIV infection, combining indirect…

Equations37

E_{θ} (L (d^{*}, θ)) - E_{θ} (L (d_{α}^{*}, θ))

E_{θ} (L (d^{*}, θ)) - E_{θ} (L (d_{α}^{*}, θ))

E V P P I (ϕ) = E_{θ} (L (d^{*}, θ)) - E_{ϕ} [E_{θ ∣ ϕ} (L (d_{ϕ}^{*}, θ))]

E V P P I (ϕ) = E_{θ} (L (d^{*}, θ)) - E_{ϕ} [E_{θ ∣ ϕ} (L (d_{ϕ}^{*}, θ))]

E V S I (y) = E_{θ} (L (d^{*}, θ)) - E_{y} [E_{θ ∣ y} (L (d_{y}^{*}, θ)]

E V S I (y) = E_{θ} (L (d^{*}, θ)) - E_{y} [E_{θ ∣ y} (L (d_{y}^{*}, θ)]

E V P I

E V P I

E V P P I (ϕ)

E V S I (y)

L (\hat{α}, α) = (\hat{α} - α)^{T} H (\hat{α} - α)

L (\hat{α}, α) = (\hat{α} - α)^{T} H (\hat{α} - α)

E V P P I (ϕ)

E V P P I (ϕ)

E V S I (y)

E_{ψ} (L (d_{ψ}^{*}, θ)) = h (E_{ψ} (α)) .

E_{ψ} (L (d_{ψ}^{*}, θ)) = h (E_{ψ} (α)) .

α = E_{α ∣ ϕ} (α ∣ ϕ) + ϵ = g (ϕ) + ϵ

α = E_{α ∣ ϕ} (α ∣ ϕ) + ϵ = g (ϕ) + ϵ

E_{ϕ} [E_{θ ∣ ϕ} (L (d_{ϕ}^{*}, θ))] = E_{ϕ} [h (E_{α ∣ ϕ} (α ∣ ϕ))] \approx \frac{1}{K} k = 1 \sum K h (\overset{g}{^} (ϕ^{(k)})) .

E_{ϕ} [E_{θ ∣ ϕ} (L (d_{ϕ}^{*}, θ))] = E_{ϕ} [h (E_{α ∣ ϕ} (α ∣ ϕ))] \approx \frac{1}{K} k = 1 \sum K h (\overset{g}{^} (ϕ^{(k)})) .

E_{ϕ} [E_{θ ∣ ϕ} (L (d_{ϕ}^{*}, θ))] \approx \frac{1}{K} k = 1 \sum K d max {\overset{g}{^}_{d} (ϕ^{(k)})},

E_{ϕ} [E_{θ ∣ ϕ} (L (d_{ϕ}^{*}, θ))] \approx \frac{1}{K} k = 1 \sum K d max {\overset{g}{^}_{d} (ϕ^{(k)})},

α = E_{α ∣ y} (α ∣ T (y)) + ϵ = g (T (y)) + ϵ

α = E_{α ∣ y} (α ∣ T (y)) + ϵ = g (T (y)) + ϵ

E_{y} [E_{θ ∣ y} (L (d_{y}^{*}, θ))] = E_{y} [h (E_{α ∣ y} (α ∣ y))] \approx \frac{1}{K} k = 1 \sum K h (\overset{g}{^} (T (y^{(k)}))) .

E_{y} [E_{θ ∣ y} (L (d_{y}^{*}, θ))] = E_{y} [h (E_{α ∣ y} (α ∣ y))] \approx \frac{1}{K} k = 1 \sum K h (\overset{g}{^} (T (y^{(k)}))) .

\overline{(π δ)}_{G} = π^{(U N)} + π^{(O P)} .

\overline{(π δ)}_{G} = π^{(U N)} + π^{(O P)} .

π^{(U N)} = γ_{1} (1 - γ_{2}) p^{(U N)}

π^{(U N)} = γ_{1} (1 - γ_{2}) p^{(U N)}

π^{(O P)} = γ_{1} γ_{2} (1 - γ_{3}) (γ_{4} + a^{(E X)})

π^{(O P)} = γ_{1} γ_{2} (1 - γ_{3}) (γ_{4} + a^{(E X)})

π^{(G A)} = (\overline{(π δ)}_{G} + π^{(G D)}) / γ_{1},

π^{(G A)} = (\overline{(π δ)}_{G} + π^{(G D)}) / γ_{1},

(π δ)_{G} = (1 - γ_{1}) + γ_{1} γ_{2} γ_{3} γ_{4},

(π δ)_{G} = (1 - γ_{1}) + γ_{1} γ_{2} γ_{3} γ_{4},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chjackson/voibayes
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\floatpagestyle

empty

Value of Information: Sensitivity Analysis and Research Design in Bayesian Evidence Synthesis

Christopher Jackson, Anne Presanis, Stefano Conti, Daniela De Angelis

MRC Biostatistics Unit, University of Cambridge; NHS England

This work was funded by the Medical Research Council, grant code U105260566

Abstract

Suppose we have a Bayesian model which combines evidence from several different sources. We want to know which model parameters most affect the estimate or decision from the model, or which of the parameter uncertainties drive the decision uncertainty. Furthermore we want to prioritise what further data should be collected. These questions can be addressed by Value of Information (VoI) analysis, in which we estimate expected reductions in loss from learning specific parameters or collecting data of a given design. We describe the theory and practice of VoI for Bayesian evidence synthesis, using and extending ideas from health economics, computer modelling and Bayesian design. The methods are general to a range of decision problems including point estimation and choices between discrete actions. We apply them to a model for estimating prevalence of HIV infection, combining indirect information from several surveys, registers and expert beliefs. This analysis shows which parameters contribute most of the uncertainty about each prevalence estimate, and provides the expected improvements in precision from collecting specific amounts of additional data.

Keywords: decision theory, research prioritisation, uncertainty

1 Introduction

Bayesian modelling is a natural paradigm for decision making, in the presence of uncertainty, based on multiple sources of evidence. However, as more data sources, parameters and assumptions are built into a model, it becomes harder to see the influence of each input or assumption. The modelling process should involve an investigation of where the weak parts of the model are, to identify which uncertainties in the model inputs contribute most to the uncertainty in the final result or decision (sensitivity analysis). We might then want to assess and compare the potential value of obtaining datasets of specific designs or sizes to strengthen different parts of the model. Furthermore, we may want to formally trade off the costs of sampling with the resulting expected improvement to decision making.

Annual estimation of HIV prevalence in the United Kingdom has, for several years, been based on a Bayesian synthesis of evidence from various surveillance systems and other surveys (Goubar et al., 2008; Presanis et al., 2010; De Angelis et al., 2014; Kirwan et al., 2016). This is an example of a class of problems called multiparameter evidence synthesis (e.g. Ades and Sutton, 2006), where the quantities of interest are not directly observable, but can be inferred from multiple indirect data sources linked through a network of model assumptions that can be expressed as a directed acyclic graph. Markov Chain Monte Carlo is typically required to estimate the posterior. The model is typically used to inform health policies, and in this context it is important to be able to assess sensitivity to uncertain model inputs and to indicate how the model could be strengthened with further data.

These dual aims can be achieved with value of information (VoI) analysis, a decision-theoretic framework based on expected reductions in loss from future information. The concepts of VoI were first set out in detail by Raiffa and Schlaifer (1961), while Parmigiani and Inoue (2009) give a more recent overview. The expected value of partial perfect information (EVPPI) is the expected reduction in loss if the exact value of a particular parameter or parameters $\bm{\theta}_{0}$ were learnt, also interpreted as the amount of decision uncertainty that is due to $\bm{\theta}_{0}$ . The expected value of sample information (EVSI) is the expected reduction in loss from a study of a specific design. These concepts have been applied in various forms in three distinct areas: health economics, computer modelling and Bayesian design.

In health economic modelling, there is a large literature on calculation and application of VoI, see, e.g. Felli and Hazen (1998); Willan and Pinto (2005); Claxton and Sculpher (2006); Welton et al. (2008). The model output in this case is the expected net benefit of each alternative policy, a known deterministic function $g(\bm{\theta})$ of uncertain inputs $\bm{\theta}$ , and the decision problem is the choice of policy that minimises $g(\bm{\theta})$ . In computer modelling, see, e.g. Oakley and O’Hagan (2004) and Saltelli et al. (2004), the influence of a particular element $\bm{\theta}_{0}$ of $\bm{\theta}$ is calculated as the expected reduction in $\mbox{var}(g(\bm{\theta}))$ , if we were to learn $\bm{\theta}_{0}$ exactly. This is equivalent to the EVPPI for $\bm{\theta}_{0}$ under a decision problem defined as point estimation of $g(\bm{\theta})$ with quadratic loss (Oakley and O’Hagan, 2004). The decision-theoretic view of Bayesian experimental design also has a long history, see, e.g. Lindley (1956); Bernardo and Smith (1994); Chaloner and Verdinelli (1995); Berger (2013), and a recent review of the computational challenges by Ryan et al. (2016).

However, the current tools in any one of these three areas cannot be applied directly to multiparameter evidence synthesis. For example, it is not always feasible or desirable to make a discrete decision with a quantifiable loss, as in health economic modelling, as often the aim of an evidence synthesis is simply to estimate one or more quantities. For a scalar quantity of interest, we might then define the “loss” as the posterior variance of this quantity, as in Oakley and O’Hagan (2004). In computer modelling, however, tools to estimate the expected value of a proposed study to learn a particular $\bm{\theta}_{0}$ more precisely have not been developed, and it is not clear what an appropriate loss for a vector of model outputs would be. Challenges also arise with computation. Current methods for computing the expected variance reduction in the computer modelling field (Sobol’, 2001; Saltelli et al., 2004) assume the output is a known function $g(\bm{\theta})$ of the inputs, therefore do not apply in multiparameter evidence synthesis, where MCMC is required to obtain the output. For Bayesian design, Ryan et al. (2016) reviewed methods where evaluating the expected utility of a design (equivalent to the EVSI) is relatively inexpensive, so that maximising the utility over a complex design space is feasible. However, this can again be difficult with MCMC. Given a sample from the posterior $p(\theta|\mathbf{x})$ , potential future datasets $\mathbf{y}$ under a specific design can be simulated cheaply from the posterior predictive distribution, but then to obtain the expected utility, it is required to repeatedly update the posterior $p(\theta|\mathbf{x},\mathbf{y})$ for different $\mathbf{y}$ , which is only feasible with Monte Carlo for smaller problems (e.g. Han and Chaloner, 2004).

We describe a VoI framework for sensitivity analysis and research design in evidence syntheses based on graphical models, using and extending methods from health economics, computer modelling and Bayesian design. This is a broader class of models than those typically used in health economics or computer modelling, since the model “output” is not necessarily a known function of the inputs, but depends on the model parameters $\bm{\theta}$ and observed data $\mathbf{x}$ through a network of statistical models or deterministic functions, potentially with hierarchical relationships. We apply VoI methods to the part of the HIV prevalence estimation model that estimates prevalence in men who have sex with men (MSM), in London. Here the decision problem is point estimation of a single scalar or a vector of parameters. We use ideas from Bayesian design to choose appropriate loss functions in this context. We also show how methods of computing EVPPI (Strong et al., 2014) and EVSI (Strong et al., 2015) for finite choices in health economics, based on fitting a non-parametric regression to a sample from the posterior, can be generalised to a broader class of decision problems, including point estimation. The method for computing EVSI enables the expected utility over all potential $\mathbf{y}$ to be estimated cheaply without an additional level of simulation, assuming only that the information provided by $\mathbf{y}$ can be represented as a low-dimensional sufficient statistic $T(\mathbf{y})$ .

In Section 2 we describe the general multiparameter evidence synthesis model, and define the expected value of information under different decision problems and loss functions, and in Section 3 we present methods to compute them. In Section 4 we describe the model for HIV prevalence estimation, and Section 5 we use VoI to identify the areas of greatest uncertainty in this model and show where collecting specific data would improve the precision of the estimates of various subgroup-specific prevalences. Finally we discuss potential extensions to the methods and application and the associated challenges.

2 Theory and methods

2.1 Bayesian graphical modelling for evidence synthesis

In our motivating applications, the general model can be represented as a directed acyclic graph (Figure 1) in the standard way, see, e.g. Lauritzen (1996). Nodes in the graph may represent scalar or vector quantities. A set of datasets $\mathbf{x}=\{x_{1},\ldots,x_{n}\}$ is observed, most generally from $n$ different sources. These data are assumed to arise from statistical models with parameters $\mu_{1},\ldots,\mu_{n}$ respectively, collectively denoted $\bm{\mu}$ . The “founder nodes” of the graph are denoted $\bm{\phi}=(\phi_{1},\ldots,\phi_{p})$ and given a joint prior distribution $\bm{\phi}\sim p(.)$ which may also include substantive information. The full set of unknowns is denoted $\bm{\theta}$ . Most simply, the $\bm{\mu}$ could equal the $\bm{\phi}$ or be related to the $\bm{\phi}$ through deterministic functions, so that $\bm{\theta}=\bm{\phi}$ . More generally, some of the relationships in the graph could be stochastic, defining a hierarchical model, where the $\bm{\mu}$ themselves arise from a distribution with parameters given by the $\bm{\phi}$ or descendants of $\bm{\phi}$ . $\bm{\theta}$ would then comprise $\bm{\phi}$ and the stochastic descendants of $\bm{\phi}$ such as random effects.

We further denote $\bm{\alpha}$ as an intermediate node in the graph, the model “output”, which is used for decision-making. This could be any unknown quantity, including one of the $\bm{\mu}$ or $\bm{\phi}$ , a function of these, or a prediction of new data. We may also be in a position to collect additional data, either from the same source as one of the existing datasets (e.g. $y_{1}$ in Figure 1), or from a new source informing a parameter $\mu_{n+1}$ on which no direct data ( $y_{2}$ ) were available.

This DAG (Figure 1) is a generalisation of the typical structure (Figure 2) used in computer modelling (Oakley and O’Hagan, 2004) where the output $\bm{\alpha}$ is a known (usually complicated) deterministic function of uncertain model inputs $\bm{\phi}$ , which are given substantive priors that may be derived separately from data.

2.2 Expected value of information: definitions

In a general decision-theoretic framework, the purpose of the model is to choose a decision or action $d$ from a space of possible decisions $\mathcal{D}$ , to minimise an expected loss $E_{\bm{\theta}}(L(d,\bm{\theta}))$ , with the expectation taken with respect to the posterior distribution of $\bm{\theta}$ . Let $\bm{\alpha}=\bm{\alpha}(\bm{\theta})$ be the minimal subset or function of $\bm{\theta}$ necessary to make the decision, so that $E_{\bm{\theta}}(L(d,\bm{\theta}))=E_{\bm{\alpha}}(L(d,\bm{\theta}))$ , $\forall d\in\mathcal{D}$ . For example, the purpose could be the choice of decision $d$ among a finite set $\mathcal{D}=\{1,\ldots,D\}$ expected to minimise a loss defined as a function of the parameters, so that $\bm{\alpha}$ would be a vector with $D$ components $\alpha_{d}=f_{d}(\bm{\theta})$ say, with $L(d,\bm{\theta})=\alpha_{d}$ . This is the typical situation in health policy decisions (e.g. Claxton and Sculpher, 2006), where a treatment $d$ is chosen to maximise a measure of utility such as expected quality-adjusted survival. Alternatively, as in our examples, the decision could simply be the choice of a point estimate $\hat{\bm{\alpha}}$ of some parameter $\bm{\alpha}$ , in which case the decision space $\mathcal{D}$ is the (typically continuous) support of $\bm{\alpha}$ (see §2.3).

For general decision problems, let $d^{*}=\operatorname*{arg\,min}_{d}E_{\bm{\theta}}(L(d,\bm{\theta}))$ be the optimal decision under current knowledge about $\bm{\theta}$ , represented by the posterior distribution $p(\bm{\theta}|\mathbf{x})$ . Suppose now we are in a position to collect new information. Let $d^{*}_{\mathbf{y}}$ be the optimal decision given further knowledge of a quantity $\mathbf{y}$ (either parameters or potential data) that informs $\bm{\alpha}$ , so that the updated posterior would be $p(\bm{\theta}|\mathbf{x},\mathbf{y})$ .

The expected value of perfect information (EVPI) is the expected loss of the decision $d^{*}$ under current information, minus the expected loss for the decision $d^{*}_{\bm{\alpha}}$ we would make if we knew the true $\bm{\alpha}$ (Raiffa and Schlaifer, 1961).

[TABLE]

Since additional information is always expected to reduce the expected loss of the optimal decision (Parmigiani and Inoue, 2009), the EVPI is an upper bound on the expected gains from any new information. 2. 2.

The expected value of partial perfect information (EVPPI) for a particular (scalar or vector) parameter $\bm{\phi}$ is the expected reduction in loss if $\bm{\phi}$ were to be known precisely:

[TABLE]

where $d^{*}_{\bm{\phi}}$ is the optimal decision if $\bm{\phi}$ were known. This is an upper bound on the potential value of data $\mathbf{y}$ which inform only $\bm{\phi}$ . In a graphical model, this means data $\mathbf{y}$ that are conditionally independent of $\bm{\theta}$ given $\bm{\phi}$ , for example $\mathbf{y}=y_{1}$ and $\bm{\phi}=\mu_{1}$ in Figure 1. 3. 3.

The expected value of sample information $EVSI(\mathbf{y})$ is the reduction in loss we would expect from collecting an additional dataset $\mathbf{y}$ of a specific design.

[TABLE]

The inner expectation is now with respect to the updated posterior distribution of $\bm{\theta}|\mathbf{y}$ , after learning $\mathbf{y}$ as well as the existing data $\mathbf{x}$ , or “preposterior” (Berger, 2013). If we can express the costs $C(\mathbf{y})$ of obtaining $\mathbf{y}$ using the same loss metric, we can further define the expected net benefit of sampling as $EVSI(\mathbf{y})-C(\mathbf{y})$ , and typically seek the sample size that maximises this (Parmigiani and Inoue, 2009).

2.3 Value of information in different decision problems

Finite-action decisions

For a choice of $d$ among a finite set $\{1,\ldots,D\}$ with loss $L(d,{\bm{\theta}})=\alpha_{d}$ and $\bm{\alpha}=\{\alpha_{1},\ldots,\alpha_{d},\ldots,\alpha_{D}\}$ , the expected loss with current information is $\min_{d}\{E_{\bm{\alpha}}(\alpha_{d})\}$ , so (Raiffa and Schlaifer, 1961)

[TABLE]

Point estimation of a parameter

When the decision is the choice of a point estimate $\hat{\bm{\alpha}}$ of a vector of parameters $\bm{\alpha}$ , with quadratic loss

[TABLE]

for a symmetric, positive-definite $H$ , the optimal estimate with current information is the posterior mean, $\hat{\bm{\alpha}}=E_{\bm{\alpha}}(\bm{\alpha})$ . For a scalar $\bm{\alpha}=\alpha$ and $H=1$ , the expected loss is $\mbox{var}(\alpha)$ under current information and zero under perfect information, so that $EVPI=\mbox{var}(\alpha)$ and

[TABLE]

the expected reduction in variance given new information. Expression (4) is used by Oakley and O’Hagan (2004) and Saltelli et al. (2004) as a measure of sensitivity of the output of a deterministic model $\alpha=g(\phi,\ldots)$ to an uncertain input $\phi$ , termed the main effect of $\phi$ , but this has not been extended to the EVSI of a potential dataset $\mathbf{y}$ in this context.

Alternatively, an absolute error loss (Bernardo and Smith, 1994) gives $\hat{\alpha}$ as the posterior median and value measures based on the mean absolute deviation.

Point estimation of multiple parameters

The purpose of a multiparameter evidence synthesis of the form in Figure 1 is typically to estimate several correlated parameters of interest, comprising a vector $\bm{\alpha}$ , say. Most simply, we could conduct independent value of information analyses for each component of $\bm{\alpha}$ . In more formal decision analyses we may want a scalar loss for the overall vector $\bm{\alpha}$ . There are various alternatives based on generalisations $v(\bm{\alpha})$ of the variance, which can be used instead of the scalar variance $\mbox{var}(\alpha)$ in equations (4)–(5) to define the expected value of information. These have been applied in the context of Bayesian study design, and we show how they can also be used for the EVPPI and EVSI in evidence synthesis models.

If $H=\mathbf{c}\mathbf{c}^{T}$ in the quadratic loss (3), for some vector of weights $\mathbf{c}$ , then the expected loss is $v(\bm{\alpha})=\mathbf{c}^{T}\mbox{cov}(\bm{\alpha})\mathbf{c}=\mbox{var}(\mathbf{c}^{T}\bm{\alpha})$ , corresponding to optimal estimation of the weighted sum of the parameters, $\mathbf{c}^{T}\bm{\alpha}$ . For example, when the elements $\alpha_{s}$ of $\bm{\alpha}$ are weighted equally, the goal is to minimise the sum of all elements $(r,s)$ of the covariance matrix, $v(\bm{\alpha})=\sum_{r,s}\mbox{cov}(\bm{\alpha})_{r,s}$ , or, if the $\alpha_{s}$ are also independent of each other, $v(\bm{\alpha})=tr(\mbox{cov}(\bm{\alpha}))=\sum_{s}\mbox{var}(\alpha_{s})$ . The same absolute reductions in variance for different components of $\bm{\alpha}$ would then be valued equally. More generally, if $\mathbf{c}$ is given a prior, then loss (3) also arises (see Chaloner and Verdinelli (1995) and references therein). Designs that minimise (3) are Bayesian analogues of classical A-optimal designs. See also Lamboni et al. (2011) for similar measures of sensitivity for multivariate outputs in deterministic computer models. 2. 2.

A Bayesian D-optimal design, on the other hand, minimises the determinant $v(\bm{\alpha})=\det(\mbox{cov}(\bm{\alpha}))$ (Chaloner and Verdinelli, 1995; Ryan et al., 2016). This simplifies to the product of the $\mbox{var}(\alpha_{s})$ when the $\alpha_{s}$ are independent and equally-weighted. Equivalently, a standardised version $\det(\mbox{cov}(\bm{\alpha}))^{1/S}$ , where $S$ is the number of components of $\bm{\alpha}$ , represents a geometric average variance of the $\alpha_{s}$ , adjusted for their covariance.

Here the same relative reductions in variance for different components of $\bm{\alpha}$ would then be valued equally, which would be more appropriate when the output of interest $\bm{\alpha}$ comprises quantities on very different scales and/or with different interpretations.

3 Computation of value of information

3.1 Partial perfect information

Computation of the EVPPI in general is not straightforward. Given a sample from the posterior distribution, the first term in (1) can be calculated by a Monte Carlo mean. The double expectation in the second term is more challenging. Strong et al. (2014) presented a method for estimating the EVPPI which avoids an expensive nested Monte Carlo procedure. However this only applied to finite-choice decision problems. We extend the scope of this method to more general problems, including point estimation. Suppose that, given a state of knowledge about $\bm{\alpha}$ represented by a distribution $\psi(.)$ , the expected loss under the optimal decision is a known function $h$ of the mean of $\bm{\alpha}$ under that distribution.

[TABLE]

If $\psi(.)$ is the current posterior, this is $h(E_{\bm{\alpha}}(\bm{\alpha}))$ , and if we were to learn the value of $\bm{\phi}$ , the expected loss would be $h(E_{\bm{\alpha}|\bm{\phi}}(\bm{\alpha}|\bm{\phi}))$ . We can estimate $E_{\bm{\alpha}|\bm{\phi}}(\bm{\alpha}|\bm{\phi})$ by expressing

[TABLE]

where $\epsilon$ is an error term with mean zero. Then using a Monte Carlo sample of $(\bm{\alpha}^{(k)},\bm{\phi}^{(k)}):k=1,\ldots,K$ , we estimate $g(\bm{\phi})$ by regression of $\bm{\alpha}$ on $\bm{\phi}$ . If $\bm{\phi}$ comprises $p$ parameters that could be learnt simultaneously, the regression will have $p$ predictors. Since the functional form of $g()$ will not be known in general, nonparametric regression methods are preferred. This produces a fitted value $\hat{g}(\bm{\phi}^{(k)})$ for each $k$ .

Then the second term in (1) is estimated by a Monte Carlo mean

[TABLE]

Strong et al. (2014) only presented this method for finite choices, where $\bm{\alpha}$ is a vector and $h(E(\bm{\alpha}))=\max_{d}\{E(\alpha_{d})\}$ . Then a separate $g_{d}()$ is estimated to relate each $\alpha_{d}$ to $\bm{\phi}$ , and

[TABLE]

Our more general formulation of this algorithm, which expresses the optimal loss as $h(E(\bm{\alpha}))$ , can be used for point estimation problems. For estimation of a scalar $\alpha$ with quadratic loss, $h(E_{\alpha}(\alpha))=E[(\alpha-E_{\alpha}(\alpha))^{2}]=\mbox{var}(\alpha)$ . We estimate $\mbox{var}(\alpha|\bm{\phi}^{(k)})$ by the squared residual $(\alpha-\hat{g}(\bm{\phi}^{(k)}))^{2}$ , substitute this for $h(\hat{g}(\bm{\phi}^{(k)}))$ and estimate $E_{\bm{\phi}}\left[\mbox{var}_{\alpha|\bm{\phi}}(\alpha|\bm{\phi})\right]$ as the mean, over $k$ , of the squared residuals. Equivalently we can estimate $\mbox{var}(\theta)-E_{\bm{\phi}}\left[\mbox{var}_{\alpha|\bm{\phi}}(\alpha|\bm{\phi})\right]=\mbox{var}_{\bm{\phi}}(E_{\alpha|\bm{\phi}}(\alpha|\bm{\phi}))$ as the variance, over $k$ , of the fitted values. Similarly, for vector $\bm{\alpha}$ and loss functions based on $\mbox{cov}(\bm{\alpha})$ , we can fit regressions to get the marginal mean for each component $\alpha_{d}$ , and calculate the empirical covariance matrix of the residuals.

Several methods of nonparametric regression have been suggested. For small $p$ , Strong et al. (2014) used generalized additive models, with tensor products of spline smoothers to represent interactions between different components of $\bm{\phi}$ . Where $\bm{\phi}$ included about $p=5$ or more components, Gaussian process regression was recommended as a more efficient way of modelling interactions, though the resulting matrix computations rapidly become impractical as the MCMC sample size $K$ increases. Heath et al. (2016) developed an integrated nested Laplace approximation for fitting Gaussian processes more efficiently in this context where $p>=2$ . For the application in Section 4 (with $K=150000$ , $p\leq 3$ ), we have found multivariate adaptive regression splines (Friedman, 1991) via the earth R package (Milborrow, 2011) to be more efficient. Standard errors for the EVPPI estimates can be calculated in general by simulating from the asymptotic normal distribution of the regression coefficients (Mandel, 2013).

For expected losses which are functions of the median or other quantiles, such as absolute error loss, a similar method based on nonparametric quantile regression could be devised.

3.2 Sample information

The regression method above can also be used to estimate the expected value of sample information $EVSI(\mathbf{y})$ . Strong et al. (2015) described the method for finite decision problems. Again we generalize this to any problem satisfying condition (6), including point estimation. The method requires that the information provided by the data $\mathbf{y}$ can be expressed as a low-dimensional sufficient statistic $T(\mathbf{y})$ , so that $E_{\alpha|\mathbf{y}}(\bm{\alpha}|\mathbf{y})=E_{\alpha|\mathbf{y}}(\bm{\alpha}|T(\mathbf{y}))$ . This could be a point estimator of the parameter $\mu$ (as in Figure 1) that $\mathbf{y}$ gives direct information on. As in (7), we can write

[TABLE]

and estimate $g()$ using a regression fitted to a Monte Carlo sample of $(\bm{\alpha}^{(k)},T(\mathbf{y}^{(k)})):k=1,\ldots,K$ , where $\mathbf{y}^{(k)}$ are drawn from their posterior predictive distribution. Then the fitted values $\hat{g}(T(\mathbf{y}^{(k)}))$ enable the double expectation to be estimated as

[TABLE]

Then, for example, for point estimation with quadratic loss, this is the estimated residual variance from the regression, as in Section 3.1.

4 HIV prevalence estimation model

We consider the sub-model of the full HIV burden model (De Angelis et al., 2014; Kirwan et al., 2016) that estimates HIV prevalence in men who have sex with men (MSM), in London. We examine two subgroups of MSM: those who have attended a genitourinary medicine (GUM) clinic in the past year (GMSM) and those who have not (NGMSM), denoting the proportion of all men who are in these subgroups by $\rho_{G}$ and $\rho_{N}$ respectively. For each group $g\in(G,N)$ , we aim to estimate simultaneously these subgroup proportions $\rho_{g}$ , prevalence of HIV in this group $\pi_{g}$ and the proportion of infections that are diagnosed, $\delta_{g}$ . Given these parameters, further important quantities are easily derived: the prevalence of diagnosed ( $\pi_{g}\delta_{g}=(\pi\delta)_{g}$ ) and undiagnosed ( $\pi_{g}(1-\delta_{g})=\overline{(\pi\delta)}_{g}$ ) infection; and the numbers of MSM living with diagnosed ( $\mu_{Dg}=\mu_{pop}\rho_{g}(\pi\delta)_{g}$ ) and undiagnosed ( $\mu_{Ug}=\mu_{pop}\rho_{g}\overline{(\pi\delta)}_{g}$ ) infection, where $\mu_{pop}$ is the number of men (MSM and non-MSM) living in London. Parts of the model refer to a third subgroup, previous MSM (PMSM), men who no longer have sex with men, but the prevalence among this group is much lower, and we do not describe this part in detail.

We construct a Bayesian model to link these quantities with the available evidence provided by various routinely-collected and survey datasets as well as expert belief. Figure 3 shows a directed acyclic graph representing this model, in the form of Figure 1, distinguishing founder nodes, observed data, and outputs of interest. The following sections explain in detail the quantities and relationships illustrated in Figure 3. All data and estimates refer to the year 2012 (unless indicated) and the Greater London area.

4.1 Subgroup membership

The total male population of London, $\mu_{pop}$ , is informed by published data $y_{pop}$ (Office for National Statistics, 2012), assumed to be a Poisson count: $y_{pop}\sim Po(\mu_{pop})$ . The estimated number of people in each group $g$ is therefore $r_{g}=\rho_{g}\mu_{pop}$ , where we assume a prior $\log(\mu_{pop})\sim N(0,1000^{2})$ . Estimates of the subgroup proportions $\rho_{g}$ are informed by data from the National Survey of Sexual Attitudes and Lifestyles (Mercer et al., 2013): $y_{G},y_{N},y_{P}$ , out of $y_{NAT}=824$ men, which we assume to come from a multinomial distribution with probabilities $\rho_{G},\rho_{N},\rho_{P}$ given a uniform Dirichlet prior. $P$ here refers to the PMSM subgroup, with $\rho_{P}$ being the corresponding proportion of men in the group. Thus the expected number of people with HIV (diagnosed or undiagnosed) in group $g$ is $\mu_{g}=\pi_{g}r_{g}$ .

4.2 Registry of diagnosed infections and diagnosed prevalence

Individuals diagnosed with HIV and accessing care in the UK are reported to the SOPHID registry (Surveillance of Prevalent HIV Infections Diagnosed) (Kirwan et al., 2016). From SOPHID we obtain the reported number of HIV diagnoses for MSM, $y_{M}\sim Po(\mu_{M})$ . We assume a small reporting bias of unknown direction, through $\log(\mu_{M})=a_{S}+log(\mu_{D})$ where $\exp(a_{S})\sim N(1,0.018^{2})$ , giving a prior $90\%$ interval of about $(-3\%,3\%)$ for the adjustment to the number of MSM HIV diagnoses $\mu_{M}$ . After this adjustment, $\mu_{D}=\mu_{DG}+\mu_{DN}+\mu_{DP}$ is the expected number of diagnoses among MSM, summed from the expected numbers of diagnoses among GMSM, NGMSM and PMSM respectively. The following sections explain where $\mu_{DG},\mu_{DN}$ come from; $\mu_{DP}$ is modelled using similar techniques.

Since SOPHID does not record GUM clinic attendance, to strengthen the evidence on diagnosed prevalence in GMSM we include data from the HIV and AIDS New Diagnoses Database (HANDD) (Kirwan et al., 2016), recording how many of the $y_{M}$ prevalent diagnosed MSM were newly diagnosed in 2012 and reported to have been diagnosed initially in a GUM clinic. These new diagnoses, denoted $y_{H}$ , are modelled as $y_{H}\sim Bin(y_{M},p_{H})$ , where $p_{H}$ is assumed to be a lower bound for the proportion of prevalent diagnosed MSM who have attended a GUM clinic in 2012. This bound is expressed through $p_{H}=a_{H}\mu_{DG}/\mu_{D}$ , where $a_{H}\sim U(0,1)$ is the unknown probability that a prevalent diagnosed MSM who has attended a GUM clinic in 2012 was newly diagnosed that year. $y_{H}$ therefore gives us additional indirect information on $\mu_{DG}$ , the number of prevalent diagnosed GMSM.

The number of diagnosed infections is related to the total number of infections in each group $g=G,N$ as $\mu_{Dg}=\delta_{g}\mu_{g}$ . The proportion of infections that are diagnosed $\delta_{g}$ is not known, but given our inferences about the undiagnosed prevalence $\overline{(\pi\delta)}_{g}=\pi_{g}(1-\delta_{g})$ (explained in the subsequent sections), we can exploit the implicit constraint $1-\delta_{g}>\overline{(\pi\delta)}_{g}$ . Therefore we define $\delta_{g}=a_{\delta g}(1-\overline{(\pi\delta)}_{g})$ , with $a_{\delta g}\sim U(0,1)$ , and the diagnosed prevalence $(\pi\delta)_{g}=\pi_{g}\delta_{g}$ in each group follows.

4.3 Undiagnosed prevalence among GMSM

Information about undiagnosed infections in GMSM is obtained from GUMCAD (Genitourinary Medicine Clinic Activity Dataset) (Kirwan et al., 2016) a registry of attendance episodes in GUM clinics. HIV tests are offered routinely to previously-undiagnosed patients. Thus we have a sequence of observations $g_{i}$ , representing firstly the number of GUM clinic visits ( $g_{1}=35121$ ) and then the number of these where the patient has no previous HIV diagnosis ( $g_{2}$ ), an HIV test is offered ( $g_{3}$ ), an HIV test is accepted ( $g_{4}$ ), or a new HIV diagnosis is made ( $g_{5}$ ). For $i=2,\ldots,5$ , $g_{i}\sim Bin(g_{i-1},\gamma_{i-1})$ , with priors $\gamma_{1},\gamma_{2},\gamma_{3}\sim U(0,1)$ and $\gamma_{4}\sim U(0,0.15)$ (see below). An HIV infection may therefore remain undiagnosed if either a test is not offered or the patient opts out of testing. We can then decompose the prevalence of undiagnosed infection $\overline{(\pi\delta)}_{G}$ into “unoffered” $\pi^{(UN)}$ and “opt-out” $\pi^{(OP)}$ components.

[TABLE]

Both of those require strong prior assumptions to estimate, which will later be relaxed in a sensitivity analysis (§4.5). Firstly, the prevalence of infection that remains undiagnosed due to an unoffered test is

[TABLE]

where $\gamma_{1}(1-\gamma_{2})$ is the proportion of clinic attenders that are undiagnosed but not offered a test, and $p^{(UN)}$ is the probability that a test would be positive for these people. We assume the prevalence in this group is between 0.5 and 1.5 times the prevalence in people actually tested, and $\mbox{logit}(p^{(UN)})=\mbox{logit}(\gamma_{4})+a^{(UN)}$ , with $a^{(UN)}\sim U(\log(0.5),\log(1.5))$

Secondly, the prevalence of infection remaining undiagnosed due to refusing a test is

[TABLE]

$\gamma_{1}\gamma_{2}(1-\gamma_{3})$ is the proportion of clinic attenders that are undiagnosed and offered a test but opt out. We assume this group has an underlying HIV prevalence higher than those given tests, but not more than 15%, so that the excess prevalence in this group is $a^{(EX)}=a^{(OP)}(0.15-\gamma_{4})$ , where $a^{(OP)}\sim U(0,1)$ , and the prior on $\gamma_{4}$ is truncated above at 0.15.

A small amount of additional evidence on $\overline{(\pi\delta)}_{G}$ is available from another dataset, GUM Anon, a convenience survey of men not previously diagnosed with HIV who had attended a GUM clinic in the previous year. This gives direct information about the prevalence of HIV among previously undiagnosed GMSM,

[TABLE]

where $\pi^{(GD)}=\prod_{1}^{4}\gamma_{r}$ is the prevalence of newly-diagnosed infection among clinic attenders. The data in GUM Anon are $g^{(A)}\sim Bin(g^{(AN)},\pi^{(GA)})$ , where $g^{(AN)}=85$ .

4.4 Undiagnosed prevalence among NGMSM

To inform undiagnosed HIV prevalence in NGMSM, we use data from the Gay Men’s Sexual Health Survey (GMSHS) (Aghaizu et al., 2016), based on face-to-face interviews in selected venues where participants were offered anonymous HIV tests. While this group is likely to have a higher HIV prevalence than the general population, we assume that the relative odds of having HIV between NGMSM and GMSM is the same as in the general population. The GMSHS data provide the numbers $y^{(GM)}_{g}$ out of $n^{(GM)}_{g}$ previously-undiagnosed people in group $g$ who tested positive for HIV (493 GMSM and 452 NGMSM) so that $y^{(GM)}_{g}\sim Bin(n^{(GM)}_{g},p^{(GM)}_{g})$ , with $p^{(GM)}_{g}\sim U(0,1)$ . Defining the odds $o(p)=p/(1-p)$ , we apply the resulting odds ratio $or^{(GM)}=o(p^{(GM)}_{N})/o(p^{(GM)}_{G})$ to the baseline estimated from GUMCAD (Section 4.3), giving $o(\overline{(\pi\delta)}_{N})$ = $o(\overline{(\pi\delta)}_{G})or^{(GM)}$ .

4.5 Alternative assumptions

The results presented in section 5 are for the above model assumptions, unless specified otherwise. Two alternative assumptions are also explored.

(a) Undiagnosed prevalence from GUM Anon only

To avoid the strong prior assumptions on prevalence among those not offered a test or refusing a test, which are necessary to use the GUMCAD data to infer $\overline{(\pi\delta)}_{g}$ , we could infer $\overline{(\pi\delta)}_{g}$ from GUM Anon alone. To construct this model, we replace equation (8) by a $U(0,1)$ prior on $\overline{(\pi\delta)}_{g}$ , although the GUMCAD data are still used to estimate the parameters $\pi^{(GD)}$ and $\gamma_{1}$ relating the prevalence in GUM Anon to $\overline{(\pi\delta)}_{g}$ .

(b) GUMCAD also informs diagnosed prevalence

Instead of being inferred indirectly though the graph, the diagnosed prevalence can be modelled directly as

[TABLE]

where $1-\gamma_{1}$ is the probability of a previous diagnosis, and $\gamma_{1}\gamma_{2}\gamma_{3}\gamma_{4}$ is the probability of newly-diagnosed infection, in GUMCAD. This is not done in the base case due to concerns about inconsistencies in reporting of diagnoses between GUMCAD and SOPHID/HANDD.

5 Value of information in HIV prevalence model

The model outputs of interest (as in Figures 1,3) are $\bm{\alpha}=$ ( $(\pi\delta)_{G}$ , $(\pi\delta)_{N}$ , $\overline{(\pi\delta)}_{G}$ , $\overline{(\pi\delta)}_{N}$ , $\mu_{DG}$ , $\mu_{DN}$ , $\mu_{UG}$ , $\mu_{UN}$ , $\mu$ ), the diagnosed and undiagnosed prevalences among both GMSM and NGMSM, and the corresponding absolute numbers of people living with HIV/AIDS (or “case-counts”), and the total number of MSM with HIV/AIDS $\mu=\mu_{DG}+\mu_{DN}+\mu_{UG}+\mu_{UN}$ . Samples from the posterior distributions are generated using Hamiltonian Monte Carlo methods in the Stan software (Stan Development Team, 2016). These are illustrated in Figure 8 along with the overall prevalence $\pi_{g}=(\pi\delta)_{g}+\overline{(\pi\delta)}_{g}$ in each group $g$ , and each of these quantities summed over the two groups $g$ . The estimates of diagnosed prevalence in all MSM (top panel) are reasonably precise, while the corresponding estimates for NGMSM and GMSM are more uncertain. Estimates of undiagnosed prevalence are lower and more precise. Full results under the two alternative assumptions are presented in the appendix.

5.1 Partial perfect information (EVPPI) for single outputs

Defining the decision problem as point estimation of $\bm{\alpha}$ with quadratic loss, we use EVPPI formula (4) to determine which parameters $\phi$ contribute most to the uncertainty about each component of $\bm{\alpha}$ , thus which $\phi$ may be worth learning more precisely. We will take $\phi$ to include the founder nodes of the graph illustrated in Figure 3. Since they are related to the $\bm{\alpha}$ through a network of deterministic functions, perfect knowledge of these implies perfect knowledge of $\bm{\alpha}$ . Each of the $\phi$ are either directly informed by data or given a substantive prior distribution based on belief. In the former case, EVPPI measures the maximum potential value of collecting more data from the same source. In the latter case, it will not necessarily be feasible to collect data to improve the precision of the belief, but EVPPI is still useful as a measure of how much of the uncertainty in $\bm{\alpha}$ is explained by the uncertainty in the parameter.

The results are presented in Figure 10 as a grid whose $r,s$ entry is colored according to $EVPPI_{\alpha_{s}}(\phi_{r})/\mbox{var}(\alpha_{s})$ , the proportion of variance in $\alpha_{s}$ which would be reduced if we learnt $\phi_{r}$ . The lighter cells correspond to $\phi_{r}$ with greater EVPPI. Standard errors in these and all following EVPPI and EVSI estimates, arising from uncertainty in the coefficients of the regression (7), were negligible, at less than 1% of the EVPPI or EVSI estimates.

The parameters $a_{\delta G}$ and $a_{\delta N}$ , governing the proportions of HIV infections in each group that are diagnosed in each of the two groups, and the probability $a_{H}$ that a GMSM is newly diagnosed in a GUM clinic, explain most of the uncertainty in the diagnosed prevalences $(\pi\delta)_{G},(\pi\delta)_{N}$ and the corresponding numbers of people diagnosed $\mu_{DG},\mu_{DN}$ . Direct data on any of these parameters would be difficult to obtain. However, if we were willing to make the assumption in (10), the estimates of diagnosed prevalence would become more precise, for example the posterior median (SD) of $(\pi\delta)_{G}$ would change from 0.06 (0.13) to 0.051 (0.001), though the extent of uncertainty around $(\pi\delta)_{N},\mu_{DN}$ would not change substantively.

For the undiagnosed prevalences $\overline{(\pi\delta)}_{G}$ , $\overline{(\pi\delta)}_{N}$ and undiagnosed case count $\mu_{UG}$ , Figure 10 shows that more GUM-Anon data (via $\pi^{(GA)}$ ), more GMSHS data (via $or^{(GM)}$ ) and more NATSAL data (via $\rho_{UG}$ ) respectively would give the greatest uncertainty reductions. These outcomes, however, are already precisely estimated in absolute terms (Figure 8). The number of NGMSM $\mu_{UN}$ with undiagnosed HIV is more uncertain, with 95% CI (277,1446), and more GMSHS data would be potentially valuable to reduce this uncertainty.

If $\overline{(\pi\delta)}_{G}$ were informed only from the 85 people observed in GUM Anon (alternative assumption (a)), the estimates of undiagnosed prevalence or case counts become extremely uncertain, for example, $\mbox{var}(\mu_{UN})$ increases from $304^{2}$ to $2859^{2}$ . We could reduce this uncertainty by collecting more GUM Anon data — since $EVPPI_{\mu_{UN}}(\pi^{(GA)})$ is $p=62\%$ of $\mbox{var}(\mu_{UN})$ , more GUM Anon data could reduce $\mbox{var}(\mu_{UN})$ to a minimum of $2859^{2}(1-p)=1770^{2}$ (note that the square root of the expected variance after learning data is not the same as the expected standard deviation).

5.2 Partial perfect information for multiple outputs

Staying with alternative assumption (a), suppose we wish to calculate the maximum potential value of extra GUM Anon data for jointly reducing the uncertainty about the number of GMSM, NGMSM and PMSM with undiagnosed HIV, so that $\bm{\alpha}$ is the vector $(\mu_{UG},\mu_{UN},\mu_{UP})$ . As described in Section 2.3, we could simply calculate the standard EVPPI based on a scalar output $\alpha$ redefined as their sum, $\mu_{U}=\mu_{UG}+\mu_{UN}+\mu_{UP}$ , the total number of MSM with undiagnosed HIV, whose posterior median is 5164 (SD 3271). This would ensure that any data expected to reduce the variance of any of these three outputs by the same (additive) amount would be valued equally. From this, we find that extra GUM Anon data would be expected to reduce $\mbox{var}(\mu_{U})$ from $3271^{2}$ to a minimum of $1803^{2}$ . Since $\mu_{U}$ is dominated by NGMSM (posterior median of $\mu_{UN}$ is 4190), this is mostly explained by an expected reduction in $\mbox{var}(\mu_{UN})$ from $2859^{2}$ to a minimum of $1770^{2}$ .

Alternatively, suppose both the prevalences and the case counts are of interest, for example in NGMSM, so that $\bm{\alpha}=(\overline{(\pi\delta)}_{N},\mu_{UN})$ . Since these two components are on very different scales, the Bayesian “D-optimality” criterion $v(\bm{\alpha})=\det(\mbox{cov}(\bm{\alpha}))$ would be a preferable measure of overall expected loss due to uncertainty. We use this criterion to compare the maximum expected value of extra GUM Anon data and extra GMSHS data, which combine to estimate the outcomes for NGMSM as described in Section 4.4. The EVPPI is interpreted as the expected reduction in the product of $\mbox{var}(\overline{(\pi\delta)}_{N})$ and $\mbox{var}(\mu_{UN})$ given by extra GUM Anon or GMSHS data, adjusted for their covariance. This is 421 and 132 respectively, favouring extra data from GUM Anon. Though in this example, examining expected reductions in $\mbox{var}(\overline{(\pi\delta)}_{N})$ or $\mbox{var}(\mu_{UN})$ separately would lead to the same conclusion, since $\overline{(\pi\delta)}_{N}$ is defined as the proportion $\mu_{UN}/r_{N}$ of NGMSM with HIV, and GUM Anon and GMSHS are not informative about the number $r_{N}$ of NGMSM, thus extra data informs $\mu_{UN}$ entirely through information on $\overline{(\pi\delta)}_{N}$ (or vice versa).

5.3 Sample information

We now estimate the expected value of data with specific sample sizes for improving the precision of the estimated number of people $\mu_{U}$ with undiagnosed HIV. Using the GUMCAD data and associated strong prior assumptions, the posterior median of $\mu_{U}$ is 804 (SD 323), compared to 5164 (SD 3271) with this information excluded. We compare the value of additional data from GUM Anon and additional data from GMSHS (on top of their original sample sizes of 85 and 945 respectively) for reducing these posterior standard deviations.

The expected value of sample information (EVSI) is computed for a series of sample sizes $n$ using the method in Section 3.2. For GUM Anon (Section 4.3), the sufficient statistic $T(\mathbf{y})$ consists of the empirical HIV prevalence $\mathbf{y}/n$ from an additional survey $\mathbf{y}\sim Bin(n,\pi^{(GA)})$ . For GMSHS (Section 4.4), given a sample size $n$ , $\mathbf{y}=(N^{(GM)}_{G},Y^{(GM)}_{G},Y^{(GM)}_{N})$ , where $N^{(GM)}_{G}$ is the number of previously-undiagnosed MSM in the future sample of $n$ who attend GUM clinics (the equivalent of the observed $n^{(GM)}_{G}=493$ ). Then $Y^{(GM)}_{G}$ and $Y^{(GM)}_{N}$ are the numbers of men out of denominators $N^{(GM)}_{G}$ and $N^{(GM)}_{N}=n-N^{(GM)}_{G}$ (GMSM and NGMSM respectively) who test positive for HIV, the equivalents of the observed $y^{(GM)}_{G}=20,y^{(GM)}_{N}=492$ . We take $T(\mathbf{y})=o(\hat{p}^{(GM)}_{N}(\mathbf{y}))/o(\hat{p}^{(GM)}_{G}(\mathbf{y}))$ , a point estimator of the odds ratio, where $\hat{p}^{(GM)}_{G}(\mathbf{y})$ is an estimator of the proportion of MSM in group $g$ who have HIV. To avoid zeros in the denominator $o(\hat{p}^{(GM)}_{G}(\mathbf{y}))$ , we use a Bayesian estimator $\hat{p}^{(GM)}_{G}(\mathbf{y})=(Y^{(GM)}_{G}+0.5)/(N^{(GM)}_{G}+1)$ , the posterior mean of a binomial proportion under a Jeffreys Beta(0.5,0.5) prior, rather than the empirical proportion $Y^{(GM)}_{G}/N^{(GM)}_{G}$ .

Figure 6 shows $\mbox{var}(\mu_{U})-EVSI(\mathbf{y})$ , the expected variance remaining after data collection, under the two alternative assumptions. With the strong priors, $\mu_{U}$ is relatively well informed, and extra data from GUM Anon at realistic sample sizes (1000 or less) would not noticeably reduce $\mbox{var}(\mu_{U})$ . GMSHS data would be more valuable, through improving the estimate of $\mu_{UN}$ , the more uncertain contributor to $\mu_{U}=\mu_{UG}+\mu_{UN}$ . 1000 extra observations from GMSHS would be expected to reduce $\mbox{var}(\mu_{U})$ from $323^{2}$ to $282^{2}$ .

Without the strong prior information, $\mbox{var}(\mu_{U})=3271^{2}$ is substantially greater, and $\mu_{U}$ is only directly informed by the 85 observations from GUM Anon. Extra data from this source would be valuable, for example, another 500 observations would be expected to reduce this variance to $2183^{2}$ . Relative to these improvements, GMSHS data of the same size would be much less valuable. GMSHS data however would be expected to give around the same absolute reductions in $\mbox{var}(\mu_{U})$ , whether or not the strong priors are included.

6 Summary and potential further work

We have presented tools to find the most influential sources of uncertainty in a multiparameter evidence synthesis context and determine the expected value of extra data. We generalized methods, previously only applied in deterministic models, to complex graphical models, a class which also includes hierarchical models. We have shown how VoI methods developed for formal finite-choice decision problems can be extended to deal with estimation of single or multiple quantities. Therefore the same methods can be used for formal decision problems based on graphical models, e.g. an HIV prevalence estimation model such as ours could be used to compare strategies for HIV testing. This would allow the optimal sample size of future data to be determined, through a health economic loss that trades off the cost of data collection with the expected health benefits gained from extra information that reduces the probability of choosing a sub-optimal policy.

In the HIV application, we found that structural assumptions, such as whether to include a particular piece of information, were influential to both the parameter estimates and the value of information. Such uncertainties might be parameterised (see, e.g. Strong et al., 2012), for example a particular prior or dataset of uncertain relevance could be discounted using an unknown weight (e.g. Neuenschwander et al., 2009). The EVPPI of the extra parameter would then quantify this uncertainty in the context of all other uncertainties, referred to as the “expected value of model improvement” by Strong and Oakley (2014).

Note that VoI refers to the expected value of potential future information, which differs from the observed value of a dataset $x_{i}$ currently included in the model. The latter could be computed as the observed reduction in loss when the model is refitted without $x_{i}$ . This could demonstrate the value of past data to the policymaker responsible for funding the collection of future data of the same type. For surveys or longitudinal studies conducted at regular intervals, VoI might be used to determine the expected value of future surveys or follow-up, although a full analysis would require modelling the expected changes through time in the quantities, such as disease prevalence or incidence, informed by the data.

While our method is broadly applicable, the details of computation for different decision problems and loss functions will be different. We discussed finite-action decisions and point estimation. A more general decision problem is to estimate the entire uncertainty distribution of $\bm{\theta}$ . The standard posterior $p(\bm{\theta}|\mathbf{y})$ is then optimal under a log scoring rule (Bernardo and Smith, 1994), and (following Lindley, 1956) standard Bayesian design theory aims to maximise the information gain from new data $\mathbf{y}$ , which we can write as $EVSI(\mathbf{y})=E_{\bm{\theta}}(-\log(p(\bm{\theta})))+E_{\mathbf{y}}E_{\theta|\mathbf{y}}\{\log(p(\bm{\theta}|\mathbf{y}))$ . Under linear models (Chaloner and Verdinelli, 1995), this is equivalent to minimising $\det(\mbox{cov}(\bm{\theta}))$ , but more generally this is challenging to compute (Ryan et al., 2016).

Note that the VoI approach to sensitivity analysis is an example of the “global” approach, which examines the changes in model outputs given by varying parameters within the ranges of their belief distributions. The “local” approach is based on examining the posterior geometry resulting from small parameter perturbations around a base case, e.g. Roos et al. (2015) assess the robustness of hierarchical models to prior assumptions in this way. While the global approach is easier to interpret, as discussed by Oakley and O’Hagan (2004) and Roos et al. (2015), it conditions on one particular prior specification, and parameterising all potential prior beliefs or structural assumptions would be impractical.

The regression method for VoI computation that we described requires only a MCMC sample from the joint distribution of parameters of interest $\bm{\phi}$ and outputs $\bm{\alpha}$ . Additionally for EVSI it requires that the information in the new data $\mathbf{y}$ can be condensed into an analytic sufficient statistic $T(\mathbf{y})$ . Alternative methods which exploit particular analytic structures of $g()$ , where $\alpha$ is a known function $g(\bm{\phi})$ , thus avoiding a regression approximation, were discussed by Madan et al. (2014) for EVPPI and and Ades et al. (2004) for EVSI. Menzies (2016) also presented an importance resampling method for EVSI computation which needs only a single MCMC sample and not a sufficient statistic.

In conclusion, the consideration of future evidence requirements is an often-neglected part of statistical analysis. The Value of Information methods we have presented provide a practicable set of tools for achieving this aim in the context of Bayesian evidence synthesis.

Appendix: Supplementary figures

Bibliography40

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Ades and Sutton (2006) Ades, A. E. and Sutton, A. J. (2006), ‘Multiparameter evidence synthesis in epidemiology and medical decision-making: current approaches’, Journal of the Royal Statistical Society, Series A 169 (1), 5–35.
3Ades et al. (2004) Ades, A., Lu, G. and Claxton, K. (2004), ‘Expected value of sample information calculations in medical decision modeling’, Medical Decision Making 24 (2), 207–227.
4Aghaizu et al. (2016) Aghaizu, A., Wayal, S., Nardone, A., Parsons, V., Copas, A., Mercey, D., Hart, G., Gilson, R. and Johnson, A. (2016), ‘Sexual behaviours, HIV testing, and the proportion of men at risk of transmitting and acquiring HIV in London, UK, 2000-13: a serial cross-sectional study’, The Lancet HIV 3 (9), e 431–e 440.
5Berger (2013) Berger, J. O. (2013), Statistical decision theory and Bayesian analysis , Springer.
6Bernardo and Smith (1994) Bernardo, J. M. and Smith, A. F. M. (1994), Bayesian Theory , Wiley, Chichester.
7Chaloner and Verdinelli (1995) Chaloner, K. and Verdinelli, I. (1995), ‘Bayesian experimental design: A review’, Statistical Science pp. 273–304.
8Claxton and Sculpher (2006) Claxton, K. P. and Sculpher, M. J. (2006), ‘Using value of information analysis to prioritise health research’, Pharmacoeconomics 24 (11), 1055–1068.