Improved Confidence Regions in Meta-analysis of Diagnostic Test Accuracy
Tsubasa Ito, Shonosuke Sugasawa

TL;DR
This paper introduces a new, simple, and accurate confidence region method for meta-analysis of diagnostic test accuracy, addressing the limitations of standard methods that underestimate errors.
Contribution
The paper proposes an asymptotic expansion-based confidence region that improves inference accuracy without complex computations like bootstrap or Monte Carlo methods.
Findings
The new method provides more accurate coverage probabilities.
Simulation studies confirm its effectiveness over standard methods.
Applied to alcohol screening test data, it yields more reliable confidence regions.
Abstract
Meta-analyses of diagnostic test accuracy (DTA) studies have been gathering attention in research in clinical epidemiology and health technology development, and bivariate random-effects model is becoming a standard tool. However, standard inference methods usually underestimate statistical errors and possibly provide highly overconfident results under realistic situations since they ignore the variability in the estimation of variance parameters. To overcome the difficulty, a new improved inference method, namely, an accurate confidence region for the meta-analysis of DTA, by asymptotically expanding the coverage probability of the standard confidence region. The advantage of the proposed confidence region is that it holds a relatively simple expression and does not require any repeated calculations such as Bootstrap or Monte Carlo methods to compute the region, thereby the proposed…
| 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | ||
|---|---|---|---|---|---|---|---|---|---|
| 8 | 27.2 | 63.4 | 75.1 | 81.4 | 85.0 | 87.2 | 89.1 | 90.8 | |
| 16 | 49.0 | 74.9 | 83.1 | 87.4 | 89.7 | 91.6 | 92.7 | 93.7 | |
| 24 | 56.2 | 78.0 | 85.4 | 88.9 | 91.2 | 92.7 | 93.7 | 94.6 | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMeta-analysis and systematic reviews · Statistical Methods in Clinical Trials · Advanced Statistical Methods and Models
Improved Confidence Regions in Meta-analysis of Diagnostic Test Accuracy
Tsubasa Ito1 and Shonosuke Sugasawa2
1M&D Data Science Center, Tokyo Medical and Dental University
2Center for Spatial Information Science, The University of Tokyo
Abstract
Meta-analyses of diagnostic test accuracy (DTA) studies have been gathering attention in research in clinical epidemiology and health technology development, and bivariate random-effects model is becoming a standard tool. However, standard inference methods usually underestimate statistical errors and possibly provide highly overconfident results under realistic situations since they ignore the variability in the estimation of variance parameters. To overcome the difficulty, a new improved inference method, namely, an accurate confidence region for the meta-analysis of DTA, by asymptotically expanding the coverage probability of the standard confidence region. The advantage of the proposed confidence region is that it holds a relatively simple expression and does not require any repeated calculations such as Bootstrap or Monte Carlo methods to compute the region, thereby the proposed method can be easily carried out in practical applications. The effectiveness of the proposed method is demonstrated through simulation studies and an application to meta-analysis of screening test accuracy for alcohol problems.
Key words: Asymptotic expansion; Bias correction; Confidence region; random-effects model
Introduction
Evidence synthesis methods have been gathering attention in diagnostic test accuracy (DTA) studies in clinical epidemiology and health technology development (Leeflang et al., 2008). In this meta-analysis, the summary statistics in each study are two primary correlated outcomes of diagnostic, sensitivity and false positive rate (), and we are typically interested in summary receiver operating characteristic curve. Moreover, DTA from different sources for studies are generally heterogeneous due to various factors, which should be adequately addressed to avoid underestimation of statistical errors and misleading conclusions (Higgins and Green, 2011). Due to the potential correlations between two summary measures and potential heterogeneity, the bivariate random-effects models is adopted as the standard method for the meta-analysis (Reitsma et al., 2005; Harbord et al., 2007).
In the bivariate random-effects meta-analyses, standard inference methods depend on large sample approximations for the number of studies synthesized, for example the extended DerSimonian-Laird methods (Chen et al., 2012; Jackson et al., 2010, 2013) and restricted maximum likelihood (REML) estimation (Reitsma et al., 2005; Jackson et al., 2011), but the numbers of trials are often moderate or small in practice. In this situation, validity of the inference methods can be violated, which may lead over-confidence results, that is, coverage probabilities of the confidence regions or intervals cannot retain their nominal confidence levels and also the type-I error probabilities of the corresponding tests can be inflated. Such problem with random-effects models was well recognized in the context of both univariate and multivariate meta-analysis, even when the models are completely specified (Veroniki et al., 2019). Recently, several refined methods have been proposed to improve confidence intervals in multivariate meta-analysis. For example, Noma et al. (2018) developed improved confidence intervals in network meta-analysis using Bartlett-type corrections, and Noma et al. (2020) and Sugasawa and Noma (2020) developed a unified method for computing accurate confidence intervals and regions in general random-effects meta-analysis. However, these methods require computationally very intensive methods based on Monte Carlo or Bootstrap methods. Also these methods considered confidence intervals or regions by inverting statistical hypothesis tests, thereby feasible ways to construct confidence regions are not necessarily obvious. On the other hand, there are a few analytical approaches to improve the standard approaches. Noma (2011) and Guolo (2012) considered higher order likelihood inference in the univariate meta-analysis, which cannot be directly applicable to more complicated multivariate meta-analysis. As more general approaches, Zucker et al. (2000) proposed an improved likelihood test in general linear mixed models through asymptotic expansions of the (restricted) maximum likelihood estimators, but the results include tedious algebraic expressions and are not useful in practice.
In this paper, we propose an improved confidence region for the bivariate random-effects meta-analysis for DTA, which does not require any repeated calculation methods and has relatively simple analytical expressions, thereby the proposed method could be easily employed in practical applications. The key mathematical tool is the distributional properties between the ordinary least squares estimator and residuals, and define a class of estimators of variance parameters in random-effects models. Then, we find a relatively simple formula for asymptotic approximation of the coverage probability of the crude Wald-type confidence intervals and regions, and construct a second order accurate confidence region. We carry out extensive simulation studies to compare the performance of the proposed confidence region with that of the standard REML method, and demonstrate that the proposed method shows quite reasonable empirical coverage than REML while the computational cost in both methods are almost identical. We also demonstrate the proposed method through an application to meta-analysis of screening test accuracy for alcohol problems.
This paper is set out as follows. In Section 2, we describe the proposed confidence region under bivariate random-effects models. In Section 3, we numerically demonstrate the proposed confidence region together with existing methods through extensive simulation studies and an application with real dataset. We conclude with a short discussion in Section 4. R code implementing the proposed method is available at GitHub repository (https://github.com/sshonosuke/CCR-BMA).
Improved Confidence Regions in Meta-analysis for Diagnostic Test Accuracy
Bivariate random-effects models and confidence region
There has been increasing interest in systematic reviews and meta-analyses of data from diagnostic accuracy studies. For this purpose, a bivariate random-effect model (Reitsma et al., 2005; Harbord et al., 2007) is widely used. Following Reitsma et al. (2005), we define and as the logit-transformed true sensitivity and specificity, respectively, in the th study. Let and be the observed logit-transformed sensitivity and specificity, and and are associated standard errors. The bivariate model assumes that and follow bivariate normal distributions:
[TABLE]
where is a vector of the average logit-transformed sensitivity and specificity, and . Note that there is no correlation between and given since sensitivity and specificity are calculated based on individuals identified as positive and negative, respectively. Here is unstructured, so that it allows correlation between and . Let , with and . Then, the model (1) is equivalent to .
Our primary interest is a confidence region of . Hence, the variance-covariance matrix is a nuisance parameter. These parameters are typically estimated via (restricted) maximum likelihood methods based on the model assumption (1). For summarizing the results of the meta-analysis, we typically employ confidence region of rather than separate confidence intervals since sensitivity and specificity could be highly correlated. Reitsma et al. (2005) suggested the confidence region for as the interior points of the ellipse defined as
[TABLE]
where is the generalized least squares estimator of and is the restricted maximum likelihood estimator of , is the variance-covariance matrix of , is the restricted maximum likelihood estimator and is the upper point of the distribution with degrees of freedom. The joint confidence region (2) is approximately valid, that is, the coverage error converges to as the number of studies goes to infinity. However, when is not sufficiently large, the coverage error is not negligible, and the region (2) would under-cover the true .
Improved confidence region
In this work, we derive an improved confidence region whose coverage error is , which has higher order accuracy than the standard confidence region (2). The main idea is to derive an approximation formula of the coverage probabilities of the confidence region of the form (2) with a certain class of estimators for , and derive an improved confidence region in an analytical form.
We consider a class of estimators satisfying the following conditions:
- (C1)
is an even function of and translation invariant, that is, , and for any .
- (C2)
is -consistent and is second-order unbiased, namely and .
- (C3)
is a function of with .
The first condition (C1) is typically satisfied by typical estimators including (restricted) maximum likelihood estimator and moment-based estimators. The -consistency in (C2) is also a standard condition, but second order unbiasedness of is not always satisfied. For example, the maximum likelihood (ML) estimator does not necessarily hold the property. The condition (C3) requires that the estimator should be function of residuals based on ordinary least squares estimator of , which is a key assumption in constructing the proposed confidence region. The condition (C3) enables us to get a relatively simple form of the corrected confidence region. Note that the typical estimators (e.g. REML) does not satisfy the condition (C3). As a specific estimator satisfying all the above conditions, we employ the following moment-based estimator:
[TABLE]
where is the ordinary least squares estimator. Since this estimator is not second-order unbiased, let be a bias corrected version, that is, with , which satisfies all the conditions (C1)(C3). We also note that given the estimator of , the parameter can be estimated via the generalized least squares estimator given by
[TABLE]
In order to improve the coverage accuracy of the confidence region (2), we consider a class of confidence regions of the form
[TABLE]
where is a function with order . When and , the confidence region (3) reduces to (2), thereby the function can be regarded as an adjustment function to achieve reasonable coverage properties. If satisfies the conditions (C1)(C3), the approximation formula of coverage probability of the confidence region (3) can be obtained in a relatively simple form, as summarized in the following theorem.
Theorem 1**.**
Suppose that satisfies the conditions (C1)(C3), and is a function with order . Then, it follows that
[TABLE]
where and are the cumulative distribution and density function of the chi-squared distribution with degrees of freedom , respectively, and and are quantities given by
[TABLE]
with K({\widehat{\Sigma}},\Sigma)=\big{\{}V({\widehat{\Sigma}})-V(\Sigma)\big{\}}V(\Sigma)^{-1}.
From Theorem 1, It turned out that the coverage probability of the confidence region (3) is a simple functional of the djustment function . Hence, to achieve higher accuracy of the confidence region, it suffices to choose such that
[TABLE]
Since and , the solution with respect to is given by
[TABLE]
We also note that since . Then, the confidence region given in (3) with given in (5) holds the second-order accuracy as shown in the following theorem.
Theorem 2**.**
Let be the confided region of the form (3) with given in (5). Then, it follows that .
It is notable that the derived confidence region has analytical expressions, so that it does not require any computationally intensive methods such as bootstrap and Monte Carlo integration as used in Sugasawa and Noma (2020) and Noma et al. (2020). For practical implementation, we need to obtain the expressions of , and given in (4). We here provide approximation formulas. we can obtain which satisfy , where
[TABLE]
for and . The detailed derivation is given in the Supplementary Material. Note that using instead of in the derived confidence region does not change the coverage accuracy shown in Theorem 2 since the difference between and is only .
Numerical Studies
Simulation study
We carried out extensive simulation studies to assess the finite sample performance of the proposed confidence region (3) together with the approximate confidence region (2) by Reitsma et al. (2005). In this study, we do not consider possible competitors by Noma et al. (2020); Sugasawa and Noma (2020) due to two reasons; the coverage performance has been already confirmed in their papers, and calculation of sizes of their confidence regions are so intensive that it is not feasible to repeatedly calculate them in our simulation study. Hence, the following simulation study is supposed to compare the performance of the proposed and standard methods, both of which have almost the same computational time.
In the model (1), we set and and . We considered 8 scenarios of the between study variances and 5 scenarios of the between study correlations . Following, Jackson and Riley (2014), for each simulation, two within-study variances and were simulated from a scaled chi-squared distribution with 1 degree of freedom, multiplied by 0.25, and truncated to lie within the interval , so the expected values of the variance is . We changed the number of studies over 8,16 and 24, and set the nominal level to . Based on 1000 replications, we evaluated empirical coverage probabilities of 95% confidence regions of the true parameters vector obtained from the proposed corrected (CCR) method as well as the standard naive (NCR) method. For simplicity, we evaluated coverage rates assessing rejection rates of the test of null hypothesis for the true parameters. Since areas of the corrected confidence region is approximately times larger than those of naive ones, we also computed median values of among 1000 replications. To see the degree of heterogeneity depending on and , we computed heterogeneity measure given by with and in each iteration, which were averaged over 1000 replications. Note that and lager value of indicates more significant heterogeneity in the data.
The averaged values of are reported in Table 1, which indicates that our simulation scenarios contain a wide range of heterogeneity. The obtained coverage probabilities and the median values of are shown in Figures 1 and 2, respectively. From Figure 1, it is observed that the simulated coverage probabilities of the standard NCR seriously smaller than the nominal level (), especially in the case with the small number of studies (), possibly because of the naive approximation in (2). On the other hand, the proposed CCR provides considerably better performance than NCR as the coverage probabilities are relatively close to the nominal level. Although the coverage probability of CCR tend to be larger than the nominal level when is small and/or is large, such a conservative property would be much more desirable than the over-confident property that NCR shows. From Figure 2, we can see that the area of CCR is much larger than that of NCR since CCR takes account of additional variability due to the estimation of the variance-covariance matrix, so it is quite reasonable that decreases as increases. Moreover, we can also observe that the areas of CCR decreases as increases and increases as increases, which are consistent to the results of the overage probabilities shown in Figure 1.
Example: screening test accuracy for alcohol problems
Here we provide a re-analysis of the dataset given in Kriston et al. (2008), including studies regarding a short screening test for alcohol problems. Following Reitsma et al. (2005), we used logit-transformed values of sensitivity and specificity, denoted by and , respectively, and associated standard errors and . For the bivariate summary data, we first fitted the bivariate models (1) using the restricted maximum likelihood method and found that , and the heterogeneity measure for sensitivity and specificity are respectively given by and , so there seems notable heterogeneity in the data. We then computed CRs of based on NCR (2) given in Reitsma et al. (2005) and the proposed CCR. Following Reitsma et al. (2005), the obtained two CRs of were transformed to the scale , where and are the sensitivity and false positive rate, respectively. The obtained two CRs are presented in Figure 3 with a plot of the observed data, summary points , and the summary receiver operating curve. The approximate CR is smaller than the proposed CR, which may indicate that the approximation method underestimates the variability of estimating nuisance variance parameters. In Figure 3, we also reported the confidence region based on Sugasawa and Noma (2020) using Monte Carlo simulation to compute accurate -values of likelihood ratio statistics. The two regions based on the proposed method and Sugasawa and Noma (2020) are slightly different but both are clearly wider than the naive confidence region. On the other hand, the computation time of the proposed method was less than 1 second while the inference method by Sugasawa and Noma (2020) took more than 12 hours, where the program was run on a PC with a 3 GHz 8-Core Intel Xeon E5 8 Core Processor with approximately 16GB RAM.
Discussion
In this paper, we presented an improved confidence region for random effects meta-analysis for diagnostic test accuracy without using repeated calculations such as Monte Carlo or Bootstrap methods. The proposed confidence region has relatively simple form and they are shown to have second order accurate coverage probability while the standard inference methods (e.g. REML) have significant coverage errors. In simulation studies, we demonstrated that possible under-coverage properties of the standard methods under the small number of studies to be synthesized while the proposed method provides reasonable coverage properties.
A possible limitation of the proposed method might be that the coverage accuracy still depends on the number of studies. On the other hand, inference methods that does not rely on large sample approximation have been recently proposed (e.g. Noma et al., 2020; Sugasawa and Noma, 2020), which are computationally intensive, so they would not be necessarily practical. Then, the proposed method would be regarded as a reasonable compromise between methods with exact empirical coverage and computational efficiency.
Acknowledgments
This research was supported by Japan Society of Promotion of Science KAKENHI (grant number: 18K12757).
Key lemmas
We first introduce lemmas which play important roles in the proof of Theorems 1. The first lemma is used for deriving the conditional distribution of .
Lemma S1**.**
Under the conditions (C1)-(C3)* given in the main document, is independent of for . Also, is a function of , and independent of .*
Proof.
Let , which is distributed as . Since , it holds that
[TABLE]
Since is a full-rank matrix, we have , that is the covariance of and is [math], which implies that is independent of from the normality assumption. Now, we write as and as . Since and from (C3), we have
[TABLE]
which implies that is invariance with respect to the translation . Moreover, is maximal invariant with respect to the translation since and implies that for . Then, is a function of from Theorem 2 in Berger (1985), p.403. ∎
In the next lemma, we show the first order bias of the plug-in estimator is approximately the same as the negative covariance of .
Lemma S2**.**
Under the conditions (C1)-(C3), it holds that
[TABLE]
Proof.
We will show the Lemma by directly comparing both sides of the equation in the Lemma. Noting that and for some non-singular matrices and , we have
[TABLE]
Since and from the condition (C2), we have
[TABLE]
and
[TABLE]
Then, for we have
[TABLE]
where the last equality holds since is a second-order unbiased estimator of .
Next, we evaluate the first term of the right side of the equation in the Lemma. We can write as
[TABLE]
In order to approximate the covariance of up to the order , we expand and as
[TABLE]
The straightforward calculation shows that
[TABLE]
thereby we have
[TABLE]
which has the same expression as (S1). ∎
Proof of Theorem 1
From Lemma S1, the conditional distribution of given is . Let . It is noted that . Then, the conditional distribution of given is , and the Mahalanobis’ distance is approximated via Taylor series expansion as
[TABLE]
where
[TABLE]
From (S2), the characteristic function is approximated as
[TABLE]
because , and . From the law of iterated expectations and the conditional normality of , the above equation reduces to
[TABLE]
For some deterministic matrix and , it holds that
[TABLE]
Using these equalities, from the law of iterated expectations, we have
[TABLE]
For notational simplicity, let . Let , or . Then, can be written as
[TABLE]
We shall evaluate the moments in (S3). First, can be expressed as
[TABLE]
thereby it holds that
[TABLE]
for . Noting that the first term in (S4) is and the second term is , we can expand and as
[TABLE]
which lead to and . Thus,
[TABLE]
It can be also observed that
[TABLE]
Then, from Lemma S2 we have
[TABLE]
Combining (S5), (S6) and (S7), we can see that the characteristic function of can be written as
[TABLE]
for and are defined in the main document. From the fact that the characteristic function of the chi-squared distribution with degrees of freedom is given by , it follows that the asymptotic expansion of the cumulative distribution function of is
[TABLE]
where is the cumulative distribution function of the chi-squared distribution with degrees of freedom . Note that , where is the density function of the chi-squared distribution with degrees of freedom . Then, it holds that
[TABLE]
thereby, for a function with order , we have
[TABLE]
which completes the proof.
Derivation of the equation (6)
We write functions given in Section 2 as functions of since the unknown parameter is in this example. For and , can be expanded as
[TABLE]
Since the first term on the right side of the above equation is of order , we only need to consider this term to derive the expressions given in (6).
At first, we evaluate . It is noted that we have
[TABLE]
where are independently distributed as the standard normal distribution. Then, we have
[TABLE]
Next, we evaluate . It is noted that we have
[TABLE]
and that can be written as
[TABLE]
Then, for for which are independently distributed as the multivariate standard normal distribution, it holds that for ,
[TABLE]
for . Then, we have
[TABLE]
Finally, we evaluate . From the equation (S1), for we have
[TABLE]
The trace of the first term in the above equation is exactly the same with and is given in (S9). To evaluate the second term, it is noted that
[TABLE]
Then, the trace of the second term in the above equation is given by
[TABLE]
Equation (S8), (S9) and (S10) lead to the expression given in (4) in the main document.
References
- Berger (1985)
Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis, Springer, New York.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Chen et al. (2012) Chen, H., A. K. Manning, and J. Dupuis (2012). A method of moments estimator for random effect multivariate meta-analysis. Biometrics 68 , 1278–1284.
- 2Guolo (2012) Guolo, A. (2012). Higher-order likelihood inference in meta-analysis and meta-regression. Statistics in Medicine 31 , 313–327.
- 3Harbord et al. (2007) Harbord, R. M., J. J. Deeks, H. Egger, P. Whiting, and J. A. C. Sterne (2007). A unification of models for meta-analysis of diagnostic accuracy studies. Biostatistics 8 , 239–251.
- 4Higgins and Green (2011) Higgins, J. P. T. and S. Green (2011). Cochrane Handbook for Systematic Reviews of Interventions, Version 5.1.0 . The Cochrane Collaboration.
- 5Jackson et al. (2011) Jackson, D., R. Riley, and I. R. White (2011). Multivariate meta-analysis: potential and promise. Statistics in Medicine 30 , 2481–2498.
- 6Jackson and Riley (2014) Jackson, D. and R. D. Riley (2014). A refined method for multivariate meta-analysis and meta-regression. Statistics in Medicine 33 , 541–554.
- 7Jackson et al. (2013) Jackson, D., I. R. White, and R. D. Riley (2013). A matrix-based method of moments for fitting the multivariate random effects model for meta-analysis and meta-regression. Biometrical Journal 55 , 231–245.
- 8Jackson et al. (2010) Jackson, D., I. R. White, and S. G. Thompson (2010). Extending dersimonian and laird’s methodology to perform multivariate random effects meta-analyses. Statistics in Medicine 29 , 1282–1297.
