Causal variance decompositions for institutional comparisons in healthcare
Bo Chen, Keith A. Lawson, Antonio Finelli, Olli Saarela

TL;DR
This paper introduces a causal variance decomposition method to analyze and compare healthcare institutions based on disease-specific quality indicators, focusing on overall variation and causal effects rather than pairwise hospital comparisons.
Contribution
It develops a novel three-way variance decomposition framework with causal interpretation, including model-based estimators for analyzing hospital performance variations.
Findings
Effective in decomposing variation into causal components
Model estimators perform well in simulations
Applied successfully to real healthcare data
Abstract
There is increasing interest in comparing institutions delivering healthcare in terms of disease-specific quality indicators (QIs) that capture processes or outcomes showing variations in the care provided. Such comparisons can be framed in terms of causal models, where adjusting for patient case-mix is analogous to controlling for confounding, and exposure is being treated in a given hospital, for instance. Our goal here is to help identifying good QIs rather than comparing hospitals in terms of an already chosen QI, and so we focus on the presence and magnitude of overall variation in care between the hospitals rather than the pairwise differences between any two hospitals. We consider how the observed variation in care received at patient level can be decomposed into that causally explained by the hospital performance adjusting for the case-mix, the case-mix itself, and residual…
| Process/outcome | ||||
|---|---|---|---|---|
| Covariate | Partial | Partial-CKD | MIS | Readmission |
| Male sex | 0.21 (2.81) | 0.22 (2.39) | 0.05 (0.67) | -0.02 (-0.18) |
| Age | -0.01 (-4.11) | -0.01 (-2.67) | -0.01 (-2.06) | 0.01 (2.74) |
| Income quintile | 0.03 (1.14) | 0.03 (1.07) | 0.01 (0.47) | <|0.01|(-0.05) |
| Rural vs urban | 0.18 (1.46) | 0.20 (1.37) | -0.14 (-1.17) | -0.18 (-1.33) |
| Charlson score | -0.14 (-4.93) | -0.16 (-5.02) | -0.11 (-4.37) | 0.12 (6.56) |
| ACG score | -0.01 (-1.20) | <|0.01|(-0.19) | 0.01 (1.61) | 0.02 (4.64) |
| log(dx to tx days+1) | 0.02 (1.02) | 0.03 (1.41) | 0.02 (0.65) | 0.09 (3.64) |
| Year of dx | 0.19 (21.87) | 0.18 (17.13) | 0.20 (15.28) | -0.01 (-1.43) |
| Tumor size (cm) | -0.82 (-17.25) | -0.75 (-12.87) | -0.16 (-7.21) | 0.02 (1.44) |
| CKD risk factors | -0.02 (-0.26) | - | -0.03 (-0.36) | 0.15 (1.31) |
| T2 vs T1 stage | - | - | -0.31 (-2.16) | -0.03 (-0.21) |
| T3 or 4 vs T1 stage | - | - | - | 0.19 (-1.60) |
| Source of variation | ||||
|---|---|---|---|---|
| Process/outcome | Patient case-mix | Between-hospital | Residual | |
| Partial | Variance | |||
| CI | ||||
| Proportion | ||||
| CI | ||||
| Partial-CKD | Variance | |||
| CI | ||||
| Proportion | ||||
| CI | ||||
| MIS | Variance | |||
| CI | ||||
| Proportion | ||||
| CI | ||||
| Readmission | Variance | |||
| CI | ||||
| Proportion | ||||
| CI | ||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Causal variance decompositions for institutional comparisons in healthcare
Bo Chen
Dalla Lana School of Public Health, University of Toronto
Keith A. Lawson
Princess Margaret Cancer Centre, University Health Network
Antonio Finelli
Princess Margaret Cancer Centre, University Health Network
Olli Saarela Correspondence to: Olli Saarela, Dalla Lana School of Public Health, 155 College Street, Toronto, Ontario M5T 3M7, Canada. Email: [email protected] Dalla Lana School of Public Health, University of Toronto
Abstract
There is increasing interest in comparing institutions delivering healthcare in terms of disease-specific quality indicators (QIs) that capture processes or outcomes showing variations in the care provided. Such comparisons can be framed in terms of causal models, where adjusting for patient case-mix is analogous to controlling for confounding, and exposure is being treated in a given hospital, for instance. Our goal here is to help identifying good QIs rather than comparing hospitals in terms of an already chosen QI, and so we focus on the presence and magnitude of overall variation in care between the hospitals rather than the pairwise differences between any two hospitals. We consider how the observed variation in care received at patient level can be decomposed into that causally explained by the hospital performance adjusting for the case-mix, the case-mix itself, and residual variation. For this purpose, we derive a three-way variance decomposition, with particular attention to its causal interpretation in terms of potential outcome variables. We propose model-based estimators for the decomposition, accommodating different link functions and either fixed or random effect models. We evaluate their performance in a simulation study and demonstrate their use in a real data application.
Keywords: Causal inference, hospital profiling, intraclass correlation, quality indicators, variance decomposition
1 Introduction
1.1 Background
Striving for accountability and transparency in healthcare has increased the interest in quantifying and comparing the performance of institutions in terms of the quality of care that the patients receive. This is commonly done in terms of disease-specific quality indicators (QIs) that measure structural, process or outcome elements related to the care of a particular condition [1]. For instance, in the context of surgical care for kidney cancer, examples would be (i) proportion of partial versus radical nephrectomies for early-stage patients with chronic kidney disease risk factors, (ii) proportion of minimally invasive versus open radical nephrectomies for early-stage patients, (iii) average length of stay after a radical nephrectomy or (iv) proportion of unplanned readmissions within 30 days of a radical nephrectomy [2]. Process measures may be preferred for measuring quality for the reason that they are directly related to clinical decisions and thus are more actionable [3]. However, even before considering improving quality, the chosen indicator has to demonstrate existing differences in the quality of care. In doing so, it already answers a causal question, namely “would a patient receive different care if treated in a different hospital?”. The goal of the present work is developing metrics and statistical methods that can help to identify processes or outcomes that can capture variation in the quality of care between hospitals. Following Varewyck et al. [4], this can be framed in a causal inference framework, where adjusting for patient case-mix factors is analogous to controlling for confounding. However, since our focus is in choosing QIs, rather than comparing or ranking hospitals in terms of an already chosen metric, we are interested in the presence and magnitude of overall variation between the hospitals, rather than contrasts between any two hospitals. While we restrict the discussion to between hospital comparisons, without loss of generality the methods would also apply to other comparisons in the healthcare system, such as administrative subregions or individual providers (such as surgeons).
Variance decompositions have been used in the hospital profiling context to measure reliability of rankings in terms of a given quality indicator (rankability)[5, 6, 7], though this decomposition is constructed at aggregate (hospital) level rather than at individual patient level. This approach broadly corresponds to the random effect meta-analysis model, where the case-mix adjusted hospital-specific quality indicators are assumed to follow the model , where is used to index the hospital and stands for the total number of hospitals. In this model, is a normally distributed hospital effect (‘heterogeneity’) around the average performance , and reflects the sampling variation under the ‘null’ of no hospital effect (). For instance, following van Houwelingen and Brand [5] and Racz and Sedransk [8], to obtain an indirectly standardized proportion type quality indicator, one could take , where the overall proportion in the entire patient population is multiplied by the standardized ratio of observed to expected counts. The observed count for hospital is given by , where is the process/outcome measure of interest, and indicates the hospital where patient was treated, with being the size of the overall patient population serving as the standard population (e.g. combination of patient populations from hospitals in a given administrative region). The expected count is given by , where is a prediction from a logistic regression model adjusted for the case-mix factors , but not including the hospital indicators. Supposing that the standard population is large enough to ignore estimation uncertainty in the expected counts, we can take , with reflecting the between-hospital variation after adjusting for patient case-mix. The latter in turn can be estimated using the DerSimonian & Laird method [9] or maximum likelihood, with the meta-analysis statistic indicating the proportion of variation due to between-hospital heterogeneity.
Closely related, but defined at individual level, is the concept of intra-class correlation (ICC), commonly estimated through mixed effect models [10, 11, 12, 13, 14, 15, 16]. Consider a generalized linear mixed effect model (GLMM)
[TABLE]
where is the link function, and , , and are as before. The hospital level intercept terms are taken to follow , . For instance, with an identity link and residual variation distributed as , we have that the conditional correlation . That is, the conditional correlation of the outcomes of two individuals treated in the same hospital is equal to the proportion of between-hospital variance of the total variance left after controlling for the patient case-mix, and can be estimated by fitting the random intercept model to obtain estimates for and . However, the connection between the ICC and the random intercept variance is more complicated with non-identity links, as the variance decomposition is different at the linear predictor and link function scale [17, 18]. An attempt for interpretation of the between cluster variance in terms of more familiar effect measures are the median odds ratio and median hazard ratio for binary outcomes and time-to-event outcomes, respectively [19]. We opt to take another route of considering the causal interpretation of the variance decomposition, as well as estimation methods that are invariant to the choice of the link function.
Furthermore, two-way variance decompositions generalize mathematically into arbitrary multi-way decompositions, though these depend on the chosen order of factorization [20]. Aligned with our goal of identifying QIs that can capture between-hospital variation in quality without being completely driven by the case-mix, we are interested in decomposing the observed variation in care received to that explained by patient case-mix, causal between-hospital variation given the case-mix and unexplained (residual) variation.
1.2 Objectives
Summarizing the objectives motivated above, the structure of the paper is as follows. In Section 2.1 we recapitulate the potential outcomes (Rubin’s) causal model [21], the related assumptions, and its extensions to between hospital comparisons following Varewyck et al. (2014) [4]. In Section 2.2, we introduce the causal variance decomposition to partition observed variation in care received, and discuss its interpretation in Section 2.3. In Section 2.4, we investigate the connection between our causal variance decomposition and the ICC. In Section 3, we propose model-based estimation methods for the variance decompositions, and investigate their performance in a simulation study in Section 4 and illustrate them in a real data application in Section 5. We conclude with a brief discussion on limitations and future directions in Section 6.
2 Proposed measures
2.1 Notation and assumptions
Suppressing the individual-level indices until the discussion of estimators in Section 3, let represent the observed process or outcome experienced by a given patient, used to construct a quality indicator. Let represent the potential counterpart of this had the same patient received care in hospital . Let further be a vector of covariates relevant to the case-mix adjustment (including information on demographics, comorbidities, and disease progression), and let indicate the hospital in which the patient was actually treated. Under counterfactual consistency/stable unit treatment value assumption (SUTVA), the observed outcome is related to the potential outcomes by . We assume the strong ignorability of the hospital assignment mechanism, which states that (positivity) and Y(z)\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}Z\mid X (conditional exchangeability) at all combinations of and [21, 22]. We note that as long as these conditions are satisfied, the actual mechanism which assigns patients to hospitals can be more complicated without biasing the inferences. For example, adapting from Moreno-Betancur et al.[23], the causal relationships in the case of a process type indicator could be as illustrated in the directed acyclic graph (DAG) of Figure 1. Here the additional variables in history reflect the history (including family history) of an individual contributing to that individual living in a particular area indicated by (e.g. postal code), which in turn in part contributes to the individual being treated in a particular hospital in that area. In this DAG, are sufficient to control for confounding of the hospital effects, while are instrumental variables rather than confounders, and adjusting for them would likely lead to positivity violations.
2.2 Three-way causal decomposition for observed variation in care received
We are interested in decomposing the observed variance in the care received, , to that causally explained by the quality of care differences between hospitals, by individual-level case-mix factors, and residual variation. Under the counterfactual consistency, , for which we can write the usual two-way variance decomposition
[TABLE]
For the first term, we can further write
[TABLE]
and for the latter term,
[TABLE]
Substituting (3) and (4) into Equation (2), we obtain
[TABLE]
Here, due to strong ignorability, we have
[TABLE]
Applying these, we get the following result:
[TABLE]
We will consider the second term on the right had side as the causal quantity of interest. In this, the hospital specific performance is compared to the the expected level of care in the system for a patient with covariate values , and then averaged over the covariate distribution of the standard population, similar to direct standardization. We note that this quantity is expressed in terms of potential outcomes and is a causal quantity in its own right, that is, it can be defined regardless of the causal assumptions. The causal assumptions are used to link this estimand to the observed data, working backwards to . For the decomposition, we get the interpretation
[TABLE]
The estimation of the components is discussed in Section 3. Briefly, the estimation of the variance decomposition can be based on modeling the components and combined with the empirical distribution of . Alternatively, because the decomposition can be also expressed as
[TABLE]
the estimation can also be based on modeling and combined with the empirical distribution of . In Section 3 we choose the former approach because it corresponds to the factorization of the likelihood, and thus enables approximate Bayesian inference. Before this, we further discuss the interpretation of the second term in the decomposition as a causal estimand.
2.3 Causal interpretation of the decomposition
To understand the causal interpretation of the variance decomposition, it is helpful to consider the special case of two hospitals, in which case the second term of the decomposition should reduce into a pairwise causal contrast. With two hospitals indexed by and denoting the propensity score , the first term of the decomposition becomes
[TABLE]
and the second term
[TABLE]
Now, we consider three different scenarios based on the relationships between , and in Figure 1.
Scenario 1.
In the absence of the arrow , which implies Y\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}X\mid Z and , the first term is equal to . This implies that this term takes the variance contribution of the indirect (mediated) pathway , which is present only if has causal effect on . The second term is equal to . The difference in interpretation between these two terms is that the former represents the variance due to different kinds of patients being treated by different hospitals, while the latter represents the variation in care similar patients receive. If all hospitals treat similar patient populations, the former term disappears, while the latter is maximized. In the opposite case where the hospitals specialize in treating different patient populations, the positivity assumption is violated and the process/outcome cannot reveal performance differences between the hospitals. Thus, we focus on the second term as a measure of goodness of a proposed process/outcome as a QI.
Scenario 2.
In the absence of the arrows and , which implies Z\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}X and (randomized assignment), the first term would be equal to . The two additive components inside the variance reflect the effect modification by and the predictive variance due to . In the absence of effect modification/additive interaction, that is, , it follows that is constant and only the predictive variance remains. The second term is equal to . In the absence of effect modification, this becomes , capturing the causal effect. Compared to the usual direct standardization formula , the between-hospital variance is similarly averaged over , but squared so that any contrast at the same level of contributes to the variance rather than canceling out. The reason effect modification contributes to both case-mix and between-hospital variance components is that it both increases the variance of the expected care level (non-causal) and the average squared difference from the expected care level (causal). When evaluating proposed processes/outcomes, we would again want the second term to be comparatively large compared to the first one, because even though the eventual QI will be case-mix adjusted, large case-mix variance indicates that the measure is less intervenable, as the variation in care largely reflects treatment choices necessitated by disease progression, comorbidities etc., and makes the indicator sensitive to the method of case-mix adjustment.
Scenario 3.
In the absence of the arrow , which implies Y\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}Z\mid X and , the first term becomes , reflecting the predictive variance due to , while the second term is equal to zero as it should in the absence of causal effect of on .
2.4 Connection to intra-class correlation
We note that the previously derived variances were weighted by the quantities reflecting the hospital patient volumes; this is natural since we are decomposing the observed variance in the entire combined patient population, and larger centers carry more weight in this. However, to expand the proposed framework, we can consider variance decompositions under hypothetical assignment mechanisms that can differ from the actual one. For example, referring to the interpretation of Scenario 1 of Section 2.3, we can consider the variance decomposition under a hypothetical ‘randomized’ assignment mechanism where every hospital treats similar kind of patient population. This also allows us to derive a connection between the causal between-hospital variance in the previous section, and the more familiar intraclass correlation. Let represent a random draw from such a hypothetical assignment mechanism, with the mechanism chosen so that A\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}Z\mid X, and strong ignorability and Y(a)\mathchoice{\mathrel{\hbox to0.0pt{\displaystyle\perp\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{\textstyle\perp\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptstyle\perp\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{\scriptscriptstyle\perp\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}A\mid X apply. This lets us consider random draws of potential outcomes , similar to the notation used by VanderWeele et al.[24]. Now, because
[TABLE]
we can write the variance decomposition
[TABLE]
If we choose here , the result is equivalent to before. However, if we choose to weight the hospitals equally as , and assume a linear model
[TABLE]
we can express the second term as
[TABLE]
If we take the hospital effects to represent a ‘sample’ from a distribution with variance , (2.4) converges to when . In reality, since our discussion is in the administrative data context, we observe all the hospitals in the administrative region of interest and don’t take them to be a sample from a superpopulation of hospitals. Nevertheless, the result serves to establish a connection to the ICC: because
[TABLE]
under a linear model the causal interpretation of is the proportion of case-mix conditional variance that is causally explained by the between-hospital variation in practices. However, this interpretation is specific to the linear model with equal hospital weights, which is why we proceed with the decomposition in Section 2.2 for the observed variance; this is general and can be estimated in the same way irrespective of the outcome model formulation (that is, irrespective of the choice of the link function, or whether the hospital effects are modeled as fixed or random effects).
3 Estimators
3.1 Point estimators
We present model-based direct standardization type estimators for the variance components. To model the outcomes, we specify a generalized linear model
[TABLE]
where . The hospital level intercept terms are either fixed, with , or follow (random intercept model). As noted in the previous section, the random effects are adopted as a means to apply shrinkage to the hospital effects, rather than a data generating mechanism. With random effects, the fitted values are obtained using empirical Bayes prediction for the random intercepts. We note that in the model we can also introduce hospital-case-mix interactions, as discussed by Varewyck et al.[25], but omit these here for notational simplicity. We estimate the parametrized variance explained by the patient case-mix as
[TABLE]
where and where the terms can be estimated by fitting a multinomial logistic regression model
[TABLE]
where . Similarly, the parametrized average variance explained by the hospital performance conditioning on patient case-mix can be estimated by
[TABLE]
The residual variance can then be estimated by subtracting the two estimated components from the empirical total variance, or using a distributional assumption such as in the case of binary indicator, and taking
[TABLE]
3.2 Variance estimators
To quantify the uncertainty in the variance components, we note that conditional on the empirical covariate distribution in the standard population the estimates are entirely model-based. Thus, we can produce approximate samples from the joint posterior distribution
[TABLE]
by sampling and separately from their respective posteriors, and recalculate the variance components and at each combination of and to quantify uncertainty in them. Under flat priors the sampling distribution variances of the parameter estimates approximate the respective posterior variances; in approximate sampling of we used parametric bootstrap to reflect the uncertainty in the random effect estimates; for example for a binary indicator we can repeatedly sample the outcomes from , , and refit the mixed model to produce a sample of the model-based fitted values . Compared to non-parametric bootstrap this avoids some hospital categories being absent in some of the replicates. For sampling of we used the normal approximation where is the maximum likelihood estimate of and its asymptotic variance-covariance matrix from the multinomial logistic model fit.
4 Simulation study
4.1 Generating mechanism
We demonstrate the performance of the estimators by simulating data from a generating mechanism simplified (omitting and ) from that in Figure 1, varying the total number of patients and number of hospitals . Our objectives in the simulation are (a) to demonstrate that the proposed estimators work both for linear models for continuous outcomes and logistic models for dichotomous outcomes, (b) to demonstrate that the estimators work both with fixed and random effect models, and (c) to obtain evidence of their asymptotic behaviour under increasing and . First, we generated two case-mix factors from and , given which the hospital assignment was generated from multinomial logistic model specified as in (7), drawing the intercepts from and the regression coefficients from and , . In our generating mechanism, we did not consider the hospitals themselves to be a sample from a superpopulation of hospitals, and thus we fixed the hospital-specific parameters across the replications (i.e. drawing these only once from the above distributions), while resampling the patients in each replication. This corresponds to conventional analysis of administrative data, where all the hospitals operating in a given region are compared, but the observed patient population is taken to represent a sample from an unobserved“long run”. Outcomes were generated from the mean structure
[TABLE]
with the hospital effects drawn from (again fixing these parameter values across the replications). Continuous potential outcomes were generated by taking , where , with the observed outcome given by . Binary outcomes were generated by dichotomizing at zero as . To calculate the estimators, correctly specified outcome and assignment models were fitted to each simulated dataset, with both fixed and random effect specifications used for the former, whereas the true decomposition was calculated using the known values fixed in the data generating mechanism.
4.2 Results
4.2.1 Continuous Outcomes
Figure 2 shows the simulated sampling distribution means for the three variance components under both fixed and random effect models for the continuous outcomes under different combinations of and , based on 1000 replications. Also shown are the 95% quantile interval for the sampling distribution, and the 95% confidence interval for the mean of the sampling distribution, reflecting the magnitude of the Monte Carlo error. The black dots indicate the true variances. The estimators capture the true value reasonably well in all cases, though with the random effect model giving slightly smaller estimates for the hospital variance component. Also of note is that the uncertainty in the variance component estimates is mainly driven by the overall number of patients (distributed across the hospitals) rather than the number of hospitals . This is also illustrated in Figure 3 which shows the convergence of the sampling distributions for the hospital and patient variance components as a function of with fixed . Finally, as suggested in Section 2.4, in Figure 4 we demonstrate that the random effect variance estimates a different quantity than the causal between-hospital variance in the decomposition; the difference is marked especially with small number of hospitals.
4.2.2 Binary Outcomes
Figure 5 shows the simulated sampling distribution means for two of the three variance components under both fixed and random effect models for the binary outcomes under different combinations of and , based on 1000 replications. The results are mostly similar to the continuous case, with the random effect model giving slightly smaller estimates for the hospital variance component compared to the fixed effect model. However, some small sample bias in the hospital variance component (fixed effect model overestimating and random effect model underestimating) is visible with small number of patients per hospital (top-right panel, on average patients per hospital), which disappears with increasing number of patients. The convergence of the sampling distributions with increasing is illustrated in the density plots of Figure 6.
5 Illustration in real data
We demonstrate the use of the proposed three-way causal decomposition in the context of QIs for surgical care of kidney cancer in the province of Ontario. We investigate four processes/outcomes that have previously been proposed as disease-specific QIs using the consensus/Delphi method by a panel of experts [2]. These are the proportion of partial (versus radical) nephrectomies among stage T1a nephrectomy patients (Partial), the same proportion restricted to the subpopulation of patients with chronic kidney disease (CKD) or its risk factors diabetes or hypertension (Partial-CKD), minimally invasive (versus open) surgery among T1-T2 radical nephrectomy patients (MIS), and readmission within 30 days of the surgery for T1-T4 radical nephrectomy patients (Readmission). The Partial, Partial-CKD and MIS indicators are process type, with higher proportion reflecting quality of care through kidney function conserving and less invasive treatment for early stage patients that can benefit from this. The Readmission indicator is outcome type, with lower proportion reflecting quality of care through reduced complications. Our objective in the illustration was to evaluate the processes/outcomes using the proposed variance decomposition approach, to quantify how much of the variation in these is between-hospital (reflecting performance and potentially intervenable), and how much is driven by the patient case-mix. The data were obtained from cross-linkage of Cancer registry, pathology reports, Hospital Discharge Abstract (DAD), Ontario Health Insurance Plan (OHIP) and other databases housed at the Institute for Clinical Evaluative Sciences (ICES). We included patients who had complete data on the case-mix factors age, sex, income quintile, Charlson comorbidity score, ACG comorbidity score, presence of CKD risk factors (for indicators other than Partial-CKD), days from diagnosis to treatment, year of diagnosis, tumor size, and T stage. Table 1 shows the number of patients identified and hospitals evaluated for each of the four indicators in the study period from 1995-2014.
To these data, using the R package VGAM [27], we fitted a multinomial logistic assignment models where, to deal with empty covariate categories for small volume hospitals, we estimated only intercept terms for hospitals that had treated less than 35 (40 for MIS) patients over the study period. This is justifiable as our estimators use the assignment model only for weighting the hospitals by their patient volumes, whereas the case-mix adjustment is achieved through the outcome model. For the processes/outcomes, we fitted mixed effects logistic models using the glmer function of R package lme4 [26], adjusting for the aforementioned covariates. The covariate effects (log-odds ratios) and their z-scores are reported in Table 1, along with the likelihood ratio test values for the random intercept variance parameter in the models (one degree of freedom). In all cases the likelihood ratio test indicated statistically significant between hospital variation, but this was much smaller for Readmission. Larger tumor size was a very strong predictor of radical nephrectomy, while both partial nephrectomies and minimally invasive surgeries had become significantly more common over calendar time. The comorbidity scores were strongest predictors of readmission.
The empirical proportions of the four indicators and the corresponding marginal variances were 44.7% and 0.447*(1-0.447)=0.247 (Partial), 43.8% and 0.246 (Partial-CKD), 63.7% and 0.231 (MIS), and 7.6% and 0.07 (Readmission). Table 2 shows the estimated three variance components using the proposed decomposition, along with the 95% credible intervals using the approximate Bayesian method described in Section 3. We also show the proportional variances along with their credible intervals, because these are more informative for comparing indicators with very different marginal variance. We note that the estimated components sum up to the empirical marginal variance given by the binomial calculation. The Partial indicator shows some between-hospital variation, but this is small compared to the case-mix variance. This does not change much when this indicator is restricted to the CKD risk factors subpopulation, because tumor size is the major driver of the large case-mix variance. These results suggest that even though the indicator was restricted to T4a disease, the treatment choice seems to be often determined by disease progression. However, because partial proportion has seen significant increase over calendar time, the case-mix variance will be smaller when such indicator is applied to contemporary data. In contrast, the MIS indicator shows between-hospital variance that is almost two times the case-mix variance, implying lot of variation in care for similar patients. This may be due to specialization or facilities available, which are intervenable. Because the calendar year was the biggest driver of the case-mix variance for MIS, the case-mix variance will be even smaller when indicator is applied to contemporary data. Finally, the Readmission indicator shows very little between-hospital variation, suggesting it is unlikely to be useful in capturing performance related variation.
6 Discussion
In this paper, we aimed to give a causal interpretation to variance decompositions used to quantify variations in hospital performance, and propose estimators that can accommodate different link functions, as well as fixed and random effect models for the hospital effects. The estimation of the hospital variance component worked reasonably well with the number of hospitals as small as 10. While the random effect models could introduce some bias to estimation of the variance components when the hospital-level effects and the patient case-mix are dependent [28], they are helpful for smoothing purposes when a large number of small volume hospitals are present. Fixed effect models can accommodate interactions between case-mix factors and hospital effects, but this leads to a large number of parameters to be estimated [25].
Addressing the variance decompositions in an explicit causal modeling framework helps to answer questions regarding the part of the variation in care received that is explained by different practices for similar patients. Furthermore, quantifying explained confounding due to observed case-mix factors into its own variance component helps to assess the usefulness of a given process or outcome in constructing a quality indicator. Also, we note that, similar to random effect meta-analysis, the estimation of the between-hospital variance component is weighted roughly in proportion to the hospital-specific patient volumes, meaning that the unstable estimates of small hospitals do not have undue influence on the estimation of this variance component.
Several extensions are possible. We split the total variance into that explained by hospital performance, case-mix, and unexplained residual variance. However, we did not separate between case-mix factors that capture disease progression and thus directly (and justifiably) influence the treatment decisions, and those capturing disparities in care, such as sociodemographic and socioeconomic factors. While the latter can be confounders under the causal model, any variation in care received that is explained by such factors may be of interest in itself, and thus could be split into a separate variance component. Of particular interest might be interactions between hospital effects and sociodemographic/economic factors, which would imply that disparities in care occur within hospitals.
In this paper we considered only a single layer of exposure hierarchy, such as hospitals. We are currently working on extending the causal variance decompositions to multiple layers of a hierarchy, such as surgeons within hospitals or referral networks within administrative subregions. This will require introduction of multiple levels of exposure, using notation similar to causal mediation analysis models [29], where the causal pathway leads, for example, from hospital to surgeon to outcome. This will enable us to further decompose the hospital effects to those due to institution level factors and those resulting from practitioner level differences. However, since the models with nested exposures are made identifiable through random effects, this will require further consideration of the role of random effects in causal models.
Funding
This work was supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (to OS), a Catalyst Grant in Health Services and Economics Research from the Canadian Institutes of Health Research (to AF, KAL and OS) and the Ontario Institute for Cancer Research through funding provided by the Government of Ontario (to BC).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Donabedian A. The quality of care. How can it be assessed? JAMA 1988; 260(12): 1743–1748.
- 2[2] Wood L, Bjarnason GA, Black PC et al. Using the Delphi technique to improve clinical outcomes through the development of quality indicators in renal cell carcinoma. J Oncol Pract 2013; 9(5): e 262–267.
- 3[3] Khuri SF, Daley J, Henderson W et al. The Department of Veterans Affairs’ NSQIP: the first national, validated, outcome-based, risk-adjusted, and peer-controlled program for the measurement and enhancement of the quality of surgical care. National VA Surgical Quality Improvement Program. Ann Surg 1998; 228: 491–507.
- 4[4] Varewyck M, Goetghebeur E, Eriksson M et al. On shrinkage and model extrapolation in the evaluation of clinical center performance. Biostatistics 2014; 15(4): 651–664.
- 5[5] van Houwelingen HC and Brand R. Empirical bayes methods for monitoring health care quality. In Bulletin of the ISI, ISI 99, Book 1 . pp. 75–78.
- 6[6] van Dishoeck AM, Lingsma HF, Mackenbach JP et al. Random variation and rankability of hospitals using outcome indicators. BMJ Quality and Safety 2011; 20(10): 869–874.
- 7[7] Henneman D, van Bommel ACM, Snijders A et al. Ranking and rankability of hospital postoperative mortality rates in colorectal cancer surgery. Annals of Surgery 2014; 259(5): 844–849.
- 8[8] Racz MJ and Sedransk J. Bayesian and frequentist methods for provider profiling using risk-adjusted assessments of medical outcomes. Journal of the American Statistical Association 2010; 105: 48–58.
