Hospital variation in surgical outcomes for gastric cancer: the impact of case-mix and treatment across a global cohort

Sander J. M. van Hootegem; Margrietha van der Linde; Marcel A. Schneider; Jeesun Kim; Felix Berlth; Yutaka Sugita; Peter P. Grimminger; Gian Luca Baiocchi; Giovanni De Manzoni; Maria Bencivenga; Suzanne S. Gisbertz; Souya Nunobe; Han-Kwang Yang; Christian A. Gutschow; Sjoerd M. Lagarde; Hester F. Lingsma; Bas P. L. Wijnhoven

PMC · DOI:10.1007/s10120-025-01696-6·January 13, 2026

Hospital variation in surgical outcomes for gastric cancer: the impact of case-mix and treatment across a global cohort

Sander J. M. van Hootegem, Margrietha van der Linde, Marcel A. Schneider, Jeesun Kim, Felix Berlth, Yutaka Sugita, Peter P. Grimminger, Gian Luca Baiocchi, Giovanni De Manzoni, Maria Bencivenga, Suzanne S. Gisbertz, Souya Nunobe, Han-Kwang Yang, Christian A. Gutschow

PDF

Open Access

TL;DR

This study finds that differences in patient characteristics and treatments don't fully explain variations in surgical outcomes for gastric cancer across hospitals globally.

Contribution

The study evaluates the impact of case-mix and treatment adjustments on hospital-level surgical outcomes for gastric cancer using a global cohort.

Findings

01

Adjusting for case-mix and treatment factors reduced hospital variation for 30-day mortality but not for other outcomes.

02

Case-mix factors explained more variance in 30-day mortality and negative resection margins than in other outcomes.

03

Treatment-related factors had a minimal impact on most surgical outcomes after adjustment.

Abstract

There is substantial global variation in demographics, disease burden, and treatment for gastric cancer patients. Benchmarking is an instrument to assess such variation and enables to investigate to which extent case-mix and treatments explain differences in outcomes. We aimed to evaluate hospital-level variation in surgical outcomes following gastrectomy for gastric cancer before and after adjusting for case-mix and treatment-related factors. Data were retrieved from the GastroBenchmark and GASTRODATA databases, including consecutive gastric cancer resections performed between 2017 and 2021 from 43 centers. Patients who underwent a (sub)total gastrectomy for adenocarcinoma were identified. Outcomes included 30-day mortality, severe complications (Clavien-Dindo grade ≥ 3a), > 15 lymph nodes retrieved, negative resection margin (R0), prolonged hospitalization (> 14 days), readmissions…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Diseases2

gastric cancer adenocarcinoma

Figures3

Click any figure to enlarge with its caption.

The proportion of explained variance by each of the models according to the conditional pseudo-R^2^. A score of 1 is equal to 100% variance explained

Keywords

Stomach neoplasmsGastrectomyTreatment outcomeQuality of health care

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGastric Cancer Management and Outcomes · Helicobacter pylori-related gastroenterology studies · Bariatric Surgery and Outcomes

Full text

Introduction

Gastric cancer (GC) ranks as the 5th most prevalent cancer and the 3rd leading cause of cancer-related mortality worldwide [1]. Surgery, with or without perioperative chemotherapy, remains the cornerstone of curative treatment. However, the complexity of a gastrectomy requires training and specialized skills, reflected by a learning curve and the association between case volume and surgical outcomes [2, 3]. Internationally, there is an increasing demand for monitoring and benchmarking the quality of surgical care at the hospital level. Various stakeholders, including government agencies, insurance companies, medical specialist associations, and patient organisations seek transparency on quality of care, in particular for low-volume and complex procedures [4–7]. In response, several projects have been initiated over the years to provide benchmark data for surgical cancer care [4, 6, 7]. Within this context, the GastroBenchmark and GASTRODATA consortiums were established, providing reference data for oncological gastrectomies derived from 43 centers in sixteen countries [8].

To serve its purpose, benchmark data should satisfy several conditions. One of these is that they should contain a large case-mix (i.e. patient and disease characteristics), reflecting a range of case complexity to represent real-life practice and enable unbiased evaluation [9]. For gastric cancer, it is well known that there are marked differences in demographics, disease burden and treatments between health care providers and regions [10, 11]. These differences may explain variation in surgical outcomes across hospitals, rather than solely being determined by the quality of surgical and perioperative care. Neglecting case-mix and treatment differences could lead to misleading or uninformative comparisons, defeating the purpose of benchmarking. However, the impact of case-mix and treatment approaches when comparing gastrectomy outcomes on a global scale remains unclear. Therefore, to assess the impact of these characteristics, we aimed to study the variation of surgical outcomes, both before and after adjusting for case-mix and treatment-related factors in a global cohort.

Methods

Patients and ethics

Patients were identified from the dataset collected as part of the GastroBenchmark and GASTRODATA collaborative, which includes data of 43 centers from sixteen countries, comprising consecutive patients who underwent an oncological gastric resection between 1st January 2017 and 31st December 2021 (Supplement 1). Further details on data collection have been previously described [8]. All patients with gastric adenocarcinoma who underwent an elective subtotal or total gastrectomy were eligible for inclusion. Patients in whom no Roux-en-Y or Billroth reconstruction was created, or no lymphadenectomy was performed were excluded.

The study committees of both consortiums approved the present study and provided the data for this analysis. Before the start of this study ethical approval was obtained in the Erasmus MC (MEC-2023-0696) and upon initiation of the database, approval was obtained in all participating centers.

Outcomes, case-mix and treatment characteristics

Outcomes assessed were > 15 lymph nodes retrieved; negative resection margin (R0); severe complications (Clavien-Dindo ≥ 3a); re-operations; escalation of care; >14 days of hospital admission; readmission within 30 days and 30-day mortality. Escalation of care was defined as unplanned readmission to a higher surveillance unit.

Based on expert opinion, literature and data availability, the following factors were included as case-mix factors: age, gender, body mass index, American Society of Anesthesiologists (ASA) score, number of comorbidities (none, one, multiple), previous abdominal surgery, tumor location and T- and N-category. Treatment-related factors included administration of neoadjuvant treatment, extent of lymphadenectomy (D1(+), D2(+), D3), total or subtotal gastrectomy, type of access (open, minimally invasive and robot-assisted), and conversion.

Statistical analysis

Missing data of baseline characteristics were imputed using multiple imputation then deletion with chained equations (five iterations) [12]. This means that all outcomes were included in the imputation regression model and imputed where missing, and subsequently, patients with initially missing outcomes were excluded in further analyses for that particular outcome.

We used mixed-effect multivariable logistic models to estimate the probability of a given outcome for each individual patient. Due to differences in case load and guidelines which the hospitals are subject to, a random intercept for hospitals was used to account for variation not captured in the covariates (i.e. unmeasured inter-hospital variation and random chance). Hospitals logging ten or fewer patients were excluded to improve representativeness of the data and limit the chance of bias, while ensuring statistical stability of the models Nonlinearity of continuous variables was addressed with restricted cubic splines (3 knots). Three models were used per outcome to assess variation: a crude model (random intercept for hospital only), a case-mix adjusted model, and a final model adjusted for case-mix and treatment-related factors. Models were tested with specific comorbidities (e.g. cardiovascular disease, pulmonary disease, endocrinological including diabetes, renal insufficiency and immunological disorders), but as this did not improve the Akaike Information Criterion (AIC) compared to simpler models using the number of comorbidities, we opted for the latter to preserve degrees of freedom without compromising model performance.

Observed variation between hospitals was displayed by calculating medians, range and interquartile range (IQR) on a hospital level. Crude and adjusted estimated probabilities with 95% confidence intervals of outcomes per hospital were presented in forest plots. The conditional pseudo-R^2^ was used to quantify the explained variance in outcomes by the models [13]. This measure reflects the explained variance in outcomes of the models by the fixed effects (i.e. the included variables) and random intercept, with 1 being the highest achievable score, representing 100% explanation of the variance. The marginal pseudo-R^2^ was calculated, reflecting the variance explained by the fixed effects alone, to assess the proportion of variance explained by the included variables in each model [13].

Meaningful inter-hospital comparison of outcomes rely on sufficient overlap in case-mix characteristics, such that patient populations treated in different hospitals have a shared range of of these variables [14]. For instance, if a specific category of a case-mix variable is absent in one center, the corresponding regression coefficient cannot be estimated. We therefore evaluated the case-mix balances across centers by estimating the probability of treatment in each center using logistic regression models, with hospital as the outcome and all case-mix variables as predictors. This led to sensitivity analyses in which we excluded patients from the two large Asian centers given their large case load (> 1000 patients per center) and distinct case-mix characteristics. Statistical analysis was performed in R version 4.3.2 (R Core Team, R Foundation for Statistical Computing, Boston, MA, USA) using the ‘lme4’, ‘rms’, ‘mice’, ‘mitools’ and ‘MuMin’ packages.

Results

A number of 9662 patients were identified in the database, of which 7829 patients were eligible for inclusion. Two of the 43 hospitals had a case load of less than ten patients, resulting in a total of 7818 patients from 41 hospitals included in the analysis (Supplementary Fig. 1). The total case load per center ranged from 12 to 2554 patients, with a median of 81 patients per hospital (IQR: 49–146).

There was large variability in case-mix characteristics and treatment across hospitals (Table 1). The mean age of patients was 65 years, with a range from 55 to 76.5 years (IQR: 5) per hospital and 26.5–57.9% (IQR: 9.6%) being female. The percentage of patients with an ASA 1–2 score varied per hospital from 3.5 to 100% (IQR: 26.8%), while the percentage of patients with multiple comorbidities ranged from 10.3 to 28.9% (IQR: 18.6%). The proportion of patients with cT3-4 tumors ranged from 53.1 to 70.3% (IQR: 17.2%), and those with clinical positive nodal disease (cN1-3) from 36.8 to 58.9% (IQR: 22.1%).

Table 1. Case-mix and treatment characteristicsVariableTotalPer hospitalMissingn (%)Median (range)Q1Q3IQRPatients/resections, n7818 (100%)81 (12–2554)49146970Age, years [mean (SD)]65.0 (12.4)67 (55–76.5)657059Female (vs. male)2842 (36.4%)37.8% (26.5–57.9)32.4%41.9%9.6%4ASA score 1–2 (vs. 3–4)5727 (73.3%)63.4% (3.45–100)50.7%77.4%26.8%464BMI, kg/m^2^ [mean (SD)]24.7 (4.52)24.9 (22.5–29)24.225.81.61359Previous thoracic or abdominal surgery No/minor5757 (73.6%)91.7% (0–97.8)84.4%93.0%8.6%1651 Medium/major410 (5.2%)8.2% (0–21.1)6.2%11.6%5.4%Comorbidities Cardiovascular624 (8%)7.4% (0–51.6)4.9%14.2%9.2%0 Respiratory135 (1.7%)0% (0–18.8)0.0%0.0%0.0% Renal37 (0.5%)0% (0–4.2)0.0%0.0%0.0% Endocrinological596 (7.6%)6.4% (0–37.5)3.1%8.7%5.6% Oncological372 (4.8%)3.9% (0–37.9)0.8%7.7%6.9% Multiple1347 (17.2%)19.8% (0–64)10.3%28.9%18.7%Number of comorbidities None4515 (57.8%)51.5% (14–80.0)37.8%62.1%24.3%0 One1956 (25%)25.4% (2.9–67.7)20.4%36.4%16.0% Multiple1347 (17.2%)19.8% (0–64)10.3%28.9%18.7%Tumor location Cardia and EGJ695 (8.9%)12.4% (0–58.3)3.7%19.5%15.8%233 Corpus and fundus3422 (43.8%)38.2% (18.2–70.6)29.9%49.6%19.8% Antrum and pylorus3039 (38.9%)40.7% (0–77.8)30.9%50.8%19.9% Whole stomach429 (5.5%0.4% (0–14.1)0.0%2.9%2.9%Tumor category cT1–21978 (25.3%)28.2% (0–64.6)22.5%37.5%15.1%2644 cT3-42972 (38.0%61.7% (0–82)53.1%70.3%17.2% cTx224 (2.9%)3.1% (0–17.7)0.0%10.2%10.2%Nodal category cN02506 (32.1%)37.6% (0–77.4)30.1%53.1%23.0%2640 cN1–32275 (29.1%)48.4% (0–83.3)36.8%58.9%22.1% cNx397 (5.1%)3.9% (0–59.8)0.0%14.7%14.7%Neoadjuvant chemotherapy1983 (25.4%)45.2% (4.3–96)31.3%66.7%35.3%737Total gastrectomy (vs. subtotal)2924 (37.4%)48.0% (22.3–91.7)35.4%62.0%26.6%0Open surgery (vs. MI)3402 (43.5%)74.2% (12.9–100)39.0%92.4%53.4%0Multivisceral resection1448 (18.5%)15.9% (0–96.6)10.9%25.0%14.1%0Lymph node dissection D1 (+)2578 (33.0%)15.5% (0–95.2)4.1%29.5%25.4%0 D2 (+)5194 (66.4%)84.2% (4.8–100)66.9%94.1%26.0% D346 (0.6%)0% (0–8.1)0.0%0.0%0.0%ASA: American society of anesthesiologists, SD: Standard deviation, EGJ: Esophagogastric junction, MI: Minimally invasive (either laparoscopic or robotic)

Neoadjuvant chemotherapy was administered in 4.3% up to 96% (IQR: 35.3%) of the patients across the hospitals and the rate of patients undergoing a total gastrectomy (versus subtotal) ranged from 22.3 to 91.7% (IQR: 26.6%). The proportion of patients undergoing a D2 lymph node dissection ranged per hospital from 4.8 to 100% of the patients (IQR: 26.0%), with a D1(+) lymph node dissection being performed in 0–95.2% of patients (IQR: 25.4%).

Hospital variation

The overall 30-day mortality rate was 1.2% and rate of severe complications (≥ CD 3a) was 12.1%, with substantial variation in surgical outcomes across hospitals (Table 2). Observed 30-day mortality and severe complications ranged from 0 to 9.7% (IQR: 3.2%) and 5.3–17.9% (IQR: 7.7%). Escalation of care was seen in 4.4–10.4% (IQR: 7.7%). Larger variation between hospitals was observed for retrieval of > 15 lymph nodes (IQR: 12.3%, range 33.3–100%), prolonged hospitalization (IQR: 14.4%, range 0−66.7%) and readmissions (IQR 11.3%, range 0–25%).

Table 2. Surgical outcomesVariableDenominatorTotalPer hospitalMissing%Median (%) (range)Q1 (%)Q3 (%)IQR (%)> 15 LN retrieved736288.191.5 (33.3–100)81.994.112.3456Negative margin (R0)781795.793.8 (74.1–100)90.691.06.41Severe complications (≥ CD 3a)781812.114.5 (5.3–31)10.317.97.70Escalation of care^a^75464.28.1 (0–17.8)4.410.46.0272Reoperation^b^52644.89.4 (0–27.6)5.512.97.52554Proportion > 14 DoH756219.221.0 (0–66.7)14.%26.914.4256Readmission (30-day)49905.611.8 (0–25)4.615.911.32828Mortality (30-day)78181.21.94 (0–9.7)0.03.23.20LN: Lymph nodes, DoH: Days of hospitalization, CD: Clavien-DindoValues are percentages, unless otherwise indicated.^a^ Unplanned readmission to a higher surveillance unit (either intermediate- or intensive care unit).^b^ Surgical intervention under general anesthesia.

The crude models reduced hospital variation. Adjusting for case-mix and treatment-related factors did not result in statistically significant changes in the hospital-specific estimated probabilities for any outcome, compared to the crude estimates (Figs. 1 and 2 and Supplementary Figs. 2–7).

Fig. 1A, B Estimated probabilities of negative resection margins per hospital (A) Crude vs. adjustment for case-mix (B) Crude vs. adjustment for case-mix and treatment-related factors (n = 7817)

Fig. 2A, B Estimated probabilities of 30-day mortality per hospital (A) Crude vs. adjustment for case-mix (B) Crude vs. adjustment for case-mix and treatment-related factors (n = 7818)

Explained variance - Pseudo-R2

Case-mix factors explained less than 10% of the variance for all outcomes, except for 30-day mortality (33.9%) and negative resection margins (31.7%) (Fig. 3 and supplementary Table 1) as determined by the marginal pseudo-R^2^. Adding treatment-related factors to the models increased the explained variance for 30-day mortality by 40.8% and for readmissions by 20.9%, but had a low to modest impact (< 10%) on the variance of other surgical outcomes. The only model that explained more than half of the variance was the final model for 30-day mortality, including both case-mix and treatment factors. In all other models, the majority of variance remained unexplained.

The probability of treatment in each hospital based on case-mix characteristics was different for the two large Asian centers compared to the other hospitals (Supplementary Fig. 8). However, in the consequently performed sensitivity analyses, excluding patients from the two large Asian centers did not alter findings, except for 30-day mortality (Supplementary Table 2). The marginal pseudo-R^2^ for case-mix factors increased to 47.4% (increase of 13.5%), but decreased for treatment factors to 18.3% (decrease of 22.5%).

Fig. 3. The proportion of explained variance by each of the models according to the conditional pseudo-R^2^. A score of 1 is equal to 100% variance explained

Discussion

In this study, we evaluated the impact of case-mix and treatment-related factors on variation in surgical outcomes following an oncological gastrectomy across 41 hospitals. There was substantial variation in surgical outcomes and adjustment for case-mix and treatment-related factors did not change the estimated hospital outcomes. Case-mix characteristics contributed to the explained variance for 30-day mortality and negative resection margins, but its contribution was low to moderate for other surgical outcomes and for the majority of the outcomes the random intercept contributed the most to the explained variance, indicating that some of the variation reflects statistical uncertainty. These results indicate that case-mix and treatment-related factors explain variation in outcomes to a limited extent, showing where risk adjustment is applicable and where caution is warranted.

Except for the model for 30-day mortality, the models including the case-mix and treatment factors performed poorly for the other outcomes. While the most comprehensive model explained most of the observed variance in 30-day mortality (81.2%), it did not significantly change estimated probabilities for any hospital, likely due to case-mix characteristics balancing each other out within each hospital. Moreover, this reflects a limitation of interpreting model fit statistics as proxies for real-world benchmarking impact. These findings align with a recent study that developed case-mix adjustment models for gastric cancer outcomes for the national Upper-GI registry in the Netherlands (DUCA) [15]. They reported that case-mix had limited effect on inter-hospital differences in outcomes besides 30-day mortality, implicating that extensive case-mix adjustment might unnecessarily complicate comparisons without improving validity for the other outcomes. Taken together, these results suggest that case-mix and treatment adjustment can improve the validity of comparisons for 30-day mortality, but is of limited use for comparing other investigated outcomes.

Unmeasured factors and statistical uncertainty probably contribute to a large extent in the observed variation between hospitals, underlined by the proportion of variance explained by the random intercept for the majority of the outcomes. Surgical proficiency, technical modifications and hospital-specific organization, such as the presence of specialized multidisciplinary teams and clinical care pathways, are often not documented but have a great impact on surgical outcomes. Technical proficiency of a surgeon has been associated with fewer complications, reoperations, readmissions and lower mortality for several surgical procedures, including other upper-GI procedures such as esophagectomy and bariatric surgery [16–18]. In 2016, a questionnaire demonstrated there are substantial differences in clinical pathways for oesophagogastric surgery across European centers [19]. The role and functionality of multidisciplinary teams, the support of clinical nurse specialists and the possibility of nutritional intervention by dietitians varied strongly across the 10 investigated countries, often due to resource constraints. The benefit of these clinical pathways has been established previously. A Cochrane meta-analysis of over 11.000 patients treated in randomized clinical trials found that implementation of clinical pathways was associated with reduced length of hospital admission and lower in-hospital complication rates, reducing healthcare costs [20, 21]. Moreover, high adherence to care pathways, most notably the ERAS protocol following gastric cancer surgery, is associated with improved outcomes, including reduced morbidity, lower rates of readmissions and shorter hospital admission [22–24].

A second notion from our findings is that the investigated outcomes may be inappropriate to compare and rank hospitals as the median case load was only 81 patients with low event rates for several outcomes. Given the typical annual caseload for European centers being less than 100 patients, accurately estimating true outcomes becomes challenging due to a low number of events, which increases the susceptibility to random variation and registration bias [25, 26]. This undermines the reliability of an outcome indicator, resulting in poor ‘rankability’. Previous studies have shown that for multiple oncological resections, the reliability of ranking with surgical outcome indicators is poor and most of the observed variation is attributable to chance [27]. This may also hold for gastric cancer outcomes, but further research is needed to establish this. Composite measures such as textbook outcome and failure to cure, have been proposed to improve the rankability of hospitals [28, 29]. These measures combine multiple quality metrics which can result in better reliability for hospital comparisons, but they lack specific information on areas requiring improvement [30, 31]. Hence, the development of an optimal benchmarking framework remains pending.

To our knowledge, this is the first study to investigate the role of case-mix and treatment related factors in global hospital variation following an oncological gastrectomy. Our dataset allowed assessment of its impact on the variation in surgical outcomes across different continents. However, our study has several limitations. Despite using extensive models to estimate outcome probabilities per hospital, residual confounding is a potential issue, as not all case-mix factors influencing outcomes may have been measured or adjusted for. In particular, specific comorbidities can affect some of the investigated outcomes [32]. However, we assessed whether individual comorbidities improved model performance based on the AIC, but as this was not the case we opted to include the number of comorbidities as variable instead. Additionally, preoperative weight loss, serving as surrogate for a patients’ metabolic state, have been shown to influence 30-day mortality [32, 33]. However, data on nutritional status or weight loss was not included in the dataset. Furthermore, we only included patients from centers with dedicated upper-GI teams. This restriction may have influenced the results as these centers are often better equipped to handle complex cases and have more robust quality control measures, also illustrated by the low mortality and severe complication rates of 1.2% and 12.1%, respectively. This could limit the effect of case-mix on outcomes as these centers may adequately tailor treatment based on case-mix characteristics and are better able to mitigate surgical risks. On the other hand, these centers are more likely to manage a diverse range of patients, including highly complex cases. Lastly, we did have data on surgical methods used for type of reconstruction and technique of anastomosis, but approximately 60% was missing with the missing data pattern being non-random. Hence, multiple imputation was not feasible and these variables could not be included as treatment factor.

In conclusion, this study highlights that case mix and treatment-related factors do not explain variation observed between hospitals to a large extent. The low contribution of case-mix to explained variance, the substantial impact of unmeasured effects, and the absence of significant changes in the estimated probabilities after adjustment indicate that case-mix and treatment characteristics are not the primary drivers of hospital variation. However, case-mix and treatment characteristics do seem essential for outcomes directly related to survival and oncologic adequacy (i.e. 30-day mortality and R0 resection). Rather than a limitation, the limited explanatory power of case-mix factors for other outcomes highlights the transparency of benchmarking efforts—showing where risk adjustment is informative and where caution is warranted. Additionally, for global comparisons of gastric cancer outcomes, accounting for statistical uncertainty with a random intercept is recommended.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1