# External validation and logistic recalibration of POSSUM and P-POSSUM for predicting postoperative morbidity and mortality after elective hepatic resection

**Authors:** Niklas Bogovic, Ann-Kathrin Fischer, Miklos Acs, Philipp Kreiner, Hans J. Schlitt, Markus Götz, Stefanie Hofmarksrichter, Paul Kupke, Stefan M. Brunner

PMC · DOI: 10.1186/s12893-026-03508-9 · BMC Surgery · 2026-01-26

## TL;DR

This study validates and recalibrates the POSSUM and P-POSSUM models to predict postoperative risks after liver surgery, showing they can be useful if locally adjusted.

## Contribution

The study provides external validation and recalibration of POSSUM and P-POSSUM for elective hepatic resection, demonstrating their clinical utility after local adjustment.

## Key findings

- POSSUM and P-POSSUM showed fair discrimination for morbidity and higher for mortality after recalibration.
- Bootstrap validation revealed the need for local model updating due to imperfect calibration.
- Recalibrated models provided better clinical utility than treat-all or treat-none strategies.

## Abstract

Accurate preoperative risk assessment remains critical in hepatobiliary surgery. Established prediction models, such as POSSUM and P-POSSUM, have shown variable performance when applied to specialized procedures. This study externally validated and recalibrated both models to predict postoperative morbidity and mortality after elective hepatic resection.

All consecutive adult patients who underwent elective hepatic resection at the University Hospital Regensburg between December 2020 and December 2023 were retrospectively analyzed. POSSUM and P-POSSUM scores were calculated using the original logistic equations. Major morbidity (Clavien–Dindo ≥ IIIa) and in-hospital mortality were the predefined outcomes. Model discrimination was assessed using the area under the receiver operating characteristic curve (AUC), and calibration was evaluated using the Brier score, calibration-in-the-large (intercept), calibration slope, and out-of-bag (OOB) calibration plots derived from 1,000 bootstrap resamples. Logistic recalibration was applied to adjust the model intercepts (α) and slopes (β). The clinical utility was evaluated using decision curve analysis.

Of the 200 elective hepatectomies assessed, six were excluded due to missing required physiological inputs, yielding 194 patients with computable predictions. Clinically relevant morbidity (Clavien–Dindo ≥ II) occurred in 146/194 (75.3%) patients, major morbidity (≥ IIIa) in 73/194 (37.6%), and in-hospital mortality in 15/194 (7.7%). Discrimination was fair for morbidity and higher for mortality: AUC 0.696 (95% CI 0.595–0.789) for clinically relevant morbidity, AUC 0.697 (95% CI 0.620–0.764) for major morbidity, and AUC 0.755 (95% CI 0.647–0.851) for in-hospital mortality. OOB bootstrap calibration showed slopes below 1 for all endpoints (clinically relevant morbidity: α 0.16, β 0.837, Brier 0.172; major morbidity: α − 0.051, β 0.907, Brier 0.215; mortality: α − 0.34, β 0.843, Brier 0.068), supporting the need for local model updating.

POSSUM and P-POSSUM can support perioperative risk prediction after hepatic resection when they are locally recalibrated and internally validated. Bootstrap-corrected recalibration yielded stable performance without evidence of overfitting, and decision curve analysis suggested clinical utility across relevant threshold probabilities. These findings support the use of POSSUM-based models in hepatobiliary surgery, provided that centers perform local validation and model updating before implementation in clinical decision-making.

The online version contains supplementary material available at 10.1186/s12893-026-03508-9.

• The POSSUM and P-POSSUM scores were externally validated in a contemporary cohort of 194 patients who underwent elective hepatic resection at the University Hospital Regensburg and the certified German Liver Center.

• POSSUM showed fair discrimination for morbidity outcomes (AUC 0.697 for major morbidity, Clavien–Dindo ≥ IIIa, and AUC 0.696 for clinically relevant morbidity, ≥ II), whereas P-POSSUM achieved higher discrimination for in-hospital mortality (AUC 0.755).

• Bootstrap out-of-bag validation (1,000 resamples) indicated imperfect calibration with optimism-corrected slopes < 1 (β 0.907, 0.837, and 0.843 for major morbidity, clinically relevant morbidity; β 0.843 and mortality, respectively), supporting the need for local model updating.

• Decision curve analysis suggested a higher net benefit of the recalibrated models compared with “treat-all” and “treat-none” strategies across clinically relevant threshold probabilities, supporting their potential use for perioperative risk communication and institutional benchmarking.

• POSSUM-based risk models can be clinically useful in hepatobiliary surgery, provided that centers perform local validation and recalibration before implementation.

The online version contains supplementary material available at 10.1186/s12893-026-03508-9.

## Full-text entities

- **Genes:** PKLR (pyruvate kinase L/R) [NCBI Gene 5313] {aka CNSHA2, PK1, PKL, PKRL, RPK}
- **Diseases:** tumor (MESH:D009369), fibrosis (MESH:D005355), Stage Liver Disease (MESH:D058625), cirrhotic (MESH:D000094724), liver dysfunction (MESH:D017093), CD (MESH:D003424), frailty (MESH:D000073496), steatosis (MESH:D005234), complications (MESH:D008107), Deaths (MESH:D003643), COVID-19 (MESH:D000086382)
- **Chemicals:** urea (MESH:D014508), potassium (MESH:D011188)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12849102/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12849102/full.md

## References

3 references — full list in the complete paper: https://tomesphere.com/paper/PMC12849102/full.md

---
Source: https://tomesphere.com/paper/PMC12849102