# A Reproducible Post-Valve-Replacement EHR Cohort for Comparative AI Studies

**Authors:** Malte Blattmann, Mika Katalinic, Adrian Lindenmeyer, Stefan Franke, Thomas Neumuth, Daniel Schneider

PMC · DOI: 10.3390/diagnostics16030447 · 2026-02-01

## TL;DR

This paper introduces a reproducible EHR dataset for valve replacement patients to evaluate AI models in predicting postoperative risks.

## Contribution

The novel contribution is a publicly available pipeline and benchmark for longitudinal EHR analysis in post-valve replacement care.

## Key findings

- ICU readmission predicted in-hospital and 100-day outcomes like mortality and complications.
- A sequential Transformer model outperformed non-sequential models with 0.87 AUROC and 0.69 AUPRC.

## Abstract

Background/Objectives: Valve replacement (VR) patients are at high risk of postoperative complications, but reproducible Electronic Health Record (EHR) benchmarks for evaluating sequential AI models in this setting are lacking. We develop a reproducible pipeline that extracts two EHR datasets from MIMIC-IV (a general-purpose and a predictive benchmark dataset) capturing perioperative histories, high-resolution time-series, and clinically motivated outcome labels. Methods: The cohort comprises 3890 VR patients with clinician-guided feature selection across diagnoses, procedures, laboratory measurements, medications, and physiological monitoring. As an exemplary use case, we define ICU readmission at first ICU discharge as a surrogate for postoperative risk and derive a predictive benchmark under strict label-leakage control. We then compare a Transformer model trained on tokenized longitudinal EHR sequences with Transformer and XGBoost baselines trained on aggregated feature statistics, and assess performance differences using paired statistical tests across validation splits. Results: ICU readmission stratified in-hospital and 100-day outcomes, including mortality, complications, and rehospitalization, confirming the clinical relevance of the prediction target. The sequential Transformer achieved 0.87 AUROC and 0.69 AUPRC. Corrected resampled t-tests confirm improved performance over the non-sequential Transformer, while the comparison with XGBoost indicates a favorable trend without conclusive evidence. Conclusions: Our findings suggest that leveraging longitudinal EHR sequences yields higher predictive performance than static feature summaries for postoperative risk prediction. The publicly released preprocessing pipeline and cohort-construction code enable researchers with MIMIC-IV access to reproduce the datasets and provide a robust benchmark for developing and comparing time-series models in post-valve replacement care.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12896665/full.md

---
Source: https://tomesphere.com/paper/PMC12896665