SurvBench: A Standardised Preprocessing Pipeline for Multi-Modal Electronic Health Record Survival Analysis

Munib Mesinovic; Tingting Zhu

arXiv:2511.11935·cs.LG·May 13, 2026

SurvBench: A Standardised Preprocessing Pipeline for Multi-Modal Electronic Health Record Survival Analysis

Munib Mesinovic, Tingting Zhu

PDF

1 Repo

TL;DR

SurvBench is an open-source, configurable preprocessing pipeline for multi-modal EHR survival analysis, standardizing data preparation across multiple critical-care datasets to improve comparability of deep-learning models.

Contribution

It introduces a comprehensive, YAML-configurable pipeline that handles various data modalities and endpoints, enabling consistent preprocessing for survival analysis research.

Findings

01

Supports four critical-care datasets and multiple data modalities.

02

Handles single-risk and competing-risks endpoints with proper censoring.

03

Facilitates external validation across datasets.

Abstract

Deep-learning survival models for electronic health record (EHR) data are hard to compare across papers because the upstream preprocessing step, which includes cohort definition, time discretisation, missingness handling, and censoring rules, is typically undocumented and inconsistent. A reported difference in concordance between two mortality models can therefore reflect any of these choices rather than a modelling contribution. We present SurvBench, an open-source preprocessing pipeline that converts raw PhysioNet exports into model-ready tensors for survival analysis. SurvBench covers four critical-care databases (MIMIC-IV, eICU, MC-MED, HiRID) and four input modalities: time-series vitals and laboratory values, static demographics, International Classification of Diseases (ICD) codes, and radiology report embeddings. Every preprocessing decision is controlled through YAML…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

munibmesinovic/SurvBench
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.