TL;DR
SurvBench is an open-source, configurable preprocessing pipeline for multi-modal EHR survival analysis, standardizing data preparation across multiple critical-care datasets to improve comparability of deep-learning models.
Contribution
It introduces a comprehensive, YAML-configurable pipeline that handles various data modalities and endpoints, enabling consistent preprocessing for survival analysis research.
Findings
Supports four critical-care datasets and multiple data modalities.
Handles single-risk and competing-risks endpoints with proper censoring.
Facilitates external validation across datasets.
Abstract
Deep-learning survival models for electronic health record (EHR) data are hard to compare across papers because the upstream preprocessing step, which includes cohort definition, time discretisation, missingness handling, and censoring rules, is typically undocumented and inconsistent. A reported difference in concordance between two mortality models can therefore reflect any of these choices rather than a modelling contribution. We present SurvBench, an open-source preprocessing pipeline that converts raw PhysioNet exports into model-ready tensors for survival analysis. SurvBench covers four critical-care databases (MIMIC-IV, eICU, MC-MED, HiRID) and four input modalities: time-series vitals and laboratory values, static demographics, International Classification of Diseases (ICD) codes, and radiology report embeddings. Every preprocessing decision is controlled through YAML…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
