CleanSurvival: Automated data preprocessing for time-to-event models using reinforcement learning

Yousef Koka; David Selby; Gerrit Gro{\ss}mann; Kathan Pandya; Sebastian Vollmer

arXiv:2502.03946·cs.LG·February 3, 2026

CleanSurvival: Automated data preprocessing for time-to-event models using reinforcement learning

Yousef Koka, David Selby, Gerrit Gro{\ss}mann, Kathan Pandya, Sebastian Vollmer

PDF

Open Access

TL;DR

CleanSurvival introduces a reinforcement learning framework to automate and optimize data preprocessing specifically for survival analysis models, significantly improving predictive performance and efficiency.

Contribution

It presents the first tailored reinforcement learning approach for automated data preprocessing in survival analysis, addressing a critical gap in current machine learning pipelines.

Findings

01

Outperforms standard preprocessing methods in predictive accuracy.

02

Achieves up to 10 times faster model training compared to grid search.

03

Effective across various missing data and noise conditions.

Abstract

Data preprocessing is a critical yet frequently neglected aspect of machine learning, often paid little attention despite its potentially significant impact on model performance. While automated machine learning pipelines are starting to recognize and integrate data preprocessing into their solutions for classification and regression tasks, this integration is lacking for more specialized tasks like survival or time-to-event models. As a result, survival analysis not only faces the general challenges of data preprocessing but also suffers from the lack of tailored, automated solutions in this area. To address this gap, this paper presents 'CleanSurvival', a reinforcement-learning-based solution for optimizing preprocessing pipelines, extended specifically for survival analysis. The framework can handle continuous and categorical variables, using Q-learning to select which combination of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications · Software System Performance and Reliability

MethodsSoftmax · Attention Is All You Need · Q-Learning