CleanSurvival: Automated data preprocessing for time-to-event models using reinforcement learning
Yousef Koka, David Selby, Gerrit Gro{\ss}mann, Kathan Pandya, Sebastian Vollmer

TL;DR
CleanSurvival introduces a reinforcement learning framework to automate and optimize data preprocessing specifically for survival analysis models, significantly improving predictive performance and efficiency.
Contribution
It presents the first tailored reinforcement learning approach for automated data preprocessing in survival analysis, addressing a critical gap in current machine learning pipelines.
Findings
Outperforms standard preprocessing methods in predictive accuracy.
Achieves up to 10 times faster model training compared to grid search.
Effective across various missing data and noise conditions.
Abstract
Data preprocessing is a critical yet frequently neglected aspect of machine learning, often paid little attention despite its potentially significant impact on model performance. While automated machine learning pipelines are starting to recognize and integrate data preprocessing into their solutions for classification and regression tasks, this integration is lacking for more specialized tasks like survival or time-to-event models. As a result, survival analysis not only faces the general challenges of data preprocessing but also suffers from the lack of tailored, automated solutions in this area. To address this gap, this paper presents 'CleanSurvival', a reinforcement-learning-based solution for optimizing preprocessing pipelines, extended specifically for survival analysis. The framework can handle continuous and categorical variables, using Q-learning to select which combination of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Software System Performance and Reliability
MethodsSoftmax · Attention Is All You Need · Q-Learning
