Survival Prediction from Imbalance colorectal cancer dataset using hybrid sampling methods and tree-based classifiers
Sadegh Soleimani, Mahsa Bahrami, Mansour Vali

TL;DR
This study develops a pipeline combining sampling techniques and tree-based classifiers to improve survival prediction accuracy for imbalanced colorectal cancer datasets, especially for 1-year survival prediction.
Contribution
Introduces a hybrid sampling pipeline with tree classifiers that enhances minority class prediction in highly imbalanced colorectal cancer survival datasets.
Findings
Proposed method with LGBM achieves 72.30% sensitivity for 1-year survival.
RENN combined with LGBM reaches 80.81% sensitivity for 3-year survival.
Hybrid sampling improves minority class prediction in imbalanced datasets.
Abstract
Background and Objective: Colorectal cancer is a high mortality cancer. Clinical data analysis plays a crucial role in predicting the survival of colorectal cancer patients, enabling clinicians to make informed treatment decisions. However, utilizing clinical data can be challenging, especially when dealing with imbalanced outcomes. This paper focuses on developing algorithms to predict 1-, 3-, and 5-year survival of colorectal cancer patients using clinical datasets, with particular emphasis on the highly imbalanced 1-year survival prediction task. To address this issue, we propose a method that creates a pipeline of some of standard balancing techniques to increase the true positive rate. Evaluation is conducted on a colorectal cancer dataset from the SEER database. Methods: The pre-processing step consists of removing records with missing values and merging categories. The minority…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare · Machine Learning in Healthcare
