Comparison of the C4.5 and a Naive Bayes Classifier for the Prediction of Lung Cancer Survivability
George Dimitoglou, James A. Adams, Carol M. Jim

TL;DR
This study compares the effectiveness of C4.5 (J48) and Naive Bayes classifiers in predicting lung cancer survivability using 15 years of patient data, highlighting the importance of domain knowledge.
Contribution
It provides an empirical comparison of C4.5 and Naive Bayes classifiers on real-world lung cancer data, emphasizing data preprocessing and domain-specific insights.
Findings
J48 performs marginally better than Naive Bayes
Data preprocessing is crucial for accurate predictions
Domain knowledge enhances model performance
Abstract
Numerous data mining techniques have been developed to extract information and identify patterns and predict trends from large data sets. In this study, two classification techniques, the J48 implementation of the C4.5 algorithm and a Naive Bayes classifier are applied to predict lung cancer survivability from an extensive data set with fifteen years of patient records. The purpose of the project is to verify the predictive effectiveness of the two techniques on real, historical data. Besides the performance outcome that renders J48 marginally better than the Naive Bayes technique, there is a detailed description of the data and the required pre-processing activities. The performance results confirm expectations while some of the issues that appeared during experimentation, underscore the value of having domain-specific understanding to leverage any domain-specific characteristics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Artificial Intelligence in Healthcare · Biomedical Text Mining and Ontologies
