Comparative Evaluation of Explainable Machine Learning Versus Linear Regression for Predicting County-Level Lung Cancer Mortality Rate in the United States
Soheil Hashtarkhani, Brianna M. White, Benyamin Hoseini, David L. Schwartz, Arash Shaban-Nejad

TL;DR
This study compares explainable machine learning models with linear regression for predicting county-level lung cancer mortality in the US, finding that random forest offers superior accuracy and insights into key risk factors.
Contribution
It demonstrates the effectiveness of explainable machine learning models, particularly random forest, over traditional linear regression in predicting and understanding lung cancer mortality rates.
Findings
Random forest achieved R2 of 41.9% and RMSE of 12.8.
SHAP analysis identified smoking rate as the top predictor.
Spatial analysis revealed significant mortality hotspots in mid-eastern US counties.
Abstract
Lung cancer (LC) is a leading cause of cancer-related mortality in the United States. Accurate prediction of LC mortality rates is crucial for guiding targeted interventions and addressing health disparities. Although traditional regression-based models have been commonly used, explainable machine learning models may offer enhanced predictive accuracy and deeper insights into the factors influencing LC mortality. This study applied three models: random forest (RF), gradient boosting regression (GBR), and linear regression (LR) to predict county-level LC mortality rates across the United States. Model performance was evaluated using R-squared and root mean squared error (RMSE). Shapley Additive Explanations (SHAP) values were used to determine variable importance and their directional impact. Geographic disparities in LC mortality were analyzed through Getis-Ord (Gi*) hotspot analysis.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods in Epidemiology · Global Cancer Incidence and Screening · Lung Cancer Diagnosis and Treatment
