Prediction of Lung Metastasis from Hepatocellular Carcinoma using the SEER Database
Jeff J.H. Kim, George R. Nahass, Yang Dai, Theja Tulabandhula

TL;DR
This study develops machine learning models using SEER data to predict lung metastasis in hepatocellular carcinoma patients, achieving high sensitivity but facing challenges with low precision, highlighting potential for clinical risk stratification.
Contribution
Introduces an end-to-end machine learning pipeline with custom loss functions and ensemble methods for improved metastasis prediction in HCC using SEER data.
Findings
Random Forest and MLP achieved AUROC of 0.82
Custom loss function improved model sensitivity
Ensemble approach increased recall
Abstract
Hepatocellular carcinoma (HCC) is a leading cause of cancer-related mortality, with lung metastases being the most common site of distant spread and significantly worsening prognosis. Despite the growing availability of clinical and demographic data, predictive models for lung metastasis in HCC remain limited in scope and clinical applicability. In this study, we develop and validate an end-to-end machine learning pipeline using data from the Surveillance, Epidemiology, and End Results (SEER) database. We evaluated three machine learning models (Random Forest, XGBoost, and Logistic Regression) alongside a multilayer perceptron (MLP) neural network. Our models achieved high AUROC values and recall, with the Random Forest and MLP models demonstrating the best overall performance (AUROC = 0.82). However, the low precision across models highlights the challenges of accurately predicting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Sigmoid Activation · Batch Normalization · Dense Connections · 1x1 Convolution · Squeeze-and-Excitation Block · Convolution · Grouped Convolution · Average Pooling · Global Average Pooling
