Enhancing Lung Cancer Treatment Outcome Prediction through Semantic Feature Engineering Using Large Language Models
MunHwan Lee, Shaika Chowdhury, Xiaodi Li, Sivaraman Rajaganapathy, Eric W Klee, Ping Yang, Terence Sio, Liewei Wang, James Cerhan, Nansu NA Zong

TL;DR
This paper presents a novel framework using Large Language Models as Goal-oriented Knowledge Curators to generate high-quality, task-specific features from multimodal clinical data, significantly improving lung cancer outcome prediction accuracy.
Contribution
Introduces a scalable, interpretable LLM-based feature engineering method that outperforms traditional approaches in predicting lung cancer treatment outcomes.
Findings
Achieved mean AUROC of 0.803, outperforming baselines.
Demonstrated the effectiveness of multimodal feature integration.
Confirmed the importance of semantic representation quality.
Abstract
Accurate prediction of treatment outcomes in lung cancer remains challenging due to the sparsity, heterogeneity, and contextual overload of real-world electronic health data. Traditional models often fail to capture semantic information across multimodal streams, while large-scale fine-tuning approaches are impractical in clinical workflows. We introduce a framework that uses Large Language Models (LLMs) as Goal-oriented Knowledge Curators (GKC) to convert laboratory, genomic, and medication data into high-fidelity, task-aligned features. Unlike generic embeddings, GKC produces representations tailored to the prediction objective and operates as an offline preprocessing step that integrates naturally into hospital informatics pipelines. Using a lung cancer cohort (N=184), we benchmarked GKC against expert-engineered features, direct text embeddings, and an end-to-end transformer. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Lung Cancer Diagnosis and Treatment · Artificial Intelligence in Healthcare and Education
