Classifying Cancer Stage with Open-Source Clinical Large Language Models
Chia-Hsuan Chang, Mary M. Lucas, Grace Lu-Yao, Christopher C. Yang

TL;DR
This study shows that open-source clinical large language models can extract cancer staging information from pathology reports without labeled data, performing comparably or better than fine-tuned models on certain classification tasks.
Contribution
Demonstrates the capability of open-source LLMs to extract cancer staging info from reports without labeled training data, reducing the need for labor-intensive dataset preparation.
Findings
LLMs perform subpar in Tumor (T) classification.
LLMs achieve comparable results in Metastasis (M) classification.
LLMs outperform fine-tuned models in Node (N) classification with prompting.
Abstract
Cancer stage classification is important for making treatment and care management plans for oncology patients. Information on staging is often included in unstructured form in clinical, pathology, radiology and other free-text reports in the electronic health record system, requiring extensive work to parse and obtain. To facilitate the extraction of this information, previous NLP approaches rely on labeled training datasets, which are labor-intensive to prepare. In this study, we demonstrate that without any labeled training data, open-source clinical large language models (LLMs) can extract pathologic tumor-node-metastasis (pTNM) staging information from real-world pathology reports. Our experiments compare LLMs and a BERT-based model fine-tuned using the labeled data. Our findings suggest that while LLMs still exhibit subpar performance in Tumor (T) classification, with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · Topic Modeling · Biomedical Text Mining and Ontologies
