Multi-Task LLM with LoRA Fine-Tuning for Automated Cancer Staging and Biomarker Extraction
Jiahao Shao, Anam Nawaz Khan, Christopher Brett, Tom Berg, Xueping Li, Bing Yao

TL;DR
This paper presents a parameter-efficient multi-task LLM framework fine-tuned with LoRA for accurate, scalable extraction of cancer staging and biomarkers from pathology reports, outperforming traditional methods.
Contribution
Introduces a novel multi-task, LoRA-finetuned Llama-3 model with parallel classification heads for reliable cancer data extraction from unstructured pathology reports.
Findings
Achieved a Macro F1 score of 0.976 in extraction tasks.
Successfully handled complex contextual ambiguities and heterogeneous report formats.
Outperformed rule-based NLP, zero-shot LLMs, and single-task baselines.
Abstract
Pathology reports serve as the definitive record for breast cancer staging, yet their unstructured format impedes large-scale data curation. While Large Language Models (LLMs) offer semantic reasoning, their deployment is often limited by high computational costs and hallucination risks. This study introduces a parameter-efficient, multi-task framework for automating the extraction of Tumor-Node-Metastasis (TNM) staging, histologic grade, and biomarkers. We fine-tune a Llama-3-8B-Instruct encoder using Low-Rank Adaptation (LoRA) on a curated, expert-verified dataset of 10,677 reports. Unlike generative approaches, our architecture utilizes parallel classification heads to enforce consistent schema adherence. Experimental results demonstrate that the model achieves a Macro F1 score of 0.976, successfully resolving complex contextual ambiguities and heterogeneous reporting formats that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
