Extracting Breast Cancer Phenotypes from Clinical Notes: Comparing LLMs with Classical Ontology Methods

Abdullah Bin Faiz; Arbaz Khan Shehzad; Asad Afzal; Momin Tariq; Muhammad Siddiqi; Muhammad Usamah Shahid; Maryam Noor Awan; Muddassar Farooq

arXiv:2604.06208·cs.CL·April 9, 2026

Extracting Breast Cancer Phenotypes from Clinical Notes: Comparing LLMs with Classical Ontology Methods

Abdullah Bin Faiz, Arbaz Khan Shehzad, Asad Afzal, Momin Tariq, Muhammad Siddiqi, Muhammad Usamah Shahid, Maryam Noor Awan, Muddassar Farooq

PDF

TL;DR

This study develops an LLM-based framework to extract breast cancer phenotypes from unstructured clinical notes, demonstrating comparable accuracy to classical ontology methods and flexibility for adaptation to other diseases.

Contribution

The paper introduces an LLM framework for phenotype extraction from clinical notes, showing it matches classical methods and can be easily adapted to other conditions.

Findings

01

LLM framework achieves accuracy comparable to ontology-based methods.

02

The framework can be fine-tuned for different cancer types and diseases.

03

Extracted phenotypes include treatment outcomes, biomarkers, and tumor characteristics.

Abstract

A significant amount of data held in Oncology Electronic Medical Records (EMRs) is contained in unstructured provider notes -- including but not limited to the chemotherapy (or cancer treatment) outcome, different biomarkers, the tumor's location, sizes, and growth patterns of a patient. The clinical studies show that the majority of oncologists are comfortable providing these valuable insights in their notes in a natural language rather than the relevant structured fields of an EMR. The major contribution of this research is to report an LLM-based framework to process provider notes and extract valuable medical knowledge and phenotype mentioned above, with a focus on the domain of oncology. In this paper, we focus on extracting phenotypes related to breast cancer using our LLM framework, and then compare its performance with earlier works that used knowledge-driven annotation system,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.