Can Language Models Identify Side Effects of Breast Cancer Radiation Treatments?
Natalie Seah, Danielle S. Bitterman, Daphna Spiegel, Thomas Hartvigsen

TL;DR
This paper evaluates the reliability of large language models in identifying breast cancer radiation therapy side effects, highlighting their limitations and proposing methods to improve safety and accuracy.
Contribution
It introduces a stress-testing framework for assessing LLMs in oncology, comparing outputs to clinician-curated references, and analyzing their sensitivity and recall of side effects.
Findings
LLMs show sensitivity to documentation changes.
Trade-offs exist between precision and recall.
Grounding outputs improves reliability.
Abstract
Accurately communicating the side effects of cancer treatments to cancer survivors is critical, particularly in settings such as informed consent, where clinicians must clearly and comprehensively convey potential treatment toxicities. However, this task remains challenging due to clinical knowledge deficits about adverse treatment effects and fragmentation across electronic health record (EHR) systems. Large language models (LLMs) have the potential to assist in this task, though their reliability in oncology survivorship contexts remains poorly understood. We present a deployment-oriented stress-testing framework for evaluating LLM-generated radiation side effect lists in breast cancer treatment and survivorship care. Using 21 breast cancer patient profiles, we construct paired patient clinical scenarios that differ only in radiotherapy regimens to evaluate seven instruction-tuned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
