Enhancing Breast Cancer Prediction with LLM-Inferred Confounders

Debmita Roy

arXiv:2511.17662·cs.LG·November 25, 2025

Enhancing Breast Cancer Prediction with LLM-Inferred Confounders

Debmita Roy

PDF

Open Access

TL;DR

This paper proposes using large language models to infer confounding health conditions from clinical data, improving breast cancer prediction accuracy and supporting noninvasive prescreening and clinical decision-making.

Contribution

It introduces a novel method of leveraging LLMs to generate confounder features, enhancing predictive models for breast cancer detection.

Findings

01

LLMs like Gemma and Llama improved model performance by up to 6.4%.

02

Inferred confounders contributed to better early detection.

03

Supports noninvasive prescreening and clinical decision-making.

Abstract

This study enhances breast cancer prediction by using large language models to infer the likelihood of confounding diseases, namely diabetes, obesity, and cardiovascular disease, from routine clinical data. These AI-generated features improved Random Forest model performance, particularly for LLMs like Gemma (3.9%) and Llama (6.4%). The approach shows promise for noninvasive prescreening and clinical integration, supporting improved early detection and shared decision-making in breast cancer diagnosis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Artificial Intelligence in Healthcare · Cancer Risks and Factors