# Deep Learning Models to Screen Electronic Health Records for Breast and Colorectal Cancer Progression: Performance Evaluation Study

**Authors:** Pascal Lambert, Rayyan Khan, Marshall Pitz, Harminder Singh, Helen Chen, Kathleen Decker

PMC · DOI: 10.2196/63767 · JMIR AI · 2025-10-13

## TL;DR

This study compares deep learning models to identify cancer progression in electronic health records, finding that some models perform better than others and can reduce the need for manual chart reviews.

## Contribution

The study evaluates and compares the performance of three deep learning models for detecting cancer progression in EHRs, highlighting their potential to reduce manual chart reviews.

## Key findings

- Clinical-BigBird and Clinical-Longformer models outperformed Bio+ClinicalBERT in accuracy, sensitivity, and positive predictive value for both breast and colorectal cancer progression detection.
- All models could eliminate over 84% of charts from manual review, with the word 'progression' being the most influential token in predictions.
- Model performance could be improved with larger training datasets and sentence-level analysis of EHRs.

## Abstract

Cancer progression is an important outcome in cancer research. However, it is frequently documented only in electronic health records (EHRs) as unstructured text, which requires lengthy and costly chart reviews to extract for retrospective studies.

This study aimed to evaluate the performance of 3 deep learning language models in determining breast and colorectal cancer progression in EHRs.

EHRs for individuals diagnosed with stage 4 breast or colorectal cancer between 2004 and 2020 in Manitoba, Canada, were extracted. A chart review was conducted to identify cancer progression in each EHR. Data were analyzed with pretrained deep learning language models (Bio+ClinicalBERT, Clinical-BigBird, and Clinical-Longformer). Sensitivity, positive predictive value, area under the curve, and scaled Brier scores were used to evaluate performance. Influential tokens were identified by removing and adding tokens to EHRs and examining changes in predicted probabilities.

Clinical-BigBird and Clinical-Longformer models for breast and colorectal cancer cohorts demonstrated higher accuracy than the Bio+ClinicalBERT models (scaled Brier scores for breast cancer models: 0.70-0.79 vs 0.49-0.71; scaled Brier scores for colorectal cancer models: 0.61-0.65 vs 0.49-0.61). The same models also demonstrated higher sensitivity (breast cancer models: 86.6%-94.3% vs 76.6%-87.1%; colorectal cancer models: 73.1%-78.9% vs 62.8%-77.0%) and positive predictive value (breast cancer models: 77.9%-92.3% vs 80.6%-85.5%; colorectal cancer models: 81.6%-86.3% vs 72.9%-82.9%) compared to Bio+ClinicalBERT models. All models could remove more than 84% of charts from the chart review process. The most influential token was the word progression, which was influenced by the presence of other tokens and its position within an EHR.

The deep learning language models could help identify breast and colorectal cancer progression in EHRs and remove most charts from the chart review process. A limited number of tokens may influence model predictions. Improvements in model performance could be obtained by increasing the training dataset size and analyzing EHRs at the sentence level rather than at the EHR level.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989), colorectal cancer (MONDO:0005575)

## Full-text entities

- **Diseases:** Cancer (MESH:D009369), Breast and Colorectal Cancer (MESH:D001943), colorectal cancer (MESH:D015179)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12559821/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12559821/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/PMC12559821/full.md

---
Source: https://tomesphere.com/paper/PMC12559821