# Performance of ChatGPT-4o in Determining Radiology–Pathology Concordance and Management Recommendations Following Image-Guided Breast Biopsies

**Authors:** Albert Lee, Belinda Curpen, Afsaneh Alikhassi

PMC · DOI: 10.3390/diagnostics15192536 · Diagnostics · 2025-10-08

## TL;DR

ChatGPT-4o performed as well as radiologists in assessing breast biopsy results and recommending follow-up actions, with a slightly more conservative approach.

## Contribution

Demonstrated that ChatGPT-4o can match radiologists in evaluating radiology-pathology concordance in breast biopsies.

## Key findings

- ChatGPT-4o achieved a 98.8% concordance rate, comparable to radiologists.
- It showed high diagnostic agreement with the gold standard (kappa = 0.947).
- ChatGPT-4o recommended more imaging follow-up and less surgery than radiologists.

## Abstract

Background: Determining radiology–pathology concordance after breast biopsies is critical to ensuring appropriate patient management. However, expertise and multidisciplinary input are not universally accessible. Purpose: To evaluate the performance of a large language model, ChatGPT-4o, in determining the radiology–pathology concordance of breast biopsies and suggesting subsequent management steps. Methods: A retrospective single-center study analyzed 244 cases of image-guided breast biopsies of women. ChatGPT-4o assessed de-identified radiology and pathology reports for concordance and recommended management. Radiologist assessments served as the reference standard with final surgical pathology and 2-year imaging follow-up serving as gold standards when applicable. Concordance rates, management recommendations, and diagnostic agreement with the gold standard were compared using statistical tests, including McNemar’s, chi-square, Fisher–Freeman–Halton, and Cohen’s kappa. Results: ChatGPT-4o achieved a concordance rate of 98.8% vs. 98.0% for radiologists (p = 0.625) and demonstrated high diagnostic agreement with the gold standard (kappa = 0.947, p < 0.001). ChatGPT-4o favored imaging follow-up more than radiologists (49.2% vs. 41.8%, p < 0.001) and surgical management less frequently (41.8% vs. 46.7%). Conclusions: ChatGPT-4o demonstrated diagnostic performance comparable to radiologists with breast imaging subspecialities in evaluating breast biopsy concordance. Its slightly more conservative management approach may enhance shared decision-making in resource-limited settings.

## Linked entities

- **Diseases:** breast cancer (MONDO:0004989)

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12523907/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12523907/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/PMC12523907/full.md

---
Source: https://tomesphere.com/paper/PMC12523907