Is ChatGPT-5 Ready for Mammogram VQA?

Qiang Li; Shansong Wang; Mingzhe Hu; Mojtaba Safari; Zachary Eidex; Xiaofeng Yang

arXiv:2508.11628·cs.CV·August 18, 2025

Is ChatGPT-5 Ready for Mammogram VQA?

Qiang Li, Shansong Wang, Mingzhe Hu, Mojtaba Safari, Zachary Eidex, Xiaofeng Yang

PDF

TL;DR

This study evaluates GPT-5's performance on mammogram visual question answering tasks across multiple datasets, revealing it outperforms previous GPT models but still falls short of expert-level accuracy, highlighting the need for domain-specific tuning.

Contribution

First comprehensive assessment of GPT-5 on mammography VQA tasks, demonstrating its relative strengths and limitations compared to domain-specific models and human experts.

Findings

01

GPT-5 outperforms earlier GPT models but lags behind experts.

02

Performance varies across datasets and tasks, with highest accuracy on BI-RADS and malignancy classification.

03

Significant performance improvements from GPT-4o to GPT-5 indicate potential for future development.

Abstract

Mammogram visual question answering (VQA) integrates image interpretation with clinical reasoning and has potential to support breast cancer screening. We systematically evaluated the GPT-5 family and GPT-4o model on four public mammography datasets (EMBED, InBreast, CMMD, CBIS-DDSM) for BI-RADS assessment, abnormality detection, and malignancy classification tasks. GPT-5 consistently was the best performing model but lagged behind both human experts and domain-specific fine-tuned models. On EMBED, GPT-5 achieved the highest scores among GPT variants in density (56.8%), distortion (52.5%), mass (64.5%), calcification (63.5%), and malignancy (52.8%) classification. On InBreast, it attained 36.9% BI-RADS accuracy, 45.9% abnormality detection, and 35.0% malignancy classification. On CMMD, GPT-5 reached 32.3% abnormality detection and 55.0% malignancy accuracy. On CBIS-DDSM, it achieved…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.