Can GPT-5.0 Interpret Thyroid Ultrasound Images? A Comparative TI-RADS Analysis with an Expert Radiologist
Yunus Yasar, Sevde Nur Emir, Muhammet Rasit Er, Mustafa Demir

TL;DR
This study compares GPT-5.0's ability to interpret thyroid ultrasound images with an expert radiologist using the TI-RADS system, finding that while it recognizes some features, it overestimates malignancy risk.
Contribution
The study evaluates GPT-5.0's performance in thyroid ultrasound interpretation using TI-RADS criteria and compares it to an expert radiologist.
Findings
GPT-5.0 showed substantial agreement with the radiologist for composition, shape, and margin but poor agreement for echogenic foci.
GPT-5.0 had lower sensitivity and specificity compared to the radiologist, with more false positives in benign nodules.
The model tends to overclassify nodules as malignant, suggesting a need for ultrasound-specific training.
Abstract
Background/Objectives: Multimodal large language models (LLMs) may directly interpret medical images, including thyroid ultrasounds (USs). Whether these models can reliably assess thyroid nodules—where subtle echogenic and morphological details are critical—remains uncertain. The American College of Radiology (ACR) TI-RADS system provides a structured framework for benchmarking artificial intelligence. This study evaluates GPT-5.0’s ability to interpret thyroid US images according to TI-RADS criteria and contextualizes its performance relative to expert radiologist assessment, using FNA cytology as the reference standard. Methods: This retrospective study included 100 patients (mean age 49.8 ± 12.6 years; 72 women) with cytology-confirmed diagnoses: Bethesda II (benign) or Bethesda V–VI (malignant). Each nodule had longitudinal and transverse US images acquired with high-frequency…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsThyroid Cancer Diagnosis and Treatment · Artificial Intelligence in Healthcare and Education · AI in cancer detection
