# Performance of ChatGPT on questions from the Brazilian College of Radiology annual resident evaluation test

**Authors:** Cleverson Alex Leitão, Gabriel Lucca de Oliveira Salvador, Leda Maria Rabelo, Dante Luiz Escuissato

PMC · DOI: 10.1590/0100-3984.2023.0083-en · Radiologia Brasileira · 2024-03-25

## TL;DR

This study evaluates how well ChatGPT answers radiology questions from a Brazilian medical exam, finding it performs better on simpler and physics-related questions.

## Contribution

The study is the first to assess ChatGPT's performance on radiology questions from the Brazilian College of Radiology's resident evaluation test.

## Key findings

- ChatGPT answered 53.3% of radiology questions correctly.
- It performed better on lower-order cognitive and physics-related questions.
- Performance did not vary significantly by subspecialty or academic year.

## Abstract

To test the performance of ChatGPT on radiology questions formulated by the
Colégio Brasileiro de Radiologia (CBR, Brazilian College of
Radiology), evaluating its failures and successes.

165 questions from the CBR annual resident assessment (2018, 2019, and 2022)
were presented to ChatGPT. For statistical analysis, the questions were
divided by the type of cognitive skills assessed (lower or higher order), by
topic (physics or clinical), by subspecialty, by style (description of a
clinical finding or sign, clinical management of a case, application of a
concept, calculation/classification of findings, correlations between
diseases, or anatomy), and by target academic year (all, second/third year,
or third year only).

ChatGPT answered 88 (53.3%) of the questions correctly. It performed
significantly better on the questions assessing lower-order cognitive skills
than on those assessing higher-order cognitive skills, providing the correct
answer on 38 (64.4%) of 59 questions and on only 50 (47.2%) of 106
questions, respectively (p = 0.01). The accuracy rate was
significantly higher for physics questions than for clinical questions,
correct answers being provided for 18 (90.0%) of 20 physics questions and
for 70 (48.3%) of 145 clinical questions (p = 0.02). There
was no significant difference in performance among the subspecialties or
among the academic years (p > 0.05).

Even without dedicated training in this field, ChatGPT demonstrates
reasonable performance, albeit still insufficient for approval, on radiology
questions formulated by the CBR.

## Full-text entities

- **Diseases:** hallucinations (MESH:D006212), ChatGPT (MESH:D002472)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11236413/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11236413/full.md

## References

16 references — full list in the complete paper: https://tomesphere.com/paper/PMC11236413/full.md

---
Source: https://tomesphere.com/paper/PMC11236413