# Examining the Performance of ChatGPT in Comprehensive Pre‐Internship Exam: The Potential of Artificial Intelligence in Medical Education

**Authors:** Michaeel Motaghi Niko, Zahra Karbasi, Maryam Kazemi, Maryam Zahmatkeshan

PMC · DOI: 10.1002/hsr2.71492 · Health Science Reports · 2026-01-08

## TL;DR

This study tests ChatGPT's performance on a national medical exam in Iran, finding it accurate in some areas but less so in others, suggesting it could be a helpful but not fully reliable educational tool.

## Contribution

The study is the first to evaluate ChatGPT's performance on a comprehensive, multi-disciplinary medical exam in a real-world setting.

## Key findings

- ChatGPT answered 68.6% of the exam questions correctly.
- Expert ratings for response quality averaged 4.23 out of 5.
- Performance varied by specialty, with higher accuracy in pharmacology and lower in pulmonology.

## Abstract

ChatGPT is a popular large language model with potential educational applications in medicine. However, its performance in standardized, multi‐disciplinary medical exams has not been comprehensively assessed. This study evaluates ChatGPT's accuracy and quality in Iran's national medical pre‐internship exam.

We tested ChatGPT (GPT‐3.5, May 3rd version) on 195 multiple‐choice questions from the March 2022 Iranian pre‐internship exam, covering 23 medical specialties. Questions with visual content were excluded. Each question was asked in a new chat to avoid memory bias. Responses were evaluated by 55 experts using a 5‐point Likert scale and compared against the official answer key. Data were analyzed descriptively using SPSS.

ChatGPT answered 68.6% of questions correctly. Expert ratings averaged 4.23/5 (SD = 1.21), indicating good to excellent quality. Best‐performing specialties included pharmacology (85.7%), otorhinolaryngology (83.3%), and dermatology (83.3%). Lower performance was observed in pulmonology (42.9%) and epidemiology (50%).

ChatGPT shows promise as a supplemental educational tool in medical education, but its accuracy varies by specialty. Faculty guidance is essential to ensure responsible integration until further improvements and validations are made.

## Full-text entities

- **Diseases:** dermatology (MESH:D000168)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12783691/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12783691/full.md

## References

21 references — full list in the complete paper: https://tomesphere.com/paper/PMC12783691/full.md

---
Source: https://tomesphere.com/paper/PMC12783691