Performance of ChatGPT in Israeli Arabic-language OBGYN national medical licensure exam

Adiel Cohen; Elior Eliasi; Raanan Meyer; Yoav Brezinov; Gabriel Levin

PMC · DOI:10.1186/s12909-026-08753-3·February 11, 2026

Performance of ChatGPT in Israeli Arabic-language OBGYN national medical licensure exam

Adiel Cohen, Elior Eliasi, Raanan Meyer, Yoav Brezinov, Gabriel Levin

PDF

Open Access

TL;DR

This study evaluates ChatGPT's performance on Arabic-language OBGYN medical exams in Israel, finding it answered only 44% of questions correctly.

Contribution

The study is the first to assess ChatGPT's performance in Arabic-language medical exams, revealing lower accuracy compared to English.

Findings

01

ChatGPT-3.5 correctly answered 43.9% of Arabic OBGYN exam questions, below the passing threshold.

02

Performance was consistently low across all exam subjects, with no significant differences between them.

03

Arabic performance (43.9%) was significantly lower than previously reported English performance (60.7%).

Abstract

Previous studies of ChatGPT performance in the field of medical exams have reached contradictory results. The performance of ChatGPT in languages other than English, including Arabic, which is the official language of medical education and practice in many countries, has yet to be explored. We aim to evaluate the performance of ChatGPT in Arabic-language Israeli OBGYN medical licensure exams for foreign university alumni. We conducted a performance study using a consecutive sample of text-based multiple-choice questions, originated from authentic Arabic-language Israeli OBGYN medical licensure exams for foreign university alumni. ChatGPT-3.5 (using a newly created account) answered all questions in Arabic. We compared the performance of ChatGPT including in the different fields of the exam; Obstetrics, Reproductive medicine and Infertility, Gynecology and Gynecologic Oncology, and also…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Diseases2

OBGYN ChatGPT

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning