Vision-language model performance on the Japanese Nuclear Medicine Board Examination: high accuracy in text but challenges with image interpretation

Rintaro Ito; Keita Kato; Marina Higashi; Yumi Abe; Ryogo Minamimoto; Katsuhiko Kato; Toshiaki Taoka; Shinji Naganawa

PMC · DOI:10.1007/s12149-025-02084-x·July 15, 2025

Vision-language model performance on the Japanese Nuclear Medicine Board Examination: high accuracy in text but challenges with image interpretation

Rintaro Ito, Keita Kato, Marina Higashi, Yumi Abe, Ryogo Minamimoto, Katsuhiko Kato, Toshiaki Taoka, Shinji Naganawa

PDF

Open Access

TL;DR

Vision-language models perform well on text-based nuclear medicine questions but struggle with image interpretation, showing promise as assistants but not replacements for experts.

Contribution

First evaluation of state-of-the-art vision-language models on the Japanese Nuclear Medicine Board Examination, revealing performance trends and limitations.

Findings

01

ChatGPT o1 pro achieved the highest overall accuracy (83.3%) on the JNMBE questions.

02

All models performed significantly better on text-only questions than on image-based ones.

03

VLMs struggled with Japanese regulations and showed declining accuracy from 2022 to 2024.

Abstract

Vision language models (VLMs) allow visual input to Large Language Models. VLMs have been developing rapidly, and their accuracy is improving rapidly. Their performance in nuclear medicine compared to state-of-the-art models, including reasoning models, is not yet clear. We evaluated state-of-the-art VLMs using problems from the past Japan Nuclear Medicine Board Examination (JNMBE) and assessed their strengths and limitations. We collected 180 multiple-choice questions from JNMBE (2022–2024). About one-third included diagnostic images. We used eight latest VLMs. ChatGPT o1 pro, ChatGPT o1, ChatGPT o3-mini, ChatGPT-4.5, Claude 3.7, Gemini 2.0 Flash thinking, Llama 3.2, and Gemma 3 were tested. Each model answered every question three times in a deterministic setting, and the final answer was set by majority vote. Two board-certified nuclear medicine physicians independently provided…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Chemicals2

18 F-fluorodeoxyglucose Gemini

Diseases3

LLMs VLMs lymphoma

Figures3

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling · Interpreting and Communication in Healthcare