Evaluating Chat Generative Pretrained Transformer (GPT-4o) Problem-Solving Performance in the Japan Certificate Examination for Biomedical Engineering Class 1

Kai Ishida

PMC · DOI:10.7759/cureus.81029·March 23, 2025

Evaluating Chat Generative Pretrained Transformer (GPT-4o) Problem-Solving Performance in the Japan Certificate Examination for Biomedical Engineering Class 1

Kai Ishida

PDF

Open Access

TL;DR

This study tested ChatGPT's ability to solve biomedical engineering exam questions and found it performed below human-level accuracy, especially in problem-solving and understanding complex concepts.

Contribution

The novel contribution is evaluating ChatGPT's performance on a specific biomedical engineering certification exam, revealing its limitations in accuracy and knowledge depth.

Findings

01

ChatGPT scored below 70% on fundamental and applied knowledge questions.

02

Over 80% of incorrect answers were due to lack of knowledge or hallucinations.

03

Performance met passing criteria only in one of three exams tested.

Abstract

Introduction Chat generative pretrained transformer (ChatGPT; OpenAI, San Francisco, CA) has developed rapidly and is used in various fields, including medical engineering. Japan’s Certificate Examination for Biomedical Engineering class 1 (CEBM1) is responsible for the assessment of comprehensive specialized knowledge and skills centered on the maintenance and safety management of medical devices, systems, and related equipment. This study evaluated the performance of ChatGPT (GPT-4o) on CEBM1 for comparison to human-level expectations. Methods We targeted 171 questions including testing for knowledge with fundamental, applied, and problem-solving abilities from the 26th to 28th CEBM1s. We inputted the Japanese version of questions to ChatGPT (GPT-4o), and evaluated performance based on question difficulty. No prompt optimizations were used. We compared the responses provided by…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Diseases1

hallucinations

Figures6

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Biomedical and Engineering Education · AI in Service Interactions